Tutorial

Writing and Searching for POJOs in MarkLogic - Part 4

             

Counting Property Values for a Dataset

The broad understanding of the dimensions of the dataset gained through facet analysis can frame the investigation of specific questions. Knowing the genres for the song dataset suggest that, if we want to investigate the breadth of Quincy Jones career, we could look at the genres for the songs he has produced. Such questions can be answered quickly based on a range index.

First, the ValuesLister example defines a producer constraint (much like the artist and writer constraints in a previous example). The query options also identify the range index supplying the list of values (in this case, the genre values).

Query Options for Values

optHandle.withConstraints( 
        optBldr.constraint("producer",
                optBldr.elementQuery(new QName("producer")) )); 
optHandle.withValues( 
        optBldr.values("genre", 
                optBldr.range( 
                        optBldr.elementRangeIndex( 
                                new QName("genre"), 
                                optBldr.stringRangeType( 
                                        "http://marklogic.com/collation/" ))))); 
optMgr.writeOptions("valuesongs", optHandle);

 

To query for the values, the ValuesLister example constructs a ValuesDefinition with the name of the values list (genre) specified in the query options (valuesongs). The example also constructs a StringQueryDefinition, prefixes Quincy Jones with the producer constraint (as with Aretha Franklin and the artist constraint previously), and initializes the ValuesDefinition with the StringQueryDefinition to constrain the values list to the songs produced by Quincy Jones.  

Values List

 

ValuesDefinition valdef = queryMgr.newValuesDefinition("genre", "valuesongs");

StringQueryDefinition stringQry = queryMgr.newStringDefinition();
stringQry.setCriteria("producer:\"Quincy Jones\"");
valdef.setQueryDefinition(stringQry);

ValuesHandle genreHandle = queryMgr.values(valdef, new ValuesHandle());
for (CountedDistinctValue value: genreHandle.getValues()) {
        System.out.println(
                "    "+value.getCount()+" "+value.get("xs:string", String.class));
}

The call to QueryManager.values() reads the index and ValuesHandle parses the list into a Java structure reflecting the values for the constrained subset. That's similar to the search() method with a SearchHandle in previous examples, but in this case, reading directly from the index. As elsewhere, you can also get the values list as JSON or XML. The example iterates over the list to get each count and value.

The output shows that Quincy Jones has produced a surprising diversity of hit songs:  

Values Output

Hit songs per genre for producer Quincy Jones:

1 Country Soul

1 Dance

...

1 Glam Metal

2 Hard Rock

1 Jazz

...

1 West Coast Hip Hop 

Counting Property Co-Occurrence for a Dataset

A top song is a hit in one or more weeks and can be classified in one or more genres; thus, each top song associates weeks with genres. These associations of weeks and genres (called co-occurrence or, when read from the database, tuples) can demonstrate trends over time for genres. For instance, we can investigating the trend for songs produced by Quincy Jones.

In the query options, the producer constraint remains the same as the previous example (and so isn't included in the fragment below). The TuplesLister example builds the week-genre tuple list over the week and genre range indexes (instead of a values list for one range index).

Query Options for Tuples

 

optHandle.withTuples(
        optBldr.tuples("week-genre",
                optBldr.tupleSources(
                        optBldr.range(
                                optBldr.elementRangeIndex(
                                        new QName("week"),
                                        optBldr.rangeType("xs:date"))),
                        optBldr.range(
                                optBldr.elementRangeIndex(
                                        new QName("genre"),
                                        optBldr.stringRangeType(
                                                "http://marklogic.com/collation/"))))));
optMgr.writeOptions("tuplesongs", optHandle);

 

To query for the tuples, the TuplesLister example constructs a ValuesDefinition with the name of the tuples list (weeks-genre) specified in the query options (tuplesongs). The example constrains the query to songs produced by Quincy Jones with the same StringQueryDefinition as the previous example (and so doesn't include those statements in the fragment below).

Tuples List

 

ValuesDefinition valdef =
        queryMgr.newValuesDefinition("week-genre", "tuplesongs");
...
valdef.setQueryDefinition(stringQry);

DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd");

TuplesHandle tuplesHandle = queryMgr.tuples(valdef, new TuplesHandle());
for (Tuple tuple: tuplesHandle.getTuples()) {
        System.out.print("    "+tuple.getCount()+" ");
        for (TypedDistinctValue value: tuple.getValues()) {
                String type = value.getType();
                if ("xs:date".equals(type)) {
                        System.out.print(dateFormat.format(
                                value.get(Calendar.class).getTime()));
                } else if ("xs:string".equals(type)) {
                        System.out.print(value.get(String.class));
                }
        }
        System.out.println();
}

 

The call to QueryManager.tuples() reads the indexes and TuplesHandle parses the tuples into a Java structure reflecting the values for the constrained subset. The example iterates over the tuples to get each value, formatting the date values for weeks using a Java DateFormat.

The output satisfies the goal of the investigation by showing that Quincy Jones started by producing Country Soul / R&B songs and transitioned through other genres to Hip Hop.

Tuples Output

Hit song genres by week for producer Quincy Jones:

1 1962-06-02 Country Soul

1 1962-06-02 R&B

...

1 1996-07-13 West Coast Hip Hop

1 1996-07-20 West Coast Hip Hop 

Summary and Resources

This tutorial provided a quick overview of how to use the Java API to persist and query POJOs in MarkLogic Server. In particular, you learned how to:

  • Write POJOs to and read POJOs from the database.

  • Search persisted POJOs with key-value, string, or structured criteria.

  • Specify Boolean, fulltext, or element-constrained criteria.

  • Perform facet analysis over POJO properties including bucketting.

  • Extract POJO property values and tuples from indexes.

The MarkLogic Java API works with other kinds of content besides POJOs. You can perform CRUD operations on binary (including PDF and video), JSON, XML, and text documents with collection, permission, and property metadata. You can use multi-statement transactions and optimistic locking control for CRUD operations. In search, you can take advantage of geospatial search and faceting, aggregate functions over indexes (including user-defined aggregates), and flexible snippeting and element extraction for search results. Finally, you can extend the API with server-side transforms and new resource services.

MarkLogic Server has too many capabilities to explore in one tutorial including Hadoop integration, reverse queries and alerting, server-side content processing pipelines (for conversion, enrichment, or metadata extraction), flexible replication, and ingestion and monitoring tools. Major corporations and government agencies have used MarkLogic Server in mission-critical solutions for years. Whether evaluating NoSQL platforms for an enterprise solution or rolling out a fast implementation on the Express license for a great idea all your own, you can learn more at http://developer.marklogic.com.

Resources

Express downloads - http://developer.marklogic.com/express

Comprehensive tutorial for the Java API - http://developer.marklogic.com/learn/java

Documentation - http://developer.marklogic.com/docs

For future updates to this tutorial, please see http://developer.marklogic.com/learn/java-pojos

Terms

Constraint: a name and specification for how to use an index to qualify documents.

Facet: an enumerative or quantitative property useful for selecting or grouping objects or documents in search.

Key-value search: query criteria expressed as the value of a property.

QName: a qualified name, which may be associated with a namespace for uniqueness and with a short prefix for the namespace URI for convenience.

Query options: the invariant parts of a query including the constraints that specify the names and uses of indexes, the result page size, and so on.

Range index: an index that supporting facet queries (either finding documents based on value or values based on documents).

String search: query criteria expressed as a simple Google-like expressions with Booleans, constraints, and so on.

Structured search: query criteria expressed as a data structure with Booleans, constraints, and so on

Values: some or all of the entries in an index.

Tuple: records based on co-occurrence of values in documents.

For updates and future revisions of this tutorial, please see http://developer.marklogic.com/learn/java-pojos

Erik Hennum and Charles Greer

What do you think?

JAX Magazine - 2014 - 03 Exclucively for iPad users JAX Magazine on Android

Comments

Latest opinions