Tutorial

Writing and Searching for POJOs in MarkLogic - Part 3

                  

Modifying Criteria Dynamically with Structured Search

From time to time, you might need to modify or inspect criteria programmatically. Examples include providing a GUI editor for search criteria, adding hidden criteria, checking for invalid or unauthorized criteria, or generating criteria to reflect the current state of an external resource.

As with query options, you use a builder to create a Java structure. The StructuredSearcher example builds a structured search for the same constrained criteria that the previous example expressed as a string.  

Structured Search

StructuredQueryBuilder structureBldr =
        queryMgr.newStructuredQueryBuilder("constraints");
StructuredQueryDefinition structuredQry =
        structureBldr.and(
                structureBldr.elementConstraint("artist",
                        structureBldr.term("Aretha Franklin")),
                structureBldr.elementConstraint("writer",
                        structureBldr.term("Otis Redding")));

SearchHandle searchHandle = queryMgr.search(structuredQry, new SearchHandle());
for (MatchDocumentSummary docSum: searchHandle.getMatchResults()) {
        ...
}

The example uses StructuredQueryBuilder to create a StructuredQueryDefinition specifying the criteria for the artist and writer constraints defined by the constraints query options. Aside from using StructuredQueryDefinition instead of StringQueryDefinition, this example is the same as the previous example, qualifies the same documents, and produces the same output. A Java program, however, could easily change one of the terms or add new complex Boolean conditions without string parsing.

If you prefer, you can also write a structured query as a JSON or XML document. While the rest of the tutorial will stick with string queries for consistency, in each case, the search criteria could have been specified with a structured query.

Analyzing a Dataset with Facetted Search

So far, the examples have answered specific questions. To help frame questions, it's also useful to get a broad overview of the dataset. Facet analysis meets that requirement by performing counts or other aggregates on the entire dataset or a subset of interest. The next example supports facet analysis by genre or over time.

When you imported the package at the start of this tutorial, the import action configured the top songs database. The configuration created range indexes on the genre and week elements. A range index provides a basis for calculating facets. Now, we're ready to take advantage of those genre and week range indexes.

As with the artist and writer indexes in a previous example, the FacettedSearcher example creates constraints for the genre and week indexes in query options. The constraints identify the range indexes and their datatypes. The example sorts the genres in descending order by number of songs in the genre.

Query Options for Facets

 

optHandle.withConstraints(
        optBldr.constraint("genre",
                optBldr.range(
                        optBldr.elementRangeIndex(
                                new QName("genre"),
                                optBldr.stringRangeType(
                                        "http://marklogic.com/collation/")),
                        Facets.FACETED,
                        FragmentScope.DOCUMENTS,
                        null,
                        "frequency-order", "descending")),
        optBldr.constraint("week",
                optBldr.range(
                        optBldr.elementRangeIndex(
                                new QName("week"),
                                optBldr.rangeType("xs:date")))));
optHandle.setReturnResults(false);
optMgr.writeOptions("facetsongs", optHandle);

 

The source code fragment skips over the construction of the QueryOptionsBuilder and QueryOptionsHandle builder, which remains the same as the earlier example. The call to QueryOptionsHandle.setReturnResults() modifies searches to return just the facet analysis and not a page of search results.

The facetsongs query options have done the heavy lifting of defining the facets. The FacettedSearcher example specifies the facetsongs query options when constructing the string definition. The example performs the facet analysis on the subset of the songs that contain the Grammy term anywhere in the document. A search could use complex Booleans for a smaller subset or no criteria for the entire dataset.

Facet Search

 

StringQueryDefinition stringQry = queryMgr.newStringDefinition("facetsongs");
stringQry.setCriteria("Grammy");

SearchHandle searchHandle = queryMgr.search(stringQry, new SearchHandle());
for (FacetResult facet: searchHandle.getFacetResults()) {
        System.out.println("facet: "+facet.getName());
   for (FacetValue value: facet.getFacetValues()) {
                System.out.println("    "+value.getLabel()+" = "+value.getCount());
        }
}

As with search results, SearchHandle parses the list of facets into a Java structure with the values and their aggregate counts. You can also read facets as JSON or XML.

The example output analyzes the genres and weeks for all songs having the Grammy term.

Facet Output

facet: genre

Pop = 79

R&B = 71

...

Rhythm And Blues = 2

...

facet: week

1940-07-27 = 1

1940-08-03 = 1

1940-08-10 = 1

... 

The output shows that consolidating genre values like R&B and Rhythm And Blues would improve the quality of the dataset. That's fine and to be expected from real-world Big Data. Cleaning up those blemishes won't change the big picture, so we can get value from our dataset immediately. If later applications could benefit from fixing these flaws, the facet analysis has shown us what to fix. We can refine the dataset in place without getting in the way of existing applications. Such flexible, progressive refinement differs from traditional databases, where changes to data structures and associations have a disruptive impact on applications.

Summarizing a Dataset with Limits and Buckets

For some purposes, facet analysis provides too much detail. To get a fast summary of a dataset, you might want to aggregate ranges of values and eliminate outliers.

Query options can limit the number of facet values. When facet values are ordered by descending frequency, the effect is to return the top values. Query options can also define buckets for grouping facet values. The BuckettedSearcher example refines the previous query options to add a limit and buckets:

Query Options for Limits & Buckets

 

optHandle.withConstraints(
        optBldr.constraint("genre",
                optBldr.range(
                        optBldr.elementRangeIndex(
                                new QName("genre"),
                                optBldr.stringRangeType(
                                        "http://marklogic.com/collation/")),
                        Facets.FACETED,
                        FragmentScope.DOCUMENTS,
                        null,
                        "frequency-order", "descending", "limit=10")),
        optBldr.constraint("week",
                optBldr.range(
                        optBldr.elementRangeIndex(
                                new QName("week"),
                                optBldr.rangeType("xs:date")),
                        Facets.FACETED,
                        FragmentScope.DOCUMENTS,
                        optBldr.buckets(
                        optBldr.bucket("1940s", "40s", "1940-01-01", "1950-01-01"),
                        optBldr.bucket("1950s", "50s", "1950-01-01", "1960-01-01"),
                        ...,
                        optBldr.bucket("2000s", "00s", "2000-01-01", "2010-01-01")
                        ))));

 

Other than referring to the revised query options, the BuckettedSearcher example has exactly same search code as the previous example. Because of the query options changes, however, the example produces only the top genres and groups songs by decade instead of by week.

Facet Output Limits & Buckets

facet: genre

Pop = 79

R&B = 71

...

Country = 8

facet: week

40s = 4

50s = 11

...

00s = 67

                

Pages

Erik Hennum and Charles Greer

What do you think?

JAX Magazine - 2014 - 03 Exclucively for iPad users JAX Magazine on Android

Comments

Latest opinions