Tutorial

Writing and Searching for POJOs in MarkLogic - Part 2

                         

Annotating the POJO Classes

The dataset for this tutorial consists of top songs extracted from Wikipedia (http://en.wikipedia.org/wiki/Category:Lists_of_number-one_songs_in_the_United_States). Each song is described by a standalone tree structure modelled with nested POJOs (similar to JSON but with strong typing). To enable processing by JAXB, the POJO classes have two JAXB annotations: one on the root class for the tree structure and one on the descr property.

JAXB Annotation

@XmlRootElement
public class TopSong {
        ...
        public Artist getArtist() {
                ...
        }
        @XmlAnyElement
        public Element getDescr() {
                ...
        }
}

 

The descr property contains marked-up text as a target for fulltext search. Other key properties include exactly one artist as well as zero or many writers, producers, genres, and weeks.

Writing POJOs To the Database

The tutorial source provides the serialized POJOs in XML files. Aside from the descr property, the POJOs are vanilla Java beans and could be loaded from a Java object input stream or any other source.

The POJOWriter example creates a database client and iterates over the serialized POJOs files, using JAXB to write the POJOs to the database as separate documents. Each document has a unique URI and contains a root object and its subordinate objects. Here's the source code condensed to focus on the important parts (which will also be true of subsequent examples).

Document Write

DatabaseClient dbClient = DatabaseClientFactory.newClient(
        "localhost", 8005, "rest-admin", "x", Authentication.DIGEST);

XMLDocumentManager docMgr = dbClient.newXMLDocumentManager();

JAXBContext context = JAXBContext.newInstance(TopSong.class);
JAXBHandle writeHandle = new JAXBHandle(context);
for (File songfile: inputDir.listFiles()) {
        TopSong song = ... read the serialized POJO from the file ... ;
        writeHandle.set(song);
        docMgr.write("/topsongs/"+songfile.getName(), writeHandle);
}

dbClient.release();

 

Every application using the API creates a DatabaseClient before interacting with the database and releases the client afterward. Subsequent examples will omit these statements to focus on new ideas.

The example above calls the XMLDocumentManager.write() method to persist each POJO as a document in the database. The JAXBHandle class adapts JAXB for integration into the API. The API uses adapters like JAXBHandle to integrate standard content representations as diverse as binary InputStream, character String, and StAX XMLStreamReader.

Reading a POJO from the Database

The POJOReader example confirms the previous load by calling the XMLDocumentManager.read() method to get a POJO from the database, again using JAXB. 

Document Read

XMLDocumentManager docMgr = dbClient.newXMLDocumentManager();

JAXBContext context = JAXBContext.newInstance(TopSong.class);
JAXBHandle readHandle = new JAXBHandle(context);
docMgr.read("/topsongs/Aretha-Franklin+Respect.xml", readHandle);

TopSong song = (TopSong) readHandle.get();
... print the properties of the POJO ...

 

The example prints out the POJO properties, producing the following output:

document: /topsongs/Aretha-Franklin+Respect.xml

title | Respect

artist | Aretha Franklin

writers | Otis Redding

producers | Steve Cropper

genres | Soul

weeks | 1967-06-03 | 1967-06-10 

Subsequent examples will search these properties and the text of the descr property.

Searching for the Value of a Property

Now we're ready to investigate the top songs dataset. Looking at the output for Respect, we might wonder whether Otis Redding wrote any other hit songs.

The KeyValueSearcher example finds all documents where the writer element contains the exact value Otis Redding. Such searches resemble equals predicates in the WHERE clause of an SQL database but can operate on varied document structures instead of rigid relational tables.

KeyValue Search

QueryManager queryMgr = dbClient.newQueryManager();

KeyValueQueryDefinition keyValueQry = queryMgr.newKeyValueDefinition();
keyValueQry.put(
        queryMgr.newElementLocator(new Qname("writer")), "Otis Redding");

SearchHandle searchHandle = queryMgr.search(keyValueQry, new SearchHandle());
for (MatchDocumentSummary docSum: searchHandle.getMatchResults()) {
        System.out.println("document: "+docSum.getUri());
        for (MatchLocation docLoc: docSum.getMatchLocations()) {
                System.out.println("    location: "+docLoc.getPath());
                System.out.println("    matched:  "+docLoc.getAllSnippetText());
        }
}

 

All queries use a QueryManager. (Subsequent examples skip its construction.) The KeyValueQueryDefinition class specifies the query criteria. The call to QueryManager.search() searches the database. SearchHandle parses the results into a Java structure reflecting documents matched by the query and locations matched within each document. You can also get search results in JSON or XML if you prefer.

The example iterates over the matched documents and locations to generate the following output, which answers the question. Otis Redding wrote two top songs.  

KeyValue Search Output

 

document: /topsongs/Aretha-Franklin+Respect.xml 
    location: /topSong/writers 
    matched:  Otis Redding 
document: /topsongs/Otis-Redding+Sittin-On-The-Dock-of-the-Bay.xml 
    location: /topSong/writers 
    matched:  Otis Redding 

 

 For JSON documents, you can search on the value of a key in much the same way.

Searching for Terms in Text

When investigating a dataset, one question often leads to another. We might wonder whether Aretha Franklin and Otis Redding collaborated on other top songs. We can start with a simple string search.

A string search expresses query criteria including phrases and Booleans similar to the Google search box. You can prompt a user for the criteria, but it's also convenient for specifying static criteria in an application. Like a search engine, the StringSearcher example matches documents that contain both of the phrases Aretha Franklin and Otis Redding in any location.

String Search

StringQueryDefinition stringQry = queryMgr.newStringDefinition();
stringQry.setCriteria("\"Aretha Franklin\" AND \"Otis Redding\"");

SearchHandle searchHandle = queryMgr.search(stringQry, new SearchHandle());
for (MatchDocumentSummary docSum: searchHandle.getMatchResults()) {
        ...
}

 

The example differs from the previous example only in the use of StringQueryDefinition to specify the criteria.

In some cases, a quick phrase search is enough to get the answer. In this case, however, the output shows that the search was too general.

String Search Output

document: /topsongs/Aretha-Franklin+Respect.xml

location: /topSong/artist/artistId

matched: http://en.wikipedia.org/wiki/Aretha_Franklin

location: /topSong/artist

matched: Aretha Franklin

location: /topSong/descr/p[1]

matched: ...Stax recording artist Otis Redding in 1965. "Respect" became...

document: /topsongs/Jailhouse-Rock-Elvis-Presley+You-Send-Me-Summertime-...

location: /topSong/descr/p[4]

matched: ...Aretha Franklin, The Supremes, Otis Redding 

The search matched phrases mentioning Aretha Franklin and Otis Redding in the description, which doesn't indicate whether they collaborated on the song.

Searching for Combinations of Properties

To get a definitive answer for our question, we need to constrain our phrase search to the artist and writer properties. We define constraints with query options. Query options specify the static parts of a query including not only constraints but the result page length and so on. You write query options to the database before executing a search that supply the dynamic parts of the query including the criteria, the result page number, and so on.

The ConstrainedSearcher example builds the query options as a data structure in Java:

Query Options for Constraints

 

QueryOptionsManager optMgr =
        dbClient.newServerConfigManager().newQueryOptionsManager();
QueryOptionsBuilder optBldr = new QueryOptionsBuilder();

QueryOptionsHandle optHandle = new QueryOptionsHandle();
optHandle.withConstraints(
        optBldr.constraint("artist",
                optBldr.elementQuery(new QName("artistName"))),
        optBldr.constraint("writer",
                optBldr.elementQuery(new QName("writer"))));

optMgr.writeOptions("constraints", optHandle);

 

As you might expect, the API provides a QueryOptionsManager to write, read, and delete query options. To build options as a Java structure, you use QueryOptionsBuilder and QueryOptionsHandle. In particular, the call to QueryOptionsHandle.withConstraints() specifies constraints on the artist and writer properties. That makes it possible to restrict search phrases to these properties (similar to the key-value search shown earlier). The QueryOptionsManager.writeOptions() call saves the query options under the name constraints.

By the way, because query options are typically set up by an experienced developer and used by other developers in applications, writing them requires a higher level of permissions. While we'll show how to build query options in Java, you can also write query options as JSON or XML documents if you prefer.

Now we can use the query options to constrain the POJO properties where the search matches the phrases. The ConstrainedSearcher example specifies the constraints query options when constructing the StringQueryDefinition object and then prefixes the Aretha Franklin phrase with the artist constraint and the Otis Redding phrase with the writer constraint. 

Search Constrainted by Options

StringQueryDefinition stringQry = queryMgr.newStringDefinition("constraints");
stringQry.setCriteria(
        "artist:\"Aretha Franklin\" AND writer:\"Otis Redding\"");

SearchHandle searchHandle = queryMgr.search(stringQry, new SearchHandle());
for (MatchDocumentSummary docSum: searchHandle.getMatchResults()) {
        ...
}

 

Apart from adding the query options and constraint prefixes, this example is unchanged from the previous version. The result output, however, is much more precise:

Constrained Search Output

document: /topsongs/Aretha-Franklin+Respect.xml

location: /topSong/artist

matched: Aretha Franklin

location: /topSong/writers

matched: Otis Redding 

Only one song had this combination of artist and writer, yielding our definitive answer.

                     

Pages

Erik Hennum and Charles Greer

What do you think?

JAX Magazine - 2014 - 06 Exclucively for iPad users JAX Magazine on Android

Comments

Latest opinions