Tutorial
Writing and Searching for POJOs in MarkLogic - Part 2
Annotating the POJO Classes
The dataset for this
tutorial consists of top songs extracted from Wikipedia
(http://en.wikipedia.org/wiki/Category:Lists_of_number-one_songs_in_the_United_States). Each song is described by a standalone tree
structure modelled with nested POJOs (similar to JSON but with
strong typing). To enable processing by JAXB, the POJO classes have
two JAXB annotations: one on the root class for the tree structure
and one on the descr property.
JAXB Annotation
@XmlRootElement
public class TopSong {
...
public Artist getArtist() {
...
}
@XmlAnyElement
public Element getDescr() {
...
}
}
The descr
property contains marked-up
text as a target for fulltext search. Other key properties include
exactly one artist as well as zero or many writers, producers, genres,
and weeks.
Writing POJOs To the Database
The tutorial source provides
the serialized POJOs in XML files. Aside from the
descr property, the POJOs are vanilla Java beans and could be
loaded from a Java object input stream or any other
source.
The POJOWriter example creates a database client and iterates over the serialized POJOs files, using JAXB to write the POJOs to the database as separate documents. Each document has a unique URI and contains a root object and its subordinate objects. Here's the source code condensed to focus on the important parts (which will also be true of subsequent examples).
Document Write
DatabaseClient dbClient = DatabaseClientFactory.newClient(
"localhost", 8005, "rest-admin", "x", Authentication.DIGEST);
XMLDocumentManager docMgr = dbClient.newXMLDocumentManager();
JAXBContext context = JAXBContext.newInstance(TopSong.class);
JAXBHandle writeHandle = new JAXBHandle(context);
for (File songfile: inputDir.listFiles()) {
TopSong song = ... read the serialized POJO from the file ... ;
writeHandle.set(song);
docMgr.write("/topsongs/"+songfile.getName(), writeHandle);
}
dbClient.release();
Every application using the API creates a DatabaseClient before interacting with the database and releases the client afterward. Subsequent examples will omit these statements to focus on new ideas.
The example above calls the XMLDocumentManager.write() method to persist each POJO as a document in the database. The JAXBHandle class adapts JAXB for integration into the API. The API uses adapters like JAXBHandle to integrate standard content representations as diverse as binary InputStream, character String, and StAX XMLStreamReader.
Reading a POJO from the Database
The POJOReader example confirms the previous load by calling the XMLDocumentManager.read() method to get a POJO from the database, again using JAXB.
Document Read
XMLDocumentManager docMgr = dbClient.newXMLDocumentManager();
JAXBContext context = JAXBContext.newInstance(TopSong.class);
JAXBHandle readHandle = new JAXBHandle(context);
docMgr.read("/topsongs/Aretha-Franklin+Respect.xml", readHandle);
TopSong song = (TopSong) readHandle.get();
... print the properties of the POJO ...
The example prints out the POJO properties, producing the following output:
document: /topsongs/Aretha-Franklin+Respect.xml
title | Respect
artist | Aretha Franklin
writers | Otis Redding
producers | Steve Cropper
genres | Soul
weeks | 1967-06-03 | 1967-06-10
Subsequent examples will
search these properties and the text of the descr
property.
Searching for the Value of a Property
Now we're ready to
investigate the top songs dataset. Looking at the output for
Respect,
we might wonder whether Otis Redding wrote any other hit
songs.
The
KeyValueSearcher example finds all documents where the writer
element contains the exact value Otis
Redding. Such
searches resemble equals predicates in the WHERE clause of an SQL
database but can operate on varied document structures instead of
rigid relational tables.
KeyValue Search
QueryManager queryMgr = dbClient.newQueryManager();
KeyValueQueryDefinition keyValueQry = queryMgr.newKeyValueDefinition();
keyValueQry.put(
queryMgr.newElementLocator(new Qname("writer")), "Otis Redding");
SearchHandle searchHandle = queryMgr.search(keyValueQry, new SearchHandle());
for (MatchDocumentSummary docSum: searchHandle.getMatchResults()) {
System.out.println("document: "+docSum.getUri());
for (MatchLocation docLoc: docSum.getMatchLocations()) {
System.out.println(" location: "+docLoc.getPath());
System.out.println(" matched: "+docLoc.getAllSnippetText());
}
}
All queries use a QueryManager. (Subsequent examples skip its construction.) The KeyValueQueryDefinition class specifies the query criteria. The call to QueryManager.search() searches the database. SearchHandle parses the results into a Java structure reflecting documents matched by the query and locations matched within each document. You can also get search results in JSON or XML if you prefer.
The example iterates over the matched documents and locations to generate the following output, which answers the question. Otis Redding wrote two top songs.
KeyValue Search Output
document: /topsongs/Aretha-Franklin+Respect.xml
location: /topSong/writers
matched: Otis Redding
document: /topsongs/Otis-Redding+Sittin-On-The-Dock-of-the-Bay.xml
location: /topSong/writers
matched: Otis Redding
For JSON documents, you can search on the value of a key in much the same way.
Searching for Terms in Text
When investigating a dataset, one question often leads to another. We might wonder whether Aretha Franklin and Otis Redding collaborated on other top songs. We can start with a simple string search.
A string search expresses
query criteria including phrases and Booleans similar to the Google
search box. You can prompt a user for the criteria, but it's also
convenient for specifying static criteria in an application. Like a
search engine, the StringSearcher example matches documents that contain both of the
phrases Aretha Franklin and Otis Redding in any location.
String Search
StringQueryDefinition stringQry = queryMgr.newStringDefinition();
stringQry.setCriteria("\"Aretha Franklin\" AND \"Otis Redding\"");
SearchHandle searchHandle = queryMgr.search(stringQry, new SearchHandle());
for (MatchDocumentSummary docSum: searchHandle.getMatchResults()) {
...
}
The example differs from the previous example only in the use of StringQueryDefinition to specify the criteria.
In some cases, a quick phrase search is enough to get the answer. In this case, however, the output shows that the search was too general.
String Search Output
document: /topsongs/Aretha-Franklin+Respect.xml
location: /topSong/artist/artistId
matched: http://en.wikipedia.org/wiki/Aretha_Franklin
location: /topSong/artist
matched: Aretha Franklin
location: /topSong/descr/p[1]
matched: ...Stax recording artist Otis Redding in 1965. "Respect" became...
document: /topsongs/Jailhouse-Rock-Elvis-Presley+You-Send-Me-Summertime-...
location: /topSong/descr/p[4]
matched: ...Aretha Franklin, The Supremes, Otis Redding
The search matched phrases mentioning Aretha Franklin and Otis Redding in the description, which doesn't indicate whether they collaborated on the song.
Searching for Combinations of Properties
To get a definitive answer
for our question, we need to constrain our phrase search to
the artist and writer properties. We define constraints with query options. Query
options specify the static parts of a query including not only
constraints but the result page length and so on. You write query
options to the database before executing a search that supply the
dynamic parts of the query including the criteria, the result page
number, and so on.
The ConstrainedSearcher example builds the query options as a data structure in Java:
Query Options for Constraints
QueryOptionsManager optMgr =
dbClient.newServerConfigManager().newQueryOptionsManager();
QueryOptionsBuilder optBldr = new QueryOptionsBuilder();
QueryOptionsHandle optHandle = new QueryOptionsHandle();
optHandle.withConstraints(
optBldr.constraint("artist",
optBldr.elementQuery(new QName("artistName"))),
optBldr.constraint("writer",
optBldr.elementQuery(new QName("writer"))));
optMgr.writeOptions("constraints", optHandle);
As you might expect, the API
provides a QueryOptionsManager to write, read, and delete query options. To build
options as a Java structure, you use QueryOptionsBuilder
and QueryOptionsHandle. In particular, the call to
QueryOptionsHandle.withConstraints()
specifies constraints on
the artist and writer properties. That makes it possible to restrict search
phrases to these properties (similar to the key-value search shown
earlier). The QueryOptionsManager.writeOptions()
call saves the query options
under the name constraints.
By the way, because query options are typically set up by an experienced developer and used by other developers in applications, writing them requires a higher level of permissions. While we'll show how to build query options in Java, you can also write query options as JSON or XML documents if you prefer.
Now we can use the query
options to constrain the POJO properties where the search matches
the phrases. The ConstrainedSearcher example specifies the constraints query options when constructing the
StringQueryDefinition object and then prefixes the Aretha
Franklin phrase
with the artist constraint and the Otis Redding phrase with the writer
constraint.
Search Constrainted by Options
StringQueryDefinition stringQry = queryMgr.newStringDefinition("constraints");
stringQry.setCriteria(
"artist:\"Aretha Franklin\" AND writer:\"Otis Redding\"");
SearchHandle searchHandle = queryMgr.search(stringQry, new SearchHandle());
for (MatchDocumentSummary docSum: searchHandle.getMatchResults()) {
...
}
Apart from adding the query options and constraint prefixes, this example is unchanged from the previous version. The result output, however, is much more precise:
Constrained Search Output
document: /topsongs/Aretha-Franklin+Respect.xml
location: /topSong/artist
matched: Aretha Franklin
location: /topSong/writers
matched: Otis Redding
Only one song had this combination of artist and writer, yielding our definitive answer.
Pages
- Introduction to MarkLogic and Setup
- Annotating, Reading and Writing POJOS with help from Otis Redding and Aretha Franklin
- Modifying Criteria Dynamically with Structured Search
- Counting Property Values for a Dataset
Follow us