Writing and Searching for POJOs in MarkLogic


In this tutorial, find out how to store and search POJOs in a MarkLogic database without giving up consistency, reliability, or scale.

With traditional relational databases, persisting your
in-memory data structures requires complex ORM (Object-Relational
Mapping) tools to handle the well-known impedance mismatch.
Next-generation NoSQL databases that support variety on stored
information can provide a simpler solution. In this tutorial, find
out how to store and search POJOs in a MarkLogic database without
giving up consistency, reliability, or scale.

A Quick Introduction to
MarkLogic Server

MarkLogic Server is an Enterprise
NoSQL database, supporting a schema-optional document data model,
ACID transactions, security, and real-time search indexing.
Supported document formats include XML, JSON, text, and even binary
(such as video or PDF). Features include:

  • Speed – with a C++ implementation
    optimized for today’s IO systems.

  • Scalability – with a shared-nothing distributed

  • High availability – with replication and
    disaster recovery.

The latest version adds the
MarkLogic Java API to make it easy to take advantage of the server
in your Java applications. For this tutorial, you’ll download the
free version of MarkLogic Server. We’ll work through some typical
data discovery scenarios with a music dataset, executing queries
both to answer specific questions and to get a better overall
understanding of the dataset. To make things simple, we’ll work
with data in a POJO representation. The setup steps consist of
installing MarkLogic Server, downloading the tutorial, and running
a bootstrapping utility that defines a couple of users and creates
the database and REST server.

Installing and Starting
MarkLogic Server

Download and install the latest version of MarkLogic from
Once you’ve installed and started MarkLogic, go to the
browser-based administrative interface (at http://localhost:8001/), which will walk you through
getting an Express license and creating an admin user. (This
tutorial assumes you’ll be running MarkLogic on your local machine;
if that’s not the case, just substitute your server name whenever
you see “localhost” in this tutorial.)

For more detailed instructions on installing and running
MarkLogic, see Installing
MarkLogic Server

Downloading the

After starting the server,
download the tutorial source code from http://developer.marklogic.com/media/pojo-tutorial-01.zip.
Unzip the distribution. You’ll find a standard Maven source
structure that you can use, for instance, in m2e. You can, of
course, work with the sources and classes without Maven if you
prefer by looking for the sources under the src/main/java directory and for runtime
environment under the target/classes

In the following sections, we’ll
only show the highlights from the source code and output. To get
the most out of this tutorial, you should view the complete
examples in your IDE or editor and run the examples to see the
complete output.

To run the tutorial examples,
you’ll need to set up a Java 6 runtime environment (preferrably the
latest stable distribution). You configure your CLASSPATH in the
usual way:

  • From the command-line, specify the root
    directory for the tutorial classes and the jars for the Java API
    and its lib dependencies on your CLASSPATH.

  • In an IDE such as Eclipse, create a
    project with the tutorial classes in the source directory. Either
    add the jars for the Java API and its lib dependencies to your
    build path or use the Maven POM in the tutorial distribution to
    download these dependencies to your Maven repository.

Setting up the Tutorial’s
Server Environment

This tutorial focuses on application programming rather than
MarkLogic server adminstration. Therefore, this tutorial provides a
utility to set up the server environment in one step. Before you
start, find and check the values in tutorial.properties.
The default values should be correct for your setup; simply ensure
that the values for tutorial.bootstrap_user
and tutorial.bootstrap_password
match the adminstrative credentials for the MarkLogic server. Be
wary of modifying the other values shipped with tutorial.properties.
To bootstrap the REST server’s environment, run the following
command at the command line:

Bootstrapping the Tutorial’s server-side environment


java -cp CLASSPATH com.marklogic.client.tutorial.util.CreateDatabaseServer


Alternatively, use an IDE to execute this class’s main method.
When its done, this command will have completed the following:

  • Created two users for running your application.

    • the rest-admin user is permitted to configure
      the application. The bootstrapper sets up this user with the
      password “x“.

    • The rest-writer user is allowed to write and
      update documents, as well as execute searches and retrieve
      documents. This user is also created with the password

  • Created a new database called “TopSongs” for the application
    data, and “TopSongs-modules” to hold extension code.

  • Added two range indexes to the “TopSongs” database to support
    some of the searches below.

Later, when you want to set
up your own database, REST server, and indexes, go to

http://localhost:8000/appservices/, click the New Database button, select the database, and click the
button. Now we’re ready for a
quick look at the dataset.


Annotating the POJO

The dataset for this
tutorial consists of top songs extracted from Wikipedia
http://en.wikipedia.org/wiki/Category:Lists_of_number-one_songs_in_the_United_States). Each song is described by a standalone tree
structure modelled with nested POJOs (similar to JSON but with
strong typing). To enable processing by JAXB, the POJO classes have
two JAXB annotations: one on the root class for the tree structure
and one on the
descr property.

JAXB Annotation

public class TopSong {
        public Artist getArtist() {
        public Element getDescr() {


The descr
property contains marked-up
text as a target for fulltext search. Other key properties include
exactly one
artist as well as zero or many writers, producers, genres,

Writing POJOs To the

The tutorial source provides
the serialized POJOs in XML files. Aside from the

descr property, the POJOs are vanilla Java beans and could be
loaded from a Java object input stream or any other

POJOWriter example creates a database client and iterates over the
serialized POJOs files, using JAXB to write the POJOs to the
database as separate documents. Each document has a unique URI and
contains a root object and its subordinate objects. Here’s the
source code condensed to focus on the important parts (which will
also be true of subsequent examples).

Document Write

DatabaseClient dbClient = DatabaseClientFactory.newClient(
        "localhost", 8005, "rest-admin", "x", Authentication.DIGEST);

XMLDocumentManager docMgr = dbClient.newXMLDocumentManager();

JAXBContext context = JAXBContext.newInstance(TopSong.class);
JAXBHandle writeHandle = new JAXBHandle(context);
for (File songfile: inputDir.listFiles()) {
        TopSong song = ... read the serialized POJO from the file ... ;
        docMgr.write("/topsongs/"+songfile.getName(), writeHandle);



Every application using the
API creates a
DatabaseClient before interacting with the database and releases the
client afterward. Subsequent examples will omit these statements to
focus on new ideas.

The example above calls
XMLDocumentManager.write() method to persist each POJO as a document
in the database. The
JAXBHandle class adapts JAXB for integration into the API. The API
uses adapters like JAXBHandle to integrate standard content
representations as diverse as binary InputStream, character String,
and StAX XMLStreamReader.

Reading a POJO from the

POJOReader example confirms the previous load by calling the
XMLDocumentManager.read() method to get a POJO from the database,
again using JAXB. 

Document Read

XMLDocumentManager docMgr = dbClient.newXMLDocumentManager();

JAXBContext context = JAXBContext.newInstance(TopSong.class);
JAXBHandle readHandle = new JAXBHandle(context);
docMgr.read("/topsongs/Aretha-Franklin+Respect.xml", readHandle);

TopSong song = (TopSong) readHandle.get();
... print the properties of the POJO ...


The example prints out the POJO
properties, producing the following output:


title | Respect

artist | Aretha Franklin

writers | Otis Redding

producers | Steve Cropper

genres | Soul

weeks | 1967-06-03 |

Subsequent examples will
search these properties and the text of the

Searching for the Value of a

Now we’re ready to
investigate the top songs dataset. Looking at the output for

we might wonder whether Otis Redding wrote any other hit

KeyValueSearcher example finds all documents where the writer
element contains the exact value
. Such
searches resemble equals predicates in the WHERE clause of an SQL
database but can operate on varied document structures instead of
rigid relational tables.

KeyValue Search

QueryManager queryMgr = dbClient.newQueryManager();

KeyValueQueryDefinition keyValueQry = queryMgr.newKeyValueDefinition();
        queryMgr.newElementLocator(new Qname("writer")), "Otis Redding");

SearchHandle searchHandle = queryMgr.search(keyValueQry, new SearchHandle());
for (MatchDocumentSummary docSum: searchHandle.getMatchResults()) {
        System.out.println("document: "+docSum.getUri());
        for (MatchLocation docLoc: docSum.getMatchLocations()) {
                System.out.println("    location: "+docLoc.getPath());
                System.out.println("    matched:  "+docLoc.getAllSnippetText());


All queries use a
QueryManager. (Subsequent examples skip its construction.) The
KeyValueQueryDefinition class specifies the query criteria. The call
QueryManager.search() searches the database. SearchHandle parses the results into a Java structure
reflecting documents matched by the query and locations matched
within each document. You can also get search results in JSON or
XML if you prefer.

The example iterates over the
matched documents and locations to generate the following output,
which answers the question. Otis Redding wrote two top songs.

KeyValue Search Output


document: /topsongs/Aretha-Franklin+Respect.xml 
    location: /topSong/writers 
    matched:  Otis Redding 
document: /topsongs/Otis-Redding+Sittin-On-The-Dock-of-the-Bay.xml 
    location: /topSong/writers 
    matched:  Otis Redding 


 For JSON documents, you can
search on the value of a key in much the same way.

Searching for Terms in

When investigating a dataset, one
question often leads to another. We might wonder whether Aretha
Franklin and Otis Redding collaborated on other top songs. We can
start with a simple string search.

A string search expresses
query criteria including phrases and Booleans similar to the Google
search box. You can prompt a user for the criteria, but it’s also
convenient for specifying static criteria in an application. Like a
search engine, the
StringSearcher example matches documents that contain both of the
Aretha Franklin and Otis Redding in any location.

String Search

StringQueryDefinition stringQry = queryMgr.newStringDefinition();
stringQry.setCriteria(""Aretha Franklin" AND "Otis Redding"");

SearchHandle searchHandle = queryMgr.search(stringQry, new SearchHandle());
for (MatchDocumentSummary docSum: searchHandle.getMatchResults()) {


The example differs from the
previous example only in the use of
to specify the

In some cases, a quick phrase
search is enough to get the answer. In this case, however, the
output shows that the search was too general.

String Search Output





location: /topSong/artist

matched: Aretha Franklin


matched: …Stax recording artist Otis
Redding in 1965. “Respect” became…



matched: …Aretha Franklin, The Supremes,
Otis Redding 

The search matched phrases
mentioning Aretha Franklin and Otis Redding in the description,
which doesn’t indicate whether they collaborated on the song.

Searching for
Combinations of Properties

To get a definitive answer
for our question, we need to constrain our phrase search to
artist and writer properties. We define constraints with query options. Query
options specify the static parts of a query including not only
constraints but the result page length and so on. You write query
options to the database before executing a search that supply the
dynamic parts of the query including the criteria, the result page
number, and so on.

ConstrainedSearcher example builds the query
options as a data structure in Java:

Query Options for


QueryOptionsManager optMgr =
QueryOptionsBuilder optBldr = new QueryOptionsBuilder();

QueryOptionsHandle optHandle = new QueryOptionsHandle();
                optBldr.elementQuery(new QName("artistName"))),
                optBldr.elementQuery(new QName("writer"))));

optMgr.writeOptions("constraints", optHandle);


As you might expect, the API
provides a
QueryOptionsManager to write, read, and delete query options. To build
options as a Java structure, you use
and QueryOptionsHandle. In particular, the call to
specifies constraints on
artist and writer properties. That makes it possible to restrict search
phrases to these properties (similar to the key-value search shown
earlier). The
call saves the query options
under the name

By the way, because query options
are typically set up by an experienced developer and used by other
developers in applications, writing them requires a higher level of
permissions. While we’ll show how to build query options in Java,
you can also write query options as JSON or XML documents if you

Now we can use the query
options to constrain the POJO properties where the search matches
the phrases. The
ConstrainedSearcher example specifies the constraints query options when constructing the
StringQueryDefinition object and then prefixes the Aretha
with the
artist constraint and the Otis Redding phrase with the writer

Search Constrainted by

StringQueryDefinition stringQry = queryMgr.newStringDefinition("constraints");
        "artist:"Aretha Franklin" AND writer:"Otis Redding"");

SearchHandle searchHandle = queryMgr.search(stringQry, new SearchHandle());
for (MatchDocumentSummary docSum: searchHandle.getMatchResults()) {


Apart from adding the query
options and constraint prefixes, this example is unchanged from the
previous version. The result output, however, is much more

Constrained Search Output


location: /topSong/artist

matched: Aretha Franklin


matched: Otis

Only one song had this combination
of artist and writer, yielding our definitive answer.


Modifying Criteria
Dynamically with Structured Search

From time to time, you might need
to modify or inspect criteria programmatically. Examples include
providing a GUI editor for search criteria, adding hidden criteria,
checking for invalid or unauthorized criteria, or generating
criteria to reflect the current state of an external resource.

As with query options, you use a
builder to create a Java structure. The
StructuredSearcher example builds a structured
search for the same constrained criteria that the previous example
expressed as a string.  

Structured Search

StructuredQueryBuilder structureBldr =
StructuredQueryDefinition structuredQry =
                        structureBldr.term("Aretha Franklin")),
                        structureBldr.term("Otis Redding")));

SearchHandle searchHandle = queryMgr.search(structuredQry, new SearchHandle());
for (MatchDocumentSummary docSum: searchHandle.getMatchResults()) {

The example uses
StructuredQueryBuilder to create a
StructuredQueryDefinition specifying the criteria
for the artist and writer constraints defined by the constraints
query options. Aside from using StructuredQueryDefinition instead
of StringQueryDefinition, this example is the same as the previous
example, qualifies the same documents, and produces the same
output. A Java program, however, could easily change one of the
terms or add new complex Boolean conditions without string

If you prefer, you can also write
a structured query as a JSON or XML document. While the rest of the
tutorial will stick with string queries for consistency, in each
case, the search criteria could have been specified with a
structured query.

Analyzing a Dataset with
Facetted Search

So far, the examples have answered
specific questions. To help frame questions, it’s also useful to
get a broad overview of the dataset. Facet analysis meets that
requirement by performing counts or other aggregates on the entire
dataset or a subset of interest. The next example supports facet
analysis by genre or over time.

When you imported the package at
the start of this tutorial, the import action configured the top
songs database. The configuration created range indexes on the
genre and week elements. A range index provides a basis for
calculating facets. Now, we’re ready to take advantage of those
genre and week range indexes.

As with the artist and writer
indexes in a previous example, the
FacettedSearcher example creates constraints for
the genre and week indexes in query options. The constraints
identify the range indexes and their datatypes. The example sorts
the genres in descending order by number of songs in the genre.

Query Options for Facets


                                new QName("genre"),
                        "frequency-order", "descending")),
                                new QName("week"),
optMgr.writeOptions("facetsongs", optHandle);


The source code fragment skips
over the construction of the QueryOptionsBuilder
and QueryOptionsHandle builder, which remains the
same as the earlier example. The call to
QueryOptionsHandle.setReturnResults() modifies
searches to return just the facet analysis and not a page of search

The facetsongs query options have
done the heavy lifting of defining the facets. The
FacettedSearcher example specifies the facetsongs
query options when constructing the string definition. The example
performs the facet analysis on the subset of the songs that contain
the Grammy term anywhere in the
document. A search could use complex Booleans for a smaller subset
or no criteria for the entire dataset.

Facet Search


StringQueryDefinition stringQry = queryMgr.newStringDefinition("facetsongs");

SearchHandle searchHandle = queryMgr.search(stringQry, new SearchHandle());
for (FacetResult facet: searchHandle.getFacetResults()) {
        System.out.println("facet: "+facet.getName());
   for (FacetValue value: facet.getFacetValues()) {
                System.out.println("    "+value.getLabel()+" = "+value.getCount());

As with search results, SearchHandle parses the
list of facets into a Java structure with the values and their
aggregate counts. You can also read facets as JSON or XML.

The example output analyzes the
genres and weeks for all songs having the Grammy term.

Facet Output

facet: genre

Pop = 79

R&B = 71

Rhythm And Blues = 2

facet: week

1940-07-27 = 1

1940-08-03 = 1

1940-08-10 = 1


The output shows that
consolidating genre values like R&B and Rhythm And
would improve the quality of the dataset. That’s fine
and to be expected from real-world Big Data. Cleaning up those
blemishes won’t change the big picture, so we can get value from
our dataset immediately. If later applications could benefit from
fixing these flaws, the facet analysis has shown us what to fix. We
can refine the dataset in place without getting in the way of
existing applications. Such flexible, progressive refinement
differs from traditional databases, where changes to data
structures and associations have a disruptive impact on

Summarizing a Dataset
with Limits and Buckets

For some purposes, facet analysis
provides too much detail. To get a fast summary of a dataset, you
might want to aggregate ranges of values and eliminate

Query options can limit the number
of facet values. When facet values are ordered by descending
frequency, the effect is to return the top values. Query options
can also define buckets for grouping facet values. The
BuckettedSearcher example refines the previous
query options to add a limit and buckets:

Query Options for Limits &


                                new QName("genre"),
                        "frequency-order", "descending", "limit=10")),
                                new QName("week"),
                        optBldr.bucket("1940s", "40s", "1940-01-01", "1950-01-01"),
                        optBldr.bucket("1950s", "50s", "1950-01-01", "1960-01-01"),
                        optBldr.bucket("2000s", "00s", "2000-01-01", "2010-01-01")


Other than referring to the
revised query options, the BuckettedSearcher
example has exactly same search code as the previous example.
Because of the query options changes, however, the example produces
only the top genres and groups songs by decade instead of by

Facet Output Limits &

facet: genre

Pop = 79

R&B = 71

Country = 8

facet: week

40s = 4

50s = 11

00s = 67


Counting Property Values
for a Dataset

The broad understanding of the
dimensions of the dataset gained through facet analysis can frame
the investigation of specific questions. Knowing the genres for the
song dataset suggest that, if we want to investigate the breadth of
Quincy Jones career, we could look at the genres for the songs he
has produced. Such questions can be answered quickly based on a
range index.

First, the
ValuesLister example defines a producer constraint
(much like the artist and writer constraints in a previous
example). The query options also identify the range index supplying
the list of values (in this case, the genre values).

Query Options for Values

                optBldr.elementQuery(new QName("producer")) )); 
                                new QName("genre"), 
                                        "http://marklogic.com/collation/" ))))); 
optMgr.writeOptions("valuesongs", optHandle);


To query for the values, the
ValuesLister example constructs a
ValuesDefinition with the name of the values list
(genre) specified in the query options
(valuesongs). The example also
constructs a StringQueryDefinition, prefixes
Quincy Jones with the producer constraint (as with Aretha Franklin and the artist constraint previously), and initializes the
ValuesDefinition with the StringQueryDefinition to constrain the
values list to the songs produced by Quincy Jones.  

Values List


ValuesDefinition valdef = queryMgr.newValuesDefinition("genre", "valuesongs");

StringQueryDefinition stringQry = queryMgr.newStringDefinition();
stringQry.setCriteria("producer:"Quincy Jones"");

ValuesHandle genreHandle = queryMgr.values(valdef, new ValuesHandle());
for (CountedDistinctValue value: genreHandle.getValues()) {
                "    "+value.getCount()+" "+value.get("xs:string", String.class));

The call to QueryManager.values() reads the
index and ValuesHandle parses the list into a Java
structure reflecting the values for the constrained subset. That’s
similar to the search() method with a SearchHandle in previous
examples, but in this case, reading directly from the index. As
elsewhere, you can also get the values list as JSON or XML. The
example iterates over the list to get each count and value.

The output shows that Quincy Jones
has produced a surprising diversity of hit songs:  

Values Output

Hit songs per genre for producer Quincy

1 Country Soul

1 Dance

1 Glam Metal

2 Hard Rock

1 Jazz

1 West Coast Hip

Counting Property
Co-Occurrence for a Dataset

A top song is a hit in one or more
weeks and can be classified in one or more genres; thus, each top
song associates weeks with genres. These associations of weeks and
genres (called co-occurrence or, when read from the database,
tuples) can demonstrate trends over time for genres. For instance,
we can investigating the trend for songs produced by Quincy

In the query options, the producer
constraint remains the same as the previous example (and so isn’t
included in the fragment below). The TuplesLister
example builds the week-genre tuple
list over the week and genre range indexes (instead of a values
list for one range index).

Query Options for Tuples


                                        new QName("week"),
                                        new QName("genre"),
optMgr.writeOptions("tuplesongs", optHandle);


To query for the tuples, the
TuplesLister example constructs a
ValuesDefinition with the name of the tuples list
(weeks-genre) specified in the query
options (tuplesongs). The example
constrains the query to songs produced by Quincy Jones with the
same StringQueryDefinition as the previous example (and so doesn’t
include those statements in the fragment below).

Tuples List


ValuesDefinition valdef =
        queryMgr.newValuesDefinition("week-genre", "tuplesongs");

DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd");

TuplesHandle tuplesHandle = queryMgr.tuples(valdef, new TuplesHandle());
for (Tuple tuple: tuplesHandle.getTuples()) {
        System.out.print("    "+tuple.getCount()+" ");
        for (TypedDistinctValue value: tuple.getValues()) {
                String type = value.getType();
                if ("xs:date".equals(type)) {
                } else if ("xs:string".equals(type)) {


The call to
QueryManager.tuples() reads the indexes and
TuplesHandle parses the tuples into a Java
structure reflecting the values for the constrained subset. The
example iterates over the tuples to get each value, formatting the
date values for weeks using a Java DateFormat.

The output satisfies the goal of
the investigation by showing that Quincy Jones started by producing
Country Soul / R&B songs and transitioned through other genres
to Hip Hop.

Tuples Output

Hit song genres by week for producer Quincy

1 1962-06-02 Country Soul

1 1962-06-02 R&B

1 1996-07-13 West Coast Hip

1 1996-07-20 West Coast Hip

Summary and

This tutorial provided a quick
overview of how to use the Java API to persist and query POJOs in
MarkLogic Server. In particular, you learned how to:

  • Write POJOs to and read POJOs from the

  • Search persisted POJOs with key-value,
    string, or structured criteria.

  • Specify Boolean, fulltext, or
    element-constrained criteria.

  • Perform facet analysis over POJO
    properties including bucketting.

  • Extract POJO property values and tuples
    from indexes.

The MarkLogic Java API works with
other kinds of content besides POJOs. You can perform CRUD
operations on binary (including PDF and video), JSON, XML, and text
documents with collection, permission, and property metadata. You
can use multi-statement transactions and optimistic locking control
for CRUD operations. In search, you can take advantage of
geospatial search and faceting, aggregate functions over indexes
(including user-defined aggregates), and flexible snippeting and
element extraction for search results. Finally, you can extend the
API with server-side transforms and new resource services.

MarkLogic Server has too many
capabilities to explore in one tutorial including Hadoop
integration, reverse queries and alerting, server-side content
processing pipelines (for conversion, enrichment, or metadata
extraction), flexible replication, and ingestion and monitoring
tools. Major corporations and government agencies have used
MarkLogic Server in mission-critical solutions for years. Whether
evaluating NoSQL platforms for an enterprise solution or rolling
out a fast implementation on the Express license for a great
idea all your own, you can learn more at http://developer.marklogic.com.


Express downloads – http://developer.marklogic.com/express

Comprehensive tutorial for
the Java API -

Documentation – http://developer.marklogic.com/docs

For future updates to this
tutorial, please see http://developer.marklogic.com/learn/java-pojos


Constraint: a name and
specification for how to use an index to qualify documents.

Facet: an enumerative or
quantitative property useful for selecting or grouping objects or
documents in search.

Key-value search: query criteria
expressed as the value of a property.

QName: a qualified name, which may
be associated with a namespace for uniqueness and with a short
prefix for the namespace URI for convenience.

Query options: the invariant parts
of a query including the constraints that specify the names and
uses of indexes, the result page size, and so on.

Range index: an index that
supporting facet queries (either finding documents based on value
or values based on documents).

String search: query criteria
expressed as a simple Google-like expressions with Booleans,
constraints, and so on.

Structured search: query criteria
expressed as a data structure with Booleans, constraints, and so

Values: some or all of the entries
in an index.

Tuple: records based on
co-occurrence of values in documents.

For updates and future
revisions of this tutorial, please see http://developer.marklogic.com/learn/java-pojos



All Posts by ErikHennumandCharlesGreer

comments powered by Disqus