Multi-tier marvel

Introducing MapDB: The agile Java data engine

Cory Isaacson
neo

MapDB, a data engine grounded in Java, has just reached 1.0 status as an Apache 2.0-licensed project. Cory Isaacson runs through the key features, and underlines what makes it so darn agile.

Today there are many database engines available
to the Java developer, in fact there are 100s of options. Included
in the array of available choices are several that are “pure java”
implementations, including H2, Apache Derby, and others. I have
been working in the Java and database development arena for many
years, and when I found out about MapDB (and its predecessor
projects) I was instantly attracted to the natural approach and
agility offered by this Open Source project.

MapDB allows you as the Java developer to do
what you do every day: work with the familiar, natural and powerful
Java Collections API – while surpassing the traditional limitations
of Java Heap Memory and the expense of Garbage Collection with
large data sets. I was able to get up and running with MapDB
literally in a few minutes, and now I can create virtually any size
collection (Map, Set, Queue, etc.) using the well-established Java
Collections API. This means that you can even use MapDB with an
existing Java application, simply by modifying the initialization
for collections that you want to extend with this powerful
engine.

A
brief history of MapDB

Prior to MapDB, Jan Kotek (the primary MapDB
developer) supported various versions of the JDBM (Java Database
Manager) projects. JDBM itself was a Java port of UNIX DBM and
GDBM, C-language databases that support hash-based key-value stores
on disk. Through this experience, Jan saw how he could greatly
improve and expand the architecture, and thus created MapDB as a
totally new implementation. Jan’s experience paid off, with MapDB
offering ease-of-use, an agile approach to database structure,
transaction support, concurrency, and very impressive
performance.

Now MapDB 1.0 is released as an Apache 2.0-licensed
project, available at: www.mapdb.org.

A natural API for Java developers

There are many strong features of MapDB, but the one I
noticed first was it’s intuitive and flexible API. For example, if
you want to create a Map structure (even up to 100s of GBs), here
is all that is needed:

// Initialize a MapDB database
DB db = DBMaker.newFileDB(new File("testdb"))
.closeOnJvmShutdown()
.make();
// Create a Map:
Map<String,String> myMap = db.getTreeMap(“testmap”);

// Work with the Map using the normal Map API.
myMap.put(“key1”, “value1”);
myMap.put(“key2”, “value2”);

String value = myMap.get(“key1”);
...

That’s all you need to do, now you have a
file-backed Map of virtually any size.

Another very powerful feature is that MapDB
utilizes some of the advanced Java Collections variants, such as
ConcurrentNavigableMap. With this type of Map you can go beyond
simple key-value semantics, as it is also a sorted Map allowing you
to access data in order, and find values near a key. Not many
people are aware of this extension to the Collections API, but it
is extremely powerful and allows you to do a lot with your MapDB
database (I will cover more of these capabilities in a future
article).

What
makes MapDB an agile data engine?

When I first met Jan and started talking with
him about MapDB he said something that made a very important
impression: If you know what data structure you want, MapDB allows
you to tailor the structure to your exact application needs. In
other words, the schema and ways you can structure your data is
very flexible.

Why was this so important? I learned early in my
career that if you can make a data structure that matches what you
are trying to do, it can offer far better performance compared to a
“generic” structure (sometimes orders of magnitude better). While
it is beyond the scope of this introductory article, this is a
ground-breaking concept because MapDB not only offers you the
ability to create Maps, Sets, etc. to meet your needs, it also
supports tremendous flexibility in its internal implementation of
those structures. That is where you can really blow away
traditional thinking, and achieve the performance you need – often
with far less work than attempting various “work-arounds” with
other engines that support only a single internal
structure.

They key to this capability is inherent in
MapDB’s architecture, and how it translates to the MapDB API
itself. Here is a simple diagram of the MapDB
architecture:

As you can see from the diagram, there are 3
tiers in MapDB:

  •  Collections
    API: This is the familiar Java Collections API that
    every Java developer uses for maintaining application state. It has
    a simple builder-style extension to allow you to
    control the exact characteristics of a given database (including
    its internal format or record structure).
  • Engine: The Engine is the real key to MapDB,
    this is where the records for a database – including
    their internal structure, concurrency control,
    transactional semantics – are controlled. MapDB ships
    with several engines already, and it is straightforward to add your
    own Engine if needed for specialized data handling.

  • Volume: This is the physical
    storage layer (e.g., on-disk or in-memory). MapDB has a few
    standard Volume implementations, and they should
    suffice for most projects.

The
main point is that the development API is completely distinct from
the Engine implementation (the heart of MapDB), and both are
separate from the actual physical storage layer. This offers a very
agile approach, allowing developers to exactly control what type of
internal structure is needed for a given database, and what the
actual data structure looks like from the
top-level Collections API.

There are many capabilities supported by this
architecture (again the subject of future articles on advanced
MapDB concepts).

However, without knowing any of the internals,
you can accomplish great agility simply using existing features. In
the first simple code example I showed the builder-style API that
you use to create a database. That is where the power is, as the
static builder methods are extensible to support any Engine
features that are available.

For example, let’s say I want a pure in-memory
database, without transactional capabilities. All I need to do is
add a few builder methods for incredibly quick configuration – and
my application code doesn’t need to change at
all
:

// Initialize an in-memory MapDB database
// without transactions
DB db = DBMaker.newMemoryDB()
                .transactionDisable()
.closeOnJvmShutdown()
.make();

// Create a Map:
Map<String,String> myMap = db.getTreeMap(“testmap”);

// Work with the Map using the normal Map API.
myMap.put(“key1”, “value1”);
myMap.put(“key2”, “value2”);

String value = myMap.get(“key1”);
...

That’s it! All that was needed was to change the DBMaker
call to add the new options, everything else works exactly the
same.

The
standard MapDB DBMaker class supports many options, allowing you
specific control over the type of database you need. This is really
great, as in many applications I have seen developers rely on
multiple database engines to get the job done – and while I don’t
expect MapDB to replace all other databases, it is incredibly agile
and supports many modes of operation.

Agile data structures

I briefly covered how MapDB allows you to
customize the characteristics of a given database instance, but
just as important is the ability to create agile data structures –
structures that exactly match your application
requirements.

This is a familiar concept that likely mirrors
how you work with your code when creating standard Java in-memory
structures. For example, let’s say you need to lookup a Person
object by username, or by personID. This is simple, you can simply
create a Person object and two Maps to meet your needs:

public class Person {

private Integer personID;
private String username;
...

// Setters and getters go here
...

}

// Create a Map of Person by username.
Map<String,Person> personByUsernameMap = ...

// Create a Map of Person by personID.
Map<Integer,Person> personByPersonIDMap = ...

This
is a very trivial example, but now you can easily write to both
maps for each new Person instance, and subsequently retrieve a
Person by either key.

You can do the same thing with MapDB, but even
easier. MapDB supports many constructs for the interaction of Maps
or other collections, allowing you to create a schema of related
structures that can automatically be kept in sync. This avoids a
lot of scanning of structures, makes coding fast and convenient,
and can keep things very fast.

Wrapping it up

In this article I have provided a very brief
summary of MapDB and its key capabilities and features. Most
importantly I discussed how MapDB is totally natural for a Java
developer (due to the familiar Java Collections API), and how MapDB
is agile as well, offering powerful control over many aspects of
your database (internal structures as well as exposed schema
structures). And perhaps best of all, MapDB is freely available for
any use under the Apache 2.0 license.
To learn more, check out: www.mapdb.org

Author
Cory Isaacson
Cory Isaacson is CEO/CTO of CodeFutures Corporation, maker of dbShards, a leading database scalability suite providing a true “shared nothing” architecture for relational databases. Cory has authored numerous articles in a variety of publications including SOA Magazine, Database Trends and Applications, and recently authored the book Software Pipelines and SOA (Addison Wesley). Cory has more than twenty years experience with advanced software architectures, and has worked with many of the world’s brightest innovators in the field of high-performance computing. Cory can be reached at: cory.isaacson@codefutures.com
Comments
comments powered by Disqus