JAX London 2014: A retrospective
Multi-tier marvel

Introducing MapDB: The agile Java data engine

CoryIsaacson
neo

MapDB, a data engine grounded in Java, has just reached 1.0 status as an Apache 2.0-licensed project. Cory Isaacson runs through the key features, and underlines what makes it so darn agile.

Today there are many database engines available to the Java developer, in fact there are 100s of options. Included in the array of available choices are several that are “pure java” implementations, including H2, Apache Derby, and others. I have been working in the Java and database development arena for many years, and when I found out about MapDB (and its predecessor projects) I was instantly attracted to the natural approach and agility offered by this Open Source project.

MapDB allows you as the Java developer to do what you do every day: work with the familiar, natural and powerful Java Collections API – while surpassing the traditional limitations of Java Heap Memory and the expense of Garbage Collection with large data sets. I was able to get up and running with MapDB literally in a few minutes, and now I can create virtually any size collection (Map, Set, Queue, etc.) using the well-established Java Collections API. This means that you can even use MapDB with an existing Java application, simply by modifying the initialization for collections that you want to extend with this powerful engine.

A brief history of MapDB

Prior to MapDB, Jan Kotek (the primary MapDB developer) supported various versions of the JDBM (Java Database Manager) projects. JDBM itself was a Java port of UNIX DBM and GDBM, C-language databases that support hash-based key-value stores on disk. Through this experience, Jan saw how he could greatly improve and expand the architecture, and thus created MapDB as a totally new implementation. Jan’s experience paid off, with MapDB offering ease-of-use, an agile approach to database structure, transaction support, concurrency, and very impressive performance.

Now MapDB 1.0 is released as an Apache 2.0-licensed project, available at: www.mapdb.org.

A natural API for Java developers

There are many strong features of MapDB, but the one I noticed first was it’s intuitive and flexible API. For example, if you want to create a Map structure (even up to 100s of GBs), here is all that is needed:

// Initialize a MapDB database
DB db = DBMaker.newFileDB(new File("testdb"))
.closeOnJvmShutdown()
.make();
// Create a Map:
Map<String,String> myMap = db.getTreeMap(“testmap”);

// Work with the Map using the normal Map API.
myMap.put(“key1”, “value1”);
myMap.put(“key2”, “value2”);

String value = myMap.get(“key1”);
...

That’s all you need to do, now you have a file-backed Map of virtually any size.

Another very powerful feature is that MapDB utilizes some of the advanced Java Collections variants, such as ConcurrentNavigableMap. With this type of Map you can go beyond simple key-value semantics, as it is also a sorted Map allowing you to access data in order, and find values near a key. Not many people are aware of this extension to the Collections API, but it is extremely powerful and allows you to do a lot with your MapDB database (I will cover more of these capabilities in a future article). 

What makes MapDB an agile data engine?

When I first met Jan and started talking with him about MapDB he said something that made a very important impression: If you know what data structure you want, MapDB allows you to tailor the structure to your exact application needs. In other words, the schema and ways you can structure your data is very flexible.

Why was this so important? I learned early in my career that if you can make a data structure that matches what you are trying to do, it can offer far better performance compared to a “generic” structure (sometimes orders of magnitude better). While it is beyond the scope of this introductory article, this is a ground-breaking concept because MapDB not only offers you the ability to create Maps, Sets, etc. to meet your needs, it also supports tremendous flexibility in its internal implementation of those structures. That is where you can really blow away traditional thinking, and achieve the performance you need – often with far less work than attempting various “work-arounds” with other engines that support only a single internal structure.

They key to this capability is inherent in MapDB’s architecture, and how it translates to the MapDB API itself. Here is a simple diagram of the MapDB architecture:


As you can see from the diagram, there are 3 tiers in MapDB:

  •  Collections API: This is the familiar Java Collections API that every Java developer uses for maintaining application state. It has a simple builder-style extension to allow you to control the exact characteristics of a given database (including its internal format or record structure).
  • Engine: The Engine is the real key to MapDB, this is where the records for a database – including their internal structure, concurrency control, transactional semantics – are controlled. MapDB ships with several engines already, and it is straightforward to add your own Engine if needed for specialized data handling.

  • Volume: This is the physical storage layer (e.g., on-disk or in-memory). MapDB has a few standard Volume implementations, and they should suffice for most projects. 

The main point is that the development API is completely distinct from the Engine implementation (the heart of MapDB), and both are separate from the actual physical storage layer. This offers a very agile approach, allowing developers to exactly control what type of internal structure is needed for a given database, and what the actual data structure looks like from the top-level Collections API.

There are many capabilities supported by this architecture (again the subject of future articles on advanced MapDB concepts).

However, without knowing any of the internals, you can accomplish great agility simply using existing features. In the first simple code example I showed the builder-style API that you use to create a database. That is where the power is, as the static builder methods are extensible to support any Engine features that are available.

For example, let’s say I want a pure in-memory database, without transactional capabilities. All I need to do is add a few builder methods for incredibly quick configuration – and my application code doesn’t need to change at all:

// Initialize an in-memory MapDB database
// without transactions
DB db = DBMaker.newMemoryDB()
                .transactionDisable()
.closeOnJvmShutdown()
.make();

// Create a Map:
Map<String,String> myMap = db.getTreeMap(“testmap”);

// Work with the Map using the normal Map API.
myMap.put(“key1”, “value1”);
myMap.put(“key2”, “value2”);

String value = myMap.get(“key1”);
...

That’s it! All that was needed was to change the DBMaker call to add the new options, everything else works exactly the same.

The standard MapDB DBMaker class supports many options, allowing you specific control over the type of database you need. This is really great, as in many applications I have seen developers rely on multiple database engines to get the job done – and while I don’t expect MapDB to replace all other databases, it is incredibly agile and supports many modes of operation.

Agile data structures

I briefly covered how MapDB allows you to customize the characteristics of a given database instance, but just as important is the ability to create agile data structures – structures that exactly match your application requirements.

This is a familiar concept that likely mirrors how you work with your code when creating standard Java in-memory structures. For example, let’s say you need to lookup a Person object by username, or by personID. This is simple, you can simply create a Person object and two Maps to meet your needs:

public class Person {

private Integer personID;
private String username;
...

// Setters and getters go here
...

}

// Create a Map of Person by username.
Map<String,Person> personByUsernameMap = ...

// Create a Map of Person by personID.
Map<Integer,Person> personByPersonIDMap = ...

This is a very trivial example, but now you can easily write to both maps for each new Person instance, and subsequently retrieve a Person by either key.

You can do the same thing with MapDB, but even easier. MapDB supports many constructs for the interaction of Maps or other collections, allowing you to create a schema of related structures that can automatically be kept in sync. This avoids a lot of scanning of structures, makes coding fast and convenient, and can keep things very fast.

Wrapping it up

In this article I have provided a very brief summary of MapDB and its key capabilities and features. Most importantly I discussed how MapDB is totally natural for a Java developer (due to the familiar Java Collections API), and how MapDB is agile as well, offering powerful control over many aspects of your database (internal structures as well as exposed schema structures). And perhaps best of all, MapDB is freely available for any use under the Apache 2.0 license.

To learn more, check out: www.mapdb.org

 


 



 



 

 

 

 

 

 








Author
CoryIsaacson
Cory Isaacson is CEO/CTO of CodeFutures Corporation, maker of dbShards, a leading database scalability suite providing a true
Comments
comments powered by Disqus