Charting a flexible course
Cory Isaacson: “MapDB is a pure Java database, for Java developers”
Cory Isaacson, CEO of CodeFutures (vendors of database scalability suite dbShards, which provides a true "shared nothing" architecture for relational DB) speaks to JAXenter about MapDB - the Apache-licensed open source database especially for Java developers, tipped to become the Java storage engine of the future.
JAX: Can you give our readers an overarching view of what MapDB is all about?
Isaacson: MapDB is a pure Java database, for Java developers. It is natural to use, all based on the Java Collections API (Maps, Lists, Sets).
The key to MapDB is that developers can create a database structure in a new agile paradigm, exactly matching application needs. It is somewhat like creating a schema in a typical database, but goes well beyond what you can do with a typical key-value store. For example, MapDB allows you to create related maps, supporting object semantics with built-in data relationships. This makes it very intuitive to create the ideal structure for the data needed by the application – without the burden of complex ORM (object-relational mapping) frameworks. Just create your maps, bind them together, and use the same semantics and syntax you use today.
It is very easy to get going with MapDB, All a developer needs to do is add a single jar to the classpath, modify the Map creation syntax, and everything else just works. The project is incredibly flexible and powerful, offering performance comparable to native C-language embedded databases (like BerkeleyDB and LevelDB). With MapDB you can access collections in the 10s to 100s of GBs – the same way you access small in-memory object stores.
MapDB is extremely configurable, using a simple Builder-style API. You can configure caching, the type of store, durability guarantees and many other features. This way you can select the right balance of features and performance needed for your application.
What problems are you trying to solve with this software?
That is a big question – MapDB can be used for many common use cases and problems. The main focus is to offer a natural way for Java developers to access large data stores in a very agile paradigm, with a schema that precisely matches application needs.
One common problem many applications suffer is running out of Java heap memory, or excessive Garbage Collection from attempting to cram too many objects into the application runtime. This is almost always the result of large memory collections (Maps, Lists, etc.) with a lot of “churn”. Convert those to MapDB and now you have the bulk of the data in a durable form on disk – with an automatic in-memory cache – with exactly the same API used for your existing collection code.
Another big problem is how to perform many common database tasks in an easy manner (sorts, iterating through collections, transactions). MapDB supports all of these, using the native Java concurrent APIs plus a few easy-to-learn extensions.
What makes you so convinced that MapDB has a good chance to become "the de facto standard Java storage engine?
Because MapDB is flexible, fast and freely available for use in any type of project, under the Apache 2.0 License. In many cases there is no other competition, except writing your own solution. It is trivial to plug MapDB into an existing project, and instantly gain all of the power of many other database offerings in one single package (most of which are either commercial closed source projects or offered with restrictive “viral” open source licenses).
The upshot is that MapDB is powerful, agile, flexible – and freely available to use and distribute in any way the developer sees fit.
What’s the history of MapDB’s development?
DBM (Database Manager) was simple database engine written by Ken Thompson for UNIX – basically a hash table on disk. JDBM (a Java port) project was started around year 2000, by a group of developers, with JDBM 1.0 released in 2005. The project languished with not much active support or interest, yet the potential for this type of data structure was immensely useful.
Jan Kotek worked on persistence for an astronomical application. Originally he modified H2 database, but SQL had major overhead. In 2010 he spent several weeks doing astronomical observations at remote region in Chilean Andes, and by some stroke of luck had the JDBM source code on his laptop. To beat the long boring days (all astronomy activities are at night of course), he started modifying and improving JDBM. And as they say, the rest is history.
Jan soon released JDBM2, the subsequently JDBM3. These libraries were widely used by many companies.
Realizing the potential for a full-fledged, powerful and agile Java database relying on native APIs, Jan renamed the project MapDB and left his “day job” to dedicate himself to the project full time in early 2013. At CodeFutures we have long used the JDBM and MapDB projects, and love the database and its potential. In November 2013 CodeFutures brought Jan on the team on a full-time basis, giving him the freedom and economic support needed to fully dedicate his efforts toward making MapDB the leading Java database in the world.
Are there any disadvantages to MapDB being so generic?
There are some disadvantages, oddly enough tied to the agile nature of MapDB’s flexible features. It takes a bit of learning to understand the various configuration options, and the many ways you can structure the data to meet application needs. This learning curve is comparable and perhaps easier than other new database options (such as MongoDB and Redis). There are many ways to use MapDB and we are hard at work improving the documentation to address these issues, including how-to recipes for common use cases.
How close would you say MapDB is to meeting its original design goals?
The idea was to make a database natural for Java developers, one that is agile and has the ability to support data structures needed by a wide array of applications. In this regard, Jan has done a great job of meeting these goals.
What’s on the roadmap ahead for MapDB?
Bug fixing, bug fixing, bug fixing.
There are many new features being considered, with a pending TODO list of 400 improvements. Topping the list are Append-only file stores, improved snapshots, incremental backups and faster commits. Another very exciting new feature on the list is to fully support new Lambda expressions in Java 8,; this will enable true parallel processing within large maps, accelerating capabilities such as complex aggregations.
What attracted CodeFutures to MapDB, and where is the company using it?
CodeFutures used both JDBM3 and MapDB in its products, and always found it very fast, powerful and easy to implement. The focus of the company has always been on high-performance data engines (we don’t offer our own database, we make other engines better) – so it was natural to team up with Jan to expand MapDB’s capabilities.
What would you say are the headline features of MapDB - and which are you personally most excited about?
- Agile data structures, meeting exact needs of Java applications
- Drop-in replacement for Java collections
- Full transaction support
Is there an active community around MapDB?
Yes, there is mailing list and many users reports bugs, send test cases and patches. We are seeing an expanding group of users and companies taking advantage of MapDB, and will help to expand the community even further going forward.