How ArangoDB integrates with Java

Native multi-model databases are gaining popularity

Mark Vollmary

© Shutterstock / Maxx-Studio

In this article, Mark Vollmary talks about native multi-model databases and introduces ArangoDB. If you want to know how ArangoDB integrates with Java, look no further.

Having multiple data models in one database core may sound strange: “It’s impossible!”; “This can’t be fast!”; or “I don’t need this. I can just connect to multiple database systems in my application.” These were some of the most common responses to our multi-model database system, ArangoDB. However, over the past year or so, the typical reactions have changed. There is far less disbelief in what’s possible with ArangoDB.

Today, our native multi-model database is increasingly gaining in popularity, (e.g., G2Crowd, Gartner, Github). In this article, we’ll look at some of the reasons why organizations are exploring this new approach. First, we’ll look at what is a native multi-model database, what is ArangoDB. Then we’ll look at how ArangoDB integrates with Java.

Native multi-model databases

As the name implies, multi-model databases serve more than one data model with one technology. There are two basic types of multi-model databases:

Layered approach

For this type, an additional database is mounted onto another storage technology and connected via an API:  e.g., a graph layer mounted on top of a key/value store,  a relational database system or a columnar store. Accessing different models is done by various query languages.

Native approach

By this simpler method, one database core and one query language are used for all supported data models.

Since 2011, the team of ArangoDB has been working on the native multi-model approach, when we discovered a way to combine key/value, document and the graph models into one C++ core and to allow the management of stored data with a declarative query language (i.e., ArangoDB Query Language or AQL). We decided to release this technology under the Apache 2 open-source license.  We felt that it’s the best way to provide access to a project of this size, allowing for easy adoption.

Our idea is simple: If we can efficiently combine more than one data model into one core and one query language, then we can provide something useful for fellow developers. If we can do this while also being competitive with leading single-model databases on their home turf, that makes it even better. Fewer datastores to learn, manage and maintain will inherently reduce the complexity of a tech stack. Plus, developers can execute a wider range of queries on data.

Native multi-model is not the ultimate solution for all situations. Our approach is not to force developers to use all data models. Nor can we integrate all models, efficiently. It’s more about enabling developers to leverage the advantages of different models for different aspects of their applications. We think this can be done more efficiently with one technology in many cases.

SEE ALSO: Top databases in 2017: Trends for SQL, NoSQL, Big Data, Fast Data

The native multi-model in ArangoDB

At the core, ArangoDB is a transactional document store, storing JSON documents and supporting joins, aggregation or sorting operations similar to typical relational database systems.

If you just store a value in a document, it can have the characteristics of a modern key/value store. As we are based on a document store, ArangoDb stores a bit more data per JSON file compared to classical key/value stores. That’s why we recommend ArangoDB for small to mid-sized key/value use cases. If you need hyperscale, though, we recommend something like Redis.

The interesting part is graphs. A graph is a highly connected dataset, with relations represented by vertices and edges. In the visual example below, a graph connection is represented by two airports (i.e., vertices) which are connected via a flight (i.e., edge).

What you can see here are two important characteristics of ArangoDB. First, in order to create the graph, the only thing necessary is to store additional _from and _to attributes in the JSON documents (see the green and orange text).

Second, both the vertices and edges are full, nestable JSON documents and can store arbitrary attributes, create suitable indexes and perform queries with complex filter conditions. By using efficient hashing (i.e., an edge index) on those _from and _to attributes, traversing a graph in ArangoDB can be very fast, thanks to consistent hash-lookup latency.

For the graph part, developers can use a wide range of graph algorithms starting with traversals, shortest path and pattern matching. Developers can also use SmartGraphs and distributed graph processing with Pregel for large-scale computation.

With the native approach of ArangoDB, you can combine different access patterns like joins, aggregations, filtering and graph traversals in a single query. Some of our clients started by using one pattern, like joins or graph traversals. Most of them recognized over time, that some queries would be better suited to another access pattern. But instead of adding a new database to their applications, they simply rewrote some queries to leverage graphs or joins. This flexibility is one of the key advantages of the native multi-model approach and the main reason for its recent popularity.

Regarding queries, there is another useful aspect of ArangoDB, a feature that has contributed to its increasing adoption. The Foxx Framework is a JavaScript framework based on Google’s V8 engine. It’s used for writing data-centric, RESTful microservices that run natively inside ArangoDB. The advantage is that it provides a unified storage logic, reduced network overhead. Sensitive data never has to leave the database. Moving data-intensive business logic closer to the data not only improves performance, but it also helps to keep the query language out of your application layer and reduce vendor lock-in effects.

ArangoDB with its native multi-model approach is obviously worth trying and considering. If you want to start today, you can read our White Paper about the multi-model idea or get the graph course to explore this new storage approach.  

SEE ALSO: Fast Data & Analytics — SQL, NoSQL, IMDGs, Hadoop, Spark. What next? [VIDEO]

Integrating ArangoDB into Java

If you’ve never worked with a non-relational database like ArangoDB before, you might be surprised that most NoSQL databases do not provide an integration to JDBC or JPA, but provide their own Java API instead. In most cases, NoSQL databases provide a much broader feature set compared to relational databases. Therefore, standardized JDBC or JPA would mean a limitation for those databases.

The Java API of ArangoDB

ArangoDB provides a RESTful API which allows an application to perform any kind of database operation from simple CRUD operations over complex queries, to database administration like creating database or collections — the NoSQL equivalent to SQL tables. Like all our other language bindings, the Java driver also uses this REST API to connect to ArangoDB.

The Java API for ArangoDB, which supports Java 1.6+, is fairly complete. It supports synchronous and asynchronous communication. Besides HTTP, the driver supports a second network protocol called VelocyStream. It’s a binary transport protocol which transports the internal storage format of ArangoDB, VelocyPack.

Let’s take a look at how to work with the Java API. Below is an example:

// create driver instance
ArangoDB arangoDB = new ArangoDB.Builder().host("localhost", 8529).build();

// write your query
String query = "FOR t IN myCollection FILTER == @name RETURN t";

// separate literal values as bind parameters from the query text
Map<String, Object> bindVars = new MapBuilder().put("name", "Homer").get();

// execute the query and define the result type
ArangoCursor<BaseDocument> cursor = arangoDB.db().query(query, bindVars, new AqlQueryOptions(), BaseDocument.class);

// process the results
cursor.forEach(doc ->{
  System.out.println("Key: " + doc.getKey());

If you decide to try ArangoDB, you may want to watch our 10-minute Java tutorial. We are always looking to improve the Java driver for ArangoDB. Please let us know if you have any feedback on it. The latest update of the driver contains a feature requested by the ArangoDB community: we now support built-in load balancing, as well as extended failover.

In complex Java projects, it’s convenient and, in some cases, necessary to use frameworks, like the Spring Framework. Over the past year, we’ve adapted to such projects, and have begun to help these teams to better integrate ArangoDB into their environments.

Spring Data integration

One interesting module of the Spring Framework is Spring Data. It provides a Spring-like programming model for data access. There are many sub-projects of Spring Data which specify data access to different database technologies.

While the query language for ArangoDB is what makes it native, it is a proprietary language. One of our key pillars is efficiency. We want to share software to help others create more efficient applications. To remove an obstacle to such efficiency, we released a Spring Data integration as an abstraction layer. It reduces the need to learn AQL and supports a wide range of database operations.

Next steps for ArangoDB

Many community members have asked for Jackson support directly within in the Java driver. So our next project will be to implement an extension for it. Starting with version 3.2, ArangoDB provides a new API to manage Foxx services. This is not yet supported by the driver, but we think it would be useful to Java developers for us to support this, as well.

With the upcoming release of 3.3, ArangoDB will support DC to DC replication. It will also integrate a new replication engine for single-instance usage, which will support “Hot Standby”. Next year, we hope to continue our journey, get plenty of feedback from you guys and possibly integrate a fourth data model. So stay tuned.


Mark Vollmary

Mark Vollmary is a Senior Java Developer for over 10 years now and leads all Java related projects at ArangoDB. In addition, he also focusses more and more on ArangoDB JavaScript framework Foxx. Before joining the company in 2016, he led several key Java projects for insurance companies and gained experience with all kinds of relational databases like Oracle or MySQL. 

comments powered by Disqus