“Quite extraordinary release” Neo4j 2.0 drops today
This September, Neo4j’s creator Emil Eifrem told JAXenter that version 2.0 of Neo Technology's signature graph database represented the biggest change in the graph database formula since its launch in 2000.
Although the community have been able to follow each step of Neo4j 2.0’s development, after months of waiting, the final ready to drive version dropped today. You can get tinkering right away by downloading it here.
Prior to this release, in the November edition of the publication, Neo4j’s Rik Van Bruggen gave JAX Magazine readers a snapshot of some of Neo4j 2.0’s key features. Before you take a test drive, check out how the graph database has evolved.
The (r)evolutionary (r)evolution of the graph database
There’s been a number of nice articles written about graphs, graph databases, and, more specifically, Neo4j in the past couple of months. Each one jumping on the hype train and brimming with ‘revelations’ about the coolness of graph databases in general - especially Neo4j. In this article, we would like to continue this love-in by shining a spotlight on a couple of fantastic new features that are part of Neo4j 2.0, and that, in our humble opinion, quite frankly, are better than sliced bread. Much better.
Under the hood: evolving from a “property graph” to a “labeled property graph”
For the longest time, Neo4j has been using a very specific, rich and expressive data model to represent networks and graphs natively in the Neo4j database. This makes queries (aka traversals) so much easier to think of, translate into a query language (cypher), execute against a running database server, and then maintain and share with others.
The “property graph” was, and is, a fine data model for highly connected data, allowing you to store data in a) nodes (aka vertices), b) relationships (aka edges) and c) properties on a) and b). Nodes and relationships were always equal citizens, and will continue to be so.
But there were issues with the data model: it did not allow for a “meta-model” that would describe the data structures in the database, and many users would have to therefore emulate that themselves. Everyone was reinventing the wheel, introducing “type nodes” and “type properties” into the database that would achieve the metamodel goals somehow - but it really was not a very nice solution.
That’s why Neo4j 2.0 introduces the concept of a “node label” into the data model. This is not just a cosmetic change to a perfectly fine graph database: it’s a fundamental new data model concept that allows users to create “subgraphs” into the property graph.
Labels do many things for you today, but will do even more for you in the future. Today, Labels:
Provide you with a much simpler data model by doing away with the need for you to create the meta-structure yourself (see above).
Allow for a much cleaner, simpler, and guaranteed indexing mechanism to the data. In the past, indexing of the data in Neo4j was a bit of a problem: there was never a hard guarantee that the data in the index would be the same as the data in the graph - it was left up to the user of the graph to ensure that consistency. Now, the database takes care of this.
Allow for an even more declarative query language (cypher). Previously, Cypher would always start with a start, which made it very clear that your queries would have to be “graph local” or “egocentric” - starting at a starting point and crawling out from there. But you had to decide where to start. You had to tell the database how to approach the crawl - which actually is not how a declarative query language should work. You are supposed to declare what you want and then let the database figure out how to get you what you want. Now, with the labels and the new indexing, you can forget about the start clause. Just define the pattern in the match clause, and be done with it - Neo4j will figure it out from there. Much more intuitive.
But imagine what we could do with labels in the future. Labels provides us with structure in the graph - so some of the things that we would like to do at some point in the future. We could use Labels for things like:
Imposing constraints on the data. We have done a bit of this in 2.0 - but the possibilities for expansion in this domain are large.
Taking the knowledge of the graph that Labels give to us to do much more query optimisation.
Implementing security structures in the graph
Distributing the graph across multiple machines (provide sharding) - should that ever be required.
The long and short of it is that Labels are new - but extremely powerful and much simpler for the novice graph users.
New Neo4j shell tools - MUCH easier import capabilities
What do people usually want to do with a database? Store data in it, right? Well, there used to be a time when importing your data into Neo4j was complicated. Many people have voiced these concerns - and I really believe that Neo4j is finally addressing these concerns in a great way. Yes, of course, if you’re already a Java-loving rocket scientist, these new techniques won’t mean much to you - but to the average mortals out there, they will make a world of a difference.
Two things usually complicated the import process:
Do you want to be importing into a running database?
What kind of scale are we looking at? Thousands or millions of things?
In all of these cases, the Neo4j-shell-tools allow you to parametrize the import process, and go from model to reality in a very reasonable timeframe. All you need to do is learn the Neo4j shell tools syntax, fire up the Neo4j shell, and get going.
New Neo4j Browser tool - visualisation and more
And then there is the all-new Neo4j Browser tool that is part of the 2.0 release. For those of you that are new to the graph database world, a word of context.
First of all: graph visualisations are important. Almost every user of the Neo4j graph database uses some kind of graph visualisation as part of the user interface. Whether they are using the stock Neo4j webadmin tool, a custom-developed visualisation solution (using tools like d3.js, vivagraph.js, or similar) or a commercial product (like Linkurio.us or Keylines) - the human-navigatable nature of graph exploration solutions is super-interesting, and very different from the traditional “excel spreadsheet” approach of interacting with data.
Secondly: ad hoc queries are important. That’s why Neo4j has initiated the development of a declarative, easy-to-write-but-even-easier-to-read query language called Cypher in the first place. But query languages require tools to be able to exploit them in a productive manner - tools that allow you to experiment, learn, retry and iterate on your queries so that you can gradually make more sense of your data.
These two things are the exact two things that the new Neo4j browser delivers on. A powerful visualisation solution that allows for flexible colouring of nodes, relationships and paths, and a powerful test and development environment for ad hoc cypher queries. Very useful.
The new, 2.0 release of the world’s leading graph database has a number of amazing new features that have made this powerful release really worth the wait. Both from a fundamental architecture point of view (Labels), an adoption point of view (easy import) and from a development point of view (the new Browser) great progress has been made that should pave the road for many great things to come. It’s not a revolution, it’s an evolution. But an evolution that could have revolutionary consequences for the graph database market. I for one, can’t wait.
Rik Van Bruggen is the regional territory manager for Neo Technology for the BeNeLux, UK, and the Nordics. He has been working for startup companies for most of his career. He has a fond technical interest, and really is passionate about business - and about Belgian Beer Twitter: @rvanbruggen