The graph database re-imagined
Ahead of the release of Neo4j 2.0, we talk graphs, and staying in touch with the keyboard with Neo4j co-founder and CEO of Neo Technology Emil Eifrem.
Ever since CEO of Neo Technology and co-founder of the Neo4j graph database Emil Eifrem first sketched out the core data model for the company’s system on the back of a napkin on a flight to Mumbai in 2000, the formula for Neo Technology’s success has remained “exactly identical”- until now.
With the release of Neo4j 2.0 imminent, Emil is ready to shake things up for the first time in over a decade – hence the 2.0 affix. Not that this will come as a major surprise to the community around Neo4j – thanks to the company’s transparent approach, they’ve been able to follow the progress of what Emil describes as a “quite extraordinary release” each step of the way.
According to Emil, Neo4j 2.0 is probably going to be released towards the end of the year, with December slated as the release for the GA version. It will be the first time that Neo Technology has made changes to the core data model for Neo4j. Emil explains that these changes will make it, “a lot more convenient to do certain things – we’ve got a new web UI which we’re calling Neo4j browser, which we think is going to be very impactful.” The first big reveal will be at Graph Connect – Neo Technology’s annual conference for graph databases and their applications.
Today, Emil puts the number of groups working on projects, products and companies related to graph databases at about 30 to 40, including Oracle IBM, SAP, and Facebook – but back in the days of the first IT boom, it was a very different landscape. Even now, according to Emil, “the majority of the world still doesn’t know that graph databases exist”, but there’s “an explosion of technologies that are addressing this now”.
The seeds for the Neo4j graph database project were first planted in early 2000, when the three co-founders of the project were working at a Swedish start up in the first big IT boom, at a company that dealt with “a bunch of very hierarchical data”. The team was dealing day in day out with a mass of “connected, big meshy data”’, with multiple parents on every node, and a complex security model protecting data, and working with everyone from, “small advertising agencies, up to the Swedish defence”. Emil, who was CTO, discovered that, at one point, about half of his team were spending their time, “just fighting with the relational database”.
A flurry of visits from consultants from companies such as IBM and Oracle followed, and eventually, they “started figuring it out that basically the key problem was that we had all this connected data – all the round data that we squeezed into this relational database – which if you will, was the square hole- and there was a huge mismatch between the nature and the shape of the data”.
Pondering on why there wasn’t a database that worked exactly like a relational database, but, rather than exposing cables, exposes a graph model – a networked model, where you have nodes that can connect to other nodes, as in a social network or in a telecom network – led Emil and the two co-founders of Neo4j to explore the concept of creating, “a very flexible model that could represent many things, including this security model that we had.” As he puts it, “that started the key question that drove us, and this was at a point in time where I was young enough, and naive enough, that when I got a good technical idea, I ran with it.”
There were two main motivations for choosing graphs as the backbone of Neo4j. One of them was that connected data was just so much harder to shoehorn into the relational database, as opposed to issues like temporal data logging, which had “easy workarounds”. The second reason was a fundamental belief among the founders that “connected data is just becoming increasingly important”. Emil adds: “Remember, this was way back when dinosaurs ruled the earth right- before LinkedIn and all the social networks, and we just thought that as an industry we’re pretty good with isolated, disconnected data. But there’s a lot of value being gained from how things are connected.”
There was also a philosophical aspect to the rationale behind the database. For Emil, “the purpose of data is to gain knowledge, and if we can’t use it to gain knowledge, it doesn’t really matter, and for me, that’s all about relating unknown concepts to known concepts. It’s ironic that our dominant database system is poor at handling connections, because that’s what turns data into knowledge.”
As of today, Emil and his team have been running in real 24/7 production for a decade – something he notes is not always an asset, for example, “in Silicon Valley- where it’s very cool if your application was built in three weeks, that’s very cool and hip”. Ultimately though, Neo4j’s innovation have served to set an important benchmark, and, moreover, for a database, “it actually is a really good thing if you’ve been around for a while, because that means that you’re stable, you’re robust, you’ll stay valuable.”
Today, Neo Technology, “totally still 100% believe” in the importance of graph databases, regarding relational databases on the whole as “very much still an unsolved problem.” Emil reflects, “We are taking the first steps, and I think we’ve come a long way since ten years ago when we had to solve the problem from scratch, but I would say there is so much more left to be done, and I still think that, generally speaking, the world has not yet reaped the benefits of being able to seamlessly and easily process connected data, and that’s very much still our mission, and something that we very much wake up every day thinking about.”