Making connections

The graph database re-imagined

Lucy Carey

Ahead of the release of Neo4j 2.0, we talk graphs, and staying in touch with the keyboard with Neo4j co-founder and CEO of Neo Technology Emil Eifrem.

Ever since CEO of Neo Technology and co-founder of the Neo4j
graph database Emil Eifrem  first sketched out the core data
model for the company’s system on the back of a napkin on a flight
to Mumbai in 2000, the formula for Neo Technology’s success has
remained  “exactly identical”- until now.

With the release of Neo4j 2.0 imminent, Emil is
ready to shake things up for the first
in over a decade – hence the 2.0 affix. Not that this will
come as a major surprise to the community around Neo4j – thanks to
the company’s transparent approach, they’ve been able to follow the
progress of what Emil describes as a “quite extraordinary release”
each step of the way. 

According to Emil, Neo4j 2.0 is probably going to be
released towards the end of the year, with December slated as the
release for the GA version. It will be the first time that Neo
Technology has made changes to the core data model for Neo4j. Emil
explains that these changes will make it, “a lot more convenient to
do certain things – we’ve got a new web UI which we’re calling
Neo4j browser, which we think is going to be very impactful.” The
first big reveal will be at Graph Connect – Neo Technology’s annual
conference for graph databases and their applications.

Today, Emil puts the number of groups working on
projects, products and companies related to graph databases at
about 30 to 40, including Oracle IBM, SAP, and Facebook – but back
in the days of the first IT boom, it was a very different
landscape. Even now, according to Emil, “the majority of the world
still doesn’t know that graph databases exist”, but there’s “an
explosion of technologies that are addressing this now”.

The seeds for the Neo4j graph database project
were first planted in early 2000, when the three co-founders of the
project were working at a Swedish start up in the first big IT
boom, at a company that dealt with “a bunch of very hierarchical
data”. The team was dealing day in day out with a mass of
“connected, big meshy data”’, with multiple parents on every node,
and a complex security model protecting data, and working with
everyone from, “small advertising agencies, up to the Swedish
defence”. Emil, who was CTO, discovered that, at one point, about
half of his team were spending their time, “just fighting with the
relational database”.

A flurry of visits from consultants from
companies such as IBM and Oracle followed, and eventually, they
“started figuring it out that basically the key problem was that we
had all this connected data – all the round data that we squeezed
into this relational database – which if you will, was the square
hole- and there was a huge mismatch between the nature and the
shape of the data”.

Pondering on why there wasn’t a database that
worked exactly like a relational database, but, rather than
exposing cables, exposes a graph model – a networked model, where
you have nodes that can connect to other nodes, as in a social
network or in a telecom network – led Emil and the two co-founders
of Neo4j to explore the concept of creating, “a very flexible model
that could represent many things, including this security model
that we had.” As he puts it,  “that started the key question
that drove us, and this was at a point in time where I was young
enough, and naive enough, that when I got a good technical idea, I
ran with it.”

There were two main motivations for choosing
graphs as the backbone of Neo4j. One of them  was that
connected data was just so much harder to shoehorn into the
relational database, as opposed to issues like temporal data
logging, which had “easy workarounds”. The second reason was a
fundamental belief among the founders that “connected data is just
becoming increasingly important”
. Emil adds:
“Remember, this was way back when dinosaurs ruled the earth
right- before LinkedIn and all the social networks, and we just
thought that as an industry we’re pretty good with isolated,
disconnected data
. But there’s a lot of
value being gained from how things are connected.”

There was also a philosophical aspect to the
rationale behind the database. For Emil, “the purpose of data is to
gain knowledge, and if we can’t use it to gain knowledge, it
doesn’t really matter, and for me, that’s all about relating
unknown concepts to known concepts. It’s ironic that our dominant
database system is poor at handling connections, because that’s
what turns data into knowledge.”

As of today, Emil and his team
have been running in real 24/7 production for a
decade – something he notes is not always an asset, for example,
“in Silicon Valley- where it’s very cool if your application was
built in three weeks, that’s very cool and hip”. Ultimately though,
Neo4j’s innovation have served to set an important benchmark, and,
moreover, for a database, “it actually is a really good thing if
you’ve been around for a while, because that means that you’re
stable, you’re robust, you’ll stay valuable.”

Today, Neo Technology, “totally still 100% believe” in the
importance of graph databases, regarding relational databases on
the whole as “very much still an unsolved problem.”  Emil
reflects, “We are taking the first steps, and I think we’ve come a
long way since ten years ago when we had to solve the problem from
scratch, but I would say there is so much more left to be done, and
I still think that, generally speaking, the world has not yet
reaped the benefits of being able to seamlessly and easily process
connected data, and that’s very much still our mission, and
something that we very much wake up every day thinking

comments powered by Disqus