Going viral

Not only are epidemics graphs: Graphs are an epidemic

DianaKupferJAXEditorialTeam
connect

Turning data into knowledge: JAXenter’s takings from GraphConnect London 2013.

Neo4j is, without a doubt, a pioneering
technology. “Graph database”, the term with which it has become
synonymous, had not even been coined when this Java technology
first emerged over a decade ago. 

Since its launch, Neo4j has become an impressive
success story – and the hype continues to grow. In recent years,
Neo Technology, the company which has officially sponsored its
development since 2007, has also organized conferences to
celebrate the rise of Neo4j –  and not only connected data,
but also the growing community around it. The fifth, and the
inaugural European, “Graph Connect” event was held on the 18th and
19th November in London’s Dexter House, and attracted both users
and fans of Neo4j.

But let’s backtrack a bit. The Neo4j story actually
began in 2000. Back then, it was already clear that the storage
capacities of data with traditional database systems were rapidly
being exhausted. Stored data links were too rich, JOIN operations
were too painstaking, and query speeds were just too low.

Little surprise, then, that relational databases have
subsequently faced competition from NoSQL alternatives. Even
key-value stores, document and column-oriented databases, which
Martin Fowler summed up with the term “Aggregate-oriented
Databases“, failed to take the connections between the data
sufficiently into account.

For this reason, the Neo4j founders envisioned a
realistic information storage system in the form of networks and
links. One that would store not only the data, but also their
complex web of relationships – and make them totally visible. Back
then, “Nobody was talking about graph databases,” recalled Emil
Eifrem, CEO of Neo Technology, in his conference opening keynote.
In those days, he and his team used terms such as “Network-oriented
database” or the shortened version: “Netbase“.

Today, thirteen years later, graph technologies have
seen a virtually meteoric rise, which Eifrem noted is demonstrated
with a simple Google Trend search for the term “Neo4j”.

The rise and rise of Neo4j in Google
Trends.

Between 30 and 40 major companies today utilise Neo4j,
including Hewlett-Packard, Deutsche Telekom, Oracle, IBM, and
Cisco.  Neo4j is now even represented in the insurance sector
– a market you wouldn’t necessarily think would have a need to
integrate cutting-edge technology into its enterprise IT.  At
GraphConnect, Frederik Wilhelm and Dr. Andreas Transforms from
Intelligence Solutions AG reported how they convinced the insurance
group “The Bavarian” to adopt Neo4j, while the alternative option,
ObjectDB, had to draw the short straw.

In 2012, big players such as Google and Facebook
jumped onto the graph bandwagon with Knowledge Graph Search (“Neo4j
Cypher for non-techies,” as Patrick Baumgartner jokes),
consolidating Emil Eifrem, Peter Neubauer and their colleagues’
position as pioneers in the field.

In early 2013, an
impressive blog post
 by developer Max De Marzi
outlined how information from Facebook in can be transformed
into Cypher statements – Cypher being Neo4j’s own query
language.

The visualization of large amounts of data and
information networks is a growing branch of science, underpinned
not so much by an aesthetic playfulness, but by the concrete need
to make the relationships among “Big Data” visible, and thus
understandable.

What’s behind this hype? This is how Eifrem explains
it: Up until 1999, Internet search engines such as AltaVista were
keyword-based. Shortly before the turn of the millennium Google
initiated a paradigm change with its search algorithm PageRank
–moving away from discrete data towards connected data: “Not only
did they store the documents, but also how they relate to each
other,” said Eifrem. Thus, a shift took place, from keyword search
to “social discovery” (Eifrem ) – and that’s where technologies
play off their strengths.

Looking at the bigger picture, the realistic
representation of data relationships is the foundation of the
personalized technologies that Robert Scoble and Shel Israel put
under the heading of “The Age of Context”: As exemplified by
Google, the cross-linking of individual data gives rise to semantic
knowledge, which the software can leverage to anticipate the user’s
behavior in any given situation.

Making invisible connections visible

Despite all the buzz around Neo4j’s success, many
GraphConnect speakers recommended examining possible graph database
scenarios carefully. “Know your domain!”, as Tareq Abedrabbo from
the London consulting firm Open Credo put it.

In his talk, “Neo4j in Theory and Practice”, Abedrabbo
differentiated between between domain-centric and data-centric
applications. A classic example of the type first mentioned is a
recommendation engine: a well-defined data model with a “top-down”
design, with flexible but predictable data structures that can be
alternated by user input. Data-centric approaches, on the other
hand, are marked by a complex set of data that represent networks
of the real world.

In data-centric applications, different data sources
are typically integrated with each other. The design follows the
“bottom-up ” principle. A stereotypical example would be
telecommunication networks. Although graph technologies are
“naturally data-driven,” according to the speaker, the categories
are anything but clear-cut. For more domain-centric applications,
Abedrabbo recommends the use of a mapping framework such as Spring
Data Neo4j.

Visualization strategies were presented by Joe Parry
from the startup Cambridge Intelligence, which specializes in this
area. “Data is invisible, but the user has to see it,” was one of
his key messages. The essential task of visualizers is to make the
graphical presentation of data semantically unambiguous and thus
make the intangible tangible, he said.

For example, a thick connecting line (edge, in graph
theory) between two objects (nodes) represents a particularly
strong nexus – obvious, one might think. Nevertheless, the same
mistakes are being made over and over. For Parry, 3D
visualizations, poor color schemes, missing tooltips and lack of
interaction are some of the most common mistakes.

In terms of making invisible connections visible, Glen
Ford (Zeebox) arguably provided the most vivid example of the day:
From the graph model in his talk “Graphing the Second Screen”, he
clearly showed that former “Dr. Who” actor Tom Baker, not only made
an appearance in “Blackadder“, but is also featured as the narrator
in “Little Britain” – a connection which, despite the popularity of
the series, even the most ardent fans in the audience were probably
unaware of.

Many use cases, one thing in common: Performance

The afternoon of the main conference offered a wide
range of testimonials and application scenarios, in which the
different strengths of Neo4j were shown, whether in gene expression
analysis, in studying a user network of a gaming platform, in
building an on-board entertainment system for Lufthansa Systems AG,
or in an impact analysis of web service and cloud integrations.
“Graphs are epidemic,” stated Toby O’Rourke of Gamesys, prompting
one participant to tweet the following:

 

A frequently cited Neo4j selling point is the
performance increase for queries. A number that was particularly
noteworthy in this context was presented by Sebastian Verheughe
from the Norwegian telecommunications company Telenor : With SQL
queries in the company’s old database system, queries had taken
about 20 minutes. “You could go to lunch while running the query,”
quipped Verheughe. He explained that, with Neo4j , the query time
was reduced “from 20 minutes to seconds”, provided that the Java
code runs fast and that one has a deep understanding of how
traversal works. Although an in-memory database system could do a
similar job, one is also bound to SQL and its
complexity.

Cypher and Neo4j 2.0

Compared to that, the first steps with Cypher,
the declarative ASCII query language are a breeze, as Ian Robinson
demonstrated in his keynote. In a simple case study, he translated
a request (literally: “Which colleagues have similar skills to
me?”) into a data model, a graph pattern and finally a Cypher
query. From Neo4j 2.0 (expected in December) onwards, Cypher will
not only be applicable to graphs, but also to
collections.

The next major release brings another new feature, which
Eifrem highlighted in his keynote: node labels. This means that it
will be possible to provide a node with any number of labels. He
also explained that, from 2014, focus will be on the development on
the user experience. In general, it will be easier to use schemes,
which will remain optional.

For those who wish to get more of a feel for Neo4j and the
innovations around it, we recommend watching this quick

intro by Stefan Armbruster,
which was filmed for JAX TV.

Although Neo4j is also often lumped together with other “NoSQL”
technologies, virtually no one at GraphConnect used this term, as
an astute participant tweeted. The self-confidence of the
community, and their confidence in the technology to position
itself as superfluous to overarching or related trends is an
identification of this. And perhaps in the long run, a more
sophisticated understanding of newer database systems will
supersede the “NoSQL” label not only from within, but also beyond
the Neo4j community.

Image by Fedor_Ø

Author

DianaKupferJAXEditorialTeam

All Posts by DianaKupferJAXEditorialTeam

Comments
comments powered by Disqus