Not only are epidemics graphs: Graphs are an epidemic
Turning data into knowledge: JAXenters takings from GraphConnect London 2013.
Neo4j is, without a doubt, a pioneering technology. “Graph database”, the term with which it has become synonymous, had not even been coined when this Java technology first emerged over a decade ago.
Since its launch, Neo4j has become an impressive success story – and the hype continues to grow. In recent years, Neo Technology, the company which has officially sponsored its development since 2007, has also organized conferences to celebrate the rise of Neo4j – and not only connected data, but also the growing community around it. The fifth, and the inaugural European, “Graph Connect” event was held on the 18th and 19th November in London’s Dexter House, and attracted both users and fans of Neo4j.
But let’s backtrack a bit. The Neo4j story actually began in 2000. Back then, it was already clear that the storage capacities of data with traditional database systems were rapidly being exhausted. Stored data links were too rich, JOIN operations were too painstaking, and query speeds were just too low.
Little surprise, then, that relational databases have subsequently faced competition from NoSQL alternatives. Even key-value stores, document and column-oriented databases, which Martin Fowler summed up with the term “Aggregate-oriented Databases“, failed to take the connections between the data sufficiently into account.
For this reason, the Neo4j founders envisioned a realistic information storage system in the form of networks and links. One that would store not only the data, but also their complex web of relationships – and make them totally visible. Back then, “Nobody was talking about graph databases,” recalled Emil Eifrem, CEO of Neo Technology, in his conference opening keynote. In those days, he and his team used terms such as “Network-oriented database” or the shortened version: “Netbase“.
Today, thirteen years later, graph technologies have seen a virtually meteoric rise, which Eifrem noted is demonstrated with a simple Google Trend search for the term “Neo4j”.
The rise and rise of Neo4j in Google Trends.
Between 30 and 40 major companies today utilise Neo4j, including Hewlett-Packard, Deutsche Telekom, Oracle, IBM, and Cisco. Neo4j is now even represented in the insurance sector – a market you wouldn’t necessarily think would have a need to integrate cutting-edge technology into its enterprise IT. At GraphConnect, Frederik Wilhelm and Dr. Andreas Transforms from Intelligence Solutions AG reported how they convinced the insurance group “The Bavarian” to adopt Neo4j, while the alternative option, ObjectDB, had to draw the short straw.
In 2012, big players such as Google and Facebook jumped onto the graph bandwagon with Knowledge Graph Search (“Neo4j Cypher for non-techies,” as Patrick Baumgartner jokes), consolidating Emil Eifrem, Peter Neubauer and their colleagues’ position as pioneers in the field.
In early 2013, an impressive blog post by developer Max De Marzi outlined how information from Facebook in can be transformed into Cypher statements – Cypher being Neo4j’s own query language.
The visualization of large amounts of data and information networks is a growing branch of science, underpinned not so much by an aesthetic playfulness, but by the concrete need to make the relationships among “Big Data” visible, and thus understandable.
What’s behind this hype? This is how Eifrem explains it: Up until 1999, Internet search engines such as AltaVista were keyword-based. Shortly before the turn of the millennium Google initiated a paradigm change with its search algorithm PageRank –moving away from discrete data towards connected data: “Not only did they store the documents, but also how they relate to each other,” said Eifrem. Thus, a shift took place, from keyword search to “social discovery” (Eifrem ) – and that’s where technologies play off their strengths.
Looking at the bigger picture, the realistic representation of data relationships is the foundation of the personalized technologies that Robert Scoble and Shel Israel put under the heading of “The Age of Context”: As exemplified by Google, the cross-linking of individual data gives rise to semantic knowledge, which the software can leverage to anticipate the user’s behavior in any given situation.
Making invisible connections visible
Despite all the buzz around Neo4j’s success, many GraphConnect speakers recommended examining possible graph database scenarios carefully. “Know your domain!”, as Tareq Abedrabbo from the London consulting firm Open Credo put it.
In his talk, “Neo4j in Theory and Practice”, Abedrabbo differentiated between between domain-centric and data-centric applications. A classic example of the type first mentioned is a recommendation engine: a well-defined data model with a “top-down” design, with flexible but predictable data structures that can be alternated by user input. Data-centric approaches, on the other hand, are marked by a complex set of data that represent networks of the real world.
In data-centric applications, different data sources are typically integrated with each other. The design follows the “bottom-up ” principle. A stereotypical example would be telecommunication networks. Although graph technologies are “naturally data-driven,” according to the speaker, the categories are anything but clear-cut. For more domain-centric applications, Abedrabbo recommends the use of a mapping framework such as Spring Data Neo4j.
Visualization strategies were presented by Joe Parry from the startup Cambridge Intelligence, which specializes in this area. “Data is invisible, but the user has to see it,” was one of his key messages. The essential task of visualizers is to make the graphical presentation of data semantically unambiguous and thus make the intangible tangible, he said.
For example, a thick connecting line (edge, in graph theory) between two objects (nodes) represents a particularly strong nexus – obvious, one might think. Nevertheless, the same mistakes are being made over and over. For Parry, 3D visualizations, poor color schemes, missing tooltips and lack of interaction are some of the most common mistakes.
In terms of making invisible connections visible, Glen Ford (Zeebox) arguably provided the most vivid example of the day: From the graph model in his talk “Graphing the Second Screen”, he clearly showed that former “Dr. Who” actor Tom Baker, not only made an appearance in “Blackadder“, but is also featured as the narrator in “Little Britain” – a connection which, despite the popularity of the series, even the most ardent fans in the audience were probably unaware of.
Many use cases, one thing in common: Performance
The afternoon of the main conference offered a wide range of testimonials and application scenarios, in which the different strengths of Neo4j were shown, whether in gene expression analysis, in studying a user network of a gaming platform, in building an on-board entertainment system for Lufthansa Systems AG, or in an impact analysis of web service and cloud integrations. “Graphs are epidemic,” stated Toby O’Rourke of Gamesys, prompting one participant to tweet the following:
A frequently cited Neo4j selling point is the performance increase for queries. A number that was particularly noteworthy in this context was presented by Sebastian Verheughe from the Norwegian telecommunications company Telenor : With SQL queries in the company’s old database system, queries had taken about 20 minutes. “You could go to lunch while running the query,” quipped Verheughe. He explained that, with Neo4j , the query time was reduced “from 20 minutes to seconds”, provided that the Java code runs fast and that one has a deep understanding of how traversal works. Although an in-memory database system could do a similar job, one is also bound to SQL and its complexity.
Cypher and Neo4j 2.0
Compared to that, the first steps with Cypher, the declarative ASCII query language are a breeze, as Ian Robinson demonstrated in his keynote. In a simple case study, he translated a request (literally: “Which colleagues have similar skills to me?”) into a data model, a graph pattern and finally a Cypher query. From Neo4j 2.0 (expected in December) onwards, Cypher will not only be applicable to graphs, but also to collections.
The next major release brings another new feature, which Eifrem highlighted in his keynote: node labels. This means that it will be possible to provide a node with any number of labels. He also explained that, from 2014, focus will be on the development on the user experience. In general, it will be easier to use schemes, which will remain optional.
For those who wish to get more of a feel for Neo4j and the innovations around it, we recommend watching this quick intro by Stefan Armbruster, which was filmed for JAX TV.
Although Neo4j is also often lumped together with other “NoSQL” technologies, virtually no one at GraphConnect used this term, as an astute participant tweeted. The self-confidence of the community, and their confidence in the technology to position itself as superfluous to overarching or related trends is an identification of this. And perhaps in the long run, a more sophisticated understanding of newer database systems will supersede the “NoSQL” label not only from within, but also beyond the Neo4j community.
Image by Fedor_Ø