The SQL for graphs

Neo4j introduces openCypher, its common graph query language

Natali Vlatko
Graph tools image via Shutterstock

Co-founder of Neo4j Emil Eifrem has introduced the Oracle-backed openCypher Project – an open source enterprise that aims to provide a common graph query language for any data store, tooling or application provider to query graph data.

Emil Eifrem, CEO of Neo Technology and co-founder of Neo4j, has unveiled a new common graph query language project with the support of Oracle and Databricks, the company behind Apache Spark. The openCypher Project aims to be as conducive to the growth of graph processing and analysis “as SQL was in accelerating the adoption of RDBMS”.

Eifrem believes that there needs to be a common graph query language in order to benefit both vendors and users:

A high-quality query language that already has broad adoption is extremely valuable because of its reusability across platforms. A common language also helps grow the wider graph space, encouraging healthy competition (another advantage to users). I believe that graph query language is Cypher.

Cypher is noted as Neo Technology’s third attempt at a graph query language, where it was first released together with Neo4j 2.0 as a first-class citizen. A first-class citizen is the term describing how graph databases store connections, which Neo4j claims is more efficient than other databases.

This release led to wide-spread adoption of Cypher by Neo4j users, thanks to its use of symbols to express patterns that correspond to a visual understanding and intuitive representation of data.

As for the features of the language itself, Cypher is deemed human-readable and declarative, taking much of its keyword inspiration from SQL (WHERE and ORDER BY), where SQL is considered declarative for working with sets of data. Pattern matching borrows from SPARQL, while collection semantics have been borrowed from Haskell and Python. Some English prose also pops up throughout.

Open for discussion

Being an open source project, openCypher allows any developer or technology provider to use it and incorporate graph processing capabilities within any product or application being worked on. The project has advertised itself as wanting to deliver the following:

  • Cypher reference documentation: Comprehensive user documentation describing use of the Cypher query language with examples and tutorials.
  • Technology certification kit (TCK): The TCK consists of a number of tests that a software supplier would run in order to self-certify support for a given version of Cypher.
  • Reference implementation: Distributed under the Apache 2.0 license, the reference implementation is a fully functional implementation of key parts of the stack needed to support Cypher inside a data platform or tool. The first planned deliverable is a parser that will take a Cypher statement and parse it into an AST (abstract syntax tree) representation. The reference implementation complements the documentation and tests by providing working implementations of Cypher – which are permissively licensed – and can be used as examples or as a foundation for one’s own implementation.
  • Cypher language specification: Licensed under Creative Commons, the Cypher language specification is a technical expression of the language syntax to enable parsers to auto-generate the query syntax. A full semantic specification is also planned as a part the openCypher project.

Eifrem wants to get more people and companies involved in the project by reading through and commenting on published language proposals. He’s also invited interested parties to write their own proposal with an implementation, stating that openCypher is explicitly structured around working code.

Author
Natali Vlatko
An Australian who calls Berlin home, via a two year love affair with Singapore. Natali was an Editorial Assistant for JAXenter.com (S&S Media Group).

Comments
comments powered by Disqus