Twitter's open source flock

Twitter open sources graph processing library Cassovary

Chris Mayer
twitter-tease

Yet another gift from the social networking behemoth – this one being a “big graph” processing library for the JVM.

Twitter’s recent open source charm-offensive has continued with
big graph processing library Cassovary being contributed to
GitHub.

The social networking giant has already been right on trend by
offering Scala cascading library Scalding
and Cassandra client Cassie to the masses, but now it’s Cassovary’s
turn for attention. Cassovary is written in Scala (but can be used
with other JVM languages) and is described as ‘a simple “big
graph” processing library for the JVM,’ that is importantly space
efficient unlike many JVM-hosted graph libraries.

Revealing the open sourcing on Twitter’s engineering blog,
Pankaj Gupta said that Cassovary is designed from
the ground up to be able to efficiently handle graphs with billions
of nodes and edges. Given the gargantuan operation that is Twitter,
they are surely one of the best to know about how to deal with
large-scale graph mining of a big network. 

Gupta gave examples of Cassovary at work within Twitter
too:

At Twitter, Cassovary forms the bottom layer of a
stack that we use to power many of our graph-based features,
including 
“Who to
Follow”
 and “Similar
to.”
 We also use it for relevance
in 
Twitter
Search
 and the algorithms that determine which
Promoted Products users will see. Over time, we hope to bring more
non-proprietary logic from some of those product features into
Cassovary.

You may be thinking, there’s already several graph
mining libraries available but Cassovary differs from the likes
of Neo4J, the
storage sacrificing 
JUNG and C/C++
written SNAP by
deliberately being as simple as possible to use. No need
for persistence or database functionality or even
partioning like 
Apache Giraph, Cassovary
appears to stay out of the complex stuff to allow it to run
efficiently.

For more information on the Cassovary project, check out the
GitHub
page where you can also download. Following the team on Twitter
@cassovary is also a
wise move for advice.

Author
Comments
comments powered by Disqus