Facebook reveal trillion edge version of Apache Giraph in Graph Search
The social network open up about their usage of the Google Pregel-imitating graph processing within their Graph Search tool.
When nascent graph processing platform Apache Giraph hit its
1.0 release back in May, the names associated with the project
added prestige to the announcement. Social networks Twitter,
LinkedIn and Facebook were all namechecked as production users, but
we weren’t privy to just how much they were experimenting.
in a blogpost yesterday, Facebook detailed
just how much faith they were putting the Google Pregel-mimicking
project, using it as a fundamental cog in their Graph Search tool.
With a little rejigging, the company have managed to scale Giraph
to analyse trillions of edges (connections) in less than four
Software engineer Avery Ching explained that Facebook’s hunt for
available scalable software was “impossible last year” when
creating Graph Search.
“We needed a programming framework to express a
wide range of graph algorithms in a simple way and scale them to
massive datasets,” he wrote, pointing to
Giraph as the solution to their requirements. Facebook discarded
other choices Apache Hive and GraphLab, as they
couldn’t compete with Giraph’s speed or
Ching adds that Giraph was also selected for being
able to run as a MapReduce job and for being written in Java, like
the rest of Facebook’s stack. The ease of which Giraph can read
graphs with HiveIO, Facebook’s own blend of the Apache data
warehousing technology, is another plus point for the platform.
Scalability has been further improved with the link to event-driven
Facebook’s trillion-edge real world test over 200
commodity machines is believed to be the biggest ever in graph
programming. According to Ching, the biggest reported benchmarks so
Twitter’s graph with 1.5 billion edges and the
Yahoo! Altavista graph, comprised of 6.6 billion. Facebook’s
social graph is “2 orders of magnitude beyond that scale” according
Even better news for the Apache Giraph community
is that Facebook are pumping this code back into the
trunk of the
project, making Giraph more
scalable and less memory intensive than previously.
While Graph Search may not yet be mature, having Facebook
demonstrate the potential of the mammoth iterative processing
platform should turn a few heads.
Images courtesy of Facebook