Putting it on the map
Facebook reveal trillion edge version of Apache Giraph in Graph Search
When nascent graph processing platform Apache Giraph hit its 1.0 release back in May, the names associated with the project added prestige to the announcement. Social networks Twitter, LinkedIn and Facebook were all namechecked as production users, but we weren’t privy to just how much they were experimenting.
However, in a blogpost yesterday, Facebook detailed just how much faith they were putting the Google Pregel-mimicking project, using it as a fundamental cog in their Graph Search tool. With a little rejigging, the company have managed to scale Giraph to analyse trillions of edges (connections) in less than four minutes.
Software engineer Avery Ching explained that Facebook’s hunt for available scalable software was “impossible last year” when creating Graph Search.
“We needed a programming framework to express a wide range of graph algorithms in a simple way and scale them to massive datasets,” he wrote, pointing to Giraph as the solution to their requirements. Facebook discarded other choices Apache Hive and GraphLab, as they couldn’t compete with Giraph’s speed or performance.
Ching adds that Giraph was also selected for being able to run as a MapReduce job and for being written in Java, like the rest of Facebook’s stack. The ease of which Giraph can read graphs with HiveIO, Facebook’s own blend of the Apache data warehousing technology, is another plus point for the platform. Scalability has been further improved with the link to event-driven framework Netty.
Facebook’s trillion-edge real world test over 200 commodity machines is believed to be the biggest ever in graph programming. According to Ching, the biggest reported benchmarks so far are Twitter’s graph with 1.5 billion edges and the Yahoo! Altavista graph, comprised of 6.6 billion. Facebook’s social graph is “2 orders of magnitude beyond that scale” according to Ching.
Even better news for the Apache Giraph community is that Facebook are pumping this code back into the trunk of the project, making Giraph more scalable and less memory intensive than previously. While Graph Search may not yet be mature, having Facebook demonstrate the potential of the mammoth iterative processing platform should turn a few heads.
Images courtesy of Facebook