Nothing to do with beer, sadly

Facebook improves MapReduce job scheduling with Corona

Elliot Bentley
facebook-corona1

Social network’s engineers open source scheduling framework designed to optimise job scheduling efficiency.

The
first social network to obtain one billion active users is likely
to have a big data problem, and indeed generates “over half a
petabyte of new data” every 24 hours.

While the company use MapReduce as the foundation of their data
crunching infrastructure, they found themselves “reaching the
limits of that system” last year, mostly in the capabilities of
MapReduce’s scheduling framework.

Their solution was Corona, which
in an introductory blog
post they describe as a “scheduling
framework that separates cluster resource management from job
coordination”. A cluster manager tracks nodes in the cluster and
free resources, while each job has its own dedicated job tracker,
either sharing the client’s process or having a process of its
own.

The team report that, since deploying Corona, slot refill times
have reduced from 10 seconds to 600 milliseconds, while a test job
run every four minutes has seen its latency drop by roughly
half.

Facebook is notorious for building everything in-house, often
finding existing technologies too slow for their needs. For
example, since 2010 the site has run not on PHP itself, but PHP
transformed by HipHop into C++.

Facebook aren’t the first to work on improving handling of
MapReduce tasks: YARN, also known as MapReduce 2.0, also splits the
JobTracker into two separate daemons – a global ResourceManager and
per-application ApplicationMaster. However, citing
incompatibilities with their modified version of Hadoop Distributed
File System that would be “time-prohibitive and risky to fix”,
Facebook instead opted to build their own version from
scratch.

If you’re interested in the gritty details, the
source has been made available on GitHub
.

Photo by
plasticpeople
.

Author
Comments
comments powered by Disqus