Facebook improves MapReduce job scheduling with Corona
Social networks engineers open source scheduling framework designed to optimise job scheduling efficiency.
first social network to obtain one billion active users is likely
to have a big data problem, and indeed generates “over half a
petabyte of new data” every 24 hours.
While the company use MapReduce as the foundation of their data
crunching infrastructure, they found themselves “reaching the
limits of that system” last year, mostly in the capabilities of
MapReduce’s scheduling framework.
Their solution was Corona, which
in an introductory blog post they describe as a “scheduling
framework that separates cluster resource management from job
coordination”. A cluster manager tracks nodes in the cluster and
free resources, while each job has its own dedicated job tracker,
either sharing the client’s process or having a process of its
The team report that, since deploying Corona, slot refill times
have reduced from 10 seconds to 600 milliseconds, while a test job
run every four minutes has seen its latency drop by roughly
Facebook is notorious for building everything in-house, often
finding existing technologies too slow for their needs. For
example, since 2010 the site has run not on PHP itself, but PHP
transformed by HipHop into C++.
Facebook aren’t the first to work on improving handling of
MapReduce tasks: YARN, also known as MapReduce 2.0, also splits the
JobTracker into two separate daemons – a global ResourceManager and
per-application ApplicationMaster. However, citing
incompatibilities with their modified version of Hadoop Distributed
File System that would be “time-prohibitive and risky to fix”,
Facebook instead opted to build their own version from
If you’re interested in the gritty details, the
source has been made available on GitHub.