Netflix unleash Genie, a Hadoop Platform-as-a-Service
The on-demand film company open source another project – this time for heavy duty Hadoop clusters in the cloud.
On-demand media giant Netflix has become synonymous with open source development in recent years, by releasing parts of its radical infrastructure through their OSS scheme. It seems as if a week doesn’t go by without another project being put out in the open, and the latest is sure to intrigue plenty of Hadoop developers.
Genie is Netflix’s own Hadoop Platform-as-a-Service, providing RESTful APIs for job and resource management. Sriram Krishnan and Eva Tse, part of Netflix’s Data Science and Engineering team, detailed the thinking behind Genie’s architecture back in January, explaining that the company had outgrown Hadoop’s traditional approach.
“At Netflix, our Hadoop-based data warehouse is petabyte-scale, and growing rapidly. However, with the big data explosion in recent times, even this is not very novel anymore,” the post reads.
“Our architecture, however, is unique as it enables us to build a data warehouse of practically infinite scale in the cloud (both in terms of data and computational power).”
Genie lets users manage multiple Hadoop clusters within their Amazon Web Services cloud and submit new jobs (Hadoop, Hive or Pig) through a RESTful API. This means there is no need to provision extra Hadoop clusters each time or install extra clients. In addition, administrators can abstract away Hadoop back-end resources in the cloud.
Netflix’s Sriram Krishnan is quick to point out in Friday’s release blogpost that Genie shouldn’t be used as a workflow scheduler, like Apache Oozie, a task scheduler or as a end-to-end resource management tool.
“Genie’s unit of execution is a single Hadoop, Hive or Pig job. Genie doesn’t schedule or run workflows – in fact, we use an enterprise scheduler (UC4) at Netflix to run our ETL.” he explained, before adding that Genie “is a key complementary tool, serving as a repository of clusters, and an API for job management.”
Although the tool has been present in Netflix’s robust architecture for several months, it’s clear that Genie isn’t a ready-made fit for everyone. Krishnan dubs the release ‘version 0’ and says the tool is definitely biased towards its creator.
Yet given time, Genie has potential to become an important part of the Hadoop community, especially for those with heavy-duty use cases. If your three wishes centre on elasticity, scalability and the fastest data warehouse around, this Genie should be yours to command.
Image courtesy of puuikibeach