Powder keg friendly

Apache Spark officially graduates from incubator

Lucy Carey

Following a high profile endorsement from Cloudera earlier this month, Hadoop oriented hatchling and MapReduce contender is judged mature enough to take next big steps.


Fast data-processing tool Apache Spark has just been promoted out of the incubator – a sure sign that it’s making steady progress towards becoming a fully fledged member of the Hadoop pachyderm parade.

Apache Spark first flickered into life at the University of California’s AMPLab back in 2009, going open source in 2010.  It’s designed to be an ultra rapid layer for data analytics within the open-source Hadoop file system, as well as other shared file systems like NFS. It also has a  general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, allowing it to query data more rapidly than disk-based engines like the slow-but-steady Hadoop.

The project received a high profile boost this month when Hadoop-based Big Data providers Cloudera announced commercial support for the software, citing the technology’s ability to run more variable workloads and speedier operation than MapReduce engine as key deciding factors. Traditionally, MapReduce has been the a central fixture of all Hadoop implementations.

Speaking about the enterprise backing, Hadoop co-creator Doug Cutting, who was also responsible for casting one of the deciding votes for the graduation of Spark, commented that, although he didn’t expect Spark to render MapReduce entirely obsolete, “Over time, fewer projects will use MapReduce, and more will use Spark.” Exceptions to this rule will be “one-shot” projects, of which make up  sizeable category on their own.

The next steps for Spark will be the establishment of a dedicated project management committee. Matei Zaharia, the co-founder of Databricks, the company created to support Apache Spark, will step into the role of ‘Vice President, Apache Spark’.

Other fledging Spark adopters include IBM’s Almaden research group, Yahoo!, Alibaba, TrendMicro, and Baidu. There’s an active dev community around it, with more than 120 coders from 25 companies having contributed source code to date. It may have just stumbled out into the big bad world, but this juvenile technology is certainly growing up fast.

Image by Daniel Dionne
Inline Feedbacks
View all comments