Powder keg friendly

Apache Spark officially graduates from incubator

Lucy Carey
spark.1

Following a high profile endorsement from Cloudera earlier this month, Hadoop oriented hatchling and MapReduce contender is judged mature enough to take next big steps.

 

Fast data-processing tool Apache Spark has just been
promoted out of the incubator – a sure sign that it’s making steady
progress towards becoming a fully fledged member of the Hadoop
pachyderm parade.

Apache Spark first flickered into life at the
University of California’s AMPLab back in 2009, going open source
in 2010.  It’s designed to be an ultra rapid layer for data
analytics within the open-source Hadoop file system, as well as
other shared file systems like NFS. It also has a  general
execution model that can optimize arbitrary operator graphs, and
supports in-memory computing, allowing it to query data more
rapidly than disk-based engines like the slow-but-steady
Hadoop.

The project received a high profile boost this month when
Hadoop-based Big Data providers Cloudera announced
commercial support
for the software, citing the
technology’s ability to run more variable workloads and speedier
operation than MapReduce
engine
as key deciding factors. Traditionally, MapReduce
has been the a central fixture of all Hadoop
implementations.

Speaking about the enterprise backing, Hadoop
co-creator Doug Cutting, who was also responsible for casting one
of the deciding votes for the graduation of Spark, commented that,
although he didn’t expect Spark to render MapReduce entirely
obsolete, “Over time, fewer projects will use MapReduce, and more
will use Spark.” Exceptions to this rule will be “one-shot”
projects, of which make up  sizeable category on their
own.

The next steps for Spark will be the establishment of
a dedicated project management committee. Matei Zaharia, the
co-founder of Databricks, the
company created to support Apache Spark, will step into the role of
‘Vice President, Apache Spark’.

Other fledging Spark adopters include IBM’s Almaden
research group, Yahoo!, Alibaba, TrendMicro, and Baidu. There’s an
active dev community around it, with more than 120 coders from 25
companies having contributed source code to date. It may have just
stumbled out into the big bad world, but this juvenile technology
is certainly growing up fast.

Image by
Daniel Dionne

Author
Comments
comments powered by Disqus