Cuddling up to the elephant

Microsoft to open source Hadoop-helper REEF framework

Chris Mayer

Just before Hadoop 2.0 and YARN hit the shelves, Microsoft reveal their piece to the enterprise puzzle – the fault-tolerant REEF framework.

With Hadoop 2.0 just around the corner, we’re
beginning to see a number of open source efforts appear related to
YARN, the next generation resource manager which lets users run
multiple types of job at the same time.

Making Hadoop capable of managing batch and
processing jobs in the same cluster has understandably created a
buzz in the community, with the potential to greatly reduce
Hadoop’s ETL and provide a new analytical purpose for the big data

One company that we didn’t expect to be joining the
big data corral so willingly however is Microsoft, who yesterday
announced their
intentions to open source REEF
(Retainable Evaluator Execution
Framework). The framework, which runs on top of YARN, aims to make
it easier to implement scalable fault-tolerant environments. It is
particularly well versed at building machine-learning jobs,
according to Microsoft CTO of Information Services, Raghu
Ramakrishnan, speaking at the International Conference for
Knowledge Mining and Data Discovery on Monday.

Details on the framework are sketchy, with only
a few conference session abstracts to go off and no technical
documentation to hand. What we do know is that  REEF seems to
be a fairly diverse framework. Through a distributed control flow
abstraction, it can support MapReduce workloads, graph processing
or iterative algorithms, such as those required for machine

In order to separate REEF from the systems built
on top of it, Microsoft have created two standalone systems: a
configuration manager dubbed Tang and event-driven data movement
framework Wake. The two are language agnostic and enable REEF to
work in JVM or .NET environments.

Further information on REEF should emerge closer to its
full open sourcing. The project’s flexibility, as well the nature
of the problems it is tackling (the ones which enterprises demand
Hadoop sort out) makes it an interesting one to watch. Equally
intriguing to monitor is Microsoft’s overt embrace of Hadoop, after
putting the buffers on

its own big data framework Dryad
in late
2011. But rather than being a Hadoop freeloader, they are finally
beginning to put back in with useful open source efforts such as
REEF, which could help shape Hadoop’s future direction.

Image courtesy of Derek Keats

comments powered by Disqus