Microsoft all-in on Hadoop?

Apache Hadoop 0.23.0 released as Microsoft seemingly switch allegiances

Chris Mayer

The Apache PMC votes for next major update

After over a year without a major release, we can finally get to
grips with big data framework Apache Hadoop
. The Cloudera team promise some new exciting
improvements from the 0.20 series such as the HDFS federation
and new MapReduce frameworks.

Other add-ons include a new Maven build system
plus Kerberos
, however the team are staying fairly
tight-lipped over future updates.

The timing of the release couldn’t have come better for the data
intensive company as Microsoft
their big data project Dryad, instead opting for a
Windows implementation of Hadoop – showing how they are dominating
the Big Data market all on their lonesome. The Windows server test
for is scheduled for
next year
. In last month did Microsoft say it was committed to
working towards an alternative to Hadoop but it seems not anymore –
which is testament to Hadoop’s efforts.

Features of

  • The big upgrade here is through scalability by allowing
    multiple independent namenodes, each managing a portion of the
  • The next generation MapReduce 2 is a re-write of the the
    MapReduce runtime to overcome scalability bottlenecks in the
    jobtracker. It is based on a new framework called YARN for cluster
    resource management, and a MapReduce “application” which runs
    users’ jobs on YARN. In this design MapReduce becomes a user-space
    library, and also allows other parallel applications to run on
    Hadoop clusters, beside MapReduce applications. However be aware
    that 0.23.0 doesn’t come with MapReduce 1 (which runs jobtrackers
    and tasktrackers) but does support the old APIs

The team do stress that this version is not ready for production
use just now and should be used for testing at this early stage,
ahead of later 0.23 versions. For further details, Cloudera have
chronicled it all on their

comments powered by Disqus