Prepare for next-gen Hadoop

Apache Hadoop 2.0 Alpha released

Chris Mayer

Now Hadoop 1.0 has settled into the fold, we get a glimpse at what the next version will look like, as the team release an alpha ‘preview release’ of Hadoop 2.0

If you’re a regular reader of this site, you will have no doubt
heard me proclaiming that Hadoop is set to have a barnstorming
2012, or what is rather more likely, heard me prattling on about
it. But with good reason of course.

At the turn of the year, Apache Hadoop 1.0 landed, more as a
flare to the enterprise world signalling that it was ready for them
to adopt as part of their setup. Since then, the team have been
eagerly plotting what comes next for it.

Fortunately this week for us, the Hadoop team at Hortonworks
offered a peep at Apache Hadoop 2.0 through an alpha release.
Release manager for Hadoop, Arun Murthy was pleased to reveal
to the wider Apache Hadoop community, but was keen to
stress that this was a preview release, thus not ready to be
deployed in production scenarios. Not that it really mattered,
there’s enough to salivate over.

The future of large data processor MapReduce has been placed in
YARN, which was first introduced in 0.23, and it has really been
bolstered moving towards Hadoop 2.0. YARN rethinks MapReduce’s old
ways by breaking Hadoop’s JobTracker functionality into two
daemons. ResourceManager manages the resources in any given
cluster, whilst ApplicationManager handles job scheduling per
application and negotiates with ResourceManager to siphon off
enough resources to run its application.

Other new features promised for Hadoop 2.0
HA (manual failover)
 and HDFS
, alongside general performance enhancements, giving
Hadoop another turbo boost in dealing with vast amount of data
storage and processing. Another useful architectural improvement is
through new wire compatibility for both HDFS and YARN, by
protobufs instead to

Murthy also detailed some of the features planned along
the road towards the next stable version, all of which will come
from the wider community. These include HDFS
 and auto-failover for HA
 - which lead Murthy to believe that there are
‘definitely good times ahead’ for Hadoop itself.

We couldn’t agree more. Hadoop 2.0 looks to be a proper
rethinking of the first effort, taking onboard experiences learned
and pushing the boundaries a bit further for the Big Data
phenomenon. Now the real hard work begins towards sort out all the
APIs and testing the new features before we can see Hadoop 2.0 in
action properly.

For the meantime at least, visit
Apache Hadoop
 page to download hadoop-2.0.0-alpha and
visit the 
Documentation page for more

comments powered by Disqus