Prepare for next-gen Hadoop
Apache Hadoop 2.0 Alpha released
If you're a regular reader of this site, you will have no doubt heard me proclaiming that Hadoop is set to have a barnstorming 2012, or what is rather more likely, heard me prattling on about it. But with good reason of course.
At the turn of the year, Apache Hadoop 1.0 landed, more as a flare to the enterprise world signalling that it was ready for them to adopt as part of their setup. Since then, the team have been eagerly plotting what comes next for it.
Fortunately this week for us, the Hadoop team at Hortonworks offered a peep at Apache Hadoop 2.0 through an alpha release. Release manager for Hadoop, Arun Murthy was pleased to reveal all to the wider Apache Hadoop community, but was keen to stress that this was a preview release, thus not ready to be deployed in production scenarios. Not that it really mattered, there's enough to salivate over.
The future of large data processor MapReduce has been placed in YARN, which was first introduced in 0.23, and it has really been bolstered moving towards Hadoop 2.0. YARN rethinks MapReduce's old ways by breaking Hadoop's JobTracker functionality into two daemons. ResourceManager manages the resources in any given cluster, whilst ApplicationManager handles job scheduling per application and negotiates with ResourceManager to siphon off enough resources to run its application.
Other new features promised for Hadoop 2.0 include HDFS HA (manual failover) and HDFS Federation, alongside general performance enhancements, giving Hadoop another turbo boost in dealing with vast amount of data storage and processing. Another useful architectural improvement is through new wire compatibility for both HDFS and YARN, by using protobufs instead to communicate.
Murthy also detailed some of the features planned along the road towards the next stable version, all of which will come from the wider community. These include HDFS Snapshots and auto-failover for HA NameNode - which lead Murthy to believe that there are 'definitely good times ahead' for Hadoop itself.
We couldn't agree more. Hadoop 2.0 looks to be a proper rethinking of the first effort, taking onboard experiences learned and pushing the boundaries a bit further for the Big Data phenomenon. Now the real hard work begins towards sort out all the APIs and testing the new features before we can see Hadoop 2.0 in action properly.