Microsoft all-in on Hadoop?
Apache Hadoop 0.23.0 released as Microsoft seemingly switch allegiances
After over a year without a major release, we can finally get to grips with big data framework Apache Hadoop 0.23.0. The Cloudera team promise some new exciting improvements from the 0.20 series such as the HDFS federation and new MapReduce frameworks.
Other add-ons include a new Maven build system plus Kerberos HTTP SPNEGO support, however the team are staying fairly tight-lipped over future updates.
The timing of the release couldn't have come better for the data intensive company as Microsoft discontinued their big data project Dryad, instead opting for a Windows implementation of Hadoop - showing how they are dominating the Big Data market all on their lonesome. The Windows server test for is scheduled for next year. In last month did Microsoft say it was committed to working towards an alternative to Hadoop but it seems not anymore - which is testament to Hadoop's efforts.
Features of 0.23.0
- The big upgrade here is through scalability by allowing multiple independent namenodes, each managing a portion of the namespace.
- The next generation MapReduce 2 is a re-write of the the MapReduce runtime to overcome scalability bottlenecks in the jobtracker. It is based on a new framework called YARN for cluster resource management, and a MapReduce “application” which runs users’ jobs on YARN. In this design MapReduce becomes a user-space library, and also allows other parallel applications to run on Hadoop clusters, beside MapReduce applications. However be aware that 0.23.0 doesn't come with MapReduce 1 (which runs jobtrackers and tasktrackers) but does support the old APIs
The team do stress that this version is not ready for production use just now and should be used for testing at this early stage, ahead of later 0.23 versions. For further details, Cloudera have chronicled it all on their website.