The head of the herd
Hadoop 2 is GA: the elephant returns with added YARN
After months of anticipation, and four years of hard graft, the Apache Software Foundation (ASF) has finally drawn back the curtains to usher forward the next generation version of open source data processing program Hadoop.
Capable of running multiple applications simultaneously to enable users to quickly and efficiently leverage data in multiple ways at supercomputing speed, Hadoop 2 has now achieved the level of stability and enterprise-readiness required by Apache to go GA.
The big news about this release is the addition of YARN (Yet Another Resource Negotiator), which splits key functions into two separate daemons, with resource management in one, and job scheduling and monitoring in the other, broadening Hadoop’s processing options and architecture. It perches on top of the HDFS (Hadoop Distributed File System) and serves as a large-scale, distributed operating system for big data applications, enabling multiple applications to run simultaneously for more efficient support of data throughout its entire lifecycle.
The addition of YARN is critical to the future of Hadoop. Although MapReduce’s batch approach was a driving factor in initial adoption of the program, its inability to multitask and provide satisfactory real-time analytics has been a bugbear for some people in recent years. YARN will hopefully dispel a good deal of people’s reservations about employing Hadoop going forward.
Other additions include support for Microsoft Windows, snapshots for data in Apache Hadoop HDFS, and NFS-v3 Access for Apache Hadoop HDFS.
Original Hadoop creator and ASF Board member Doug Cutting, credits a large portion of Hadoop’s success to Apache's open-source model, which he says has permitted a wide range of users and vendors to productively collaborate on “a platform shared by all”.
Since the platform was created in 2005 as part of Yahoo’s Nutch search engine project, Hadoop has gone from strength to strength, and has been utilised by everyone from AOL and Rackspace to Apple, Facebook, and Twitter. The explosion of social media and concomitant surge in demand for hugely scalable real time data processing has proved a bit of a stumbling block for the little elephant. However, this release shows that the ASF and huge community around it are more than capable of rising to the challenge of keeping Hadoop relevant for years to come.
Image by NuttyIrishmanKnits