The head of the herd

Hadoop 2 is GA: the elephant returns with added YARN

Lucy Carey

The souped up next generation version of the data processing program has just been declared stable and ready to go on general release.

After months of anticipation, and four years of hard graft, the
Apache Software Foundation (ASF) has finally drawn back the
curtains to usher forward the next generation version of open
source data processing program Hadoop.

Capable of running multiple
applications simultaneously to enable users to quickly and
efficiently leverage data in multiple ways at supercomputing speed,
Hadoop 2 has now achieved the level of stability and
enterprise-readiness required by Apache to go GA.

The big news about this release is the addition

(Yet Another Resource Negotiator),
which splits key functions into two separate daemons, with resource
management in one, and job scheduling and monitoring in the other,
broadening Hadoop’s processing options and architecture. It perches
on top of the
HDFS (Hadoop Distributed
File System) and serves as a large-scale, distributed operating
system for big data applications, enabling multiple applications to
run simultaneously for more efficient support of data throughout
its entire lifecycle.

The addition of YARN is
critical to the future of Hadoop. Although MapReduce’s batch
approach was a driving factor in initial adoption of the program,
its inability to multitask and provide satisfactory real-time
analytics has been a bugbear for some people in

recent years
. YARN will
hopefully dispel a good deal of people’s reservations about
employing Hadoop going forward.

Other additions include support for Microsoft
Windows, snapshots for data in Apache Hadoop HDFS, and NFS-v3
Access for Apache Hadoop HDFS.

Original Hadoop creator and ASF Board member
Doug Cutting, credits a large portion of Hadoop’s success to
Apache’s open-source model, which he says has permitted a wide
range of users and vendors to productively collaborate on “a
platform shared by all”.

Since the platform was
created in 2005 as part of Yahoo’s
search engine project, Hadoop has gone from strength to
strength, and has been utilised by everyone from AOL and Rackspace
to Apple, Facebook, and Twitter. The explosion of social media and
concomitant surge in demand for hugely scalable real time data
processing has proved a bit of a stumbling block for the little
elephant. However, this release shows that the ASF and huge
community around it are more than capable of rising to the
challenge of keeping Hadoop relevant for years to come.

Image by NuttyIrishmanKnits

comments powered by Disqus