Apache Hadoop 3.0 is here
© Shutterstock / is am are
Hadoop is back! The latest version [3.0.0] of the Open Source software framework for reliable, scalable, distributed computing brings a lot of new features, including an early preview (alpha 2) of a major revision of YARN Timeline Service, shell script rewrite and more.
Apache Hadoop 3.0 is here! According to Andrew Wang, Apache Hadoop 3 release manager, this version “represents the combined efforts of hundreds of contributors over the five years since Hadoop 2.”
Furthermore, this is their biggest release ever so that’s one more reason to celebrate.
Proud to announce that Apache @Hadoop 3.0.0 is GA! It incorporates over 6,000 changes since we started on 3.0.0 over a year ago. Thanks go out to all the contributors who helped make this release possible.
— Andrew Wang (@umbrant) December 14, 2017
The platform continues to evolve with hardware trends and to accommodate new workloads beyond batch analytics, particularly real-time queries and long-running services. At the same time, our Open Source contributors have adapted Apache Hadoop to a wide range of deployment environments, including the Cloud.
Apache Hadoop 3.0: Major changes
Minimum required Java version increased to Java 8
All Hadoop JARs are now compiled targeting a runtime version of Java 8, which means that those of you who are still using Java 7 or below should upgrade to Java 8.
Early preview of YARN Timeline Service major revision
Hadoop 3.0 also brings an early preview (alpha 2) of a major revision of YARN Timeline Service: v.2, which addresses two major challenges:
- improving scalability and reliability of Timeline Service
- enhancing usability by introducing flows and aggregation
SEE ALSO: Is Hadoop losing its spark?
Shell script rewrite
The Hadoop shell scripts have been rewritten to fix many long-standing bugs and include some new features. However, keep in mind that some changes could break existing installations.You’ll find the incompatible changes in the release notes, with related discussion on HADOOP-9902.
MapReduce task-level native optimization
MapReduce has added support for a native implementation of the map output collector. For shuffle-intensive jobs, this can lead to a performance improvement of 30% or more.
Shaded client jars
The hadoop-client Maven artifact available in 2.x releases pulls Hadoop’s transitive dependencies onto a Hadoop application’s classpath. This can be problematic if the versions of these transitive dependencies conflict with the versions used by the application.
HADOOP-11804 adds new hadoop-client-api and hadoop-client-runtime artifacts that shade Hadoop’s dependencies into a single jar. This avoids leaking Hadoop’s dependencies onto the application’s classpath.
You can find all the major changes here.