Major milestone alert

Apache Hadoop 3.0 is here

Gabriela Motroc
Hadoop 3.0

© Shutterstock / is am are

Hadoop is back! The latest version [3.0.0] of the Open Source software framework for reliable, scalable, distributed computing brings a lot of new features, including an early preview (alpha 2) of a major revision of YARN Timeline Service, shell script rewrite and more.

Apache Hadoop 3.0 is here! According to Andrew Wang, Apache Hadoop 3 release manager, this version “represents the combined efforts of hundreds of contributors over the five years since Hadoop 2.”

Furthermore, this is their biggest release ever so that’s one more reason to celebrate.

The platform continues to evolve with hardware trends and to accommodate new workloads beyond batch analytics, particularly real-time queries and long-running services. At the same time, our Open Source contributors have adapted Apache Hadoop to a wide range of deployment environments, including the Cloud.

Chris Douglas, Vice President of Apache Hadoop

Apache Hadoop 3.0: Major changes

Minimum required Java version increased to Java 8

All Hadoop JARs are now compiled targeting a runtime version of Java 8, which means that those of you who are still using Java 7 or below should upgrade to Java 8.

Early preview of YARN Timeline Service major revision

Hadoop 3.0 also brings an early preview (alpha 2) of a major revision of YARN Timeline Service: v.2, which addresses two major challenges:

  • improving scalability and reliability of Timeline Service
  • enhancing usability by introducing flows and aggregation


SEE ALSO: Is Hadoop losing its spark?

Shell script rewrite

The Hadoop shell scripts have been rewritten to fix many long-standing bugs and include some new features. However, keep in mind that some changes could break existing installations.You’ll find the incompatible changes in the release notes, with related discussion on HADOOP-9902.

MapReduce task-level native optimization

MapReduce has added support for a native implementation of the map output collector. For shuffle-intensive jobs, this can lead to a performance improvement of 30% or more.

Shaded client jars

The hadoop-client Maven artifact available in 2.x releases pulls Hadoop’s transitive dependencies onto a Hadoop application’s classpath. This can be problematic if the versions of these transitive dependencies conflict with the versions used by the application.

HADOOP-11804 adds new hadoop-client-api and hadoop-client-runtime artifacts that shade Hadoop’s dependencies into a single jar. This avoids leaking Hadoop’s dependencies onto the application’s classpath.

You can find all the major changes here

Gabriela Motroc
Gabriela Motroc was editor of and JAX Magazine. Before working at Software & Support Media Group, she studied International Communication Management at the Hague University of Applied Sciences.

1 Comment
Inline Feedbacks
View all comments
Ms powerpoint customer service
Ms powerpoint customer service
4 years ago

Apache Hadoop 2.8.0 contains a number of significant features and enhancements. For major features and improvements. I m ready to use it now.