Apache Hadoop 3.2 is here!
Big data was born on the back of Apache Hadoop. Today, we take a look at the latest version of this open source framework for distributed computing, Hadoop 3.2! Highlights include powerful new features and deep learning applications.
Apache Hadoop is rightly seen as the foundation for the modern big data ecosystem, so it’s exciting to see the latest version for this open source framework for scalable, distributed computing! Hadoop 3.2 arrived earlier this week with a number of upgrades, improvements, and more!
“This latest release … diversifies the platform by building on the cloud connector enhancements from Apache Hadoop 3.0.0 and opening it up for deep learning use-cases and long-running apps,” said Vinod Kumar Vavilapalli, Vice President of Apache Hadoop.
This marks the first stable release of Apache Hadoop 3.2 line. It contains a whopping 1092 bug fixes, improvements and enhancements since 3.1.0. There’s a full list of all the major changes available here.
Hadoop 3.2 highlights
One of the biggest and most active Apache Software communities, the Hadoop community continues to help drive big data innovation. Here are some of the latest improvements to Hadoop:
- ABFS Filesystem connector now supports the latest Azure Datalake Gen2 Storage.
- The Enhanced S3A connector comes with more resilience to throttled AWS S3 and DynamoDB IO.
- Node Attributes Support in YARN helps tag multiple labels on the nodes based on its attributes and supports placing the containers based on expression of these labels.
- Storage Policy Satisfier supports HDFS (Hadoop Distributed File System) applications to move blocks between storage types as they set the storage policies on files/directories.
- The Hadoop Submarine enables data engineers to easily develop, train, and deploy deep learning models in TensorFlow on very same Hadoop YARN cluster.
- C++ HDFS client helps to do async IO to HDFS for downstream projects like Apache ORC.
Additionally, Hadoop 3.2 also comes with upgrades for long-running services. This includes things like support for in-place, seamless upgrades for long-running containers through the YARN Native Service API and CLI.
This project also includes a number of modules, like Hadoop Common, Hadoop Distributed File System (HDFS), Hadoop YARN, Hadoop MapReduce, and Hadoop Ozone.
Getting Apache Hadoop
Apache Hadoop is an open source project under the Apache Software Foundation and it relies on developers like you to grow and evolve. Contributions are welcome and necessary for the health of this vital foundation for the modern big data ecosystem. More information is available here.