Another project gets its stripes

Apache promotes Sqoop to Top Level Project status

Chris Mayer

The Hadoop to SQL connector gets top billing at Apache.

The Apache Software Foundation today announced that bulk data transfer tool, Apache Sqoop is the latest in the wave of Apache projects to graduate from the Apache Incubator to become a Top-Level Project.

Apache Sqoop is designed to ease the transition of moving bulk data between big data framework Apache Hadoop and structured datastores such as relational databases and NoSQL. It also allows users to import data from external datastores and enterprise data warehouses into Hadoop Distributed File System (HDFS) or related systems like Apache Hive and HBase, such is the flexibility.

“The Sqoop Project has demonstrated its maturity by graduating from the Apache Incubator,” explained Arvind Prabhakar, Vice President of Apache Sqoop. “With jobs transferring data on the order of billions of rows, Sqoop is proving its value as a critical component of production environments.”

As Hadoop has grown, so has the sheer data needing to be transferred by enterprise. Accessing this data from MapReduce applications running on large clusters is a challenging task so this is where Sqoop comes in, providng fast performance and best utilization of system and network resources. In addition, Sqoop allows fast copying of data from external systems to Hadoop to make data analysis more efficient and mitigates the risk of excessive load to external systems.

“Connectivity to other databases and warehouses is a critical component for the evolution of Hadoop as an enterprise solution, and that’s where Sqoop plays a very important role” said Deepak Reddy, Hadoop Manager at “We use Sqoop extensively to store and exchange data between Hadoop and other warehouses like Netezza. The power of Sqoop also comes in the ability to write free-form queries against structured databases and pull that data into Hadoop.”

“Sqoop has been an integral part of our production data pipeline” said Bohan Chen, Director of the Hadoop Development and Operations team at Apollo Group. “It provides a reliable and scalable way to import data from relational databases and export the aggregation results to relational databases.”

Since entering the Apache Incubator in June of last year, Sqoop was quickly embraced as the ideal choice for a SQL-to-Hadoop data transfer solution. Sqoop also connects to other systems such as MySQL, PostgreSQL, Oracle, SQL Server and DB2, and allows for the development of drop-in connectors that provide high speed connectivity with specialized systems like enterprise data warehouses.

With this nod from the Apache Foundation, it looks like the road is set for Apache Sqoop to become a huge part of the Hadoop landscape.

Inline Feedbacks
View all comments