Another project gets its stripes

Apache promotes Sqoop to Top Level Project status

Chris Mayer

The Hadoop to SQL connector gets top billing at Apache.

The Apache Software Foundation today announced that bulk data
transfer tool, Apache
is the latest in the wave of Apache projects to graduate
from the Apache Incubator to become a Top-Level Project.

Apache Sqoop is designed to ease the transition of moving bulk data
between big data framework Apache Hadoop and structured datastores
such as relational databases and NoSQL. It also allows users to
import data from external datastores and enterprise data warehouses
into Hadoop Distributed File System (HDFS) or related systems like
Apache Hive and HBase, such is the flexibility.

“The Sqoop Project has demonstrated its maturity by graduating from
the Apache Incubator,” explained Arvind Prabhakar, Vice President
of Apache Sqoop. “With jobs transferring data on the order of
billions of rows, Sqoop is proving its value as a critical
component of production environments.”

As Hadoop has grown, so has the sheer data needing to be
transferred by enterprise. Accessing this data from
MapReduce applications running on large clusters is a challenging
task so this is where Sqoop comes in, providng
performance and best utilization of system and network resources.
In addition, Sqoop allows fast copying of data from external
systems to Hadoop to make data analysis more efficient and
mitigates the risk of excessive load to external systems.

“Connectivity to other databases and warehouses is a critical
component for the evolution of Hadoop as an enterprise solution,
and that’s where Sqoop plays a very important role” said Deepak
Reddy, Hadoop Manager at “We use Sqoop extensively to
store and exchange data between Hadoop and other warehouses like
Netezza. The power of Sqoop also comes in the ability to write
free-form queries against structured databases and pull that data
into Hadoop.”

“Sqoop has been an integral part of our production data pipeline”
said Bohan Chen, Director of the Hadoop Development and Operations
team at Apollo Group. “It provides a reliable and scalable way to
import data from relational databases and export the aggregation
results to relational databases.”

Since entering the Apache Incubator in June of last year, Sqoop was
quickly embraced as the ideal choice for a SQL-to-Hadoop data
transfer solution. Sqoop also connects to other systems such as
MySQL, PostgreSQL, Oracle, SQL Server and DB2, and allows for the
development of drop-in connectors that provide high speed
connectivity with specialized systems like enterprise data

With this nod from the Apache Foundation, it looks like the road
is set for Apache Sqoop to become a huge part of the Hadoop

comments powered by Disqus