Spring for Apache Hadoop hits first major release
SpringSource release their Hadoop-helping project for Spring users, which aims to streamline the heavy lifting process for newcomers.
Almost a year to the day since its
opening milestone, Spring for Apache Hadoop
has reached its first major release, melding together the data
processing technology with the established Java EE
Hadoop might be written in Java, but it’s evolved since its
inception to include different components beyond HDFS and MapReduce
alone. With so many offshoot projects like Hive, Pig and HBase
performing key jobs in the guts of Hadoop, the
SpringSource team have spent much of the last
year making it easier for Spring users to get grips with
Hadoop’s many limbs, said
SpringSource’s Costin Leau. This has
been done by applying the familiarity of
Spring’s Template API design pattern, to
create helper classes.
The team also recognise that Hadoop applications can become
unwieldy if you add too many components to the mix. Leau says that
the team want to encourage a “start small and grow” approach
through the introduction of “various runner classes” for Hadoop’s
supporting cast (Hive, Pig, Cascading).
Integration with existing Spring projects aims to streamline
the heavy lifting process for newcomers. The use of Spring
Integration alongside Spring Hadoop allows developers to filter
event streams before they go into HDFS or into another NoSQL store.
Users can also upgrade to Spring Batch, a REST API which controls
the processing side of Hadoop.
With Spring’s roots firmly in the enterprise, portability is
another key design goal of the project. Leau says that they
are testing against
some of the major Hadoop
distributions (including Cloudera CDH3 and CDH4,
Greenplum HD) as well as the vanilla codebase to ensure
reliability. With Hadoop 2.0 still in the pipeline, there’s no
immediate plans to support the next major version, but Leau says
the team will keep a “close eye” on development.
The release has come at just the right time – with Hadoop
maturing and cementing itself as an enterprise technology of
choice, SpringSource are in a prime position to make Spring a key
component for big data developers’ stacks. Now the challenge is to
garner a strong community behind it.