Spring for Apache Hadoop hits first major release
SpringSource release their Hadoop-helping project for Spring users, which aims to streamline the heavy lifting process for newcomers.
Almost a year to the day since its
opening milestone, Spring for Apache Hadoop
has reached its first major release, melding together the data
processing technology with the established Java EE
Hadoop might be written in Java, but it’s evolved since its inception to include different components beyond HDFS and MapReduce alone. With so many offshoot projects like Hive, Pig and HBase performing key jobs in the guts of Hadoop, the SpringSource team have spent much of the last year making it easier for Spring users to get grips with Hadoop’s many limbs, said SpringSource’s Costin Leau. This has been done by applying the familiarity of Spring’s Template API design pattern, to create helper classes.
The team also recognise that Hadoop applications can become unwieldy if you add too many components to the mix. Leau says that the team want to encourage a “start small and grow” approach through the introduction of “various runner classes” for Hadoop’s supporting cast (Hive, Pig, Cascading).
Integration with existing Spring projects aims to streamline the heavy lifting process for newcomers. The use of Spring Integration alongside Spring Hadoop allows developers to filter event streams before they go into HDFS or into another NoSQL store. Users can also upgrade to Spring Batch, a REST API which controls the processing side of Hadoop.
With Spring’s roots firmly in the enterprise, portability is another key design goal of the project. Leau says that they are testing against some of the major Hadoop distributions (including Cloudera CDH3 and CDH4, Greenplum HD) as well as the vanilla codebase to ensure reliability. With Hadoop 2.0 still in the pipeline, there’s no immediate plans to support the next major version, but Leau says the team will keep a “close eye” on development.
The release has come at just the right time – with Hadoop maturing and cementing itself as an enterprise technology of choice, SpringSource are in a prime position to make Spring a key component for big data developers’ stacks. Now the challenge is to garner a strong community behind it.