Spring developers get Hadoop integration

SpringSource embraces Big Data with Spring Hadoop 1.0.0 M1

With the whole Java ecosystem going 'gaga' about the possibilities of Hadoop, one dominant community had yet to dip their toe in the water when it came to storing huge amounts of unstructured data. That is until now.

Costin Leau made the announcement on the SpringSource blog that the first milestone for Hadoop Spring 1.0.0, an offshoot of the Spring Data umbrella project, had arrived, demonstrating some fairly advanced capabilities for a first milestone.

This is likely down to Spring awaiting Hadoop to reach its first stable version before joining the rest of the Java ecosystem. A wise decision, as Hadoop has finally signalled its intention to be the leader of this Big Data revolution that we are in the midst of. Hortonworks CEO Rob Bearden recently said that the Apache Software Foundation can be bigger than JBoss and Springsource as well as MySQL combined.

Spring Hadoop is pretty flexible as it's able to deal with everything from simple stand-alone vanilla MapReduce jobs to more complex things like interacting with data from multiple enterprise data stores or coordinating a complex workflow of HDFS, Pig, or Hive jobs. Leau says that 'Spring Hadoop stays true to the Spring philosophy offering a simplified programming model and addresses "accidental complexity" caused by the infrastructure.'

The Hello world for Hadoop is the word count example – a simple use-case that exposes the base Hadoop capabilities, which you can see below.

<!-- configure Hadoop FS/job tracker using defaults -->
<hdp:configuration />

<!-- define the job -->
<hdp:job id="word-count"
  input-path="/input/" output-path="/ouput/"
  mapper="org.apache.hadoop.examples.WordCount.TokenizerMapper"
  reducer="org.apache.hadoop.examples.WordCount.IntSumReducer"/>

<!-- execute the job -->
<bean id="runner" class="org.springframework.data.hadoop.mapreduce.JobRunner"
                  p:jobs-ref="word-count"/>

Leau adds that  'Spring Hadoop does not require one to rewrite your MapReduce job in Java, you can use non-Java streaming jobs seamlessly: they are just objects (or as Spring calls them beans) that are created, configured, wired and managed just like any other by the framework in a consistent, coherence manner [sic].' This means the developer can mix and match according to their preference and requirements without having to worry about integration issues.

Spring Hadoop is also unique in the way it handles HDFS support, supporting a variety of JVM languages such as Groovy, JRuby, Jython, and Rhino.

For further introduction to Spring Hadoop and all the tricks in its arsenal, such as handling MapReduceHivePig, and Cascading jobs, check out the announcement. There's even a promise to detail how Spring Batch integration provides tasklets for various Hadoop interactions and the use of Spring Integration for event triggering. Coordination with other Spring projects is of course vital for Spring Hadoop to achieve liftoff.

They may have taken their time before showing off their first milestone but boy, is it worth it.

Chris Mayer

What do you think?

JAX Magazine - 2014 - 05 Exclucively for iPad users JAX Magazine on Android

Comments

Latest opinions