JAX Magazine: The free PDF magazine powered by JAXenter!
It has been dubbed the Swiss Army knife of the 21st century’ whilst scooping technology awards left, right and centre. Simply put, the technology to watch in 2012 is Hadoop. After 6 years of gestation, the project released their first stable version, Hadoop 1.0, in December 2011. It was a signal to some of the late-comers to the next-generation data storage and processing revolution that Hadoop is here. And here for the long run it seems, as the epicentre of this decade’s data processing explosion, just like SQL was 30 years earlier. For those of you who have somehow avoided Hadoop, it is essentially a Java-based framework split into two core parts - HDFS (the data storage system) and MapReduce (which processes it). But Hadoop has become so much more in the eight years since Doug Cutting was inspired to create his open source implementation of the MapReduce framework. Several sub-projects, such as Pig and Hive, have sprung up around Hadoop, creating several integral parts to a standard Hadoop stack, each becoming a part in the Big Data jigsaw. The project has no doubt been helped by achieving Apache Top Level status back in 2008, giving it huge community backing. Consequently, large vendors have been allured by Hadoop’s charm - low cost and massive scalability being the two main draws to the yellow elephant as the solution for their petabytes of data. Nearly every mega-corporation wants a Hadoop distribution tailored to their needs. Whether it’s a social networking giant like Facebook or Twitter, or a huge entertainment-based company like Netflix or dating site eHarmony, practically every enterprise will have problems dealing with reams and reams of data, while maintaining it and avoiding moments of instability. They are all looking to Hadoop to solve that problem and are backing it to become a Big Data mammoth. Palo Alto-based Cloudera was the first to test the enterprise waters back in 2009, with their CDH Hadoop distribution and since then many off-shoot companies have appeared, each offering their own flavour of Hadoop. Last year saw the Hadoop boom, with performance-focused MapR and Yahoo spin-off Hortonworks appearing with their twist on things. We have an exclusive interview with Hortonworks founder and also Hadoop PMC Chair Arun Murthy, about the latest stable version, Hadoop 1.0 and the entire ecosystem. What businesses really want though is to be able to effectively analyse these vast amounts of data. The arrival of Hadoop analytics vendors like Hadapt is really the launchpad for Hadoop being accepted by the masses. This issue will also see Hadoop in action. Sujee Maniyam will show us how MapReduce can be used to measure the effectiveness of an advertising campaign whilst Isabel Drost delves into various Aspects of the Hadoop ecosystem and how to apply Hadoop for your own needs. Josh Wills, Director of Data Science at Cloudera also provides an excellent look at how Hadoop can be used to analyse adverse drug events. There’s no better way of learning about Hadoop by seeing first hand what it can do. Finally we look at the inevitable combination of Hadoop and the NoSQL Couchbase server - Matt Ingenthron shows us how the Sqoop connector can unite the duo to perform some deep analytics. The biggest indication of Hadoop hitting the mainstream was the fact that five major vendors all made positive noises about the project. All have teamed up with Hadoop companies to create their own distributions and even the previously hesistent Microsoft backed Hadoop, leaving Dryad by the wayside. With Oracle joining forces with Cloudera to release their ‘Big Data Appliance’ last month, jumping first with an aggressive pricing policy, the competition is set to get fierce. But one thing is clear - those rivals will all filter back into the open source implementation, strengthening its core. Hadoop is at the top table and it is here to stay. This is the year of the elephant. Enjoy the issue.