Hadoop Evolved : Doug Cutting tells of the platforms origins and future
The second keynote of Tuesday was given by the Hadoop creator himself Doug Cutting, providing an excellent overview into Big Data and the project he founded.
The second keynote of JAX London Day One was provided by Big Data royalty – the creator of Lucene, Nutch and the ubiquitous Big Data platform Hadoop, Doug Cutting gave us a glimpse into the future for Hadoop, as well as charting its humble origins.
Big Data Con London’s banner keynote saw Cutting begin with the inception Hadoop, after seeing Google’s initial distributed File System and MapReduce in 2003 and 2004 respectively. Cutting explained from there on in, he (amongst others no doubt) “immediately saw the applicability”.
Yahoo! were the first to see the platform’s potential as the ‘kernel of the Big Data ecosystem’, and Cutting joined in 2006 to help make the automation of the laborious distributed process a reality. Ten engineers initially took on the challenge of maturing the technology, and scaling across 1000s of commodity servers. Yahoo!’s intentions of making it open source were a big draw for Cutting, who now acts as Apache Software Foundation Chairman, as well as working as Cloudera’s Chief Architect.
He added that “running on commodity hardware makes a big difference. Doing something at ten times the cost means it can run ten times further”.
The rest they say is history – Hadoop was adopted early by the likes of Facebook and Twitter, but Cutting believes the platform is robust and reliable enough to be used throughout the enterprise world– citing use cases within healthcare and financial sectors, amongst others.
Interestingly, Cutting admitted during the keynote that alone, the core storage and computational components of Hadoop weren’t groundbreaking, but their real power comes in unison provided the “real scalability”.
“MapReduce is the hammer, all data looks like nails” insisted Cutting before adding “it solves a wide reach of problems. It’s not ideal for all but its impressive in the number [of problems] it addresses.
He continued: “The interesting part is the reliability – that’s why people use this. That’s not easy to get, but once it’s there, the framework acts as a basis”.
It’s really quite astounding see how Hadoop has blossomed over the past few years. Cutting went on to detail the astounding number of Big Data projects that have spawned around the HDFS/MapReduce project.
Cutting highlighted the NoSQL project HBase as one Big Data project within the ecosystem that was deserving of its attention, due to its “complimentary” nature within the whole project. The first non-batch component of the BigTop distribution was even proclaimed by Cutting as the basis for future systems of that nature. Despite its slow pickup rate, Cutting proclaimed that HBase had still attained rapid adoption.
Finally, the Hadoop creator tackled the question we were most keen to hear – what is the Holy Grail of Big Data? What could provide the linear scaling and the global reliability that the business world craves? The answer according to Cutting could be within Google’s recent paper Spanner.
“It’s an impressive piece of work and gives me great optimism looking at the framework that it has legs”, said Cutting. Just like with Hadoop, Google appears to be leading the way for Big Data. We were informed that 26 authors took part in the Spanner publication over 5 years, showing their commitment to the cause. We may not be there yet, but given their track record, you’d be foolish to bet against Spanner and what it can do.
Stay tuned for the keynote session in the following weeks, as well as a brief interview with Doug himself.