Hadoop Evolved : Doug Cutting tells of the platforms origins and future
The second keynote of Tuesday was given by the Hadoop creator himself Doug Cutting, providing an excellent overview into Big Data and the project he founded.
The second keynote of JAX London Day One was provided by Big
Data royalty – the creator of Lucene, Nutch and the ubiquitous Big
Data platform Hadoop, Doug Cutting gave us a glimpse into the
future for Hadoop, as well as charting its humble origins.
Big Data Con London’s banner keynote saw Cutting begin with the
inception Hadoop, after seeing Google’s initial distributed File
System and MapReduce in
2003 and 2004 respectively. Cutting explained from there on in, he
(amongst others no doubt) “immediately saw the applicability”.
Yahoo! were the first to see the platform’s potential as the
‘kernel of the Big Data ecosystem’, and Cutting joined in 2006 to
help make the automation of the laborious distributed process a
reality. Ten engineers initially took on the challenge of maturing
the technology, and scaling across 1000s of commodity servers.
Yahoo!’s intentions of making it open source were a big draw for
Cutting, who now acts as Apache Software Foundation Chairman, as
well as working as Cloudera’s Chief Architect.
He added that “running on commodity hardware makes a big
difference. Doing something at ten times the cost means it can run
ten times further”.
The rest they say is history – Hadoop was adopted early by the
likes of Facebook and Twitter, but Cutting believes the platform is
robust and reliable enough to be used throughout the enterprise
world– citing use cases within healthcare and financial sectors,
Interestingly, Cutting admitted during the keynote that alone,
the core storage and computational components of Hadoop weren’t
groundbreaking, but their real power comes in unison provided the
“MapReduce is the hammer, all data looks like nails” insisted
Cutting before adding “it solves a wide reach of problems. It’s not
ideal for all but its impressive in the number [of problems] it
He continued: “The interesting part is the reliability – that’s
why people use this. That’s not easy to get, but once it’s there,
the framework acts as a basis”.
It’s really quite astounding see how Hadoop has blossomed over
the past few years. Cutting went on to detail the astounding number
of Big Data projects that have spawned around the HDFS/MapReduce
Cutting highlighted the NoSQL project HBase as one Big Data
project within the ecosystem that was deserving of its attention,
due to its “complimentary” nature within the whole project. The
first non-batch component of the BigTop distribution was even
proclaimed by Cutting as the basis for future systems of that
nature. Despite its slow pickup rate, Cutting proclaimed that HBase
had still attained rapid adoption.
Finally, the Hadoop creator tackled the question we were most
keen to hear – what is the Holy Grail of Big Data? What could
provide the linear scaling and the global reliability that the
business world craves? The answer according to Cutting could be
within Google’s recent paper Spanner.
“It’s an impressive piece of work and gives me great optimism
looking at the framework that it has legs”, said Cutting. Just like
with Hadoop, Google appears to be leading the way for Big Data. We
were informed that 26 authors took part in the Spanner publication
over 5 years, showing their commitment to the cause. We may not be
there yet, but given their track record, you’d be foolish to bet
against Spanner and what it can do.
Stay tuned for the keynote session in the following weeks, as
well as a brief interview with Doug himself.
— Bruce Durling (@otfrom) October 16,
— JAXenter.com (@JAXenterCOM) October