No big problem

Hadoop isn’t ready for the elephant graveyard

Lucy Carey
elophant

Google might have ditched its old tried and tested formula, but DataTorrent’s John Fanelli has some choice words for Hadoop detractors.

Urs Hölzle, Google’s Senior Vice President of
Technical Infrastructure, dealt some California shade to the Hadoop
community at last week’s Google I/O conference in San Francisco
with his declaration that the company was, like,
so
over the  traditional
Hadoop/MapReduce approach
. Instead, the
Chocolate Factory has concocted their own system – Google Cloud
Dataflow – which Hölzle said, twisting the knife, was far more
suited for dealing with the multi-petabyte scale data-loads Google
now experiences.

Although Google sits towards the apex of the
bell curve, across the board, enterprises are finding themselves
dealing with influxes of data on an unprecedented level, and for
many, Hadoop-flavoured tech, which was originally cooked up in the
early 2000s, simply isn’t slicing it the way it used to. Whilst
 for many Google’s announcement was the final sign-off for the
technology it pioneered, there remains still a sizable army who’d
beg to differ. Among them, DataTorrent’s VP of Marketing, John
Fanelli.

Coming from the purveyors of  speedy
analytics solution DataTorrent Real Time Streaming – which happens
to be built on Hadoop 2.0 – you’d expect them to have a vested
interest in defending the beleaguered pachyderm. And interestingly,
in spite of Google moving away from the once inseparable
Hadoop/MapReduce powerhouse pair, Fanelli thinks they do
too.

This Monday, Google also made waves in the
Hadoop-sphere with an $80 million investment in leading Hadoop
distributior MapR,
which Fanelli agrees is confirmation that, in spite of recent
indications to the contrary, they remain “serious about Hadoop 2.0
and YARN.”

Introduced last year, for Fanelli, Hadoop 2.0
was a game changer for the tech, taking it from software which
could only “process via MapReduce functionality, which is, by
design, batch processing, to a solution which  Hadoop can now
process big data with multiple execution models, MapReduce being
one of them when still in batch mode.”

With this new functionality, DataTorrent were
able to build RTS, which enables enterprises “to take action in
real-time as a result of high-performance complex processing of
data as it is created.”  YARN, which provides the ability to
process big data with multiple execution models, is also a key
component, effectively enabling Hadoop 2.0 to become “a big-data
operating system.”  It’s thanks to YARN that a vibrant
ecosystem of other new technologies have sprouted up besides
DataTorrent, including  Spark, Tez and Hbase.

Of course, given Google’s past as arch-Hadoop
Svengali, there’s always the prospect that big data devs will
follow past form and slowly trickle away from Hadoop. After all, it
was a Google white paper that helped push the elephant to
prominence in the first place. Fanelli is unruffled on this front,
and tells JAX that, in his opinion, “In this particular case,
DataTorrent RTS is ahead of the Google shift,” adding that, “Google
Dataflow is helping to bring awareness to the fact that big data
can be processed in real-time, not just batch.”

And as for the dark whispers emanating from certain
quarters that Google’s new in-house solution, which DataTorrent
claims can’t match their own tech, is a veiled attempt on the
search giant’s part to snatch control of big data after it
foolishly let the opportunity slip out of its claws a decade ago?
(We
know. Who could possibly cast such
aspersions on the least evil company on the internet?) Fanelli is
diplomatic, telling JAX that we will have to “ask Google what their
plans are” to such “ominous” sounding questions. In his view, this
is merely Google “responding to customer demand for real-time
processing,”  in very much the same way as
DataTorrent.

Author
Comments
comments powered by Disqus