No big problem
Hadoop isn't ready for the elephant graveyard
Urs Hölzle, Google’s Senior Vice President of Technical Infrastructure, dealt some California shade to the Hadoop community at last week’s Google I/O conference in San Francisco with his declaration that the company was, like, so over the traditional Hadoop/MapReduce approach. Instead, the Chocolate Factory has concocted their own system - Google Cloud Dataflow - which Hölzle said, twisting the knife, was far more suited for dealing with the multi-petabyte scale data-loads Google now experiences.
Although Google sits towards the apex of the bell curve, across the board, enterprises are finding themselves dealing with influxes of data on an unprecedented level, and for many, Hadoop-flavoured tech, which was originally cooked up in the early 2000s, simply isn’t slicing it the way it used to. Whilst for many Google’s announcement was the final sign-off for the technology it pioneered, there remains still a sizable army who’d beg to differ. Among them, DataTorrent’s VP of Marketing, John Fanelli.
Coming from the purveyors of speedy analytics solution DataTorrent Real Time Streaming - which happens to be built on Hadoop 2.0 - you’d expect them to have a vested interest in defending the beleaguered pachyderm. And interestingly, in spite of Google moving away from the once inseparable Hadoop/MapReduce powerhouse pair, Fanelli thinks they do too.
This Monday, Google also made waves in the Hadoop-sphere with an $80 million investment in leading Hadoop distributior MapR, which Fanelli agrees is confirmation that, in spite of recent indications to the contrary, they remain “serious about Hadoop 2.0 and YARN.”
Introduced last year, for Fanelli, Hadoop 2.0 was a game changer for the tech, taking it from software which could only "process via MapReduce functionality, which is, by design, batch processing, to a solution which Hadoop can now process big data with multiple execution models, MapReduce being one of them when still in batch mode."
With this new functionality, DataTorrent were able to build RTS, which enables enterprises “to take action in real-time as a result of high-performance complex processing of data as it is created.” YARN, which provides the ability to process big data with multiple execution models, is also a key component, effectively enabling Hadoop 2.0 to become “a big-data operating system.” It’s thanks to YARN that a vibrant ecosystem of other new technologies have sprouted up besides DataTorrent, including Spark, Tez and Hbase.
Of course, given Google’s past as arch-Hadoop Svengali, there’s always the prospect that big data devs will follow past form and slowly trickle away from Hadoop. After all, it was a Google white paper that helped push the elephant to prominence in the first place. Fanelli is unruffled on this front, and tells JAX that, in his opinion, “In this particular case, DataTorrent RTS is ahead of the Google shift,” adding that, “Google Dataflow is helping to bring awareness to the fact that big data can be processed in real-time, not just batch.”
And as for the dark whispers emanating from certain quarters that Google’s new in-house solution, which DataTorrent claims can’t match their own tech, is a veiled attempt on the search giant’s part to snatch control of big data after it foolishly let the opportunity slip out of its claws a decade ago? (We know. Who could possibly cast such aspersions on the least evil company on the internet?) Fanelli is diplomatic, telling JAX that we will have to “ask Google what their plans are” to such “ominous” sounding questions. In his view, this is merely Google “responding to customer demand for real-time processing,” in very much the same way as DataTorrent.