Nifty - but is it performant?
Google puts HDFS-less Hadoop in its cloud
The Hadoop Distributed File System is so 2003 – at least according to Google, which is testing a connector that allows Hadoop to gobble data straight off its Cloud Storage platform.
Announced in a blog post, the ‘Google Cloud Storage connector for Hadoop’ plugs Compute Engine-hosted instances of Hadoop directly into its equivalent of Amazon’s S3, Google Cloud Storage.
This, says Product Manager Jonathan Bingham, will provide all the advantages of Cloud Storage – high availability, interoperability and accessibility – without the need to move data to a specialised Hadoop Distributed File System (HDFS).
A crucial aspect of Hadoop, the HDFS requires time-consuming routine maintenance, whereas Cloud Storage “just works”, says Bingham. Data persists even when Hadoop instances are shut down and costs are theoretically lower. The only catch, it seems, may be in performance, which Bingham claims is “comparable” to HDFS.
Though Hadoop was developed and open-sourced by Yahoo, it was heavily inspired by 2003 papers from Google describing its own MapReduce and Google File System. The latest version of Google File System, named Colossus, underpins not only Google’s internal products, but the commercial Cloud Storage service, too.
Google is not the only company replacing the traditional HDFS: Red Hat last year open sourced a Hadoop plug-in for Red Hat Storage Server (formerly called Gluster). However, in Google’s case this alternative is less about performance and more about sheer convenience.