Nifty - but is it performant?

Google puts HDFS-less Hadoop in its cloud

Elliot Bentley
hadoop-google-cloud1

Don’t worry about the file system for your precious Big Data, says Google – just keep it in our Cloud Storage.

The
Hadoop Distributed File System is so 2003 – at least according to
Google, which is testing a connector that allows Hadoop to gobble
data straight off its Cloud Storage platform.

Announced in a
blog post
, the ‘Google Cloud Storage connector for Hadoop’
plugs Compute Engine-hosted instances of Hadoop directly into its
equivalent of Amazon’s S3, Google Cloud Storage.

This, says Product Manager Jonathan Bingham, will provide all
the advantages of Cloud Storage – high availability,
interoperability and accessibility – without the need to move data
to a specialised Hadoop Distributed File System (HDFS).

A crucial aspect of Hadoop, the HDFS requires time-consuming
routine maintenance, whereas Cloud Storage “just works”, says
Bingham. Data persists even when Hadoop instances are shut down and
costs are theoretically lower. The only catch, it seems, may be in
performance, which Bingham claims is “comparable” to HDFS.

Though Hadoop was developed  and open-sourced by
Yahoo, it was heavily inspired by 2003 papers from Google
describing its own MapReduce and Google File System. The latest
version of Google File System, named Colossus,
underpins not only Google’s internal products, but the commercial
Cloud Storage service, too.

Google is not the only company replacing the
traditional HDFS: Red Hat last year open sourced
a Hadoop plug-in
for Red Hat Storage Server (formerly called
Gluster). However, in Google’s case this alternative is less about
performance and more about sheer convenience.

Author
Comments
comments powered by Disqus