Hadoop Hive-as-a-service Qubole notches up $7m funding
Created by those behind Apache Hive and Facebooks analytics platform, is Qubole more than just another Big-Data-as-a-service startup?
The slew of Big Data-as-a-service startups, or the money being
pumped into them, shows no sign of letting up, with Hadoop churner
Qubole raising $7m in their
opening round of funding.
Qubole is aimed at predominantly at data
scientists and ETL engineers who want little fuss when it comes
analysing data pipelines.
Its founders, Ashish Thusoo and Joydeep Sen
Sarma, are responsible for Hadoop’s data warehousing system Hive
and its querying language, which they co-authored while working on
Facebook’s analytics platform.
Naturally, the project is at the heart of the
Qubole Data Service, and could be considered a cloud version of
Hive itself. Qubole processes unstructured data and
lets users run quick Hive jobs within
Amazon Web Services. The platform calls upon a number of analytics
tools like R, SQL sources such as MySQL, as well as NoSQL databases
like MongoDB, before pushing it to typical business intelligence
Since coming out of beta in December, Qubole has
processed around half a petabyte of data from clients. Thusoo
believes this quick milestone “demonstrates Qubole’s growth and
“We are very excited to raise the bar again as
we continue to innovate on behalf of our users,” he said in a press
The duo’s Hadoop heritage has helped them
optimise the framework, with claims that their platform runs Hive
queries and Hadoop jobs five times faster than Amazon Elastic
The hookup to Amazon Web Services is undoubtedly
Qubole’s strongest selling point, with a ready-made customer base
at their disposal, who won’t to learn something new when keeping
tabs on their cluster.
But is Qubole just another name in an already
competitive field? Data analysis platform Hadapt for
example, offers a similar concept and was
launched in 2011. Hive itself has been
around even longer and beginning to show its age.
Leading Hadoop vendors are moving on from Hive, or are opting to
renovate it in their distributions. Cloudera have
Impala, MapR have
Drill and Hortonworks have the
Stinger Initiative, which is promising to make Hive 100 times
quicker with a new processing framework called Tez. It seems in
this world, you have to adapt to survive – can Qubole do the same
in the long run?