JAX London 2014: A retrospective
Getting up to speed

Hadoop Hive-as-a-service Qubole notches up $7m funding

ChrisMayer
Beehive2

Created by those behind Apache Hive and Facebook’s analytics platform, is Qubole more than just another Big-Data-as-a-service startup?

The slew of Big Data-as-a-service startups, or the money being pumped into them, shows no sign of letting up, with Hadoop churner Qubole raising $7m in their opening round of funding.

Qubole is aimed at predominantly at data scientists and ETL engineers who want little fuss when it comes analysing data pipelines.

Its founders, Ashish Thusoo and Joydeep Sen Sarma, are responsible for Hadoop’s data warehousing system Hive and its querying language, which they co-authored while working on Facebook’s analytics platform.

Naturally, the project is at the heart of the Qubole Data Service, and could be considered a cloud version of Hive itself. Qubole processes unstructured data and lets users run quick Hive jobs within Amazon Web Services. The platform calls upon a number of analytics tools like R, SQL sources such as MySQL, as well as NoSQL databases like MongoDB, before pushing it to typical business intelligence applications.

Since coming out of beta in December, Qubole has processed around half a petabyte of data from clients. Thusoo believes this quick milestone “demonstrates Qubole’s growth and viability”.

“We are very excited to raise the bar again as we continue to innovate on behalf of our users,” he said in a press release.

The duo’s Hadoop heritage has helped them optimise the framework, with claims that their platform runs Hive queries and Hadoop jobs five times faster than Amazon Elastic MapReduce does.

The hookup to Amazon Web Services is undoubtedly Qubole’s strongest selling point, with a ready-made customer base at their disposal, who won’t to learn something new when keeping tabs on their cluster.

But is Qubole just another name in an already competitive field? Data analysis platform Hadapt for example, offers a similar concept and was launched in 2011. Hive itself has been around even longer and beginning to show its age. 

Leading Hadoop vendors are moving on from Hive, or are opting to renovate it in their distributions. Cloudera have Impala, MapR have Drill and Hortonworks have the Stinger Initiative, which is promising to make Hive 100 times quicker with a new processing framework called Tez. It seems in this world, you have to adapt to survive – can Qubole do the same in the long run?

Author
Comments
comments powered by Disqus