Sting in the tail?
Hortonworks announce Stinger to solve Hadoop’s real-time headache
The race to make Hadoop faster in the enterprise world has
heated up. In October, Cloudera unveiled
real-time query engine Impala, whilst MapR put
their weight behind
Apache Drill, a real-time analytics project.
Now it’s Hortonworks’ turn to show their hand, but curiously, they’ve opted to revitalise a part of the Hadoop furniture, rather than offer something new. Alan Gates, co-founder of Hortonworks, today revealed details behind the Stinger Initiative, a plan to make Apache Hive “up to 100 times faster” within their product.
There are several methods behind Hortonworks’ strategy to boost Hadoop’s data warehousing project. The first is to tune Hive to focus more on SQL-like queries, or as Gates says, make it “a more suitable tool for the decision support queries people want to perform on Hadoop”. Separately, Gates said changes within Hive’s execution engine will drop querying time to enable the tool to “answer human-time use cases.”
Aside from heavy tinkering to Hive’s existing structure, Hortonworks have also announced Tez, a latency-reducing runtime framework that processes “complex” data tasks. Appearing as a proposal in the Apache Incubator yesterday, Tez also works natively with YARN, the MapReduce overhaul set to be the centrepiece of Hadoop 2.0.
Gates also explained that the introduction of a new columnar file format, called ORCFile, within the community would modernise Hive and make it more efficient at storing data. The company realise collaboration with heavy Hadoop enterprise users, like Facebook, is key to see the format gain adoption throughout the community.
“At Hortonworks, we believe in the power of the open source community to innovate faster than any proprietary offering,” explained Gates, adding that the “initiative is proof of this once again as we collaborate with others to improve Hive performance.
A full preview of the Stinger Initiative is expected at Hadoop Summit Amsterdam in March. Hortonworks still believe that the answer to Hadoop’s real-time problem lies in the projects that are already established, rather than introducing new blood to a packed ecosystem. If it’s possible to renovate and improve the tools already present in the enterprise environment, rather than thrusting new ones onto developers, then Hortonworks’ logic might ultimately reap the rewards.
Image courtesy of Svadilfari