The road to YARN

Hortonworks speed up Hive by 50x in first phase of Stinger Initiative

Chris Mayer

The latest version of Hortonworks’ main Hadoop data platform suggests company is on track for 100x faster SQL querying

Despite the competition opting to push shiny new
SQL engines for Hadoop, Hortonworks chose to refurbish a piece of
the established furniture, in the Stinger Initiative.

Yesterday, the company released the latest
version of
flagship product Hortonworks Data Platform
with a heavy focus on boosting the
performance of the data warehouse
Apache Hive, which
is seen as the de-facto standard for SQL access in

The company claim that, with the
help of the Apache Hive community, the latest
distribution improves querying speed by up to 50x thanks to the
addition of a new columnar format ORC File. It also broadens the
range of SQL semantics offered in Hadoop.

While a 50x improvement shouldn’t be sniffed at,
it’s only the start of Hortonworks’ Stinger plans. The
restructuring of the project represents a large portion of the work
towards the ambitious 
improvement proclaimed back in February
With the first phase out of the way, the attention now turns
solely towards Tez, a simplified data processing application
framework that eliminates disk writes. Hortonworks see Tez as the
gateway to

, the next generation MapReduce that is
the cornerstone of Hadoop 2.0, which is scheduled to arrive

this summer

The third phase of
the Stinger Initiative is to add a new vector query engine aimed at
modern software architectures. However Hortonworks VP of

Bob Page told GigaOM
week that there wasn’t yet a target date set for this

Time might be of the essence for Hortonworks
though, as the enterprise need for speed ramps up. Rival Cloudera
launched their interactive query engine,

, last month, offering a speedy

Hortonworks believe that the “vast majority of Hadoop
deployments” use Hive for “proven and scalable SQL”, so sticking to
what enterprises know should put them in good stead for when the
Stinger Initiative fully comes to fruition. Yet should they dally,
there’s nothing to say Cloudera and MapR (with Apache Drill) could
swoop in with their own offerings.

Image courtesy of carolynconner

comments powered by Disqus