Hortonworks speed up Hive by 50x in first phase of Stinger Initiative
The latest version of Hortonworks main Hadoop data platform suggests company is on track for 100x faster SQL querying
Despite the competition opting to push shiny new SQL engines for Hadoop, Hortonworks chose to refurbish a piece of the established furniture, in the Stinger Initiative.
Yesterday, the company released the latest version of flagship product Hortonworks Data Platform 1.3, with a heavy focus on boosting the performance of the data warehouse Apache Hive, which is seen as the de-facto standard for SQL access in Hadoop.
The company claim that, with the help of the Apache Hive community, the latest distribution improves querying speed by up to 50x thanks to the addition of a new columnar format ORC File. It also broadens the range of SQL semantics offered in Hadoop.
While a 50x improvement shouldn’t be sniffed at, it’s only the start of Hortonworks’ Stinger plans. The restructuring of the project represents a large portion of the work towards the ambitious 100x improvement proclaimed back in February. With the first phase out of the way, the attention now turns solely towards Tez, a simplified data processing application framework that eliminates disk writes. Hortonworks see Tez as the gateway to YARN, the next generation MapReduce that is the cornerstone of Hadoop 2.0, which is scheduled to arrive this summer.
The third phase of the Stinger Initiative is to add a new vector query engine aimed at modern software architectures. However Hortonworks VP of Products Bob Page told GigaOM this week that there wasn’t yet a target date set for this part.
Time might be of the essence for Hortonworks though, as the enterprise need for speed ramps up. Rival Cloudera launched their interactive query engine, Impala, last month, offering a speedy alternative.
Hortonworks believe that the “vast majority of Hadoop deployments” use Hive for “proven and scalable SQL”, so sticking to what enterprises know should put them in good stead for when the Stinger Initiative fully comes to fruition. Yet should they dally, there’s nothing to say Cloudera and MapR (with Apache Drill) could swoop in with their own offerings.
Image courtesy of carolynconner