Out of the blocks
Cloudera’s real-time query engine Impala goes GA
Cloudera’s newest open source project, set to give a power boost to SQL query time in Hadoop, has hit its first major release, six months on from its first public airing.
Impala’s arrival back in October came after two years of intensive development, with claims of being the first real-time interactive query engine for analysing Hadoop clusters. Since its inauguration, Cloudera have been fine-tuning the project through testing, as well as picking up high-profile users such as 37signals and Expedia. Partners Splunk and Pentaho have already offered integration with Impala as part of their platforms.
One of Hadoop’s major enterprise sticking points is its relative sluggishness at processing data. The speed at which companies demand analysis currently outstrips the capability of Hadoop itself. Making it more appealing to SQL developers is an important step in that evolution.
The battle lines were drawn in the last few months of 2012, as other Hadoop vendors lifted the covers off projects each with their own method of going ‘beyond batch’. In August for example, MapR pushed the Google Dremel-inspired Drill to the Apache Incubator, said to provide more analytical method of processing. Only this year did Hortonworks show their hand, deciding to renovate what was already in Hadoop through the Stinger Initiative.
Curiously though, it is Cloudera who have taken the plunge first in making their “massively parallel processing” query engine generally available. Users can store data in the standard Hadoop filesystem (HDFS), the non-relational HBase, and the latest columnar format Parquet, which Cloudera launched with Twitter last month. All of this, impressively, can be shared and reused across workloads from one dataset, eliminating the need for migration.
With Cloudera laying down an early marker, will the competition be forced to speed up themselves?
Image courtesy of Ludovic