MapR release HBase-heavy Hadoop distro, M7
Every Hadoop vendor wants to be different. MapR opt for NoSQL approach to their special Hadoop sauce
Not wanting Cloudera to steal the limelight with Impala, MapR have unleashed their latest distribution, with yet another take on how to best to push Hadoop on.
MapR were one of the first Hadoop vendors on the scene, making their name with on their advanced flavour of Hadoop. But now it appears offering non-relational capabilities is at the centre of MapR’s minds with the crosshairs firmly fixed on HBase, the NoSQL distributed database.
The company first outlined plans to speed up the HBase layer of Hadoop back in October in their whitepaper for the M7 distribution. With research telling them that 45% of Hadoop shops [PDF] are using HBase in production, it seems like a natural choice for MapR to fine-tune the NoSQL layer.
The top-end M7 edition aligns HBase with MapR’s closed source version of the Hadoop filesystem, so they share a single data layer. M7 splits up HBase database tables and stores them within the MapR filesystem, giving a huge boost in performance. The company boldly boast that M7 delivers a rapid “one million operations/sec with a ten node cluster”, supports one trillion tables in one cluster and “ensures 99.999% availability for HBase and Hadoop applications”.
M7 is the top-end distribution offered by MapR, and is primarily targeted at heavy NoSQL enterprise users. M3 is the free community version of MapR’s Hadoop stack, while M5 is the half-way house which opens up extra features such mirroring and snapshotting within the company’s filesystem, with additional tech support.
The competition is heating up as vendors look for new ways to distinguish themselves from the pack. Parallel to rolling out the release, MapR have announced the inclusion of a search engine within their distribution, in conjunction with Lucidworks, the company behind Apache Lucene and Solr. Currently in private beta, users can index and search standard files without needing to perform conversion or transformation, as well as clone and snapshot files within the filesystem.
There’s no room for Apache Drill, the Google Dremel-mimicking system for analysing large datasets, within M7 just yet. The project led by MapR is still under heavy development in the Apache Incubator.