Apache Cassandra leads the way in latest NoSQL benchmark
It’s the battle of the databases – Apache Cassandra comes out on top in both throughput and latency tests. Other popular NoSQL names like HBase, Couchbase and MongoDB all faltered somewhere along the line.
E-commerce consultancy firm End Point have published the results of a series of performance tests on NoSQL databases using the Yahoo! Cloud Servicing Benchmark (YCSB), with Apache Cassandra coming out on top over the likes of HBase, Couchbase and MongoDB.
The report shows that Apache Cassandra performed significantly better than Couchbase 3.0, MongoDB 3.0 (Wired Tiger included), and HBase 0.98 in throughput and latency.
Passing the test
The tests ran on Amazon Web Services EC2 instances using the i2.xlarge class of instances for the database nodes. Client nodes were the c3.xlarge class of instances to drive the test activity. Each test was performed three times on three different days, in order to minimise the effect of AWS CPU and I/O variability.
End Point focused the tests on workloads, data volumes, and conditions most commonly found in production environments. They scaled each database from 1 to 32 nodes for a variety of tests that included load, insert heavy, read intensive, analytic and other typical transactional workloads.
Databases began empty, with instances using Ubuntu 14.04 LTS AMI in HVM virtualisation mode, customised with Java 7 and the software required for each database. A script was used to drive the benchmark process, which included management of the start up, configuration and termination of instances, plus the commands required to run the tests.
New instances were used for each test, to reduce the impact of any “lame instance” or “noisy neighbour” effect on one test verses another.
All hail Cassandra
Jon Jensen, CTO of End Point, reported that Apache Cassandra was clearly the top performer throughout the study. Cassandra was the only database able to perform durable write operations, meaning results based on load process found that Couchbase, HBase, and MongoDB all had to be configured for non-durable writes to complete in a reasonable amount of time.
Couchbase had to be eliminated from mixed operational and analytical workload tests because it didn’t support scan operations, producing the error: “Range scan is not supported”.
With regards to latency, Cassandra once again outperformed the bunch, exhibiting the lowest and most consistent latency numbers for each test (with less being better). When the report turned to database idiosyncrasies and encountered difficulties, Cassandra wasn’t even mentioned, with the following listed for Cassie-competitors:
- Couchbase was difficult to compile from source
- Couchbase doesn’t actually support a scan operation
- Couchbase uses more memory per thread and database connection
- Due to storage of data in discrete chunks, marked with start and end key values, HBase and MongoDB made it difficult to evenly distribute data among their shards
- HBase encountered problems at higher node counts without any on-disk compression, which was disabled for all tests
- MongoDB seemed unable to efficiently handle range scans in a sharded configuration
Although Cassandra was found to be the best performing NoSQL database in both throughput and latency in the scale-out tests that were run, Jensen also said that anyone assessing a database’s performance should “test the engine under specific use case and deployment conditions intended for a particular production application”.
The above tests are just another notch to add to Cassandra’s growing belt, with Dice.com’s recent tech salary survey revealing Cassandra comes in at number 2 in the top 10 highest paying tech skills. Developers with Cassandra knowledge and experience can find themselves earning up to $128,646 USD per annum.