Why SimpleGeo and others are jumping onto the NoSQL bandwagon.
Id like to ask a few questions to the people who honestly think RDBMSs can compete with NoSQL solutions at large scale. – Joe Stump.
Joe Stump, CTO and co-founder of SimpleGeo and former Lead Architect for Digg, has posted a blog defending a number of high-profile organisations who have recently switched from MySQL to NoSQL storage systems.
He cites the parentage of the NoSQL databases Cassandra, BigTable and Dynamo as evidence that, in some instances, NoSQL databases really are necessary. Cassandra was created by Facebook, BigTable by Google and Dynamo by Amazon. He questions whether these organisations would have gone to the effort of creating and implementing these tools, if continuing with a RDBMS was a viable option.
Stump then moves onto what he perceives as MySQL’s shortcomings. The first of these, is the sticky matter of highly heterogeneous datasets with varying indexes. Digg has a large number of these tables, which causes problems with disk input/output as the indexes for different tables are stored on different parts of the disks. This can result in concurrent reads/writes. However, he acknowledges there are workarounds:
“I know that people have found ways around this, such as 37Signals systems guy putting 15 x 15k RPM drives on his DB server. Assuming $500 a disk (15k disks range from $300 to $800 on Newegg) that’s $7,500 just for disks.” Joe Stump.
Which leads us to another drawback of MySQL for Stump: price. In order to get RDBMSs to scale up to a few thousand reads/writes per second, he argues you need expensive MS SQL servers with SSD drives, and few startups can afford to spend so much money on a single server. Stump then factors in the theoretical cost of the data centres needed to support RDBMSs, and once again reaches a hefty sum.
The difficulty of adding a new server to a RDBMS cluster is another feature that comes under fire in Stump’s blog post. “If we identify a hot spot in our Cassandra cluster, we can have a new node bootstrapped into our cluster in about five minutes,” he claims, with a side-nod to NoSQL’s ability to automatically rebalance the entire cluster when a new node is bootstrapped into it.
However, he concludes that much of NoSQL’s appeal doesn’t lie in its additional features, but in the lower costs of NoSQL solutions. Stump isn’t shy to toss exact figures out there, revealing that he’s running a 50 node cluster, across three data centres, on Amazon’s EC2 service for around $10,000 a month.
“I’m happy to put my $/write, $/read, and $/GB numbers for my NoSQL setup against anyone’s RDBMS numbers,” he says.