Why SimpleGeo and others are jumping onto the NoSQL bandwagon.
Id like to ask a few questions to the people who honestly think RDBMSs can compete with NoSQL solutions at large scale. – Joe Stump.
Joe Stump, CTO and co-founder of SimpleGeo and former Lead
Architect for Digg, has posted a blog defending a number of high-profile
organisations who have recently switched from MySQL to NoSQL
He cites the parentage of the NoSQL databases Cassandra,
BigTable and Dynamo as evidence that, in some instances,
NoSQL databases really are necessary. Cassandra was created by
Facebook, BigTable by Google and Dynamo by Amazon. He questions
whether these organisations would have gone to the effort of
creating and implementing these tools, if continuing with a RDBMS
was a viable option.
Stump then moves onto what he perceives as MySQL’s shortcomings.
The first of these, is the sticky matter of highly heterogeneous
datasets with varying indexes. Digg has a large number of these
tables, which causes problems with disk input/output as the indexes
for different tables are stored on different parts of the disks.
This can result in concurrent reads/writes. However, he
acknowledges there are workarounds:
“I know that people have found ways around this, such as
37Signals systems guy putting 15 x 15k RPM drives on his DB server.
Assuming $500 a disk (15k disks range from $300 to $800 on Newegg)
that’s $7,500 just for disks.” Joe Stump.
Which leads us to another drawback of MySQL for Stump: price. In
order to get RDBMSs to scale up to a few thousand reads/writes per
second, he argues you need expensive MS SQL servers with SSD
drives, and few startups can afford to spend so much money on a
single server. Stump then factors in the theoretical cost of the
data centres needed to support RDBMSs, and once again reaches a
The difficulty of adding a new server to a RDBMS cluster is
another feature that comes under fire in Stump’s blog post. “If we
identify a hot spot in our Cassandra cluster, we can have a new
node bootstrapped into our cluster in about five minutes,” he
claims, with a side-nod to NoSQL’s ability to automatically
rebalance the entire cluster when a new node is bootstrapped into
However, he concludes that much of NoSQL’s appeal doesn’t lie in
its additional features, but in the lower costs of NoSQL solutions.
Stump isn’t shy to toss exact figures out there, revealing that
he’s running a 50 node cluster, across three data centres, on
Amazon’s EC2 service for around $10,000 a month.
“I’m happy to put my $/write, $/read, and $/GB numbers for my
NoSQL setup against anyone’s RDBMS numbers,” he says.