How big is your data?
MongoDB mocked after posting “100GB Scaling Checklist”
NoSQL sceptics have had a field day with a blog post from MongoDB (the company previously known as 10gen) warning of the difficulties of scaling the document-oriented database up to a measly 100GB.
Despite being pulled – and later reinstated with a new introduction – the blog post has been held up as undermining MongoDB’s promises of being able to handle “big data”.
It started with the best of intentions: To tie in with a recorded webinar by Christ Winslet of MongoHQ (an independent company to MongoDB), the company posted a checklist for users of the NoSQL database with moderately large quantities of data on their blog.
“Surpassing 100GB of data in your application requires you to have in-depth knowledge of how to operate and run MongoDB,” read the opening sentence. “MongoHQ recommends going through the 100GB Scaling Checklist as you grow.” In slides for the accompanying webinar, Winslet wrote that “100GB is relatively big data”.
This was leapt upon by critics of the database, who saw it as an admission that it’s far from suited to dealing with genuinely large datasets. Gwen Shapira, a DBA at Cloudera, quipped on twitter: “#mongoDB: the big data platform that is challenging to scale over 100GB”.
For reasons unclear to JAXenter, the original blog post (preserved by Google’s cache) was taken down, only to be reinstated several hours later. This new blog post came with a revised introduction, emphasising that “most systems” require specialized knowledge when scaling to large sizes, and a new title that omitted the 100GB figure. The checklist itself remained the same.
That didn’t stop MongoDB’s critics from piling on, however. Markus Winand used it as a key example in his own blog post, titled “MongoDB is to NoSQL like MySQL to SQL — in the most harmful way”, which reached the front page of Hacker News. The fact that MongoDB’s creators felt the need to publish the guide, he wrote, “gives me the impression that scaling MongoDB to that size is a serious issue”.
Whether this is true or not is a matter of great contention – after all, big data is about more than just sheer volume. But if this event proves anything, it’s that even one of the most popular of NoSQL databases has yet to gain developers’ trust. And that, given the opportunity, internet commentators will tear into any slip-up.