How NoSQL saved Draw Something
Draw Something's success aided by Couchbase - chat with CEO, Bob Wiederhold - Part 3
How does a developer cope with the pressures of scaling that much? Not to mention the pressure from the developer/user?
The social gaming business is a “hits” business. Every developer dreams of having a game that goes viral and gets into the “top 10”. Developers naturally focus most of their attention on the creative aspects of their game. But if they don’t also focus on its scalability all their hard work could be wasted if the game begins to go viral and falls over under heavy load. If app performance goes down due to scaling problems or the game just crashes, users quickly get frustrated and move on to other games – and your opportunity at hit game vanishes overnight. This nightmare is exactly what happened to EA’s Simpsons Tapped Out.
At the time the game goes viral though, the pressure to keep the game up and running generally falls on the operations people or Devops. There is enormous pressure to make sure the game stays up with high performance 24x7 without a second of downtime. There is no room for error so every precaution is taken to make sure there isn’t a problem. During critical periods of growth when you are trying to characterize the database workload of the game, the database is monitored 24 hours a day.
As you suggest there is huge pressure and very little sleep for the first few weeks after a game goes viral. After that things settle in, the database workload is well characterized, the ongoing growth of the game becomes more predictable, the pressure subsides, and a good night's sleep returns.
What sort of issues occurred with Draw Something? Anything particularly rare?
There was nothing rare. We worked through a few database configuration issues early on and we helped OMGPOP move to much beefier servers. In the first couple of weeks, there were an unusual number of servers that failed and had to failed over, there were backup strategies that needed to be changed due to the rapid growth of data, etc.
For a slow growing app you have plenty of time to deal with all these issues and it’s not a problem. For a game that goes viral you are forced to deal with these issues in a very compressed period of time and the stakes are so high associated with getting things right.
How many Couchbases were they employing? And how many were killed in the process?
The infographic shows how OMGPOP’s database cluster grew over time. When they contacted us they had 6 servers in their cluster. It’s now well over 100 servers. I don’t know how many servers had to be failed over but I believe it was over a dozen.