Embracing the Big Data revolution
Netflix pushing the NoSQL agenda with Cassandra tools Astyanax and Priam
We've already touted this year as the year when enterprises finally embrace Big Data and NoSQL solutions for managing reams and reams of their data. As Hadoop and Cassandra both matured at the back end of last year, many large corporations have put their weight behind the NoSQL revolution and many appear to be giving a little back to the community, in order to further the Apache projects
One of the world's biggest video streaming hubs, Netflix have had to tackle the problem of coping with a rapidly increasing user base and expanding catalogue and they've been pretty vocal about transferring their cloud-based infrastructure to NoSQL options like Amazon SimpleDB, Hadoop/HBase and Cassandra. With the company expanding exponentially, and seemingly taking on the world after launching a UK version in January, they needed a scalable, latent and robust system to make things run as smoothly as possible to avoid customers jumping ship for a rival. Their site chronicles their 'open source journey' (as they put it) of last year, and is well worth a read.
Now they've announced through their tech blog, two open source options for making life easier for those transferring to a dynamic database that scales horizontally. The Netflix team must have spent an afternoon flicking through a Greek mythology book as they've dubbed the two tools, Astyanax and Priam. The former is a Java Cassandra client with an improved API and connections management and the latter, open sourced last week is a set of tools for managing configuration.
Priam runs alongside Cassandra on each node, improving Cassandra's functionality by bolstering it with 'a dependable backup and recovery process'. It delivers daily snapshot and incremental data for all Netflix's clusters that is then backed up to S3. This functionality is a must for those embarking on deploying to the cloud, as one mistake could be critical. Priam can also restore data, supporting the restoration of a complete or partial ring. There's also the ability to restore data in the testing environment.
Priam's also unique in the way it allows multi-regional clusters (a must for Netflix as they span across multiple AWS regions in record time) by allocating tokens through interlacing them between regions. Most intriguing is Priam's REST API which as the team state had the goal to support multi-managing clusters. It does this by using 'hooks that support external monitoring and automation scripts. They provide the ability to backup, restore a set of nodes manually and provide insights into Cassandra's ring information. They also expose key Cassandra JMX commands such as repair and refresh.'
Netflix's Cassandra stats are impressive. They have 57 Cassandra clusters and Priam backs up tens of TBs of data to S3 per day. Even more impressive is that nodes get replaced almost daily automatically. It looks like Netflix have offered a gem of a NoSQL help client for the enterprise. We're just glad that everyone is pushing the capabilities of Cassandra further. What will we see next?