Embracing the Big Data revolution

Netflix pushing the NoSQL agenda with Cassandra tools Astyanax and Priam

Chris Mayer
netflix

As Netflix becomes an all-consuming video streaming behemoth, they’ve been chronicling their transition to NoSQL to cope with the huge demand. Now they offer two Cassandra clients to the masses.

We’ve already touted this year as the year when enterprises
finally embrace Big Data and NoSQL solutions for managing reams and
reams of their data. As
Hadoop
and
Cassandra
both matured at the back end of last year, many large
corporations have put their weight behind the NoSQL revolution and
many appear to be giving a little back to the community, in order
to further the Apache projects

One of the world’s biggest video streaming hubs, Netflix have
had to tackle the problem of coping with a rapidly increasing user
base and expanding catalogue and they’ve been pretty vocal about
transferring
their cloud-based infrastructure
to NoSQL options like
Amazon SimpleDB, Hadoop/HBase and Cassandra. With the
company expanding exponentially, and seemingly taking on the world
after launching a UK version in January, they needed a scalable,
latent and robust system to make things run as smoothly as possible
to avoid customers jumping ship for a rival. Their site chronicles
their ‘open source journey’ (as they put it) of last year, and is
well worth a read.

Now they’ve announced through their tech blog, two open source
options for making life easier for those transferring to a dynamic
database that scales horizontally. The Netflix team must have spent
an afternoon flicking through a Greek mythology book as they’ve
dubbed the two tools, Astyanax and Priam. The former is a
Java Cassandra client with an
improved 
API and connections
management and the latter, open sourced last week is a set
of tools for managing configuration.

Priam runs alongside Cassandra on each node,
improving Cassandra’s functionality by bolstering it with
‘a dependable backup and recovery process’. It
delivers daily snapshot and incremental data for all
Netflix’s clusters that is then backed up to S3. This functionality
is a must for those embarking on deploying to the cloud, as one
mistake could be critical. Priam can also restore data, supporting
the restoration of a complete or partial ring. There’s also the
ability to restore data in the testing
environment.

Priam’s also unique in the way it allows
multi-regional clusters (a must for Netflix as they span
across multiple AWS regions in record
time
) by allocating tokens through interlacing
them between regions. Most intriguing is Priam’s REST API which as
the team state had the goal to support multi-managing clusters. It
does this by using ‘hooks that support external monitoring
and automation scripts. They provide the ability to backup, restore
a set of nodes manually and provide insights into Cassandra’s ring
information. They also expose key Cassandra JMX commands such as
repair and refresh.’

Netflix’s Cassandra stats are impressive. They have 57 Cassandra
clusters and Priam backs up tens of TBs of data to S3 per day. Even
more impressive is that nodes get replaced almost daily
automatically. It looks like Netflix have offered a gem of a NoSQL
help client for the enterprise. We’re just glad that everyone is
pushing the capabilities of Cassandra further. What will we see
next?


Author
Comments
comments powered by Disqus