Don’t get hysterical

Netflix open sources cloud performance library Hystrix

Elliot Bentley

The latest showing in the Netflix OSS cinema is designed to provide greater latency and fault tolerance.

As well as dominating the online video market, Netflix are known for their early adoption of AWS and daring approaches to cloud computing.

Luckily for us, they also generously open source much of their most innovative technology straight onto GitHub.

The latest release is Hystrix, a library designed to provide greater latency and fault tolerance by isolating points of access. Written in Java, it is designed to stop cascading failures, fail fast and rapidly recover and enable real-time monitoring. The name is derived from a genus of Porcupine,  hence the cute, spiny logo.

The impressive documentation on the wiki explains some of Hystrix’s technical details, and why fault tolerance is so crucial to their infrastructure. While a user request will be blocked if a single backend system becomes latent, multiple requests can bring down the entire system:

Worse than failures, these applications can also result in increased latencies between services which backs up queues, threads and other system resources causing even more cascading failures across the system.

Netflix’s approach is that, in distributed networks as complex as their own, failure is inevitable and should be prepared for accordingly. Chaos Monkey, another project open-sourced over the summer, terminates processes running in both production and testing environments in order to test the tolerance of Netflix’s system.

Not yet open sourced (but “coming soon”) is an integrated dashboard for real-time Hystrix monitoring, seen below.

Hystrix has also been added to the Netflix Open Source Center, which represents each project with a generic poster. For Hystrix, it’s a grim-looking ‘Thriller’: perhaps a reflection of the rough conditions the library is designed to help endure. 

Inline Feedbacks
View all comments