Don’t get hysterical

Netflix open sources cloud performance library Hystrix

Elliot Bentley
hystrix1

The latest showing in the Netflix OSS cinema is designed to provide greater latency and fault tolerance.

As well
as
dominating the online video market
, Netflix are known for their
early adoption of AWS and daring approaches to cloud computing.

Luckily for us, they also generously open source much of their
most innovative technology straight onto GitHub.

The latest release is Hystrix, a library
designed to provide greater latency and fault tolerance by
isolating points of access. Written in Java, it is designed to stop
cascading failures, fail fast and rapidly recover and enable
real-time monitoring. The name is derived from a genus of Porcupine,
 hence the cute, spiny logo.

The impressive documentation on the wiki explains
some of Hystrix’s technical details, and why fault tolerance is so
crucial to their infrastructure. While a user request will be
blocked if a single backend system becomes latent, multiple
requests can bring down the entire system:

Worse than failures, these applications can also result in
increased latencies between services which backs up queues, threads
and other system resources causing even more cascading failures
across the system.

Netflix’s approach is that, in distributed networks as complex
as their own, failure is inevitable and should be prepared for
accordingly. Chaos
Monkey
, another project open-sourced over the summer,
terminates processes running in both production and testing
environments in order to test the tolerance of Netflix’s
system.

Not yet open sourced (but “coming soon”) is an integrated dashboard
for real-time Hystrix monitoring, seen below.


Hystrix has also been added to the Netflix Open Source Center,
which represents each project with a generic poster. For Hystrix,
it’s a grim-looking ‘Thriller’: perhaps a reflection of the rough
conditions the library is designed to help endure. 

Author
Comments
comments powered by Disqus