Destroy responsibly!

Turn failure into resilience with Gremlin Free – World’s first hosted chaos engineering service

Eirini-Eleni Papadopoulou
chaos engineering
Shutterstock / Albert Ziganshin

If you had the chance to test your system’s limits in order to make it more resilient before an unexpected failure cost you money, wouldn’t you do it? Gremlin Free, based on the principles of chaos engineering, is a service that offers just that!

Gremlin was created by former Amazon, Netflix, Google, and Dropbox engineers and is the first hosted chaos engineering service. To make this more special, the Gremlin team announced the availability of Gremlin Free, which provides DevOps teams with the means to get started with chaos engineering easily.

For those of you who are not familiar with the principles of chaos engineering, an introduction is in order.

Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production.

Basically, you break your system on purpose to make it more resilient before an unexpected failure breaks it for you, thus costing you [even more] money!

So, now that we had a quick overview of what chaos engineering is, let’s jump right in and have a look at what Gremlin Free has to offer!


How does it work

Gremlin Free offers several different attacks that you can use to inject failure into your system. The main categories are:

  • Resource: Starve your application of critical resources
    State: Change the state of the environment your application is running within
    Network: Simulate the inherently unreliable behavior of the network
    Request: Impact individual requests as they hit the wire

To make it a bit more clear, here’s what you can do with Gremlin Free:

Shutdown – The Shutdown attack allows you to shutdown or reboot one or many hosts or containers. You can select a set of specific hosts or choose to impact a random number of tagged hosts and you can also specify whether you’d like the attack to start right away or run on a schedule.

Consume CPU resources – With this function, you can find out how your autoscaled instances behave when their CPUs are consumed. You target the instances, select how many cores to consume and for how long, and using your favorite monitoring tool, watch the CPU consumption increase along with the number of instances in place to handle your traffic.

Failure as a Service – Provides you with full control at all times with an intuitive UI, CLI, & API. Quickly halt and revert all attacks at the click of a button, returning your hosts to a healthy state. If the Gremlin client ever loses communication with our control plane, all attacks will be halted and reverted.

Head over to the official documentation to find out more information on this tool and how it can help you build more resilient systems.

SEE ALSO: Handling failures in Java just got easier thanks to Failsafe

Getting started

If you can’t wait to let the gremlins wreak havoc in your system, you can find Gremlin Free available for Ubuntu 16.04CentOS 7Docker Container, and Kubernetes.

Destroy responsibly!

Eirini-Eleni Papadopoulou
Eirini-Eleni Papadopoulou was the editor for Coming from an academic background in East Asian Studies, she decided that it was time to go back to her high-school hobby that was computer science and she dived into the development world. Other hobbies include esports and League of Legends, although she never managed to escape elo hell (yet), and she is a guest writer/analyst for competitive LoL at TGH.

Inline Feedbacks
View all comments