Turn failure into resilience with Gremlin Free – World’s first hosted chaos engineering service
If you had the chance to test your system’s limits in order to make it more resilient before an unexpected failure cost you money, wouldn’t you do it? Gremlin Free, based on the principles of chaos engineering, is a service that offers just that!
Gremlin was created by former Amazon, Netflix, Google, and Dropbox engineers and is the first hosted chaos engineering service. To make this more special, the Gremlin team announced the availability of Gremlin Free, which provides DevOps teams with the means to get started with chaos engineering easily.
For those of you who are not familiar with the principles of chaos engineering, an introduction is in order.
Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production.
Basically, you break your system on purpose to make it more resilient before an unexpected failure breaks it for you, thus costing you [even more] money!
So, now that we had a quick overview of what chaos engineering is, let’s jump right in and have a look at what Gremlin Free has to offer!
How does it work
Gremlin Free offers several different attacks that you can use to inject failure into your system. The main categories are:
- Resource: Starve your application of critical resources
State: Change the state of the environment your application is running within
Network: Simulate the inherently unreliable behavior of the network
Request: Impact individual requests as they hit the wire
To make it a bit more clear, here’s what you can do with Gremlin Free:
Shutdown – The Shutdown attack allows you to shutdown or reboot one or many hosts or containers. You can select a set of specific hosts or choose to impact a random number of tagged hosts and you can also specify whether you’d like the attack to start right away or run on a schedule.
Consume CPU resources – With this function, you can find out how your autoscaled instances behave when their CPUs are consumed. You target the instances, select how many cores to consume and for how long, and using your favorite monitoring tool, watch the CPU consumption increase along with the number of instances in place to handle your traffic.
Failure as a Service – Provides you with full control at all times with an intuitive UI, CLI, & API. Quickly halt and revert all attacks at the click of a button, returning your hosts to a healthy state. If the Gremlin client ever loses communication with our control plane, all attacks will be halted and reverted.
Head over to the official documentation to find out more information on this tool and how it can help you build more resilient systems.