Unleash chaos engineering: Kubethanos kills half your Kubernetes pods
Chaos engineering is the art of destruction. Since Netflix unleashed Chaos Monkey onto the world, chaos engineering has been used to test system resiliency and see just how secure your system really is. Kubethanos is a new open source tool for Kubernetes pods. It kills half of your pods at random so that you can see just how your system (and your team) behaves under the threat of catastrophic failure.
Chaos engineering can help programmers find software resiliency holes and practice handling worst-case scenarios. Perhaps the most well-known example of chaos engineering is Netflix’s Chaos Monkey, an open source resiliency tool. Chaos Monkey randomly kills virtual machine instances and containers inside of the production environment.
It isn’t the only chaotic neutral tool in town though. Taking inspiration from Chaos Monkey, Kubethanos is a new tool for Kubernetes pods. As the name implies, it terminates half of your Kubernetes pods, selected at random.
Can your system handle the snap?
Created by software engineer and GitHub user Berkay-Dincer, Kubethanos wrecks havoc.
From the GitHub README:
kubethanos kills half of your pods randomly to engineer chaos in your preferred environment, gives you the opportunity to see how your system behaves under failures.
The project is written in Go and is open source under the MIT license.
Valid parameters include:
- –namespaces=!kubesystem,foo-bar //
- –kubeconfig //
- –healthcheck //
- –interval //
- –dry-run //
- –debug //
In a word: Why?
The thrill of unleashing absolute destruction isn’t even the best part. Essentially, breaking things on purpose gives us a better understanding of how a system behaves.
For Kubernetes, chaos testing has major benefits. Testing is critical and can prevent potential catastrophic failure in a safe, controlled manner. Proper usage of chaos engineering can prevent widespread outages, find potential failures, and boost confidence in your system.
Matthew Helmke wrote about the importance of testing reliability in Kubernetes. Helmke says:
Chaos Engineering begins with the recognition that while helpful technologies like microservices and containers add scalability and greater flexibility, it comes at the cost of complexity. The more complex something is, the more likely some part of it will fail.
Failure of application components is inevitable. However, downtime is not. CE looks at your application and tries to imagine where it is most likely to fail.
It is more expected to face the issues due to even more distributed nature of serverless compared to “traditional” microservices. So, the best way to get ourselves ready is to experience some issues in advance under our control….After you discover the issues as a result of a chaos engineering experiment, you can implement some fixes like exponential backoffs, tuning the timeouts or circuit breakers
More Kubernetes chaos engineering tools
Add to your arsenal. Some other similar tools include:
- Gremlin: In November, 2019 Gremlin announced native Kubernetes support for its client. Release the chaotic monsters upon your clusters.
- Chaoskube: Test your system against the clock. Running Chaoskube kills a pod in any namespace every 10 minutes.
- Kube-monkey: Chaos Monkey for Kubernetes clusters. You can even terminate all of your pods with the
- PowerfulSeal: Embrace the seal and unleash it upon targeted pods. An interactive, autonomous, and label mode are available.
- Litmus: Find weaknesses in your Kubernetes applications with a cloud-native approach to chaos engineering.
- Chaos Toolkit: An open API for chaos engineering with a Kubernetes extension.