Chaos engineering with Kubernetes
Kubernetes makes it easy to give engineers the ability to deploy their apps to dedicated, isolated namespaces. In this article you will learn what Kubernetes is and how to use Chaos Engineering to help you reach your company’s objectives. Discover what containers, monoliths, and microservices are, why containers are useful, and how to lay down the foundations of success with Kubernetes.
What is Kubernetes?
Kubernetes is a way to deploy and manage applications as microservices instead of as a monolith. This article gives a high level explanation in plain language to provide a basic understanding of the technology and where it fits in today’s enterprise software world.
What are monoliths and microservices?
A monolith is where the entire application is deployed in one big chunk of code, traditionally to an in-house data center directly on server machines that have only an operating system installed on them. These are typically referred to as “bare metal” deployments.
A microservice architecture finds functions from your application that can stand alone, requiring only a clearly defined input and returning a clearly defined output. These functions are each deployed separately from one another, whether on bare metal machines in a data center, or in containers like Docker.
Microservices are beneficial to enterprises because they enable faster, smaller updates of software. Instead of redeploying your entire code base after a simple fix or addition is made, you only deploy the small microservice containing the new changes. In addition, your application will scale better under heavy use, saving time and preventing user frustration.
What are containers?
A container is a defined instance of a bundled function.
Put simply, software engineers take one function from the main application and put it in an isolated package, bundled with any and all software dependencies needed to run that function.
Containers are different from deploying using virtualization. Virtualization starts with a base operating system, then a virtualization platform layer, then individual virtual machines on top of that. Individual virtual machines each have their own instance of a complete operating system, then all of the program dependencies, and on top of that, the application software you want to run.
A container can stand alone and run by itself on the container platform it was created for. There is only one instance of an operating system, running at the base layer, then the container platform runs on top of that. Containers are leaner, and as a result, use fewer system resources.
Why are containers useful?
The software in the container can only communicate in and out using the defined inputs and outputs specified when the original container was built. It is able to be copied and multiple instances of that containerized function can be run at the same time.
This is useful because as the number of users of an application grows, more resources are required to keep the application running. An application that has had its most used features broken out into microservice containers is easier to scale to meet demand because you can quickly and easily create more instances of the feature and then balance the increased user load across them.
Containers are useful because of the intentional isolation of the contained function, which enhances overall security. The code in one container cannot affect code in another container, and so on. The input/output limits prevent this.
With containers, it is also possible to upgrade just one feature at a time, perhaps upgrading just one container instance of a feature for testing while running most of your application using the current version. This provides great flexibility and the opportunity to roll back quickly if problems are found well before those problems have any chance of affecting your users.
There are multiple container platform types and vendors. Some common ones include Docker, Kubernetes, LXC, and Mesos. Each have their strengths and their weaknesses.
What makes Kubernetes different?
As you can imagine, running all of these containers creates some complexity. It is difficult to keep track of all of the container instances of a specific microservice. We must also track the load balancing required to spread the work across all instances, and the networking required to make all of the microservices in the application communicate and operate smoothly.
The value added by Kubernetes, besides being free and open source so we can deploy it without adding to our expenses, is orchestration. It provides a way to automate important tasks related to keeping containers healthy and running, or replacing them quickly when something fails.
All of this is because of complexity.
This complexity can be a struggle to secure. It can be a struggle to understand. It can be a struggle to configure and keep running. You need to have a plan.
What are the foundations of success with Kubernetes?
DevOps was created as a new perspective to merge development and operations to make sure the person/team writing code was also responsible for making sure the code works when deployed. Kubernetes makes it easy to give engineers the ability to deploy their apps to dedicated, isolated namespaces. It has good permissions handling for large-scale microservices deployments.
Site Reliability Engineering (SRE) popped up as a new discipline focused solely on keeping complex web applications up and running, or fixing them quickly when they fail.
Just because a technology helps manage complexity, don’t let yourself believe that complexity has gone away. Specialized engineering skills and resources to properly use those skills are what keep our applications available.
We all know how expensive downtime is and want to avoid it. One of the newest and most useful disciplines being adopted and applied by DevOps and SRE practitioners is Chaos Engineering (CE).
Chaos Engineering begins with the recognition that while helpful technologies like microservices and containers add scalability and greater flexibility, it comes at the cost of complexity. The more complex something is, the more likely some part of it will fail.
Failure of application components is inevitable. However, downtime is not. CE looks at your application and tries to imagine where it is most likely to fail.
Chaos Engineers then create a hypothesis of how to test the reliability of your application in the event of the imagined failure and then test that hypothesis in a controlled manner to see what really happens. The data collected from CE tests enables engineers to harden systems, making them more reliable overall should problems occur.
SEE ALSO: The four myths of shift left testing
Can chaos engineering work with Kubernetes?
The short answer is yes, of course! Chaos Engineering with Kubernetes is not only possible, but vital.
There are some differences to consider when designing chaos experiments on Kubernetes. For example, when you have multiple containers running on the same host, you must remember that host resources are shared across all of the running containers.
This means that unless you design your experiment to properly focus the attack and blast radius, you may cause unexpected outcomes such as an experiment you thought was going to impact only one container actually impacting the performance of multiple containers running on the same host.
There are ways to prevent this, the most obvious of which is to learn from experts how to design experiments that actually test what you want to test. Another is to use a vendor, like Gremlin, who offers a Chaos Engineering tool that is designed with Kubernetes support in mind as a first-class citizen.
For example, here is what it looks like in the Gremlin tool UI when we are preparing to run an experiment on our Kubernetes deployment. Notice that at first, nothing is selected.
Compare that to this second image where we have selected objects in our Kubernetes cluster based on object type.
After this step, we select the type of attack we want to run and the magnitude we want to set for that attack. Anyone willing to learn more about their system and how to properly design a useful chaos experiment can do this. Learn more about using Gremlin with Kubernetes in their product announcement.
If Kubernetes looks like something useful to your company, the best advice is to talk to the application and engineering experts in your own company first. Talk about the resources needed to do the best job possible, focusing on reliability in your discussions. How do we do this in a way that increases our uptime and limits our downtime? Do that, and you will be successful.