Moving to cloud-native applications and data with Kubernetes and Apache Cassandra
Over the past twenty years, there have been several big trends in distributed computing. Today, we have more developers using microservices designs to decompose applications into smaller and more manageable units. Apache Cassandra was built for cloud data and is now becoming the choice of developers for cloud native applications.
Moving your applications to run in the cloud is attractive to developers. Who doesn’t like the idea of being able to easily scale out and have someone else worry about the hardware? However, making use of cloud-native methodologies to design your applications is more than migrating them to a cloud platform, or using cloud services.
What does this mean in practice? It involves understanding the role that containers and orchestration tools play in automating your applications, how to use APIs effectively and how other elements like data are affected by dynamic changes to your application infrastructure. More specifically, it means running your application using virtually unlimited compute and storage in the cloud alongside a move to distributed data. Apache Cassandra was built for cloud data and is now becoming the choice of developers for cloud native applications.
How did we get here?
Let’s look at how we got to today. Over the past twenty years, there have been several big trends in distributed computing. Reliable scale networking was the big area of focus in the 2000s, which enabled the linking of multiple locations and services together so that they could function at the velocity and volume the Internet demanded. This was followed in the 2010s by moving compute and storage to the cloud, which used the power of that distributed network to link application infrastructure together on-demand with elasticity. That works well for the application itself, but it does not change how we have been managing data.
Managing a distributed database like Cassandra can be complex. To manage transactions across multiple servers, it takes some understanding of the tradeoffs presented in Brewer’s Theorem which covers Consistency, Availability and Partition Tolerance (CAP): how a database can manage data across nodes; the availability of that data; and what happens across different locations respectively. More importantly, how does the database react when non-ideal conditions are present. The inevitable failures that happen in a system with multiple parts.
Not only does your database have to manage failure cases, it also has to do this while maintaining data consistency, availability and partition tolerance across multiple locations. This is exactly what Cassandra was built to do and has proven itself in just those tough conditions. Being rooted in a distributed foundation, has given Cassandra the ability to do hybrid cloud, multi-cloud or geographically distributed environments from the beginning. As applications have been built to withstand failures and scalability problems, Cassandra has been the database of choice for developers.
Today, we have more developers using microservices designs to decompose applications into smaller and more manageable units. Each unit fulfills a specific purpose which can scale independently using containers. To manage these container instances, the container orchestration tool Kubernetes has become the de-facto choice.
Kubernetes can handle creating new container instances as needed, which can help scale the amount of compute power available for the application. Similarly, Kubernetes dynamically tracks the health of running containers – if a container goes down, Kubernetes handles restarting it, and can schedule its container replacement on other hardware. You can rapidly build microservice-powered applications and ensure they run as designed across any Kubernetes platform. For an application to run continuously and avoid downtime, even while things are going wrong, are powerful attributes.
In order to run Kubernetes together with Apache Cassandra, you will need to use a Cassandra Operator within your Kubernetes cluster. This allows Cassandra nodes to run on top of your existing Kubernetes cluster as a service. Operators provide an interface between Kubernetes and more complex processes like Cassandra to allow them to be managed together. Starting a Cassandra cluster, scaling it and dealing with failures are handled via the Kubernetes Operator in a way that Cassandra understands.
Since Cassandra nodes are considered stateful services you will need to provision additional parts of your Kubernetes cluster. Storage requirements needed by Cassandra can be satisfied by using PersistentVolumes and StatefulSets to guarantee that data volumes are attached to the same running nodes between any restart event. Containers for Cassandra nodes are built with the idea of external volumes and are a key element in the success of a cluster deployment. When properly configured, a single YAML file can deploy both the application and data tiers in a consistent fashion across a variety of environments.
As you look at adopting microservices and using application containers, you can take advantage of fully distributed computing to help scale out. However, to really take advantage of this, you need to include distributed data in your planning. While Kubernetes can make it easier to automate and manage cloud-native applications, using Cassandra can complete the picture.
Bringing together Apache Cassandra and Kubernetes can make it easier to scale out applications. Planning this process involves understanding how distributed compute and distributed data can work together, in order to take advantage of what cloud-native applications can really deliver.