Kubernetes-Managed Object Storage for the Win
Kubernetes is incredibly valuable for DevOps and IT teams because it treats infrastructure as code, delivering full scale automation to both stateful and stateless components of the software stack. If you only use Kubernetes for the applications, you are only tapping a fractional amount of the value. Let’s explore this a little deeper.
Manage Object Storage as Code with MinIO and Kubernetes to Streamline DevOps and IT Operations
Kubernetes is incredibly valuable for DevOps and IT teams because it treats infrastructure as code, delivering full scale automation to both stateful and stateless components of the software stack.
As they say, mileage may vary, and to maximize the value your team derives from running Kubernetes and software-defined infrastructure you need to maximize your ability to treat components as code and orchestrate them. The more you put into Kubernetes, the more value you receive. To max out value, put EVERYTHING in containers, including infrastructure applications, business applications and data.
Applications that run in containers are stateless, but state has to be maintained somewhere. That somewhere is object storage (not legacy block and file) and that object storage needs to run IN the container so Kubernetes can manage the automation of the infrastructure – both stateful and stateless.
A Kubernetes architecture is dynamic, with containers continuously being created and destroyed based on developer specifications and load. Pods and containers self-heal, restart and replicate in this dynamic environment. Legacy block and file persistent storage is a physical entity that can’t be dynamically created, moved and destroyed. The benefits of Kubernetes-based infrastructure orchestration, particularly the benefit of portability, are considerably diminished if the object store is left to bare metal or public cloud storage services.
We have seen this before, for example when VMware created the concept of the software defined datacenter (SDDC). Ask any infrastructure team and they all say the same thing: to get the most value out of SDDC, you have to virtualize the entire datacenter. Every application left behind on bare metal becomes a nightmare to manage, update and scale. Leave enough applications behind and all SDDC benefits are lost.
The same is true for Kubernetes. If you only use Kubernetes for the applications, you are only tapping a fractional amount of the value. Let’s explore this a little deeper.
Evolution of Object Storage as Code
Kubernetes treats CPU, Network and Storage as abstractions so applications and data stores can run as containers anywhere. In particular, the data stores include all persistent services (databases, message queues, object stores..).
From the Kubernetes perspective, object stores are not different from any other key value stores or databases. By placing the object store in a container, the storage layer is then reduced to the physical or virtual drives underneath. This point is critical. If we cannot reduce the storage layer to physical or virtual drives we lose the ability to deliver hybrid cloud portability. Workloads and persistent data that live in containers can be deployed, managed and moved between Kubernetes environments.
Modern applications, in particular, those built to run on Kubernetes, are designed to take care of availability, replication, scaling and encryption within themselves to become completely independent of the infrastructure. Application independence creates storage independence. Storage must run IN the container in order to deliver observability, data placement, maintenance operations and failure handling.
There was a time when applications relied on databases to store and work with structured data, and storage, such as local drives or distributed file systems, to house unstructured and even semi-structured data. However, the rapid rise in unstructured data challenged this model. POSIX was too chatty and had too much overhead to allow applications to perform at scale across regions and continents.
These shortcomings led to the development of object storage, which is designed for RESTful APIs (as pioneered by AWS S3). Now applications were free of any burden to handle local storage, making them effectively stateless (as the state is stored in the remote storage system).
Well-designed modern applications that work with some kind of data (logs, metadata, blobs, etc), conform to the cloud-native (RESTful API) design principle by saving the state to a relevant storage system. REST APIs can only take infrastructure so far. They only address application-storage communication challenges such as PUT and GET or READ/WRITE data, and tracking metadata and version data, but rely on Kubernetes for container orchestration and automation.
Kubernetes Native Object Storage
Kubernetes native storage applications (like MinIO) are designed to leverage the flexibility containers bring. Agile and DevOps best practices dictate that applications and CI/CD processes be simple and straightforward, independent of underlying infrastructure and consistent in how it accesses underlying infrastructure. Simply put, containers need to run the same way everywhere in order to be portable across development, test, and production. Combining that with variable hardware infrastructures, it makes sense for Kubernetes to be the point of contact between all the disaggregated infrastructures, applications and data stores.
Therefore, storage applications cannot make assumptions about the environment in which they are deployed.
In the Kubernetes world, services are simplified and abstracted: applications do application things and storage does storage things. The application doesn’t have to think about it – it just happens, all inside a container that can be expanded, moved or wiped out.
This is the cloud-native way.
There are certainly non-cloud native ways. For example, you could solve this problem with Container Storage Interfaces (CSI), but sophisticated architects and developers won’t because they add needless complexity and scalability challenges. CSI-based PVs bring their own management and redundancy layers which generally compete with the stateful application’s design.
Apache Spark, in the cloud-native world, runs without state on Kubernetes because it hands off state to other systems. The Spark containers themselves are running completely stateless, separating compute from storage. This pattern is followed by analytics platforms from Presto and Tensorflow to R and Jupyter notebooks. Applications that offload state to remote cloud storage systems are much easier to scale and manage. Bonus – they are also portable to different Kubernetes environments.
MinIO is Kubernetes Native Object Storage
Kubernetes’ value lies in its ability to treat infrastructure as code, delivering full scale automation to both stateful and stateless components of the software stack. To get the most out of Kubernetes, you must put the maximum number of components inside the container, including storage/persistent data.
MinIO is built for this – it easily fits in containers (~45MB), it is designed for RESTful APIs and includes Kubernetes-native management tools (MinIO Operator and Plugin) to deliver the richest and most reliable object storage experience for DevOps and IT teams.
When you are native to Kubernetes you can run anywhere it does without rewriting code – public cloud, private cloud, in your datacenter and across the edge.