Benefits and issues

Entering a new state as containerization heads mainstream

Jim Scott
© Shutterstock / SasinTipchai  

The growing popularity of Docker and Kubernetes has driven containerization to go mainstream and with good reason. However, even with the many benefits offered by containers, there are still issues to overcome when it comes to delivering truly stateful applications that need persistent data.

Containerisation has proven a major benefit for developers keen to deliver more agile and reliable applications. The trend has been given heightened awareness by the growing popularity of Docker and Kubernetes; a technology driving mainstream adoption through orchestration capabilities that further simplify deployment, scaling, and management of containerized applications.

Even with the many benefits offered by containers, there are still issues to overcome when it comes to delivering truly stateful applications that need persistent data. For example, how do you provide persistent storage volumes for access to any data located on-premises, across clouds, and to the edge? While Kubernetes itself only manages and orchestrates containers, it does not organically offer all the aspects required for an organization to successfully deploy containerized services in production. This includes areas like maintaining data across sessions or shared across applications or even within areas like disaster recovery for critical business data and data tiering for huge data volumes.

Growing demand for agility

Firstly, it’s worth understanding the background, benefits, and limitations of the technology options. Containers are one of the fastest growing new technologies, owing to their capability to vastly improve the application development experience.

Containers can be considered operating system virtualization in which workloads share operating system (OS) resources. Though they have been around for just a few years, their adoption rate and acceptance have been rapid with one major global study of 1100 senior IT and business executives stating that 40% are already using containers in production, half of those with mission-critical workloads. Docker is the most popular container platform by a wide margin, but alternatives are available, such as rkt from CoreOS, LXD from Canonical and Azure Container Instances from Microsoft.

Containers can do a lot of what virtual machines (VMs) cannot do in a development environment such as the capability to be launched or abandoned in real time instantly with no requirement for OS overhead in the container environment. The technology is also playing a major role in facilitating the seamless transfer of development from one environment or platform to another.

SEE ALSO: Deliver Docker containers continuously with ECS

Containers enable each workload to have exclusive access to resources such as processor, memory, service accounts and libraries, which are essential to the development process. Containers run as a group of name-spaced processes within an operating system, which makes them fast to start and maintain. They can be configured to include all the supporting elements needed for an application, which makes them especially popular with developers. Unlike virtual machines, containers can be spun up in seconds and can be stored in libraries for reuse. They are also portable; an application that executes in a container can theoretically be ported to any operating system that supports that type of container.

Welcome to the pod

Kubernetes has been a big step toward making containers mainstream. Kubernetes introduced a high-level abstraction layer called a “pod” that enables multiple containers to run as a collective group where some services may depend on others. This simplifies the administration of large containerized environments. Kubernetes also handles load balancing to ensure that each container gets the necessary resources. Kubernetes monitors container health and can automatically roll back changes or shut down containers that don’t respond to pre-defined health checks. It automatically restarts failed containers, reschedules containers when nodes die and can shift containers seamlessly between servers on-premises and in the cloud. Altogether, these features give IT organizations unprecedented productivity benefits, enabling a single administrator to manage thousands of containers running simultaneously.

Developer agility

Ultimately, developers using containers will seamlessly move their test and development projects directly into production without major porting efforts. This ability to transfer workloads from one environment to another is destined to become much more important in the emerging hybrid IT environment, in which infrastructure is a combination of existing legacy systems, on-premise and off-premise private cloud, and public cloud.

It’s no surprise that containers and microservices have grown in popularity in lockstep with each other; they go together perfectly. Microservices are well-tuned to a container environment because they typically perform a limited set of tasks and are called upon only when needed. Containers are a perfect vessel for microservices. Services can be stored in a library and spun up quickly upon demand, then shut down to be accessed again directly from the library.

Stateless vs. stateful

As a rule, containers are stateless, meaning that they don’t contain persistent information. When they shut down, any data that was in memory or stored inside the container goes away. Since microservices are miniature processing engines, they typically don’t require persistent data. Containers also include all the support software needed to run the application. This minimizes the risk of conflicts and failure due to other environmental variables. Microservices embedded in containers are self-contained, portable and consistent.

The stateless nature of containers can be a problem in some cases, particularly as the number of instances grows. While it is possible to store data Inside containers, it’s not considered a best practice. A better approach is to keep data in a separate data store and then access it upon the launch of the container. Containers enable a wide variety of big data scenarios.

SEE ALSO: “Putting a malfunctioning application into containers does not make it better”

For example, a web server can run in a container to enable public access to data without risking exposure of sensitive information in adjacent containers. In this scenario, the web server can selectively pull user profile data from an RDBMS (SQL) database in another container and combine it with analytics data from a third container running a NoSQL database to deliver individualized shopping recommendations without compromising security.

Resource efficiency also makes containers good candidates for event-driven applications. A use case could include streaming data that can be processed in parallel and combined for delivery to an analytics engine. Machine learning is another great example where algorithms running in separate containers can access the same data for different kinds of analysis, greatly improving the speed and quality of results.

Solving the data bottleneck

Containers are quick to launch but loading data into containers can be slow by comparison. For that reason, it’s tempting to keep a persistent copy of data obtained from, say, a Kafka stream inside the container. The problem is that containers work best when they’re stateless, and storing data inside them makes them stateful, or heavy. A profusion of stateful containers can quickly become an administrative nightmare, as well as a security risk.

A better approach is to separate data into a persistent and flexible data store that can be accessed by any container. The problem with that approach is that not all data stores are appropriate for all types of data. NAS filers, for example, can’t accommodate block storage, and some storage subsystems are too slow to handle streaming data at all. An increasingly common approach is the use of a distributed data platforms with a Persistent Application Client Container (PACC). Together, these technologies make it possible for containers to store their operating state upon shutdown and to load any kind of data – including structured, unstructured and streaming data – from a single persistent store. The approach is linearly scalable and provides a single platform that includes authentication and authorization within one global namespace.

Some auxiliary benefits of stateful containers include resiliency; with the ability to quickly restart a failed container from the last known data snapshot along with an effective audit trail of the steps each container has completed to avoid having to redo a complex process. For example, a container application for processing a car loan can leave behind an audit trail to ensure that every step has been completed as required for compliance.

Simplifying data management

However, the largest single benefit is for developers. By separating the data plane from the containerisation elements, developers don’t have to worry about managing the storage infrastructure and can simply point the container at the converged data platform which in turn can manage the information lifecycle across different types of data coming from database, files, streams and messaging from either local, cloud or third-party sources. If data types or sources change the containerized application does not need to be modified as the converged data platform provides an abstraction layer able to serve up the required data in a seamless fashion.

The benefits of both stateless and stateful applications are part of an overarching goal in creating an agile development and deployment environment that is unconstrained by the traditional limitations of systems management. Tasks that used to require manual intervention now can – and should – be automated. Seamless orchestration across multiple servers should be assumed. All these capabilities are available today for organizations that are ready to embrace them.


Jim Scott

Jim Scott is an experienced leader having worked in financial services, regulatory, digital advertising, IoT, manufacturing, healthcare, chemicals and geographical management systems. He is a cofounder of the Chicago Hadoop Users Group (CHUG) where he helped grow a now flourishing community around next generation technologies. Scott has built systems scaling to 50+ billion transactions per day, and his work with high-throughput computing at Dow Chemical was a precursor to more standardized big data concepts. His passion is in building combined big data and blockchain solutions.

Inline Feedbacks
View all comments