Doing DevOps

The resilient and highly scalable applications cloud

Walid Farag
Applications cloud image via Shutterstock

DevOpsCon 2015 speaker Walid Farag gives us an in-depth look into the modern datacenter by harnessing IT operations processes and tying it all with the bigger picture. He looks to answer the following question: Why choose business agility?

Business agility means delivering faster, learning faster and hence growing faster. IT operations is an integral part of this challenging process.

Resilient and highly scalable application clouds are the basis for the modern datacenter. But it is not isolated from the bigger picture. It starts with business vision and objectives. In this article we try to answer these questions: Why business agility and how to implement it smoothly in DevOps organisations.

Growing business faster with Lean Startup

Lean startup is a method that combines iterative product or service development, business-hypothesis-driven experimentation, and validated learning (a process by which one learns by trying out an idea and then measuring results to validate the effect).

Applying lean startup allows for minimal investment until results are evaluated. The goal is to learn quickly what works and discard what does not work.

Figure 1 – Lean startup

Figure 1 – Lean startup

Boost performance with an Agile development strategy

Agility enables businesses to reach goals effectively and react fast and smooth to market changes. You can rely on two simple principles:

  • Release as often as possible: You will be able to learn fast what works and what doesn’t. Then, you can base your decisions on data not assumptions. Releasing software often can vary from many times daily to once every 2-3 weeks. In all cases to get simple deployments, you have to deliver small releases/packages/services.
  • Decouple your software components: This will enable you to deploy different parts of your software product independently.
Figure 2 – Bounded contexts

Figure 2 – Bounded contexts

Figure 2 above presents a very high level architecture of an internet shopping portal. Every subdomain (e.g. Shipment or Credit Worthiness) is represented by a software service. Related subdomains are grouped logically in a Bounded Context. This will help later with assigning and planning the responsible development teams.

These services are decoupled, developed and deployed independently. Development teams can then work autonomously. Dependency between teams will decrease and hence development speed will be faster.

Certainly different teams should coordinate the development of interfaces and interactions between subdomains. Because services are developed independently, there must be a clear system for semantic versioning and deprecation.

Challenges with classic IT operations

Deploying and running these services is very much dependent on your IT infrastructure and processes. So, dynamic and flexible IT is a key for the success of all businesses. However, challenges with traditional software development and IT operations could hinder running smooth development and operations processes. Part of them is listed here.

  • Deployment & Configuration
  • Quality & Performance
  • Availability Expectations
  • Utilization of Infrastructure
  • Monitoring

These challenges are constantly growing especially with increasing number of applications and their complexity and dependency. The new generation IT operations require a new thinking of application development and operations.

Accelerate deployment with a DevOps strategy

DevOps (a clipped compound of “development” and “operations”) is a software development method that stresses communication, collaboration, integration, automation, and measurement of cooperation between software developers and other information-technology (IT) professionals (source: Wikipedia).

Therefore DevOps became increasingly important even in traditional organizations. It offers a great opportunity to let developers think deeply of how the application runs in operations. Similarly, IT operations understand the application requirements and design much better.

Yet, DevOps need a platform where they work together in different stages of software lifecycle.

Our goal

We aim at smoothing business agility. Our IT operations strategy is to boost performance and save infrastructure costs without compromising IT Security. Fortunately we have many options to reach our goal. Below are some of them. We will examine then soon in this article.

Fortunately we have many options to reach our goal. Below are some of them. We will examine then soon in this article.

  • Standard Delivery Unit
  • Boost Utilization of Infrastructure
  • Automated Deployment & Configuration
  • Dynamic Service Discovery
  • Elastic Monitoring
  • Self Service Mode

I like to name IT operations with the above capabilities “Modern Datacenter”. I heard this term from Mitchell Hashimoto, an innovator behind powerful open source products such as Consul; a service discovery and configuration tool.

A modern datacenter runs a resilient & highly scalable applications cloud.  Such a cloud is an accelerator for any modern IT and DevOps organizations.

The Modern Datacenter

The next part of this article presents how to implement the above capabilities with proven technologies used by internet giants.

Docker: Standardising deliverables to IT operations

Docker is an amazing technology. It enables you to package your application and its entire configuration inside a container. Just like a container shipped in a ship, the IT operations staff does not need to know what is inside a container. Rather, they need to know it from the outside; for example CPU, memory and storage requirements of the container’s application.

Packaging the application inside a container simplify and accelerate deployments. Application technology stack will not be relevant for IT Operations. Java, PHP, MySQL and Perl applications are just deployed inside the container. It is the developers’ responsibility to deliver preconfigured Docker images, which will run as containers later.

IT operations have the responsibility to write standard deployment scripts, which are valid for all containers (applications). The scripts start containers. That’s it!

Another important feature is isolation. Applications running in containers are isolated from other containers. Different applications can use same ports. This saves configuration time and eliminates human errors. It also enables switching or running different versions of the application at the same time.

Docker layers

Figure 3 – Docker layers

Docker offers another nice feature. A Docker image is physically stored as layers. For example, you can create an image with an operating system like Ubuntu or Red Hat. This layer is reusable. Another layer can be installed on top of it. Let us say an application server, a web server or a mail server. In this case the operating system layer is not modified. The upper layers lay simply over it.

When running a Docker container from a Docker image, the container is started in a new layer, where log files or run-time data might be stored. The images below are immutable. Should the container not run properly because of any reason (e.g. application produces memory leak), simply stop the container, delete it and start a new container from the original image state.

This feature results in a fast, consistent and standard recovery process. It certainly saves time in production.

Mesos: Turning server fleet into resource pools

Mesos is great tool that abstracts CPU, memory, storage, and other compute resources away from Machines (physical or virtual) (source: Apache Mesos).

Figure 4 – Mesos abstracts server fleet into resources

Figure 4 – Mesos abstracts server fleet into resources

Mesos enables you to turn a fleet of servers (virtual machines, bare metal) into resource pool. No matter where they run on premise or in public cloud, IT operations start applications through Mesos.

With this abstraction provided by Mesos, machines are no longer dedicated for applications.

Mesos supports Docker too. You can imagine, you give a command that your containerized applications start automatically anywhere in your cloud, where there are resources available.

When a machine or a rack crashes, applications can be restarted in other machines automatically.

Here is a summary Mesos benefits:

  • IT operations only deal with resources; No longer with single machines
  • Standard installation templates (Master & Slave). No longer special installation procedures.
  • Fast installation & configuration
  • Resilience (Elasticity)
  • Infrastructure scalability out of the box (Simply add new Mesos slaves)

Marathon: Running and scaling cloud applications

Marathon is an Apache Mesos Framework for long-running applications. Given that you have Mesos running as the kernel for your datacenter, Marathon is the “init” or “upstart” daemon. Marathon provides a REST API for starting, stopping, and scaling applications. Like Mesos, Marathon can also run in highly-available mode by running multiple copies. The state of running tasks gets stored in the Mesos state abstraction. (Source: Apache Marathon).

Figure 5 – Marathon running apps through Mesos

Figure 5 – Marathon running apps through Mesos

Marathon offers the following benefits:

  • Automated startup in the Cloud
  • Application scalability (only a RESTful API call)
  • Application fault-tolerance out of the box
  • Rolling upgrades

Consul: Automatic discovery at scale

Running hundreds or even thousands of dependent applications in the cloud is a great challenge in terms of configuration, monitoring and maintenance.

Consul is a tool for discovering and configuring services in your infrastructure.

Figure 6 – Service discovery with Consul

Figure 6 – Service discovery with Consul

When for example a database server is started in the cloud by Marathon/Mesos, its location is registered in Consul. Then a web service or a portal can easily connect to its database by configuring the location of the database server at run time.

Consul has many other benefits:

  • Service discovery and Key/Value Store for dynamic configuration
  • Health checking for dynamic monitoring
  • Multi datacenter support
  • RESTful API
  • DNS interface

Elastic monitoring

With this level of complexity and automation, it is no longer practical to rely on classic monitoring and troubleshooting methods and tools.

ElasticSearch is a group of innovative & scalable products that offer shipping and filtering of applications’ log files. This is certainly configurable, and your DevOps teams can decide on what should be monitored and how.

Figure 7 – Monitoring with ElasticSearch

Figure 7 – Monitoring with ElasticSearch

With Logstash, all logs are shipped to a central service. Then all logs are stored and indexed. This step is essential for searching co-related and complex events/incidents inside your datacenter.

Elastic Search offers a highly configurable user interface called Kibana. Your DevOps teams can monitor and dig for events much easier and faster than with classic ways.

Elastic search provides the following features out of the box:

  • Distributed full-text search engine
  • Scalable architecture
  • Multitenant-capable
  • RESTful web interface

Security – Threat and defence models

Information security is the practice of defending information from unauthorised access, use, disclosure, disruption, modification, perusal, inspection, recording or destruction (source: Wikipedia).

Our goal is to achieve:

  • Confidentiality
  • Integrity (maintaining and assuring the accuracy and completeness of data over its entire life-cycle). This means that data cannot be modified in an unauthorised or undetected manner

Trying to find out and define possible threats is specific to every situation. So, I think that there is no silver-bullet Solution. Your DevOps teams should design security to fit your threat model.

Here we focus on mentioned tools and possible threats could be:

  • Non-cluster members access data
  • Manipulating cluster state
  • Fake service registration
  • Denial of service against a cluster node/agent

The above mentioned tools come with specific configuration based defence solutions out of the box. They address:

  • Confidentiality with encryption
  • Authenticity with key/certificate for each node/agent
  • Authorisation with access control lists

Additionally, classic security zones (Admin, Private, and Public) offer another defence measures against security threats.

Roles in next generation DevOps

This architecture supports one more agility accelerator. Infrastructure teams will focus on installing and maintaining the platform described in this article. They create APIs (e.g. star, stop, deploy, scale, destroy) for development teams.

Developers take care of the applications’ details. Moreover, they can deploy, run and monitor the applications themselves. Development teams are then responsible for healthy operations of their applications.

This very clear separation of concerns should result in even faster speed, better quality and hence less costs.

Orchestrating and scaling agility

Over the years, I have developed the view that business agility must apply a 360° concept, where business and IT are synchronised and move to realise business vision an objectives faster and safer.

Figure 8 – Scaling agility with autonomous teams and cloud-ready architecture

Figure 8 – Scaling agility with autonomous teams and cloud-ready architecture

That said; agility is not only a methodology such as Scrum, LeSS (Large Scale Scrum) or Kanban. It is rather a set of values, principles and practices. In our article here, we focused on the support we get from cloud ready architecture that enables scaling and orchestrating agility. Figure 8 presents this idea at high level.

This article is a trial to present a resilient and highly scalable applications cloud from a very high level. I hope you find it valuable. Your feedback will make me happy. Please contact me, when you have any questions or complaints.

Read more about Walid Farag’s DevOpsCon 2015 presentation here.


Walid Farag

Walid is an Agile coach and systems architect. His focus is on scaling agile software development with Kanban Ace and LeSS (Large Scale Scrum). Walid also focuses on Cloud Ready Architecture and Containers (Docker). His aim is to design the most performing continuous delivery pipeline, while reaching the maximum utilisation of the infrastructure.

Inline Feedbacks
View all comments