Apache CloudStack allows DevOps teams to seamlessly provision compute, storage, and network resources
There are many hundreds of public cloud providers who use CloudStack to deliver IaaS to their customers. These range from massive, global provider’s right to small ISP’s who need to adopt a cloud model. Public cloud providers choose CloudStack because it provides all of the capabilities required to deliver multi-tenant IaaS services, in a cohesive, reliable platform that is easily deployed and managed. We talked to Giles Sirett, PMC member of the Apache CloudStack project, about Apache CloudStack, the idea behind this project and its biggest assets.
One of CloudStack’s greatest benefits is that it performs all IaaS functions in one, integrated, reliable software platform. Users often use the tag line “It just works” to describe CloudStack. We invited Giles Sirett, PMC member of the Apache CloudStack project, to dissect this project and tell us what’s next for CloudStack.
JAXenter: What is the idea behind Apache CloudStack?
Giles Sirett: Apache Cloudstack is a proven, multi-tenant infrastructure control plane that orchestrates provisioning and management of network, storage, and compute components. It exposes an HTTP API that allows system administrators to provision and manage data-centers programmatically. This API has been used to integrate CloudStack’s infrastructure management with configuration management tools (e.g. Ansible, Puppet, Chef, Terrafrom, etc) – allowing DevOps teams to seamlessly provision compute, storage, and network resources and deploy an application onto it.
The CloudStack control plane automates the efficient allocation and balancing of datacenter resources. It subdivides the infrastructure into partitions (i.e. regions, zones, pods, and clusters) that compartmentalize failure modes to increase application resilience. The following goals guide its design and evolution:
- Infrastructure Normalization: CloudStack provides a set of core abstractions that provide a vendor agnostic interface to infrastructure components such as hypervisors, network appliances, container runtimes, and storage devices. These abstractions allow uniform management of infrastructures composed of a diverse set of component types.
- Control and Data Plane Separation: CloudStack strictly separates operations controlling infrastructure components (e.g. create a virtual machine, mount a volume, take a snapshot, etc) from data transfer. This design ensures liveliness of automation control by preventing high latency, large transfers from blocking the delivery and processing of low latency management operations.
- Operational Simplicity: CloudStack shifts user focus from infrastructure maintenance to service capability by emphasizing ease of deployment and upgrade. For example, CloudStack dynamically scales out I/O intensive operations (e.g. network routing and secondary storage transfer) by automatically deploying virtual machines to handle the load.
In 2008, Amazon EC2 inspired cloud.com to create the CloudStack project to private companies with a private, self-service, Infrastructure-as-a-Service (IaaS). As originally conceived, low churn, virtual machine workloads were the primary compute model. Since that time, CloudStack has evolved to provision and manage bare metal servers to deploy hypervisors and big data platforms (e.g. Apache Hadoop, Riak, and Apache Spark). Today, support for high churn, containerized workloads is being added by embedding container management platforms (e.g. Kubernetes, Docker Swarm, Mesos DC/OS). Built on CloudStack’s core abstractions, these additional compute models leverage a battle tested orchestration model communicating over a diverse set of network topologies and storage infrastructures.
In 2011, Citrix acquired cloud.com, and in 2012, donated to the trademarks and source code to the Apache Software Foundation (ASF). This donation transferred stewardship of the project from a vendor to a community providing users with the transparency necessary to drive the project’s direction. Anyone may contribute to the Apache CloudStack community. All contributions are given equal consideration based on their merit rather than the amount of money contributed. Today, 100s of contributors develop new features and assist users via Slack, IRC, and mailing lists with 1000s of messages per month.
CloudStack underpins the service offerings of many large scale public cloud providers, such as Datapipe, BT, and Exoscale. Enterprises, such as the University of Sao Paulo, also rely on CloudStack to increase their operational agility and reduce IT costs.
JAXenter: Can you tell us more about what is under this project’s hood?
Giles Sirett: The CloudStack Management Server (MS) hosts the control plane including workload management, image (template/ISO) indexing, and virtual network management. It orchestrates management of the underlying compute, storage and network devices. Internally, the management server exposes a set of core abstractions and orchestration services which use plugins to interface with devices. Plugins are packaged as Java Archives (JARs) and injected into the Management Server via Spring at runtime. Using this plugin model, CloudStack integrates a broad range of hypervisors, network devices, and storage platforms. Finally, control plane communication is isolated at layer 2 to provide robust and secure multi-tenancy.
As depicted in the diagram, the Management Server allocates resources to the following partition types based on their failure/recovery model:
- Region: One or more datacenters in a geographic area
- Zone: The combination of a power source and Internet uplink within a datacenter. A datacenter may be composed of multiple zones if the multiple power sources and Internet uplinks are available. A region contains one or more zones.
- Pod: One or more racks that share a top-of-rack switch. A zone contains one or more pods.
- Cluster: A grouping of hosts with the same hypervisor, and typically, the same hardware configuration. A cluster is also the unit of divison for primary storage pools. A pod is composed of one or more clusters.
A Host represents a physical server that runs a hypervisor or scale-out/big data platform. Using this partitioning model, applications supporting distributed operation can be deployed in a manner resilient to power and Internet outages, as well as, switch, server, and storage pool failures.
To increase capacity and operational resilience, multiple management servers can be clustered to distribute orchestration and device monitoring work. Because infrastructure operations can take significant wall time to complete (e.g. copying a large template from secondary to primary storage, snapshotting a large volume, etc), the Management Server dispatches these operations as persistent jobs performed asynchronously. If a Management Server fails while one or more long running operations are in-progress, the other cluster members will assume ownership of the jobs initiated by downed Management Server.
JAXenter: Can you give us an example of a typical use-case?
Giles Sirett: The use-cases for Apache CloudStack fall into two categories:
- Public cloud providers & service providers And
- Enterprise users
There are many hundreds of public cloud providers who use CloudStack to deliver IaaS to their customers. These range from massive, global provider’s right to small ISP’s who need to adopt a cloud model. As an open source project with a liberal license, it is difficult to know all users, but large scale public cloud providers such as Datapipe, British Telecom, China Telecom, Interoute, and Exoscale rely on CloudStack to deliver their IaaS service offerings. We have a list of known users (https://cloudstack.apache.org/users.html)
Public cloud providers choose CloudStack because it provides all of the capabilities required to deliver multi-tenant IaaS services, in a cohesive, reliable platform that is easily deployed and managed. It does not require months of engineering time to integrate and deploy because its different components are “pre-integrated”. CloudStack also has a tightly defined scope: orchestrating core infrastructure components. Therefore, the community focuses on making CloudStack a robust, compostable infrastructure control plane to which other systems can easily integrate. It’s possible to completely shut down CloudStack and allow the infrastructure to continue operating without interruption.
Our other category of user we loosely define are “Enterprise users”/“non service-providers”. These are organizations who wish to automate & orchestrate their infrastructure internally. The drivers for this need are different across the organizations. Sometimes it’s to act as a basis for a CI/CD environment. Sometimes it’s to underpin a PaaS platform, sometimes it’s to automate the build of large-scale test environments. There are a number of very large Fortune 500 companies that use CloudStack on this basis.
JAXenter: What is the gap that CloudStack fills? What is its key feature?
Giles Sirett: The unique feature of CloudStack is that it performs all IaaS functions in one, integrated, reliable software platform. Users often use the tag line “It just works” to describe CloudStack. The community takes great pride in this reputation. Orchestrating infrastructure and managing a virtualized networking model is a critical operational function which we manage to perform with high reliability and operator simplicity. It’s no secret that CloudStack is constantly being compared to OpenStack (and the media and analysts always like to see this as some form of race). While they have a similar feature set, they employ different architecture and operational models. Rather than requiring operators to deploy and maintain 15 different pet components, the CloudStack control plane uses one component deployed across multiple nodes for scalability and resilience. – deploying the ERP system into a set of VMs and running Kubernetes in another set of VMs to manage containers for the mobile applications. If the physical co-location of these workloads would present a technical or policy issue, CloudStack’s host tagging feature can be employed to allocate them separately. For developers, multiple workload support is a powerful tool for migrating systems to cloud native models and/or maintaining a diverse portfolio of systems.
JAXenter: Which parts of CloudStack need to be improved?
Giles Sirett: Our broad developer community does a great job of keeping our feature set current; however, we do struggle to gain the attention (that we think we deserve) by the market (or rather marketing). In turn that means that vendors only want to integrate with CloudStack when their customers demand it. We would like to see more networking and storage vendors engage with CloudStack to allow us to integrate with their enhanced functionality.
JAXenter: What does the future hold for CloudStack?
Giles Sirett: As the industry focus moves up the stack from IaaS, the need for an infrastructure orchestrator that “just works” has becoming critical. Emerging models such as containers, PaaS, FaaS cannot operate without provisioning of core compute, network, and storage services. These models also require a control plane capable of handling high churn workloads where resources are rapidly provisioned, consumed, and discarded. In order to evolve to include support for these high churn, high density workloads, the control plane will be further disambiguated and automatically distributed by the Management Server across the available compute resources.
CloudStack will also gain increased visibility into infrastructure performance. Heuristically analyzing this detailed telemetry data will allow the control plane to increase resource density and proactively recognize and isolate failures. We’ve already started to see the first signs of this direction with the recent addition of host power management and upcoming HA service that will use it to reboot hung servers. Our roadmap is also crammed with other items that will start to position CloudStack as the defacto, reliable infrastructure orchestrator such as network service chaining and a pluggable, distributed scheduler capable of managing a wider range of compute types (e.g. containers, batch job execution, NoSQL databases, etc). As these features mature, the scope of the CloudStack control plane will grow into a complete data center management platform.
The community has also heavily invested in improving its development agility and release cadence. The competitive nature of the CloudStack user’s business requires rapid adaptation to a dynamic customer demand. Every two months, the community makes a feature release with an LTS release cut every 6 months and supported for 20 months. The largest engineering challenge currently facing the community is testing the myriad of possible environment components and system configurations employed across the user base. This problem requires tremendous compute power to properly solve. To address it, we are developing a distributed testing system much like SETI@Home that allows anyone with a CloudStack instance to contribute to testing the system.
Thank you very much!