Apache Mesos 1.5: Improved performance and Container Storage Interface support
Apache Mesos 1.5 is here! What new features does this open source cluster management tool have to offer? Significant improvements to resource management, increased storage, and Container Storage Interface support, for starters.
It’s been a while since we last heard from Apache Mesos. But they’ve taken that time to make some real improvements to this open source management tool. Apache Mesos 1.5 offers a number of impressive improvements for users including improved management capabilities, storage, increased performance, and more.
Container Storage Interface
One of the more exciting parts about Mesos is the improved storage situation which was developed as a part of the Container Storage Interface. While it is still experimental in 1.5, it is expected to be fully operational in future releases.
Along with support from the community, representatives from Mesos, Kubernetes, Cloud Foundry, and Docker have developed a plug-in that will work with all container orchestration platforms.. Instead of building vendor-specific code, they developed an interface that will remain consistent across storage vendors.
As explained by long-time Mesos commiter Jie Yu, the Container Storage Interface is a “system that allows storage vendors to plug into Mesos using a consistent interface. Then we don’t need to maintain code for each vendor, which requires specific expertise.”
Additionally, projects like Apache Mesos have a number of vendors, increasing the difficult of managing various release cycles. Instead, they have gone with an elegant “out-of-tree” solution for storage.
Here’s a look at what the high-level architecture of the CSI in Mesos would look like.
Containers continue to be an important part of Apache Mesos 1.5. Now, Mesos 1.5 has support for the Container Image Garbage Collection. It is possible to garbage collect unused image layers both automatically and manually, making it easier to help users avoid unbounded disk space usage in the docker image store.
Mesos 1.5 also boasts a set of new operator APIs for launching and managing a new primitive called “Standalone Container”. This is similar to a container previously launched by a framework on the Mesos Agent. However, this new Standalone Container is launched directly on the Mesos agent by the operator, cutting out the framework middleman. The Standalone Containers do not use a Mesos Executor and are limited in some cases, but can still utilize common containerization features.
While Standalone Containers have been released in support of the Container Storage Interface, the APIs are not restricted and can be used on their own.
Other improvements in Apache Mesos 1.5
Other notable improvements in Apache Mesos 1.5 include a fix to a fairly irritating operator problem. All previous releases of Mesos made it impossible to change the configuration of an agent. If you wanted to do so, you had to kill all the tasks running on that agent and restart with a brand new agent ID. The smallest of changes would lead to a “nasty looking error”, even it if was something as minimal as an additional attribute or newly connected hard drive.
Thankfully, this has been mitigated in Mesos 1.5. Now, operators can use the new agent command-line flag
--reconfiguration_policy to configure which types of operations are allowed on agents and which should lead to errors. Agents are now capable of tolerating additions or changes without losing the plot.
A big focus in the Mesos 1.5 release is performance improvement. The master failover time-to-complete has improved by 450-600% in throughput, leasing to a reduction of the time-to-completion by 80-85%. Because Mesos is architected to use a centralized master with standby masters that participate in a quorum for high availability, the leading master stores the state of the cluster in-memory.
Rebuilding the master’s in-memory state can be expensive for large clusters. So, a number of high impact changes including protobuf 3.5.0 support, copy elimination in the master, and copy elimination in libprocess have led to this dramatically improved performance. See the working group progress report for more details.
As for resource management, several quota guarantee improvements were made in release 1.5. Now, Mesos 1.5 does a better job in ensuring that a role receives its quota and that a role does not exceed its quote. For example, accounting for reservations prevents roles from “gaming” the quota system. Resources are also now allocated in a fine-grained manner to prevent roles from exceeding their quota.
And finally, Windows support has been improved significantly, as are the CPU and memory statistics. The Mesos fetcher has been ported to Windows and libprocess can now be built on Windows with OpenSSL support.
Getting Apache Mesos 1.5
The full changelog is available here. Upgrading from a Mesos 1.4 cluster is straightforward. There is a detailed upgrade guide available here. If you’re interested in getting involved, Apache Mesos relies on community support and welcomes all help.