“Transition to microservices while running under full steam is not easy”
SoundCloud’s architecture started as a Ruby-on-Rails monolith, which later had to be broken into microservices to cope with the growing size and complexity of the site. Recently, the company has started to migrate to Kubernetes. We talked to JAX DevOps speakers Fabian Reinartz and Björn Rabenstein about the current Prometheus setup at SoundCloud and what it’s like to monitor a large-scale Kubernetes cluster.
In this interview, Fabian Reinartz, an engineer at CoreOS and one of the Prometheus core developers and Björn Rabenstein,the team lead of Production Engineering at SoundCloud, speakers at next week’s JAX DevOps, are talking about how Prometheus and Kubernetes are a match made in open-source heaven and demonstrating the current Prometheus setup at SoundCloud, monitoring a large-scale Kubernetes cluster.
JAXenter: SoundCloud has broken up its Ruby-on-Rails-based monolith into microservices. What was the reason for that?
Fabian Reinartz and Björn Rabenstein: It is the old story: Starting with a single, monolithic Ruby-on-Rails application is a great idea for startups. You get something going soon, and you can iterate quickly. But in the unlikely case that your startup is a success and suddenly has a lot of users, you quickly hit scalability limits. Also, growing complexity of the monolith slows down your iteration speed. That happened at SoundCloud. Resource usage hit us left and right, and launching a new feature became a more and more painful and arcane process.
JAXenter: How would you describe your experiences? What worked well, what caused problems?
Fabian Reinartz and Björn Rabenstein: Obviously, a transition to microservices while running under full steam is not easy. We decided against a big bang migration, and remnants of the old monolith, affectionately called “the Mothership”, are still around today. If you want to learn more about the details, there are a number of talks out there to watch, and posts on our tech blog. Here are some recommendations:
For the subject of our talk, the most relevant aspects are: (1) How to run the many small applications that are part of a microservice architecture and (2) how to monitor those many small applications in a scalable and meaningful way. For the first part, we created Bazooka, a Heroku-like in-house platform to build, deploy, and manage containerized applications. For (2), we created Prometheus. While Bazooka is now being replaced by Kubernetes, Prometheus became an industry-wide success and is now used by more than a hundred organizations.
JAXenter: Why did you opt for Kubernetes?
Fabian Reinartz and Björn Rabenstein: Instead of maintaining and improving Bazooka, we decided last year to replace it by one of the quickly developing open-source alternatives that have entered the field after we had created Bazooka. We created a gigantic feature matrix to compare our options. Kubernetes only won by a small margin, but it did so as a relatively young project with a lot of anticipated development still ahead of it.
Remnants of the old monolith, affectionately called “the Mothership”, are still around today.
JAXenter: What is Prometheus?
Fabian Reinartz and Björn Rabenstein: Prometheus is an open-source systems monitoring and alerting ecosystem. It was built with modern cloud and container environments in mind and supports multiple types of dynamic service discovery (Kubernetes, Marathon, EC2, etc.). It offers a multi-dimensional data model and a powerful query language. The project’s website has all the details.
JAXenter: What plans do you have with Prometheus? Where do you see the project —from a development perspective— in the next few months?
Fabian Reinartz and Björn Rabenstein: The Prometheus project has more than 200 contributors and is in very active development. Most of the many components have frequent releases with many enhancements and new features. Also, there is a steadily growing number of 3rd party integrations, as part of the Prometheus project itself or externally maintained. Here is the list.
Hot off the press is the new Alertmanager, a complete rewrite of the previous version, which was merely meant as a proof of concept, but already revolutionized the way alerting is handled in many organizations. The various ways of Kubernetes integration are a current hotspot of development, too. We expect convergence towards a stable feature set in both areas over the next months.
The upcoming release 0.18 of the Prometheus server, the central component of the ecosystem, features a lot of internal improvements, which are nevertheless visible to the user as they will spectacularly improve performance in certain use cases and increase storage efficiency by a factor of two to three.
Thank you very much!
Fabian Reinartz and Björn Rabenstein will be delivering one talk at JAX DevOps which will focus on the current Prometheus setup at SoundCloud and monitoring a large-scale Kubernetes cluster.