#AboutLastWeek: New releases everywhere! Plus a machine learning gift
Each Monday we take a step back and analyze what has happened in the previous week. Last week we witnessed the launch of Prometheus 1.0, Mesos 1.0 and Spark 2.0, we discovered why Go is a beloved programming language and we dived deep into machine learning.
Prometheus 1.0: “This success was in no way obvious at the time the project began”
Prometheus 1.0 was launched recently —it delivers a stable API and user interface. In short, “Prometheus 1.0 means upgrades won’t break programs built atop the Prometheus API, and updates won’t require storage re-initialization or deployment changes.” We asked Björn Rabenstein, engineer at SoundCloud and Prometheus core developer, to talk about the features and benefits of Prometheus 1.0 and reveal what’s next for this open-source systems monitoring and alerting toolkit originally built at SoundCloud.
Prometheus has really paid off for SoundCloud, both in terms of what Prometheus has enabled (running a very complex site reliably) and what Prometheus has saved us (less operational effort to set up and run monitoring and to detect and investigate outages, less money paid to external monitoring providers), not to mention the more vague gains like tech credibility. But this success was in no way obvious at the time the project began. And the investment was huge compared to the size and available resources of the company. I only joined a year after the initial decision to invest in Prometheus, and I like to joke that I would have rejected the project if I had been in charge back then. Sometimes you just have to be bold, which is obviously easy to say in hindsight, when you already know you did the right thing.
Rabenstein also revealed that one thing Prometheus and Kubernetes have in common is “the idea of labels. Everything is labeled in Kubernetes, and selections can happen along arbitrary label dimensions. The same is true for Prometheus, where time series are labeled (and everything in Prometheus is a time series or acts on time series, including alerts). The labels from Kubernetes easily translate into Prometheus labels. If needed, they propagate through the full stack. The page you receive on your mobile phone may very well feature a label you have assigned to a container at start-up time.”
Read here the entire interview.
Apache Mesos 1.0 — a bag full of surprises
Vinod Kone, engineer at Mesosphere, wrote in a blog post announcing the release of Mesos 1.0 that this milestone could have been reached “a long time ago,” but they chose to wait and develop new HTTP APIs to make the lives of both framework developers and cluster operators easier. Before the release of Mesos 1.0, there was “the driver-based Framework API used by schedulers and executors and the REST-based HTTP API used by operators and tools.” The problem was that just a few languages had bindings for the driver, which limited the languages that frameworks could be written in. Plus, both the client and the server had to open connections to one another in order to communicate, which meant that it wasn’t the easiest job for clients to run inside containers or behind firewalls.
One of the most exciting parts about the HTTP API is experimental support for event streams, Kone claims. “Instead of continuously polling the heavy weight /state endpoint, clients (e.g, service discovery systems) can now get events streamed to them directly by the master.”
Spark 2.0 focuses on speed, simplicity and more
Spark 2.0 can be summed up in three words: “easier, faster, and smarter”. This new release focuses on standard SQL support and unifying DataFrame/Dataset API, but that’s not all. The Spark Survey 2015 revealed that Spark users initially chose it because of its ease-of-use and performance. Spark 2.0 is taking this one step further and it’s focusing on three themes: “easier, faster, and smarter.”
It also puts an emphasis on standard SQL support and unifying DataFrame/Dataset API; on the SQL side, Spark’s SQL support has been expanded thanks to a new ANSI SQL parser and subqueries. Since SQL has been one of the main interfaces to Spark, these boosted capabilities massively reduce the effort of porting legacy applications. On the programmatic APIs side, the team behind Apache Spark has streamlined Spark’s APIs: unifying DataFrames and Datasets in Scala/Java, SparkSession, simpler, more performant Accumulator API, DataFrame-based Machine Learning API emerges as the primary ML API and more. Find all the details here.
“Go isn’t a silver bullet and it might not be a good fit for you or your project”
Go adoption has not always been strong, but that changed after its inclusion in high-profile projects, including Docker. Go has been used by The New York Times and BBC Worldwide, but also by Booking.com, Dropbox, SoundCloud and more and the trend continues. If the pace continues, Go adoption could become the next Java in enterprise, according to a blog post by Shiju Varghese, a Solutions Architect and published author. We talked to Matt Aimonetti, the co-founder and CTO of Splice, about the benefits of Go, its community and what can be done to improve this programming language.
Go is a new hot language and its adoption is skyrocketing. However, I started noticing that some considered the language to be an advanced language and therefore not suitable for novice programmers. New programmers are told that they should first learn an “easier” language and then move to Go. In my post, I argued that new developers might actually want to start with Go. I also wanted to make sure developers outside of the community don’t perceive the Go community as an elitist community only reserved to top computer scientists and engineers in big companies. Finally, I’m encouraging the community to create more documentation, posts and books for new programmers.
Read here the entire interview.
The ABCs of machine learning
Machine learning may sound futuristic, but its not. Speech recognition systems such as Cortana or Search in e-commerce systems have already showed us the benefits and challenges that go hand in hand with these systems. In our machine learning series we will introduce you to several tools that make all this possible. First stops: CognitiveJ and MLlib, Apache Spark’s scalable machine learning library.
CognitiveJ —What’s under the hood
CognitiveJ is a Java library which gives developers access to a rich collection of powerful image processing features such as facial detection, gender and age identification and person recognition.
CognitiveJ has been written entirely in Java and at the heart uses Microsoft’s Project Oxford services, which are constantly being evolved by some of the leading and smartest researchers in the field. The use cases for CognitiveJ are vast and varied; from photo grouping, recognizing what people are within images to retrieving emotions from facial images. Personally, I’m working on a Spring Security extension that uses CognitiveJ to provide multi-factor authentication from an Android app which uses the on-device camera to capture an image of the person holding the phone and validate that the person is who they say that they are – this would not be possible without CogntiveJ.
What is the idea behind Apache MLlib?
According to Xiangrui Meng, Apache Spark PMC member and software engineer at Databricks, MLlib’s mission is to make practical machine learning easy and scalable. We want to make it easy for data scientists and machine learning engineers to build real-world machine learning (ML) pipelines. It includes not only fitting models but also stages such as data collection and labelling, feature extraction and transformation, model tuning and evaluation, model deployment, etc. This becomes a very hard problem when people try to solve each stage using different libraries and then chain them together in production (thinking of different languages, different tuning tips, different data formats, different resource requirements, etc).
MLlib, combined with other components of Apache Spark, provides a unified solution under the same framework. For example, one can use Spark SQL to generate training data from different sources and then pass it directly to MLlib for feature engineering and model tuning, instead of using Hive/Pig for the first half and then downloading the data to a single machine to train models in R. The latter is actually very common in practice but painful to maintain. Spark MLlib makes life easier for data scientists and machine learning engineers so that they can focus on building better ML models and applications.
Meng also revealed that they use Scala to implement core algorithms and utilities in MLlib and expose them in Scala as well as Java, Python, and R. He believes that the future of MLlib lies in the community and that it will “expand quickly and gain more and more contributors and users in the future.”