Machine learning at Elasticsearch: In quest of data anomalies
Machine learning is showing up in all sorts of places in tech. These days, it can even be found in speeding up search engines. We talked with Shay Banon, Founder & CEO of Elastic, creator of Elasticsearch, about machine learning and its impact on the field of search engines.
JAXenter: As CEO of Elastic and creator of Elasticsearch, you were there from its very beginning. What were the major milestones of Elasticsearch since its initial release in 2010?
Shay Banon: When I first released Elasticsearch, I had one goal. Make it simple for a developer to get started, download, and install Elasticsearch on their laptop, load data into it, and get really fast results in milliseconds or less. Today we have more than 130 million downloads of our software and our community has grown to more than 100,000 developers across 100 countries.
While there are lots of individual milestones for Elasticsearch, I’ll highlight a few company milestones that make us who we are today. Early on in 2013, Kibana and Logstash joined forces with Elasticsearch to create the de facto open source logging solution. Like how I built search based on frustration with existing tools. The creators of these projects — Rashid Khan and Jordan Sissel — also were frustrated with off the shelf products, and created these products to help them do their jobs as system and network administrators.
A few years later in 2015, we changed the company name from Elasticsearch to Elastic as we were a multi-product and use case company beyond just search. In the same year, we acquired a Norwegian SaaS company (now Elastic Cloud) so that we can offer users a way to deploy our products in the cloud, and an open source project called Packetbeat, based in Berlin decided to join us.
While many methods for machine learning exist, our approach is different. We do not provide a generic machine learning framework for developers to use.
Last year, as our products extended to Beats, we re-named the popular “ELK” to the Elastic Stack and introduced X-Pack, a single installation for all of our commercial features. Earlier this year, we formed a new partnership with Google to provide Elastic Cloud on GCP, launched Elastic Cloud Enterprise (ECE) for enterprises to deploy and manage multiple Elastic Stack environments on-premise or in a private cloud, and we just acquired a SaaS APM company based in Copenhagen.
JAXenter: Elastic is used by millions of people and a wide array of companies. What —in addition to the fact that it is open source— is it that makes Elasticsearch so popular?
Shay Banon: Elasticsearch was created to put the power of data exploration in the hands of users. There are a lot of things about it that make it popular with developers: it’s easy to get started and one can download it on a laptop; it works great for both structured and unstructured data; Elasticsearch horizontally scales; ingesting data into Elasticsearch is easy with 200+ connectors; Kibana visualizations are intuitive, powerful and provide real-time exploration; and everything works on-premise or in the cloud.
JAXenter: You recently added the first machine learning functions into the Elastic Stack. How does the Elastic Stack benefit from this advancing technology? How does it work?
Shay Banon: Machine learning is a natural extension of the powerful search and analytics capabilities in Elasticsearch. As our users continue to store more and more data in Elasticsearch, machine learning will help them automatically detect and spot anomalies in their data without having to use third party data science tools.
While many methods for machine learning exist, our approach is different. We do not provide a generic machine learning framework for developers to use. Instead, we’ve made machine learning something that is focused on delivering value for a critical use case: time series anomaly detection. For users with time series data in Elasticsearch, a simple installation of X-Pack will allow users to begin working with machine learning. Using a Kibana interface, users can setup and configure machine learning jobs, instantly view results, spot anomalies and drill into probable causes, and use our alerting features to take real-time action.
As we’ve made machine learning a native feature of the Elastic Stack, in the future, we can apply machine learning to other use cases like application search and APM.
JAXenter: What impact does the integration of this new technology have on the users? What benefits are there to be expected?
Shay Banon: None. As long as users are using the 5.5 release, they can use machine learning and install it with X-Pack in just a few steps. The benefits are boundless. Users can keep ingesting and storing more and more data in Elasticsearch and use machine learning to detect signals and anomalies, and automate alerts. This solves some of the most pressing IT operations and security analytics use cases at scale.
JAXenter: Are there also disadvantages to this new development?
Shay Banon: Today this only works with time series data, such as, log files, application and performance metrics, network flows, and financial or transactional data, which is a lot.
JAXenter: How will machine learning influence search engines in the near and distant future?
Shay Banon: Machine learning is already used in major search engines like Google. It is used in ranking, query classification, document understanding, and user classification. Our customers like BlaBlaCar, Expedia, Groupon, Uber, and Yelp use machine learning on top of the data stored in Elasticsearch to drive personalization, offers and monetization strategies, and to improve give their users the best online or mobile experience. In the future, like we’ve done with time series anomaly detection, we can give customers a way to expose machine learning with other use cases.
JAXenter: Like you said above — earlier this year, you announced your collaboration with the Google Cloud Platform. When should one use GCP and when AWS?
Shay Banon: We believe in giving developers the choice. They should be able to build and run their applications in whatever cloud they want. There are some technical and pricing benefits with each. For us, Elastic Cloud is the same product whether one wants to use GCP or AWS.
It is important to note that Elastic Cloud is not the same product as AWS Elasticsearch Service. We do not support that product nor do we have a partnership with AWS like we do with Google.
We believe in giving developers the choice. They should be able to build and run their applications in whatever cloud they want.
JAXenter: Will there be a collaboration with Microsoft Azure and IBM Bluemix, too?
Shay Banon: We already work with both Microsoft and IBM in a big way. Elasticsearch is the search technology within Microsoft Azure and users can spin up clusters of the Elastic Stack on Azure. With IBM, the Elastic Stack is the logging and monitoring solution for Bluemix and is used within IBM Watson. At an open source level, IBM developers are the key contributors to our Kibana globalization efforts. Like we’ve just done with Google, in the future, we can offer Elastic Cloud on Azure or Bluemix/Softlayer.
JAXenter.com: What does Elastic have in store for the rest of 2017 and next year?
Shay Banon: Earlier this summer, we acquired Opbeat, an application performance management (APM) company, based in Copenhagen. We’re super excited they decided to join our team as they’ve built a wonderful SaaS APM solution for developers to instrument their applications and monitor their code. Like with machine learning, APM is yet another extension of the Elastic Stack. This will provide our users with the ability to have an end-to-end solution for search, logging, metrics, and application monitoring. APM will be part of our open source.
Our major 6.0 release is also coming soon this Fall. This has many new features across the entire Elastic Stack. We’ve created an entire new upgrade experience for migrating applications to new versions, worked hard in Lucene 7 to make searches even faster and more efficient, created a new Kibana query language called Kuery, developed many new alerting and security features in X-Pack, and will be delivering a user interface to manage Logstash pipelines.