Elasticsearch 6.0 is here
© Shutterstock / cigdem
Elasticsearch 6.0 is here! The team knows that doing a full cluster restart when upgrading to a new major version is not something users look forward to so they’ve created a new upgrade experience for migrating applications to new versions. Let’s take a closer look at the 6.0 release.
Look who it is, our readers’ favorite distributed, RESTful search and analytics engine. We’re not just saying that though. According to the results of our yearly survey, 59 percent of the respondents were very excited about Elasticsearch, thus making it the runner-up. The winner, PostgreSQL, managed to get 63 percent of the respondents excited about the prospect of using it this year.
Find out more about the results here.
Elasticsearch 6.0 is here
A couple of months ago we talked with Shay Banon, founder and CEO of Elastic and the creator of Elasticsearch about what’s coming later this year — and implicitly about Elasticsearch 6.0.
[Elasticsearch 6.0] has many new features across the entire Elastic Stack. We’ve created an entire new upgrade experience for migrating applications to new versions, worked hard in Lucene 7 to make searches even faster and more efficient, created a new Kibana query language called Kuery, developed many new alerting and security features in X-Pack, and will be delivering a user interface to manage Logstash pipelines.
Read the entire interview here.
They’ve kept their promise — making upgrades is easier than ever. “You can now do a rolling upgrade (without any cluster downtime) from the latest Elasticsearch 5.x (currently 5.6.3) to Elasticsearch 6.x,” according to the blog post announcing the 6.0 release.
However, there are a few exceptions. For example, if you use X-Pack Security without SSL/TLS enabled. TLS between nodes is required in X-Pack Security in 6.0 — if you aren’t already using it, the only way to enable it is to do a full cluster restart. You should read the Stack upgrade docs before you begin the upgrade process.
One of the stars of the Elasticsearch 6.0 release is sequence IDs, which allows for operations-based shard recovery. In short, each shard is able to replay just the operations missing from that shard making the recovery process much more efficient. Previously, the process was long and costly because if a node disconnected from the cluster due to a network problem or a node restart, each shard on the node would have to be resynced by comparing segment files with the primary shard and copying over any segments that were different.
Sparsely populated fields get a major improvement
“Doc-values provide a fast columnar data store — it’s part of the magic that makes aggregations so fast in Elasticsearch.” Right now, you only pay for what you use. Even though the amount of space dense fields use has remained unchanged, sparse fields will be significantly smaller. The result is that disk space usage and merge times are reduced and query throughput is improved as the file system cache can be better utilized.
“Imagine that you have a large search-heavy index. Searches should be super-fast, but a significant part of every search request is sorting the results into the correct order in order to return just the top 10 best hits.” The biggest benefit of index sorting is that you can pay the price of sorting at index time (30-40% of throughput) and not at search time. The result is that a search can terminate once it has gathered sufficient hits.
However, if you want to use index sorting, you should know that your documents must be sorted at index time in the same order as will be used for your primary sort criterion at search time so it won’t work well where your primary sort is on the relevance
_score. If you want to use it for searches with aggregations, forget about it — aggregations have to examine all documents regardless and can’t terminate early.
The non-obvious benefit of index sorting is that sorting on low-cardinality fields such as
is_published (commonly used as filters) can result in more efficient searches since all potential matching documents are grouped together.
Searches are more scalable
Searches across many shards have been made more scalable by adding:
- A fast pre-check phase which can immediately exclude any shards that can’t possibly match the query.
- Batched reduction of results to reduce memory usage on the coordinating node.
- Limits to the number of shards which are searched in parallel, so that a single query cannot dominate the cluster.