Sing louder and improve your search analytics

How to supercharge Elasticsearch with Siren

Giovanni Tummarello
©  Shutterstock / 13_Phunkod

Search has evolved from information retrieval into big data analytics. In this article, Giovanni Tummarello explains how you can improve your search analytics with Siren for data visualization, joins across backends, and more!

Today, Elasticsearch is by far one of the most widely deployed open-source search and analytics engine. Its core support company, Elastic Co., is now stock listed and valued in excess of $6.5B. To understand its success, one has to understand how information retrieval has proven over time to be naturally great at analytics and “giving the big picture”. This complements its original mission of finding individual documents.

As search turned into real time, interactive analytics, the race was on for Elasticsearch to become an engine where all sorts of structured and unstructured information could not only be stored and searched, but also instantly analyzed as a whole. Now, database records, transactions, machine logs recording website visits and low-level operations can be a source of analysis, along with any source of text!

If I was to have an educated guess, only 15% of companies today use Elasticsearch for pure search. The other 85% of users focus on the big analytics and retrieval capabilities on machine generated, structured or semi-structured logs. There is no end to this need from any sector: every time a machine, application, firewall, or web server produces a log, anybody who wants to analyze the log requires a combination of analytics, which Elasticsearch can provide.

Typical analytics in Elasticsearch 

Using Elasticsearch, you can conduct aggregates using different dimensions such as time, properties, or matching a certain request. Then, you are able to plot those aggregates into graphs, which could be in the form of pie charts, histograms or time series.

For example, in a telephony data center, you may have peak requests at 3PM. If you’re looking to understand how these are broken down, you can check by different customers or infrastructure segments. This type of analysis – with the underlying ability to “scale” by adding servers – is what makes Elasticsearch a popular choice.

The evolution of Elasticsearch

Elasticsearch is currently released as an open source tool under the Apache 2.0 License. However, it is now evolving and becoming more commercial. While you can still use the free version, the commercial product is strongly controlled by Elastic. If you want to improve your security with a password in front of your search because of GDPR considerations, you need to upgrade to the commercial offering. Alerts, cross-cluster replication monitoring, and prediction are all also features that are only available in the commercial version.

SEE ALSO: AWS launches Open Distro for Elasticsearch, an Apache 2.0-licensed distribution of Elasticsearch

OK, but what can’t it do?

So how can I improve Elasticsearch? For one, Elasticsearch cannot perform joins across indexes. Sure, you can look up an individual value across two or more indexes and see if it appears in both. That’s a simple join. Unfortunately, you cannot ask “of all the callers on March 15, how many also called on March 16 ? Was the call shorter on average?” Without joins, it becomes harder to find the relationships that could provide valuable data. The use case for joining the dots at scale are endless.

Unfortunately, Elasticsearch tooling is not geared towards producing a visual knowledge graph representation – a relational graph showing how records are connected across indexes. Visualizing complex scenarios or events where related data is across indexes is currently not in its use cases.

When is this important? Well, take Security Information Event Management (SIEM) applications for example. While Elasticsearch would seem ideal to store the tons of real time logs that network appliances and security products create, as it stands, it falls short of the critical ability to investigate across logs and across different backend systems.

Siren supercharges Elasticsearch

At Siren, we have a lifelong passion for search engines, starting over ten years ago with the first Siren (Semantic Information Retrieval) engine. Back then, it was an extension of the Solr Apache search engine. Since Elasticsearch came on the scene, we’ve been super excited about the innovation it has brought and the potential for its extension.

Out of this work comes the Siren Platform, which extends the core ELK (Elasticsearch, Logstash, and Kibana) capabilities with the ability to join the dots across indexes and different backends. Siren allows to join with data that isn’t in Elasticsearch!

A semantic data model on top of Elasticsearch

Conceptually, Siren works by first allowing you to define a semantic data model or ontology, which specifically fits your data and use case. Concepts are defined and can be mapped to indexes or to keys in indexes (e.g. an IP address, a social security number, a user ID etc).

This is an example of a relational data model defined to meaningfully tie together indexes that were previously disconnected:


On the left, we have the data indexes: these are Elasticsearch indexes or indexes that Siren can access on other systems. In this case, we have streams coming from different security systems, which are then interconnected by the data model by specifying which fields connect to which concepts.

This data model powers Siren Investigate, a UI which goes beyond standard drilldowns to allow people to relationally connect the data from one dashboard to another via navigation or visual link analysis. This is a critical function in any deep data investigation, ranging from cyber to operational and business questions.

SEE ALSO: Toshi: A full text search engine modeled after Elasticsearch

In the figure below, the Elasticsearch cluster – as well as data in other sources – is queried by the Siren unified data model to provide relational pivoting and visual link analysis to solve a possible case of online review fraud:


At the frontend, thanks to the data model and join capabilities, Siren effectively merges search, analytics, relationally-connected dashboards, and link analysis to provide a game-changing experience for operational, data investigative use cases.



Deployment and enterprise security with Siren

At the deployment level, supercharging Elasticsearch is simply done by installing the Siren Federate plugin into new or existing Elasticsearch clusters. With Federate, Elasticsearch benefits from low-level, high performance, cross-index join (correlation) capabilities without compromising the performance of Elasticsearch’s usual operations. It is also the engine that enables the platform to have virtual Elasticsearch indexes that in practice reflects live other backends like DBs or infrastructure which responds to SQL.

For those that do not have security installed, Siren includes enterprise-grade access control and alerting that can make use of cross-index/cross-backend join conditions.

SEE ALSO: Elasticsearch 6.5.0 brings support for JDK 11 and several new features

In conclusion

Elasticsearch is the most modern, open source incarnation of search engine technology. Today, it provides very exciting capabilities for a number of big data use cases. But is your data complexly interconnected? Are data relationships absolutely critical for your use cases?

If so, it’s now possible to supercharge Elasticsearch with Siren. The result is something extremely innovative: the confluence of advanced link analysis, BI, big data monitoring,  operational intelligence monitoring, data discovery, and search all together under one roof.


Giovanni Tummarello

Giovanni Tummarello, Ph.D is founder and Chief Product Officer at Siren. Siren is born out of the academic work and team led by Tummarello in his previous capacity of research lead at the National University of Ireland, Galway. In this role he authored about 100 scholarly works on knowledge graphs, semantic technologies and information retrieval as well as spinning out technologies (e.g. the top level “any23” Apache project) and start-up companies which have raised funds and are commercially successful.

Inline Feedbacks
View all comments