Myths and realities behind security analytics
Security software vendors tend to exaggerate the potential of their tools – to put it mildy. Using real-world attack examples, Niara’s Karthik Krishnan explains why so many enterprise systems are being breached and what cyber security technology is really capable of.
While attending a recent trade show, it became clear to me that much of the misunderstanding people have about what security analytics is (and is not) capable of is perpetuated by security vendors.
At the show, attendees were flooded with signage proclaiming fantastical feats of analytics and machine learning – promises of what each could do to help relieve customers of their security ills:
- User behavior analytics for 99.99 percent false positive reduction.
- User behavior analytics on authentication logs such as Active Directory (AD) and virtual private network (VPN) alone will automatically detect compromised users.
- Automatic, real-time threat detection using machine learning.
- SIEMs promised to aggregate disparate data sources and perform analytics on it and failed miserably. Therefore, the problem must be in looking at multiple data sources. We will focus on just network traffic, apply machine learning to it and detect breaches in real-time.
These claims are laughably incorrect, and even quite dangerous in selling a false bill of goods. Such proclamations lead to unrealistic expectations of what’s actually being delivered in a solution, setting up customers to fail in their security analytics initiatives. To understand why, let’s first look at the world of analytics using a recent real-world attack. The table below walks through the attack, outlining several stages the attacker went through, the purpose of the stage and the analysis and data sources needed to gain insight into the specific attack stage.
Unfortunately for organizations, the multi-stage attack detailed below is increasingly the rule rather than the exception and multi-stage attacks are becoming the norm. To explain why so many breaches are happening, in spite of organizations heavily investing in cyber security technology, I’ve outlined four top line challenges with discovering threats.
Most signals are weak, and generating alerts as a result of these weak signals creates a deluge of alerts that Security Operations Center (SOC) and analysts cannot keep up with.
Real-time detection of multi-stage attacks using machine learning suggests that one can rely on machine learning to identify an ongoing breach by flagging variances for any single stage during this kill chain. However, if alerts are generated for every individual event, you could have thousands of alerts each day, many of which will be false positives.
This problem is called “underfitting”, which occurs when a statistical model or machine learning algorithm cannot capture the underlying trend of the data. It is often a result of an overly simple model, showing high bias and low variance. Your SOC or analysts would be chasing after ghosts through the deluge of alerts.
The alternative to the above is to build a complicated model that takes into account multiple stages by using machine learning to detect variances along the kill chain. This route only creates an alert if there is a combination of command and control activity followed by internal reconnaissance.
A more complicated model, it often lends itself to “overfitting” – meaning the model or algorithm fits the data too well. Overfitting often occurs when models are excessively complicated, showing low bias and high variance.
Variable data sources
To get a handle on what might have happened in the previous attack, as laid out in the table, would have required the victim’s behavior to be analyzed across four data sources – packets, flows, AD logs and Domain Name System (DNS) logs. To say that meaningful investigations can happen without the proper range of data sources needed for real visibility is simply not true.
Given these challenges, is an effective solution feasible for the discovery of advanced threats? The simple answer is yes. Analytics can play a pivotal role in shedding light on threats and attacks, and in mitigating the above pain points. But organizations must stop trying to automatically discover these attacks in real-time. Doing so means raising alerts for variances seen during every stage of the kill chain, thus compounding the alert white noise problem for analysts, rather than mitigating it.
Gaining insight requires complex tracking and analysis of multiple weak signals applied to diverse data sources and attributed to a specific user over weeks and months, just to comprehend what might have happened and who the compromised user might have been.
Security analytics is not just about discovering the most insidious threats, but also about using machine learning to assist analysts – whether that be in support of their incident investigations, alert prioritization or compromised user discovery needs.