Getting started

AI (in a box) for IT Ops – The AIOps 101 you’ve been looking for

Dominic Wellington
© Shutterstock / Vector Frankenstein

AIOps holds out the promise of delivering to busy CIOs and sysadmins the sort of AI support they need to make sense of the ever-growing complexity of their IT environments. But how to get started with AI without getting lost?

For some time now it has become almost a cliché for presentations to open with eye-popping statistics about just how Big all of our Big Data have become – and on the surface, the keynote by Moogsoft’s CEO Phil Tee at the recent AIOps Symposium event was no exception. However, the presentation quickly moved on to more sobering insights about how difficult it is to make practical use of the torrents of information that have become commonplace in our modern world of data overload.

It is all too easy for discussions of AI, in general, to remain stuck at the level of generalities, but the point of the presentation was to dive into one particular application of the techniques that we group under the umbrella of “AI”. As the complexity of IT infrastructures increases, it becomes ever harder to understand them and to ensure their availability and performance. Compounding the problem, these IT systems are ever more closely interwoven with business processes. 20% were fully digital in 2010, but in 2018 already somewhere between 50% and 70% are. In a very real sense, IT performance is business performance.

AIOps – Artificial Intelligence for IT Operations – bridges these two concerns, bringing AI techniques to bear in order to understand the colossal volumes of information produced by complex modern IT infrastructures, helping to avoid disruption to the people who rely on IT for more and more of their lives.

What Is AIOps?

If you have not yet heard the term “AIOps”, it’s because it is still relatively recent, coined by Gartner only two years ago. Phil Tee was followed onto the stage by Colin Fletcher, the writer of the original Gartner report in which the field was named: “Innovation Insight for Algorithmic IT Operations Platforms”. The Gartner analyst further emphasized the scale of the challenge for IT Operations organizations. AI is not just a useful technique, but a must-have; according to Gartner, by 2022 a full 30% of IT organizations that fail to adopt AI will no longer be operationally viable.

SEE ALSO: HOT or NOT: Top 5 DevOps stories so far

Both Phil Tee and Colin Fletcher agreed that AIOps is the key to addressing data complexity and bridging the gap between the old, reactive model of IT Operations and the proactive approach which is required in our new, ever more complex world. However, few organizations have already adopted these techniques. According to a recent Gartner survey of CIOs whose results were shared at the AIOps Symposium, only 4% have already invested in and deployed AI and machine learning, although a further cumulative 46% are planning to do so.

If Gartner’s timeline is accurate, many more CIOs will need to engage actively with this field in the next few years. The challenges that they are wrestling with can generally be grouped under the headings of Culture, People, and Process, all of which are hangovers from previous generations of IT infrastructure and related operations techniques. In this type of environment, it can be difficult to know where to start with a field such as AI that is still evolving very rapidly.

Making It Easy To Get Started With AIOps

This is where pre-packaged AIOps solutions such as Moogsoft’s come in. Instead of being required to develop or acquire their own in-house data science skills, organizations can rely on vendors to conduct the basic research which is required, and then integrate the resulting products into their own strategies. As proof of this approach, Phil Tee cited Moogsoft’s 42 patents (granted and pending).

The AIOps Symposium was a showcase for some of the latest fruits of the research by Moogsoft’s in-house team of data scientists. A product demonstration kicked off with a dashboard showing problems in red – so far, so routine. As the scenario continued, though, various forms of AI assistance started to appear. Probable root causes of the problem and possible solutions were identified, based on input from operators in past situations.

Unsupervised machine learning was also present, identifying relationships between impacted nodes and their relative importance in the analysis. A graphical display allowed the operator to dive deeper into the data and relationships in real time. In particular, this demonstration highlighted Moogsoft’s new Observe product, which analyses time-series data in real time directly at the source. The particular demo scenario was cloud-based to highlight Observe’s ability to operate across all environments, detecting a correlation between metrics from an AWS cloud. This approach avoids the problem of data lakes, which Phil Tee memorably described as a sort of Ponzi scheme: “the more data you store, the more you need to extract any value”.

Acting On The AI’s Recommendations

Of course, identifying the problem is only half the job – perhaps less, at least in terms of elapsed time. There can be enormous amounts of friction involved in gathering together all the specialists whose domains have been impacted by a particular issue and then agreeing on actions to be taken. This is where ChatOps comes in – a technique also highlighted by guest presenters from Qualcomm as an easily-accessible route to include automation as a seamless part of the resolution process.

SEE ALSO: New technology rises: AIOps aims to facilitate, unify and modernize existing Ops processes

The demonstration scenario concluded by flagging the causal event of the incident so that it can be highlighted automatically should a similar situation arise again in the future.

The event then shifted gears from the high-concept opening presentations and forward-looking demonstration to concrete results from users such as Ujet, Ericsson, Change Healthcare, and the previously-mentioned Qualcomm, as well as partners such as Cisco and Amazon. Quickly, a few common themes emerged, in particular, the critical need for automation, both to process the huge volumes of data hitting IT teams, and to act upon any incidents detected. Intelligence is becoming more distributed in the network, and the organization needs to follow suit, enabling operators to enact the right decisions quickly.

There is a lot of confusion around exactly what AI is and what it is good for, so it is refreshing to see some concrete applications in an area like IT Ops, which is perfectly suited to the sort of data-intensive approach which AI and machine learning enable. Extending coverage from anomalies to metrics, and from the core to the edge, makes Moogsoft’s pre-packaged AIOps solution even more useful and accessible to enterprises wishing to continue to operate effectively in the brave new fast-changing decentralized world.



Dominic Wellington

Dominic Wellington is Director of Strategic Architecture for Europe at Moogsoft, helping companies adopt AIOps to streamline their IT Operations and become more agile and responsive to ever-changing demands. He has been involved in IT operations for a number of years, working in fields as diverse as SecOps, cloud computing, and data center automation. Twitter: @dwellington, LinkedIn:

Inline Feedbacks
View all comments