Building Your Own AIOps Platform is a Bad Idea
Why does DIY AIOps fail and what is the root cause? In many cases, all the time and effort put into a do-it-yourself project simply winds up being wasted. This article looks at how to safely encourage AIOps exploration and measure ROI from AIOps without the risk of failure.
In the wake of the economic downturn brought on by the COVID-19 pandemic, investments in digital business transformation have accelerated. The applications that drive those processes are not only highly distributed, they also operate at a level of scale that no IT team can manage using legacy approaches to managing IT. It can take weeks to discover the root cause of any issue.
Enter AIOps. Machine learning algorithms make it possible to not only reduce the time it takes to resolve any issue they also enable IT teams to continuously optimize IT environments at any scale. Many aspects of AIOps, however, are still largely unexplored. Rather than opting for proven platforms, some IT teams are building custom solutions in-house.
I have experienced firsthand several large enterprises which have embarked on this risky journey, including one Fortune 500 company that asked a partner to help develop a solution and ultimately deployed a commercial product. Enterprise IT teams go down this road to solve a specific tactical problem, such as alert noise reduction. In many cases, all the time and effort put into a do-it-yourself (DIY) project simply winds up being wasted.
Why DIY AIOps Usually Fails
Given the prevalence of open-source AI tools and frameworks such as TensorFlow, Theano or the Microsoft Cognitive Toolkit (CNTK), it can be tempting to build your own custom AIOps platform. It takes considerable expertise, however, to not only build an AIOps platform but also integrate it into an enterprise and maintain it. Here are the leading reasons why in-house developed AI projects are risky:
- You’ll need a properly-constructed data lake: AIOps platforms require access to data residing in multiple technology silos in real-time. IT teams that build their own AIOps platforms need to make sure they are gathering all the right log data, metrics and traces alongside data collected from IT service and incident management platforms. These comprehensive data sets are needed to train whatever machine learning framework is in place, which is often selected at random. Invariably, that means building or buying a costly Big Data platform to create a data lake to store all that data. A poorly-constructed AIOps platform will be worse than the proverbial disease it is meant to cure because the insights don’t accurately reflect what’s actually occurring in the IT environment. Do you have the funds for this and experienced data science experts on board to get this right?
- Designing AI-enhanced workflows is unlike other workflows: Getting the data is just the beginning. Determining how the system behaves and affects existing workflows is the next step. IT teams must decide to what degree they merely want the AIOps platform to passively surface recommendation based on what’s observed versus automatically resolving issues based on defined parameters.
- Deployment is complex: After developing a few AIOps algorithm to produce meaningful results, the next step is to determine how to deploy it in a resilient and performant architecture. What other systems does it need to integrate with and how will results be monitored and viewed?
- Monitoring user impact is critical: How will end users interact with the algorithm and what is the ideal UI/UX and workflow? How will feedback be provided by end users for improvement and adoption success?
- AIOps support and maintenance is not a project, but a team. Ultimately, an internal IT team would need to build the equivalent of a product which needs ongoing maintenance and support. The total cost of the custom platform starts to rise as the bulk of the IT team could wind up spending most of their time managing the AIOps platform instead of making continual improvements. Even if the IT team has the expertise required to build an AIOps platform, there’s no guarantee those individuals will always be available to maintain and update it. Very few IT professionals spend their entire career at one organization.
- Keeping pace with marketplace innovation: Finally, AIOps as a field is still relatively nascent and the startup community has hundreds of millions of dollars in VC backing to support R&D. Advances are being made at a rate most internal IT teams can’t keep up with, let alone evaluate and vet on their own.
How to Safely Encourage AIOps Exploration
There’s no substitute for knowing where an organization needs to go and how to get there. A commercial AIOps platform incorporates all the best practices that have been defined by legions of IT experts, along with these benefits:
- Faster time to value: You can embark on the AIOps journey much sooner. A commercial AIOps platform will begin surfacing insights in a matter of weeks. It will take an internal IT team months to build an equivalent platform with no guarantee of success. Time is better spent on user adoption and adding and refining use cases for business benefit.
- Seasoned experts: A commercial platform provides immediate access to not only a proven framework but also, AIOps experts who can troubleshoot and optimize issues quickly. There’s almost no AIOps challenge they haven’t seen before.
How to Measure ROI from AIOps
Savvy organizations that invest in AIOps are primarily betting on a better way to manage IT that will enable them to accomplish more as a business. The real value proposition of any AIOps platform is that it enables an existing IT team to do more not by just eliminating rote tasks but also making it possible to deploy more applications reliably without adding IT staff. It’s worth remembering that the cost of labor continues to be the single biggest IT expense.
The return on investment from an AIOps platform can be easily calculated by measuring:
- The number of incidents resolved in a given period of time;
- The size of the IT operations/incident management staff before and after an AIOps platform is deployed.
Arguably, however, those savings pale in comparison to the opportunity costs of using advanced technologies which can increase operational intelligence and deliver new business value in the form of excellent user experiences and high-performing digital services. IT organizations which don’t embrace AIOps will soon find themselves unable to compete with faster, more nimble rivals that have modernized their IT processes. The easier, more affordable and least risky way to go about this is by deploying a solution already proven and in market.