“AIOps will play a key role in enhancing the security of IT infrastructure”
We spoke with Guy Fighel, General Manager & GVP Product Engineering at New Relic about AI and AIOps. What role does artificial intelligence play when it comes to monitoring and observability? What are the best practices when implementing AI within your team?
JAXenter: Hello Guy and thanks for taking the time for this interview. At the Future of AI conference, you spoke about AI, AIOps, and the role of artificial intelligence when it comes to Noise Reduction. So for starters, what exactly is “Noise Reduction”?
Guy Fighel: Today, with the proliferation of IT monitoring tools, the volume of daily alerts an SRE or DevOps team has to deal with is often in the tens of thousands. But the problem is more than just seeing the forest for the trees: The world of SRE and DevOps teams is all about fast responses. The ability to quickly diagnose and resolve a problem can mean thousands of dollars or clicks.
Overwhelming IT noise means that IT Ops and DevOps teams are flooded with false positives (aka ‘symptom’ alerts) on an everyday basis, making the identification of root cause nearly impossible. And, in order to deal with this overwhelming situation, organizations often filter alerts so that only those deemed high-severity (commonly known as P0 or P1 issues) reach the responding team. This creates a blind spot in the organization’s operational visibility, since low-severity alerts are often precursors to the high-severity ones, leading to what everyone hates – alert fatigue. Reducing this ‘noise’ for IT operations teams and helping teams prioritize alerts and find signals through the noise is more important than ever.
JAXenter: The term DevOps is by now, 11 years after its initial mentioning, widely known, as well as derivatives like DevSecOps, BizDevOps, and so on. What exactly is AIOps?
Guy Fighel: DevOps is all about improving the way teams work in order to ship software faster, more frequently, and with greater reliability. That means being able to respond quickly when problems occur that may impact customer experience or service level objectives (SLOs).
In the past few years, a new category of technology has emerged that puts AI and machine learning (ML) in the hands of on-call teams so they can prevent more incidents and respond to them faster. Gartner originally coined the term “AIOps” (Artificial Intelligence for IT Operations) to describe this space. AIOps is all about empowering on-call engineers with the help of AI & ML to detect problems earlier, diagnose and understand the root causes faster, and drive automation to fix them.
DevOps is all about improving the way teams work in order to ship software faster, more frequently, and with greater reliability.
At New Relic, we believe AIOps capabilities are a key requirement for observability. By providing a connected, real-time view of all telemetry data in one place, observability makes it possible to pinpoint issues faster, understand not only when an issue occurs but why, and get context to quickly analyze and proactively take action on that data.
AIOps gives on-call engineers the ability to more proactively detect anomalies, group and correlate alerts and incidents to reduce noise, and diagnose and respond to incidents faster by enriching incidents with intelligence and context.
You can learn more about AIOps use cases in detail here.
JAXenter: How does AIOps help with the aforementioned Noise Reduction?
Guy Fighel: To overcome this IT operational noise, New Relic recently made New Relic AI generally available to all customers. It’s a comprehensive solution that automatically detects anomalies, learns from your incident, alert and event data and team feedback to intelligently suppress alerts teams don’t care about and correlate related incidents, with minimal configuration, training, or onboarding. Customers already using New Relic AI have reported that they have seen automatic reductions of noise in excess of 80%, along with more streamlined and useful alerts, without requiring weeks or months to get up and running and see value.
JAXenter: What role does artificial intelligence play when it comes to monitoring and observability?
Guy Fighel: As the complexity of operating production systems increases, software teams need faster and easier ways to resolve issues. They need assistance and automation that augments their existing incident management teams and workflows, so they can find and fix problems faster. At New Relic, we believe observability should be a part of the software development life cycle, and it should be part of the culture, just like DevOps.
New Relic utilizes its access to raw monitoring data to fuel ML models and enable an intelligent, context-rich, incident response workflow. Our product, New Relic AI augments the value customers get from monitoring by providing an intelligent feed of incident information alongside telemetry, and applying AI and ML to analyze and take action on that data, so you can detect, troubleshoot and respond to problems faster.
Overall, observability allows teams to achieve new breakthroughs in visibility and service levels. By combining observability with AIOps strategies, teams improve uptime and performance, reduce toil, and accelerate their pace of innovation.
JAXenter: How do you implement and involve AI in their teams? Are there any best practices?
Guy Fighel: AIOps can dramatically improve the IT organization’s ability to be an effective partner to the business. An IT operations platform with built-in AIOps capabilities like New Relic AI can help IT operations proactively identify potential issues with the services and technology it delivers to the business and correct them before they become problems.
IT organizations should prep IT workflows and infrastructure for an AI-driven strategy. Additionally, streamlined data management and collection is another prerequisite for AI in IT operations. Easy access to data enables a team to better identify correlations between application performance and the IT infrastructure that drives it. Finally, when implementing AIOps for teams, carefully manage staff expectations and explain that AI is intended to augment, not eliminate, jobs. Our experience has been that AIOps technology doesn’t replace human work — it augments humans by eliminating mundane toil and enabling people to focus more of their time on higher-value work.
JAXenter: What security risks do you see in that regard?
Guy Fighel: I strongly believe AIOps will play a key role in enhancing the security of IT infrastructure. Through the application of machine learning algorithms, potential risk events will be detected before they occur and in turn, successfully avoided.
By applying AI to security management, IT teams can detect a variety of breaches and violations. Advanced machine learning algorithms can be used to identify unexpected and potentially unauthorized and malicious activity within the infrastructure.
By applying AI to security management, IT teams can detect a variety of breaches and violations.
JAXenter: AIOps sounds like no developers are involved. Is this the case?
Guy Fighel: Not at all. One of the outcomes of a DevOps transformation is that developers who write code are key stakeholders in ensuring service availability and reliability. AIOps will augment the role of engineering teams by taking care of the repetitive, time-consuming tasks required in IT roles. The type of human-led work will shift from handling incidents that interrupt the business to forecasting maintenance, and handling analytics and other data.
JAXenter: How will artificial intelligence change the world of software development in the coming years?
Guy Fighel: Most organizations face several challenges on their way to a digital transformation. For organizations moving toward a digital-centric approach, AI adds business value by saving time and effort so your staff can focus instead on innovation. Ultimately, advances in AI will help us deliver a seamless customer experience with predictive analytics, while reducing operational toil and inefficiency.