The value of observability

Trends in observability – What are the big issues for developers?

Ben Newton
© Shutterstock / Toomko

Observability can be used to improve how we think about our roles within companies. At the same time, linking observability to a business goal can be an effective way to ensure that other teams and departments care about the data involved. See what developers should know about observability.

Looking at observability, there are so many moving parts involved. From cloud infrastructure that supports applications, through to the containers that are here one minute and gone the next, there are lots of ephemeral assets in place. All of these assets should be tracked to check that applications are performing, that the services involved are being used effectively, and that any potential issues are spotted.

However, observability must be more than a few pillars of data. Observability means being able to maintain service levels and maintaining excellent user experience through the use of both comprehensive data and judicious use of analytics. If you want to deliver best-in-class customer experiences, you have to address the reliability of the complete application stack and all the processes that are related to them. This includes looking at software delivery CI/CD processes as well as at your chosen cloud computing and microservices platforms, and edge technologies such as CDNs that are used to deliver content to end-users too.

To achieve this more practically useful definition of observability, developers should pay attention to some other areas.

SEE ALSO: Observability and approach – what does good look like?

1. Observability across AWS is necessary

At last count, Amazon Web Services (AWS) provided more than 175 services that developers can use, from simple infrastructure through to fully implemented application services and components like databases, analytics, and security. Our research shows that the average enterprise uses more than forty AWS services to achieve their desired functionality. Additionally, those services are typically used across many different individual AWS accounts to accommodate different team needs and to add a level of security.

This has a few implications. First, the depth and flexibility of AWS hands an extensive toolset to developers and accelerates the time to market. Secondly, that very flexibility and agility creates complexity and confusion that, ironically, can end up negating many of the benefits of AWS.

For observability, this complexity presents a huge challenge. For developers that are using multiple AWS services as part of their applications each service will need some work to get the right logs and metrics together. Automating this process is therefore something that can really help. Additionally, consolidating all of the signals in one place means that developers can see all their services and how they are performing across all AWS accounts and regions.

This visibility eliminates the swivel-chair management that wastes so much time, reduces the time to find and resolve issues, while also providing continuity from pre-production to production which also helps reduce errors and failures.

2. Observability for microservices is growing in importance

More developers want to use microservices – this approach should mean that development is easier over time as they can update components without having to take down the whole application. Moving from monolithic applications to smaller and more manageable services should make the overall job easier.

However, the sheer number of container images, services and APIs involved in any large microservices application means that application issues are much more difficult to diagnose and harder to fix. This means observability is even more important. Without accurate data on every layer of the microservice – from the cloud service to the containers themselves – it can be difficult to track down complex issues and pinpoint the root cause. Container orchestration tools like Kubernetes provide extensive data, but observability requires stitching that data into a useful picture.

Open source tools like Prometheus, FluentD, and FluentBit provide this data, but these tools can help show parts of the application rather than the whole thing. Instead, it is important to consolidate this data so that it can be used effectively. For example, combining logs, metric signals and traces can help provide more contextual information, while getting more hierarchical information based on metadata can make it easier to understand and troubleshoot applications. It’s only by getting a mix of data sets together, with the analytics to explore them, that we can understand what is really taking place.

3. Getting meta about observability

The prefix meta describes anything that is ‘about itself’ – in the case of metadata, this is data about data being created. It’s also now commonly used on its own to describe books or movies that break the fourth wall and refer to themselves, or to other films. For observability, metadata is the glue that ties all of the tiny little pieces together.

Keeping track of the whole software lifecycle means not only stitching together transactions to understand end-user experience – it also means stitching together actions across the entire CI/CD pipeline to understand the life of the code. All of these actions themselves generate telemetry, but we need metadata to stitch it together and make it useful.

By looking at these actions and comparing them against best practices – for example, the DevOps Research and Assessment (DORA) KPIs – we can get an idea of how well our teams are doing in delivering code and collaborating with each other. As we look at improving our organisation’s agility and ability to work through software, we can also check our own progress.

SEE ALSO: Demystifying observability

The impact of observability data on resilience planning

Each of these trends is interesting in their own right. However, when you look at them together, you can see another development taking place. Observability provides us with insight into how applications and services are performing, but this can also be used to give us more context into how these applications support wider business objectives. As our organisations work within a world that is increasingly more reliant on digital services – due to COVID-19 or not – the data from these applications can become an effective proxy for real-world performance. For example, an increase in web traffic or application demand will usually be linked to higher levels of transactions and business. This increase can be seen and tracked across application components, but it can also be seen in bottom-line revenues too.

Similarly, these new applications are the ways that our companies operate. They have to become less fragile, even as the underlying infrastructure becomes more complex. Observability data, therefore, has a greater purpose beyond just showing us how well our app components are performing over time. Instead, this data can be used to improve both resilience to risks and to show where business results can be affected.

Observability can be used to improve how we think about our roles within companies. At the same time, linking observability to a business goal can be an effective way to ensure that other teams and departments care about the data involved. This can then become a two-way street for discussions, as operations teams will want to understand how their decisions affect results, and how steps taken by developers can affect their own performance too.

By thinking about observability from a reliability perspective, we can ensure that our applications are better able to handle issues like a cloud outage or service failure. At the same time, we can speak a language that business teams better understand. This should help everyone see the value that observability provides.


Ben Newton

Ben is a veteran of the IT Operations market, with a two-decade career across large and small companies like Loudcloud, BladeLogic, Northrop Grumman, EDS, and BMC. Ben got to do DevOps before DevOps was cool, working with government agencies and major commercial brands to be more agile and move faster. More recently, Ben spent 5 years in product management at Sumo Logic and is now running product marketing for Operations Analytics at Sumo Logic.

Inline Feedbacks
View all comments