Optimizing your microservices

What Tracing and Monitoring Brings to Your Microservices Strategy

Kevin Crawley
© Shutterstock / struvictory

Many organizations are rushing into microservices without really understanding some of the most critical components of a successful deployment. Tracing and monitoring rank high on that list. See how to leverage monitoring and tracing can improve your microservices.

Make no mistake about it: the ever-increasing migrations to microservices architecture is warranted and will continue. The agility, flexibility, and scale rewards of replacing bulky monolithic applications with decoupled services that DevOps teams, developers, and engineers can deploy independently are there for the taking. But getting it right isn’t as always as easy as it seems. Many organizations are rushing into microservices without really understanding some of the most critical components of a successful deployment.

Tracing and monitoring rank high on that list.

SEE ALSO: DevOps and Security – how to build more than another stage into software processes

Groups of microservices often act as a singular application, even when requests are routed through multiple services. If a problem occurs, developers require a way to trace issues along that rather complex route. Even systems that otherwise work well can include traceable issues (such as poor latency). To optimize microservices, developers need a method for tracing latency between services and collating service logs – while also minimizing logging overhead.

Logging, monitoring, and tracing are not the same thing

The terms logging, monitoring, and tracing are often conflated but the distinctions need to be understood:

Logging is the first tool operators use to react to – and investigate – service errors or security events. It delivers microservice-level auditing capabilities.

Monitoring allows operators to understand service responses to requests using proactive metrics. These include key infrastructure metrics like CPU, memory, and I/O, as well as runtime metrics from the application such as heap size, thread count, and memory management (garbage collection).

Tracing enables performance optimization by recording how multiple physical requests accomplish a single application-level logical request across a chain of services. Tracing captures any exceptions and errors that each request encounters – including timing information and valuable meta-data such as response code and headers. This makes tracing a particularly valuable diagnostic tool.

While logging is important, let’s dig into monitoring and tracing due to the particularly high utility they offer across multiple services.


Tracing begins at the location where a request enters an application (often called an endpoint) and then generates a unique ID for the request. Each subsequent service in the flow of traffic adds more data to the trace. This includes the request’s time of arrival and total processing time. Harnessing that data, operators can build alerting policies, service level objectives, and fully visualize the call flow using open source technologies like Jaeger or Elastic APM. In many cases, operators will utilize data collected by tracing to help augment the understanding of their monitoring data.

For the most popular runtimes and frameworks, libraries and tools are available to instrument tracing within a microservices application. Teams can also develop their own solutions to intercept calls, add headers to downstream requests, or otherwise affix metadata to trace traffic.

SEE ALSO: DevOps lessons learned from the field: People, process and technology

Tracing vs Monitoring vs Both

How developers leverage monitoring and tracing is entirely at their discretion – they’re both powerful methods whether used separately or in tandem.

Many organizations’ strategy often starts by bringing in monitoring metrics because they are easier to implement. Tracing takes greater effort to collect, store, and analyze large amounts of trace telemetry – but it will also unearth a greater breadth and depth of visibility. Aggregated trace data provides insights to teams on when and where their services need to be scaled out. Monitoring and tracing each assist in detecting anomalous behavior in individual services. Tracing goes a step further by identifying the cause of anomalies. Tracing is also a requisite for achieving more complete optimization and end-to-end performance improvements. Open source tools like Traefik and Maesh can simplify the initial implementation of distributed tracing by significantly reducing workloads or telemetry management overhead.

Monitoring and tracing each yield nice returns when building microservices applications that are reliable and performant. Monitoring oversees platform services and infrastructure health. Tracing enables troubleshooting of bottlenecks and unexpected anomalies. Mature applications must be supported by each of these techniques in order for complex service management to be optimized and efficient.


Kevin Crawley

Kevin Crawley is a Developer Advocate at Containous, a cloud-native networking company behind the open source projects Traefik and Maesh. He is passionate about championing the benefits of Open Source, DevOps, automation, observability, distributed tracing, and control theory.

Leave a Reply