10 micro metrics you should use in performance reports
Performance tests are an unavoidable task. But are you measuring the right things? In this article, Ram Lakshmanan goes over why the normal major metrics aren’t perfect and ten new measurements to keep in mind that might improve future performance reports.
In a lot of enterprises, performance tests are conducted regularly. As part of these tests, quality assurance teams gather various metrics and publish them in a performance report. Some of the common metrics analyzed in the performance report are CPU utilization, memory utilization, the response time of key transactions or backend systems, and network bandwidth, depending on the organization.
I would like to categorize these metrics as macro metrics. Macro metrics are great; however, they have two main short-comings:
- Performance problems that are not caught in test environment
Despite a number of performance tests conducted in the test environment, performance degradation still finds its way to production. In a test environment, you notice acute performance degradation, but the Macro metrics mentioned above don’t find them. These acute degradations are the ones that can manifest as major performance problems in production. However micro-level metrics discussed in the following section brings visibility to these degradations
- Macro metrics aren’t helpful for troubleshooting
To a major extent, macro metrics don’t help the development team in debugging and troubleshooting problems. If, say, the macro metrics indicates that the CPU consumption is high, there will be no indication whether the CPU consumption increased because of heavy Garbage Collection activity or thread looping problem or some other coding issue. Similarly, if there is degradation in the response time, it won’t indicate whether degradation is because of the locks in the application code or a backend connectivity issue.
Macro metrics should be complemented with micro metrics to address these shortcomings. In this article, I have listed what I consider to be the 10 most important micro metrics. You should consider adding any and all of them to your performance reports.
Memory-related micro metrics
1. Garbage collection pause times
You should measure Garbage Collection pause times, as the entire application freezes during GC pauses. This means that no customer activity will be performed, which is obviously not good. Lowering the number and length of GC pause times has a direct impact on the customers. You should always aim for lowest possible pause time.
2. Object creation/reclamation rate
The rate at which objects are created heavily influences the CPU utilization. If inefficient data structures or code are used, then more objects will be generated to process the same number of transactions. A high object creation rate translates to frequent Garbage Collection (GC). Frequent GC translates to increased CPU consumption.
3. Garbage collection throughput
Throughput is basically the amount of time your application spends in processing customer transactions vs amount of time it spends in doing garbage collection activities. You should aim for high throughput, i.e. application should spend more time in processing customer transactions and less time in garbage collection activities.
4. Memory consumption by each generation
In the JVM, Android Runtime, and other platforms, memory is divided into few internal regions. You need to know what the allocated size is as well as what the peak utilization size of each region is. Under-allocation of internal memory regions will degrade the performance of the application. Over-allocation will increase the bill from your hosting provider.
How can you source the memory-related micro metrics?
All of these memory-related micro metrics can be captured from the Garbage Collection logs. Here are instructions on how to turn ON the Garbage collection logs with apps like GCeasy.io. It’s a free online GC log analyzer tool that will report all of the above memory-related micro metrics in a visual/graphical format.
Thread-related micro metrics
5. Thread States
Threads can be in one of the following states: NEW, BLOCKED, RUNNABLE, WAITING, TIMED_WAITING, and TERMINATED. Threads count by each state should be reported. If threads are in a BLOCKED state for a prolonged period then the application can become unresponsive. If there are lot of threads in a RUNNABLE state, then the application’s CPU consumption will become high. And if application threads are spending more time in WAITING, TIMED_WAITING or BLOCKED states, then response time will degrade.
6. Thread groups
A thread group represents a set of threads. Each application has multiple thread groups. You should measure the size of each thread group and report it. An increase in thread group size might indicate a certain type of performance degradation.
7. Daemon vs non-Daemon
There are two types of thread statuses: Daemon and non-daemon (i.e. user) threads. You should report threads count by status. Because when non-daemon threads are running, the JVM won’t terminate.
8. Code execution path
Your application’s CPU, memory consumption, and response time will differ based on the code execution path. If most of the threads execute a specific code execution path, then that particular code execution should have to be studied in detail to prevent bottlenecks or inefficiencies.
How can you source these thread-related micro metrics?
Thread activity related metrics can be sourced from thread dumps. Here are 8 different options to capture thread dumps. You should use whatever option that is most convenient for you. Once you have captured thread dumps, you can upload them to free online thread dump analysis tools like fastThread.io. This tool provides all of the above thread activity related micro-level metrics.
Network-related micro metrics
9. Outbound connections
In today’s world, you will seldom see enterprise applications that don’t communicate with other applications. Your application’s performance is heavily dependent on the applications to which it communicates. Measuring the number of ESTABLISHED connections by each end-point should be measured. Any variance in the connection count can influence the performance of the application.
10. Inbound connections
The application can get traffic from multiple channels: Web, Mobile, API and multiple protocols: HTTP, HTTPS, JMS, Kafka, and more. You need to measure the number of connections coming from each channel and each protocol as they also influence the performance of the application.
How can you source network-related micro metrics?
Application Performance Monitoring (APM) tools like New Relic or App Dynamics can report this metric. You can also configure custom probes in APM tools to report these metrics. On the other hand, if you aren’t using APM tools, you can also use ‘netstat’ utility.
netstat -an | grep 'hostname' | grep 'ESTABLISHED'