Garbage day

An introduction to garbage collection

RubenBadaro
gc1

Ruben Badaro provides a crash course in monitoring garbage collection in production systems

Garbage collection is one of the killer features of the Java
Virtual Machine. Saving the developer from having to worry about
memory management, as in a language such as C++, results in a
significant productivity boost. However, that boost isn’t free:
garbage collection has a cost.

As we allocate and de-allocate objects in our programs, the
garbage collector is responsible for doing the cleanup work of
freeing up the memory taken by objects that are no longer needed so
that it can be used for new objects. The way in which the collector
goes about this depends on which collection algorithm we
choose.

There are two main vectors that need to be taken into account
regarding garbage collection characteristics – throughput and
pauses. Throughput is the percentage of total time not spent in
garbage collection, whereas pauses are the times when an
application appears unresponsive because garbage collection is
being carried out.

Depending on the type of application, we might care more about
one or the other, but in most real-world applications pauses are
the common issue. The more memory is used by the JVM in the heap,
the more likely it is that you will incur pauses when running your
garbage collection (either in the young generation or tenured
generation).

Nowadays, most teams end up limiting their JVM memory usage to
4-8GB specifically due to the long pauses that might occur
otherwise. It is therefore imperative that we can measure our
applications and the behavior of garbage collection. More
specifically, it’s important that we have this information on the
actual production applications running.

In order to measure garbage collection, there are 3 main
approaches:

  • Analyze garbage collection logs that the JVM prints;

  • Use JMX to read exposed garbage collection information;

  • Access the JVM instrumentation counters with tools such as
    jstat.

We shall focus on each of these approaches individually and see
how we can use them to measure GC behavior.

Analyzing garbage collection logs

The first thing you need to do is add arguments to your java
call to print out garbage collection information:

  • -Xloggc:~/gc.log

  • -XX:+PrintGCDetails

  • -XX:+PrintGCTimeStamps

  • -XX:+PrintTenuringDistribution

 

An example output (ignoring the tenuring distribution) would
be:

 

0.634:
[GC

[PSYoungGen:
10304K->1660K(11968K)] 10304K->2656K(39296K), 0.0069210 secs]
[Times: user=0.01 sys=0.00, real=0.01 secs]

 

What this tells us is that a minor collection happened 0.634
seconds after the start of the process on the young generation.
Before the collection, there where 10304K objects in the young
generation and after it there was 1660K in the same generation (of
a total size of 11968K of young gen) and it took ~7
milliseconds.

The next number refers to the total size of the heap – there was
a total heap usage of 10304K (means we only had young gen heap
usage before) that was reduced to 2656K (larger than the 1660K in
the young gen, which means we promoted 996K to the old gen). The
total size of the heap (young + old gen) is now 39296K.

An example of a full gc would look like this:

 

4.268:
[Full GC [PSYoungGen: 1649K->0K(42880K)] [ParOldGen:
19757K->18765K(44928K)] 21406K->18765K(87808K) [PSPermGen:
23549K->23537K(47104K)], 0.2047590 secs] [Times: user=0.31
sys=0.00, real=0.20 secs]

 

The structure of this output is the same as before, but here we
can see the three different areas of the heap – young, old and
permanent generation – and how their size was changed.

This data will certainly be valuable when tuning the generation
sizes and different parameters of the JVM but is also very useful
for our goal of monitoring garbage collection behavior in
production.

The impact of writing these logs in production is negligible, so
it’s practical to parse them and, if necessary, send an alert – as,
say, an email to the team or notification the company’s operations
alert system.

A simple example would be to monitor this file and send an email
if a full garbage collection occurs. This is one way it could be
achieved in a Unix system:

 

> tail
-f gc.log | grep –line-buffered “Full GC” | while read line; do
echo “$line” | mail -s “Full garbage collection occurred”
“email@domain.com”; done

 

This will just send the individual old collection lines to the
email. It’s fairly simple, but a complex and fully-fledged solution
isn’t always necessary to get what you need.

There are a few open-source projects that parse the garbage
collection logs, but these formats have two issues: they vary
depending on the collectors being used and can change over time
(and have done so in the past!). Be aware that some of these
projects no longer work with current log formats.

Read Garbage Collection information through JMX 

Another approach to read information about garbage collection is
through Java Management Extensions (JMX).

There are a number of data points exposed through MBeans which
can be read either inside the process, from another process on the
same machine or a remote process. For our specific case, we have
garbage collection beans that we can use (these again depend on the
garbage collection algorithms we are using):

 

Figure 1: Exposing garbage
collection beans with MBeans.

 

Here in jconsole we can see that, besides other information, we
have 2 GC related MBeans: PS MarkSweep and PS Scavenge (In this
case we are using Concurrent Mark Sweep as our garbage collection
algorithm).

We can however go much further than to simply look at this data
in jconsole or similar management GUIs. We can read this
information from our code and deal with what happens as we see
fit.

Listing 1 shows how we can read it.

 

public void monitor(int vmpid) throws Exception {
JMXServiceURL url = new JMXServiceURL(getVMAddress(vmpid));
        JMXConnector connector = JMXConnectorFactory.connect(url);

        final MBeanServerConnection serverConnection = connector.getMBeanServerConnection();
        Set<ObjectName> beanSet = serverConnection.queryNames(new ObjectName("java.lang:type=GarbageCollector,name=PS MarkSweep"), null);

        final ObjectName bean = beanSet.iterator().next();

        GarbageCollectorMXBean gcBean = JMX.newMXBeanProxy(serverConnection,bean, GarbageCollectorMXBean.class);
        System.out.println("collection time: " + gcBean.getCollectionTime());
        System.out.println("collection count: " + gcBean.getCollectionCount());

        GcInfo gcInfo = gcBean.getLastGcInfo();
        Map<String, MemoryUsage> memUsages = gcInfo.getMemoryUsageBeforeGc();
        for (Entry<String, MemoryUsage> memUsage : memUsages.entrySet()) {
                System.out.println(memUsage.getKey() + ": " + memUsage.getValue());
        }

        listenToNotifications(serverConnection, bean);
}

private String getVMAddress(int pid) throws AttachNotSupportedException, IOException {
        String jmxAddressProp = "com.sun.management.jmxremote.localConnectorAddress";
        VirtualMachine vm = VirtualMachine.attach(String.valueOf(pid));
        return vm.getAgentProperties().getProperty(jmxAddressProp);
}

 

Listing 2 shows an example output.

 

collection time: 82037
collection count: 116
PS Survivor Space: init = 1703936(1664K) used = 65536(64K) committed = 32047104(31296K) max = 32047104(31296K)
PS Eden Space: init = 10551296(10304K) used = 0(0K) committed = 69795840(68160K) max = 113049600(110400K)
PS Old Gen: init = 27983872(27328K) used = 239432344(233820K) committed = 357957632(349568K) max = 357957632(349568K)
Code Cache: init = 2555904(2496K) used = 19949568(19482K) committed = 20185088(19712K) max = 50331648(49152K)
PS Perm Gen: init = 21757952(21248K) used = 148450536(144971K) committed = 155058176(151424K) max = 268435456(262144K)

 

We are just printing out the collection time, count and
statistics of the different spaces much as we could see them when
looking at garbage collection logs. However, in this case we have
rich data structures that allow us to have this information
available without needing to parse text.

In this simple example, we are reading garbage collection
information from a process running in our local machine. In order
to do that, all we need is the process process id, which gets
passed in to the monitor() method.

You can see that the identifier we pass in to the queryNames()
method matches the tree we saw in the previous jconsole
screenshot:

 

“java.lang:type=GarbageCollector,name=PS MarkSweep”

 

 

We could here be reading the PS Scavenge collector (for young
generation collections) and you would probably change this code to
be able to read any garbage collector you decide to configure.

On the last line, we are invoking a listenToNotifications
method. This method would look something like Listing
3
.

 

private void listenToNotifications(final MBeanServerConnection serverConnection, final ObjectName bean) throws InstanceNotFoundException, IOException {
        final Queue<Notification> notifications = new LinkedBlockingQueue<Notification>();
        NotificationListener listener = new NotificationListener() {
                @Override
                public void handleNotification(Notification notification, Object ctx) {
                        notifications.add(notification);
                }
        };
        serverConnection.addNotificationListener(bean, listener, null, null);

        while (true) {
                Notification notification = notifications.poll();
                if (notification != null) {
                        process(notification);
                }
        }
}

 

This method allows us to listen for changes in the specific
garbage collector – i.e. be notified each time a collection occurs.
The process method is left unimplemented and in this case it would
gather whatever information is necessary and send it through email,
publish a notification in an internal alerting system, etc.

Access the JVM instrumentation counters

One last but interesting way to access garbage collection
information in production is to access the actual JVM counters.
When the JVM is running, it uses internal data structures to keep
all this information about garbage collection (and much more). This
is done directly in C++ and we don’t have access to it directly
from Java.

However, the JVM stores these data structures in a file that is
normally kept in a temp folder (/tmp in Linux) called
hsperfdata_<user> which contains all the information. The JVM
writes and reads this as a memory mapped file so it is possible to
do access those data structures by simply reading the file. The
advantage of this method is that it barely affects the running
process, since the file is opened with a read-only mode to
guarantee that we don’t affect memory. There is no extra work to be
done by the process, unlike in the JMX example where code needs to
execute to access the information we want.

While actually accomplishing this task is not as simple as the
previous examples, we can get some good pointers from existing
tools. One interesting tool distributed with the JDK is jstat. What
is does in practice is read the hsperfdata file and display the
performance counters on stdout. There is another similar tool
called jstatd, a daemon that allows remote processes (e.g.
VisualVM) to connect and query for data. However, the format for
this communication is not publicized and is not guaranteed to stay
the same.

There are two ways of taking advantage of this method of access:
use jstat and parse its output as we did with the garbage
collection logs, or read the performance counter data structures
directly. The second option would be too complex for the scope of
this article but jstat is fairly simple to use. We can run it like
so:

 

> jstat –gccause –t <pid> <millis_interval>

 

This will get garbage collection information every
millis_interval and print to stout something similar to
Listing 4.

 

Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT LGCC GCC

82214.9 9.04 0.00 78.21 68.87 96.27 968 20.731 116 82.039 102.769 Allocation Failure No GC

82215.9 9.04 0.00 78.21 68.87 96.27 968 20.731 116 82.039 102.769 Allocation Failure No GC

This gives you most of the basic counters –survivor space 0, 1,
eden, old and permanent utilization percentage, counts of garbage
collection, time spent in garbage collection and the reason for
garbage collection. The manual page for jstat has a good
explanation of each of these counters and other options that
can be passed in for different information.

 

As with in the garbage collection log, we can parse this output
in a very naïve way to send out emails – but I will leave that up
to you to figure out.

Conclusion

There are a number of ways to monitor garbage collection
behavior in production with different effort and advantages. The
important part is that this behavior gets measured before going to
production and that we keep these monitoring solutions in
production to get vital information about our running systems.

Author Bio: Ruben works as a software
engineer at Morgan Stanley. He was one of the founders of the
Portuguese Java User Group (PT.JUG) and has been working with Java
and other languages for the last 10 years. He has done everything
from working in consulting, startups and banking on highly-scalable
and low-latency systems.

This article previously appeared in JAX
Magazine: Pulling Together. For that issues and others, click
here.

Image courtesy of edenpictures

Author
RubenBadaro
Ruben works as a software engineer at Morgan Stanley. He was one of the founders of the Portuguese Java User Group (PT.JUG) and has been working with Java and other languages for the last 10 years. He has done everything from working in consulting, startups and banking on highly-scalable and low-latency systems.
Comments
comments powered by Disqus