Angular shitstorm: What's your opinion on the controversial plans for Angular 2.0?
Garbage day

An introduction to garbage collection

RubenBadaro
gc1

Ruben Badaro provides a crash course in monitoring garbage collection in production systems

Garbage collection is one of the killer features of the Java Virtual Machine. Saving the developer from having to worry about memory management, as in a language such as C++, results in a significant productivity boost. However, that boost isn’t free: garbage collection has a cost.

As we allocate and de-allocate objects in our programs, the garbage collector is responsible for doing the cleanup work of freeing up the memory taken by objects that are no longer needed so that it can be used for new objects. The way in which the collector goes about this depends on which collection algorithm we choose.

There are two main vectors that need to be taken into account regarding garbage collection characteristics – throughput and pauses. Throughput is the percentage of total time not spent in garbage collection, whereas pauses are the times when an application appears unresponsive because garbage collection is being carried out.

Depending on the type of application, we might care more about one or the other, but in most real-world applications pauses are the common issue. The more memory is used by the JVM in the heap, the more likely it is that you will incur pauses when running your garbage collection (either in the young generation or tenured generation).

Nowadays, most teams end up limiting their JVM memory usage to 4-8GB specifically due to the long pauses that might occur otherwise. It is therefore imperative that we can measure our applications and the behavior of garbage collection. More specifically, it’s important that we have this information on the actual production applications running.

In order to measure garbage collection, there are 3 main approaches:

  • Analyze garbage collection logs that the JVM prints;

  • Use JMX to read exposed garbage collection information;

  • Access the JVM instrumentation counters with tools such as jstat.

We shall focus on each of these approaches individually and see how we can use them to measure GC behavior.

Analyzing garbage collection logs

The first thing you need to do is add arguments to your java call to print out garbage collection information:

  • -Xloggc:~/gc.log

  • -XX:+PrintGCDetails

  • -XX:+PrintGCTimeStamps

  • -XX:+PrintTenuringDistribution

 

An example output (ignoring the tenuring distribution) would be:

 

0.634: [GC

[PSYoungGen: 10304K->1660K(11968K)] 10304K->2656K(39296K), 0.0069210 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]

 

What this tells us is that a minor collection happened 0.634 seconds after the start of the process on the young generation. Before the collection, there where 10304K objects in the young generation and after it there was 1660K in the same generation (of a total size of 11968K of young gen) and it took ~7 milliseconds.

The next number refers to the total size of the heap – there was a total heap usage of 10304K (means we only had young gen heap usage before) that was reduced to 2656K (larger than the 1660K in the young gen, which means we promoted 996K to the old gen). The total size of the heap (young + old gen) is now 39296K.

An example of a full gc would look like this:

 

4.268: [Full GC [PSYoungGen: 1649K->0K(42880K)] [ParOldGen: 19757K->18765K(44928K)] 21406K->18765K(87808K) [PSPermGen: 23549K->23537K(47104K)], 0.2047590 secs] [Times: user=0.31 sys=0.00, real=0.20 secs]

 

The structure of this output is the same as before, but here we can see the three different areas of the heap – young, old and permanent generation – and how their size was changed.

This data will certainly be valuable when tuning the generation sizes and different parameters of the JVM but is also very useful for our goal of monitoring garbage collection behavior in production.

The impact of writing these logs in production is negligible, so it’s practical to parse them and, if necessary, send an alert – as, say, an email to the team or notification the company’s operations alert system.

A simple example would be to monitor this file and send an email if a full garbage collection occurs. This is one way it could be achieved in a Unix system:

 

> tail -f gc.log | grep –line-buffered “Full GC” | while read line; do echo “$line” | mail -s “Full garbage collection occurred” “email@domain.com”; done

 

This will just send the individual old collection lines to the email. It’s fairly simple, but a complex and fully-fledged solution isn’t always necessary to get what you need.

There are a few open-source projects that parse the garbage collection logs, but these formats have two issues: they vary depending on the collectors being used and can change over time (and have done so in the past!). Be aware that some of these projects no longer work with current log formats.

Read Garbage Collection information through JMX 

Another approach to read information about garbage collection is through Java Management Extensions (JMX).

There are a number of data points exposed through MBeans which can be read either inside the process, from another process on the same machine or a remote process. For our specific case, we have garbage collection beans that we can use (these again depend on the garbage collection algorithms we are using):

 

Figure 1: Exposing garbage collection beans with MBeans.

 

Here in jconsole we can see that, besides other information, we have 2 GC related MBeans: PS MarkSweep and PS Scavenge (In this case we are using Concurrent Mark Sweep as our garbage collection algorithm).

We can however go much further than to simply look at this data in jconsole or similar management GUIs. We can read this information from our code and deal with what happens as we see fit.

Listing 1 shows how we can read it.

 

public void monitor(int vmpid) throws Exception {
JMXServiceURL url = new JMXServiceURL(getVMAddress(vmpid));
        JMXConnector connector = JMXConnectorFactory.connect(url);

        final MBeanServerConnection serverConnection = connector.getMBeanServerConnection();
        Set<ObjectName> beanSet = serverConnection.queryNames(new ObjectName("java.lang:type=GarbageCollector,name=PS MarkSweep"), null);

        final ObjectName bean = beanSet.iterator().next();

        GarbageCollectorMXBean gcBean = JMX.newMXBeanProxy(serverConnection,bean, GarbageCollectorMXBean.class);
        System.out.println("collection time: " + gcBean.getCollectionTime());
        System.out.println("collection count: " + gcBean.getCollectionCount());

        GcInfo gcInfo = gcBean.getLastGcInfo();
        Map<String, MemoryUsage> memUsages = gcInfo.getMemoryUsageBeforeGc();
        for (Entry<String, MemoryUsage> memUsage : memUsages.entrySet()) {
                System.out.println(memUsage.getKey() + ": " + memUsage.getValue());
        }

        listenToNotifications(serverConnection, bean);
}

private String getVMAddress(int pid) throws AttachNotSupportedException, IOException {
        String jmxAddressProp = "com.sun.management.jmxremote.localConnectorAddress";
        VirtualMachine vm = VirtualMachine.attach(String.valueOf(pid));
        return vm.getAgentProperties().getProperty(jmxAddressProp);
}

 

Listing 2 shows an example output.

 

collection time: 82037
collection count: 116
PS Survivor Space: init = 1703936(1664K) used = 65536(64K) committed = 32047104(31296K) max = 32047104(31296K)
PS Eden Space: init = 10551296(10304K) used = 0(0K) committed = 69795840(68160K) max = 113049600(110400K)
PS Old Gen: init = 27983872(27328K) used = 239432344(233820K) committed = 357957632(349568K) max = 357957632(349568K)
Code Cache: init = 2555904(2496K) used = 19949568(19482K) committed = 20185088(19712K) max = 50331648(49152K)
PS Perm Gen: init = 21757952(21248K) used = 148450536(144971K) committed = 155058176(151424K) max = 268435456(262144K)

 

We are just printing out the collection time, count and statistics of the different spaces much as we could see them when looking at garbage collection logs. However, in this case we have rich data structures that allow us to have this information available without needing to parse text.

In this simple example, we are reading garbage collection information from a process running in our local machine. In order to do that, all we need is the process process id, which gets passed in to the monitor() method.

You can see that the identifier we pass in to the queryNames() method matches the tree we saw in the previous jconsole screenshot:

 

“java.lang:type=GarbageCollector,name=PS MarkSweep”

 

 

We could here be reading the PS Scavenge collector (for young generation collections) and you would probably change this code to be able to read any garbage collector you decide to configure.

On the last line, we are invoking a listenToNotifications method. This method would look something like Listing 3.

 

private void listenToNotifications(final MBeanServerConnection serverConnection, final ObjectName bean) throws InstanceNotFoundException, IOException {
        final Queue<Notification> notifications = new LinkedBlockingQueue<Notification>();
        NotificationListener listener = new NotificationListener() {
                @Override
                public void handleNotification(Notification notification, Object ctx) {
                        notifications.add(notification);
                }
        };
        serverConnection.addNotificationListener(bean, listener, null, null);

        while (true) {
                Notification notification = notifications.poll();
                if (notification != null) {
                        process(notification);
                }
        }
}

 

This method allows us to listen for changes in the specific garbage collector – i.e. be notified each time a collection occurs. The process method is left unimplemented and in this case it would gather whatever information is necessary and send it through email, publish a notification in an internal alerting system, etc.

Access the JVM instrumentation counters

One last but interesting way to access garbage collection information in production is to access the actual JVM counters. When the JVM is running, it uses internal data structures to keep all this information about garbage collection (and much more). This is done directly in C++ and we don’t have access to it directly from Java.

However, the JVM stores these data structures in a file that is normally kept in a temp folder (/tmp in Linux) called hsperfdata_<user> which contains all the information. The JVM writes and reads this as a memory mapped file so it is possible to do access those data structures by simply reading the file. The advantage of this method is that it barely affects the running process, since the file is opened with a read-only mode to guarantee that we don’t affect memory. There is no extra work to be done by the process, unlike in the JMX example where code needs to execute to access the information we want.

While actually accomplishing this task is not as simple as the previous examples, we can get some good pointers from existing tools. One interesting tool distributed with the JDK is jstat. What is does in practice is read the hsperfdata file and display the performance counters on stdout. There is another similar tool called jstatd, a daemon that allows remote processes (e.g. VisualVM) to connect and query for data. However, the format for this communication is not publicized and is not guaranteed to stay the same.

There are two ways of taking advantage of this method of access: use jstat and parse its output as we did with the garbage collection logs, or read the performance counter data structures directly. The second option would be too complex for the scope of this article but jstat is fairly simple to use. We can run it like so:

 

> jstat –gccause –t <pid> <millis_interval>

 

This will get garbage collection information every millis_interval and print to stout something similar to Listing 4.

 

Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT LGCC GCC

82214.9 9.04 0.00 78.21 68.87 96.27 968 20.731 116 82.039 102.769 Allocation Failure No GC

82215.9 9.04 0.00 78.21 68.87 96.27 968 20.731 116 82.039 102.769 Allocation Failure No GC

This gives you most of the basic counters –survivor space 0, 1, eden, old and permanent utilization percentage, counts of garbage collection, time spent in garbage collection and the reason for garbage collection. The manual page for jstat has a good explanation of each of these counters and other options that can be passed in for different information.

 

As with in the garbage collection log, we can parse this output in a very naïve way to send out emails – but I will leave that up to you to figure out.

Conclusion

There are a number of ways to monitor garbage collection behavior in production with different effort and advantages. The important part is that this behavior gets measured before going to production and that we keep these monitoring solutions in production to get vital information about our running systems.

Author Bio: Ruben works as a software engineer at Morgan Stanley. He was one of the founders of the Portuguese Java User Group (PT.JUG) and has been working with Java and other languages for the last 10 years. He has done everything from working in consulting, startups and banking on highly-scalable and low-latency systems.

This article previously appeared in JAX Magazine: Pulling Together. For that issues and others, click here.

Image courtesy of edenpictures

Author
RubenBadaro
Ruben works as a software engineer at Morgan Stanley. He was one of the founders of the Portuguese Java User Group (PT.JUG) and has been working with Java and other languages for the last 10 years. He has done everything from working in consulting, startups and banking on highly-scalable and low-latency systems.
Comments
comments powered by Disqus