Java 9’s new garbage collector: What’s changing? What’s staying?
Oracle’s plans to modernise garbage collection in Java 9 have raised many questions. Uwe Schindler describes what exactly the G1 collector means for Java developers.
Oracle sparked much debate in the Java community with the Java Enhancement Request (JEP) 248, which is one part of the proposed Java 9 features and intends to make the Garbage-First (G1) Collector the new standard garbage collector. But what are the differences between G1 and the current ParallelGC? Is there any action needed for future Java 9 users?
The basics of garbage collection
Garbage collection has some flaws, which may have a negative impact on an application (of course, this depends on the usage scenario). This is due to the fact that the garbage collection itself costs resources and therefore also affects the running application. This can be easily understood with a real world example: The more waste has been piling up and the larger the city is, the longer it takes the garbage trucks to collect, transport away and recycle.
It’s similar with garbage collection in the Java world. With larger heap sizes used by modern applications, a lot more time is wasted trying to determine and clean up Java objects that are no longer referenced. If the whole Java application suddenly stops for several minutes to do this, it becomes a showstopper. Over time, algorithms for collecting, compacting and defragmenting were improved to not interfere too much with the main application.
The developer may choose between several garbage collector algorithms. If you do not specify one, the Java runtime chooses a default depending on the platform (ParallelGC on server class machines). This algorithm works fairy well with smaller heaps, but on large heaps there are phases where it has to stop all threads to do the actual garbage collection. It does this using many parallel threads, but the application has to stop completely to not interfere with the collector. This is called a stop-the-world (STW) phase. Depending on the heap size and the number of live objects, this can take plenty of time – up to several minutes. Of course, that always happens exactly when you don’t want it to happen!
The proposed new Java 9 default garbage collector G1 (Garbage First, G1GC) was useable for Java programs for the first time with Java 7 update 4 (Java 7 GA shipped with it, but you were not able to enable it with default command line options). G1 splits the processing of the “old generation” (long-living Java objects) into several phases, but not all of them are stop-the-world. Because of this the pauses for the program threads are shorter and it can proceed between the phases.
SEE ALSO: Kirk Pepperdine on the G1 for Java 9
Unfortunately, this makes the maintenance overhead larger because the garbage collection threads have to work together with the application threads: There is additional synchronisation needed to not violate the Java memory model! This requires additional memory barriers inserted by the Hotspot VM for several operations. Of course, this reduces the speed of calculations running in hot loops. In most cases, for modern web applications this is much better than stopping the whole thing for very long times.
G1 versus ParallelGC
But one has to read the discussions on the mailing list considering the following aspects: The G1GC was originally designed as a replacement for the Concurrent Mark Sweep Collector (CMS) that is used in many applications today. However, this one is not the default in Java 7 or Java 8! Therefore, many of the comparisons are somewhat vague because the default ParallelGC behaves differently and its STW pause times are much longer and therefore outweigh the benefits of G1GC.
One important point in the discussion was brought in by Google. Google uses a modified CMS Collector in its data centres, which supposedly behaves better than G1GC. The proposal from these folks was to select the already well-proven CMS as a standard for Java 9.
G1 and Apache Lucene
Apache Lucene runs its full test suite for several years on many Oracle JDK versions, using different platform, bitness and garbage collector variants. This was introduced after the problems with the initial Java 7 GA release. Many bugs in the JVM were discovered by that approach. Unfortunately this also discovered some problems happening only when G1GC was used during tests: Sometimes the VM completely died or otherwise the resulting indexes were corrupt (which is worse), because Java wrote corrupted data to disk.
The reason for these problems is the increased complexity of G1GC that deeply interferes with the structures of the Hotspot VM. Due to the increased parallelism of G1, more memory barriers in the VM code are needed to comply with the Java Memory Model. And this is where Lucene’s problems came from. For this reason, Lucene and Elasticsearch developers currently recommend not using G1GC. This was also part of the mailing list discussion about JEP 248.
On the other hand Oracle works very much on G1 to guarantee speed and accuracy. When observing the Lucene builds during recent months, the Lucene team noticed that the errors initially seen no longer occurred. This is also consistent with the statement by Oracle that G1GC is “ready for production” in Java 8 Update 40. However, one may still feeling bad when putting it into production, because some of the errors were never understood; they simply no longer occur – there is nothing more one knows about. But there is a lot going on until the release of Java 9, so hopefully the problems get understood.
In Java 9 it looks like the garbage collector will default to G1GC. Most software out there sets the garbage collector explicitly on the command line anyway, e.g., many application servers have it in their startup script, as well both Lucene-based servers Apache Solr and Elasticsearch. Such products are therefore not automatically affected by a change of the default GC. So everything stays the same! Likewise, software will continue to run that explicitly uses the G1GC today!
In short, there is no need to worry for Java users, because the majority of the software available on the market already sets the garbage collection according to their needs. One can therefore understand JEP 248 rather as a signal to the manufacturers of such software, to deal with the issue and perhaps activate G1GC in the currently available Java 8 Update 45 in their products. But there is also no reason to not continue using Concurrent Mark Sweep Collector.