7 things you thought you knew about Garbage Collection that are totally wrong
Alex Zhitnitsky is back to chat about java performance and debunking the stuff that you think you know about garbage collection. Read on for some serious trash-talking.
This post was originally published on the Takipi blog – Java and Scala exception analysis and performance monitoring.
What are the biggest misconceptions about Java Garbage Collection and how’s it really like?
When I was a kid my parents used to tell me that if I’ll not study well I’m going to be a garbage collector. Little did they know, garbage collection is actually kind of awesome. Maybe that’s why even in the Java world many developers misunderstand GC algorithms: How they work, how GC affects their application and what you can do about it. That’s why we’ve turned to Haim Yadid, a Java performance tuning expert, and put the Java performance tuning guide on the Takipi blog.
Triggered by the interest in the performance tuning guide, in this follow-up post we decided to gather some of the popular opinions about garbage collection, and show you why they’re totally wrong. Here are the top 7:
1. There’s only one garbage collector
Nope, and 4 isn’t the right answer either. The HotSpot JVM has a total of 4 garbage collectors: Serial, Parallel / Throughput. CMS, and the new kid on the block G1. But wait there’s more, there are also non-standard garbage collectors and more adventurous implementations like Shenandoah or collectors other JVMs use (like C4, the pauseless collector by Azul).
HotSpot’s default is the Parallel / Throughput collector and often it’s not the best option for your application. For example, the CMS and G1 collectors will cause less frequent GC pauses. But when a pause do comes, its duration will most likely be longer than the one caused by the Parallel collector. On the other hand, the Parallel collector usually achieves higher throughput for the same size of heap.
Takeaway: Choose the right garbage collector for the job depending on your requirements: Acceptable GC pause frequency and duration.
2. Parallel = Concurrent
A Garbage Collection cycle can be either STW (Stop-The-World) and cause a GC pause, or it can be done concurrently without stopping the application. When we go a step further, the GC algorithm itself can be either serial (single threaded) or parallel (multi-threaded).
This is why when we refer to a concurrent GC, it does not necessarily mean it’s done in parallel, and the other way around, when we refer to a serial GC it doesn’t necessarily mean it causes a pause. In the Garbage Collection world, Concurrent and Parallel are two absolutely different terms where Concurrent refers to the GC cycle, and Parallel refers to the GC algorithm itself.
Takeaway: Garbage collection is a 2 step game, the way to invoke a GC cycle and the way it goes about its business are two different things.
3. G1 solves all problems
Introduced in Java 7 and going through a lot of changes and modifications, the G1 collector is the newest addition to the JVMs garbage collectors. The main advantage is that it solves the fragmentation problem that’s common with the CMS collector: GC cycles free chunks of memory from old gen and make it look like swiss cheese until a moment comes where the JVM can’t handle it and has to stop and handle the fragmentation.
But that’s not the end of the story, other collectors can outperform G1 in certain cases. It all depends on what you’re requirements are.
Takeaway: There’s no miracle solution to all GC problems, experimentation is needed to help you choose the right collector for your JVM.
4. Average transaction time is the most important metric to look out for
If you’re only monitoring the average transaction time in your server then you’re missing out on the outliers. There’s low awareness to how devastating this can be to the users of your system. For example, a transaction that would normally take under 100ms, can get affected by a GC pause and take a minute to complete. This can go unnoticeable to anyone but the user if you’re only looking at the average transaction time.
Now consider this scenario for 1% or more of your users and you can see how easily it can be overlooked when you’re only looking at the average. For more latency related issues and the way to get it right, check out Gil Tene’s blog right here.
Takeaway: Keep an eye out on the outliers and know how your system behaves for the 99th percentile (Not that 1%).
5. Reducing new object allocation rates will improve GC behaviour
We can roughly separate the objects in our system to 3: Long-lived objects, where usually there’s not much we can do about them, mid-lived objects, that cause the biggest issues, and short-lived objects, who usually get freed and allocated quickly so they’re gone by the next GC cycle.
The mid-lived objects are the ones that focusing on their allocation rate could bring positive results. Concentrating on the short-lived and long-lived objects wouldn’t usually prove effective, and controlling the mid-lived objects is often a very hard task.
Takeaway: It’s not the object allocation rate alone that throttles your servers, it’s the type of objects in play that cause all the trouble.
6. Tuning can solve everything
If your application needs to keep a large state that changes frequently, there isn’t much benefit you can gain from tuning the heap of your JVM. Long GC pauses will be inevitable. A solution can come on the architectural changes front, where making sure a process that has a critical procedure / bottleneck affecting response time, will not contain a large state.
Large state and responsiveness don’t go well together: Breaking it down to different processes would be the way to go.
Takeaway: Not all issues can be solved through tuning JVM flags, sometimes you simply need to go back to the drawing board.
7. GC logs cause a big overhead
This one is simply not true, especially by the default log settings. The data is extremely valuable and Java 7 introduced hooks to control their size and make sure they will not use up all your hard drive. If you’re not collecting GC log data then you’re missing out on pretty much the only way for you to know how your JVMs garbage collection behaves in production.
There’s usually a 5% upper bound for acceptable GC overhead, it’s a tiny price to pay for being able to know what kind of toll GC pauses take from your system and act on minimizing it.
Takeaway: Use everything in your power to get the most data you can out of your system in production. It’s a whole different world out there.
We hope that these takeaways helped you get a better grasp of how garbage collection really works in Java. Did you recognize some of these issues in your application? Are there more common Garbage Collection mistakes that you see around? Let us know in the comments section below.
Java/Scala developer? Takipi detects all exceptions and errors in your code and tells you why they happen. Installs in just 1 minute: Try Takipi.