Guest post

The leak hunter faces his toughest challenge yet

NikitaSalnikov-Tarnovski
leak-hunter1

Nikita Salnikov-Tarnovski recounts a nightmare twelve-hour search for the source of an application’s memory leaks.

A
week ago I was asked to fix a problematic webapp suffering
from
memory
leaks
. How hard can it be, I thought –
considering
that I have both seen and fixed hundreds
of leaks
over the past year or
so.

But this one proved to be a challenge. 12 hours
later I had discovered no less than five leaks in the application
and had managed to fix four of them.
I
figured it would be an experience worth sharing.

The application at hand was a simple Java web
application with a few datasources connecting to the relational
databases,
Spring in the middle to
glue stuff together and simple

JSP pages rendered to the end user. No
magic whatsoever. Or so I thought. Boy, was I wrong.

First
stop
 -
MySQL drivers. Apparently the most common
MySQL drivers launches a thread in the background cleaning up your
unused and unclosed connections. So far so good. But the catch is
that the

context classloader
 of this newly created
thread is your web application classloader. Which means that while
this thread is running and you are trying to undeploy your webapp,
its classloader is left dangling
behind
 - with
all the classes loaded in it.

Apparently it took from July 2012 to February
2013 to fix this after the bug was discovered. You can follow the
discussion in
MySQL issue
tracker
. The solution
finally implemented was a

shutdown()
 method to the API,
which you as a developer should know to invoke before
redeploys. Well, I didn’t. And I bet 99% of you out there

didn’t, either.

There is a good place for such shutdown hooks
in your typical Java web application,

namely the
ServletContextListener
 class

contextDestroyed()
 method. This
specific method gets called each and every time the servlet context
is destroyed, which most often happens during redeploys for
example. Chances are that quite
a
few
 developers are aware this place exists, but
how many are actually
realise the need to
clean up in this particular hook?

Back to the application, which was still far
from being fixed. My
second
discovery
 was also related to
context classloaders
 and datasources. When
you are using
com.jdbc.myslq.Driver it
registers itself as a driver in

java.sql.DriverManager
 class. Again,
this is done with good intentions. After
all
, this is
what your application uses
to
figure out how to choose the right driver
for
each query when
connecting to the database URL
. But as you might
guess, there is a catch
: 
this
DriverManager
 is loaded in bootstrap
classloader,
rather than your web
application
’s classloader, so cannot
be unloaded when redeploying your application.

What now makes things really peculiar is that
there is no general way to unregister the driver by yourself. The
reference to the class you are trying to unregister seems to
deliberately hidden from you. In this particular case I was lucky
and the connection pool used in the application was able to
unregister the driver. In case I remember to ask. Looking back to
similar cases in my past
, this was
the first time I saw such a feature implemented in connection pool.
Before that, I once had to enumerate through all the

JDBC drivers
registered with

DriverManager
 to figure out which ones should
I unregister. Not an experience I can recommend to
anyone.

This should be it, I thought. Two leaks in the
same application is already more than one can tolerate. Wrong.
The
third issue staring right at me
from the leak report was
 sun.awt.AppContext with
its static field

mainAppContext
. What? I have no idea what this
class is supposed to do, but I was pretty sure that
th
e application at hand
did
n’t use AWT in
any way. So I started a debugger to find out who loads this class
(and why
). Another
surprise
: it was

com.sun.jmx.trace.Trace.out()
. Can you
think of a good reason why a

com.sun.jmx
 class would call a
sun.awt class? I
certainly can’t
. Nevertheless, that class stack
originated from my connection pool,
BoneCP. And
there
s absolutely zero way to skip that
code line that leads to this particular memory leak. Solution? The
following magic incantation in my
ServletContextListener.contextInitialized():

 

      Thread.currentThread().setContextClassLoader(null);
      // Force the AppContext singleton to be created and initialized without holding reference to WebAppClassLoder
      sun.awt.AppContext.getAppContext();

 

But I still wasn’t done: Something was still leaking. In this
case I found out that our application was binding this datasource
to the
InitialContext()
JNDI tree, a good, standardized
way to bind your objects for future discovery. But again – when
using this nice thing you had to clean up after yourself by
unbinding this datasource from the JNDI tree
in the very same
contextDestroy()
method.

Well, so far we had pretty logical, albeit rare
and somewhat obscure problems
, but with some reasoning
and
 google-fu were quickly
fixed
. My fifth and last problem was
nothing like that. I still had that application crashing
with

OutOfMemoryError: PermGen
. Both Plumbr and Eclipse MAT
reported to me that the culprit, the one who
ha
d taken my classloader hostage,
was a thread named

com.google.common.base.internal.Finalizer
.

“Who the hell is this guy?” – was my last
thought before the darkness engulfed me.

A couple of hours and
four coffees later I found myself staring
at
three lines:

 

    emf.close();
    emf = null;
    ds = null;

 

It is hard to recollect exactly what
happened
during the intervening hours. I
have remote memories of

WeakReferences
,
ReferenceQueues
,
Finalizers
,
Reflection
 and my first time of seeing
a

PhantomReference
 in the wild.
Even today I still cannot fully explain why and
for what purpose my connection pool used finalizers tied to
google’s implementation of reference queue running in a separate
thread.

Nor can I explain why closing

javax.persistence.EntityManagerFactory
(named

emf
 in the code above
and held in static reference in one of application’s own
classes) was not enough; and so I had to manually null this
reference. And similar static reference to the data source used by
that factory. I was sure that Java’s

GC
 could cope with
circular references all day
long
, but
it seems that this magical ring of classes, static references,
object, finalizers and reference queues was too hard even for him.
And so, again for first time in my long career, I had to nullify
java reference.

I am a humble guy and thus cannot claim that I
was the most efficient in finding the cure for all of the above in
a mere 12 hours. But I have to admit I have been dealing with
memory leaks almost exclusively for the past three years. And I
even had my own creation,
Plumbr, helping
me
(in fact, four out of five of
those leaks were discovered by

Plumbr in 30 minutes or so). But to
actually solve those leaks, it took me more than a full working day
in addition.

Overall – something is apparently broken in the
Java EE and/or classloader world. It cannot be normal that a
developer must remember all those hooks and configuration
tricks
, because it simply isn’t possible.
After all, we like to use our heads for something productive. And,
as seen from the workarounds bundled with two popular servlet
containers (
Tomcat and

Jetty
), the problem is
severe.
 Solving it, however, will require more
than simply alleviating some of the symptoms, but curing

the underlying design errors.

Photo by Blai
Biosca
.

Author
NikitaSalnikov-Tarnovski

NikitaSalnikov-Tarnovski

All Posts by NikitaSalnikov-Tarnovski

Nikita Salnikov-Tarnovski is co-founder of Plumbr, the memory leak detection product, where he now contributes his time as a core developer. Besides his daily technical tasks he is an active blogger and conference speaker (Devoxx, TopConf, JavaDay,
Comments
comments powered by Disqus