The Cloud and Enterprise Java
Java EE Inside GAE
Cloud computing is a fast growing field, which is defining a roadmap for future IT infrastructure. If you want to live on the bleeding edge, you shouldn't just observe it from the sidelines, you should try it! On the other hand, enterprise Java has been around for quite a while, at the moment already in its sixth stride. Over time various expert groups evolved the specifications, fixing the mistakes of previous versions, simplifying the usage despite complexity, and of course adding new interesting technologies.
Companies, as well as developers themselves, have invested a lot of resources, work and knowledge into mastering their Java enterprise technologies, whereas cloud computing mostly, due to its logical restrictions, works completely differently from what we're used to in the heavyweight Java enterprise world. The potential problem is obvious – do we really have to forget and throw away a lot of what we have learned and done in the past few Java enterprise programming years in order to move into the clouds?
Google, after having to develop cloud infrastructure for its own needs, and being in an ideal position to offer cloud services to the public, was one of the first to provide a free PaaS service. The service was named GoogleAppEngine (GAE), and it served Python applications in its initial offering. Due to the decent success of this initial service, and obviously big popularity of Java as an open source enterprise-ready language, they expanded the GAE service to host Java applications as well.
With Java the usual practice is to turn good and successful custom frameworks or projects into a specification. The last two Java EE specifications are good examples of this with their JPA, JSF, EJB3 and the latest CDI and BeanValidation. Meanwhile, the first three mentioned got an upgrade, which just confirms their vast adoption and good user feedback. CDI is a fresh enterprise specification, based on the best-of breed from popular custom application frameworks such as JBoss Seam, Google Guice and Spring framework. It allows simple type-safe configuration, which is extremely useful for large projects, and is built to play nice with today's powerful IDE tools.
Since Google is definitely aware of best-practices, they try to support existing standards as much as possible. But since cloud processing has its specific restrictions, not all is that simple. And this is what is going to be the focus of our article; how to keep standard technologies present, while still respecting the cloud restrictions. We'll look in more detail into how to introduce CDI, and how to transparently and efficiently integrate it with JPA, JSF and BeanValidation, while also replacing the missing EJB3 functionality. We'll also touch upon all of the major GAE restrictions and how to properly tackle them in Java enterprise environment.
Since today's solutions need to be fully tested, a new approach to testing will be presented as well. This new testing allows transparent environment switching, making tests true to Java; run once, test everywhere. And this goes for "our" GAE environment as well.
We won't go too much into details on what is CDI and how it works. This is a topic for a different article. In this one we'll show how to best use it in GAE.
One of the major, if actually not "the" major, GAE restrictions is a 30 seconds request/response time limit. Meaning that if the application doesn't respond in less than 30 seconds, GAE itself terminates that thread and throws an appropriate exception. On the other hand, if the application is not used for a while, GAE simply shuts down all application instances, meaning that the next request will suffer an overhead of new instance booting up. This has lately changed a bit, with the addition of "Always On" feature, but unfortunately doesn't come free of charge.
In order for CDI to work, the CDI runtime needs to inspect all potential beans upon initialization, thus spending quite some time determining and validating an application's beans configuration. Considering that on the first request you might not just serve the request, but also bootstrap the whole application, one must think hard to initialize beans as lazy as possible – that means reducing initialization time processing to the minimum. Another thing to consider is to filter out all the classes that are not actually used as beans in your application. In Weld, the Joss's CDI and RI implementation, we have a few different ways to limit a list of potential beans classes. One way is to define a filtering element in beans.xml, another is to provide an exact bean-classes.txt file with listed beans' class names.
As with every framework, CDI needs to have an entry point into your application. In the web application the easiest way to bootstrap custom framework is usually via servlet listener. CDI, or Weld specifically, is no exception. But, and an important but, this is only an initial bootstrap, that just sets up CDI BeanManager, and application's beans. Since we're in a web application, our logic is probably accessed through servlets and filters. To actually use CDI with those components an additional integration mechanism must be introduced.
For "standard" web containers, such as Tomcat and Jetty, Weld already provides full out-of-the-box CDI integration. But since GAE is not pure Jetty, but a Jetty based fork without all of its features – mostly due to security reasons, we need to find a different way of integrating with CDI.
This is not as difficult as it might look at a first glance. All of CDI implementations leave some room for custom extensions, which can easily be used to get a hold of application's BeanManager instance. This does make your application a bit un-portable between different CDI implementations, but that can easily be abstracted if you actually need it. In our Weld case, we use the fact that BeanManager is also added as an attribute of a servlet context at boot time. This way we can easily create a CDI aware delegate of the initial servlet. The way this works is that non-CDI aware servlet receives the request, and delegates it to CDI aware servlet-like delegate. OK, so we now have our super lazy CDI beans up and running, ready to serve some requests. Let's have a look how these beans can actually do something serious, interacting with your database for example. Onward to JPA!
While GAE has its own low-level persistence API – a simple detyped DataStore, the JavaEE developers are more used to JPA API, therefore GAE offers that as well. But to no surprise, there are quite a few reasonable restrictions one must be aware of.
The actual database is not a relational database as we all know it, but a highly scalable non-relational BigTable. While we normally have a single relational database instance, GAE actually has an unknown number of BigTable instances (or nodes) which – from a user's point of view – it randomly chooses when writing the data. In order to make such writes ACID, any "related" data needs to be on the same node, meaning that JPA relationships need to be carefully crafted. There are already existing OSS frameworks that help you with this relationship problem as much as possible. In our case we developed our own simple solution, on top of standardized JPA, which with the help of proxied entities hides the workaround's implementation details. Each entity instance used is actually a proxy instance that intercepts calls to potential relationships and turns them into proper lookups.
As we already know, a part of JPA is also its 2nd level caching, which in our case comes in very handy. Instead of doing expensive query lookups, we can easily cache a lot of previously looked up data, especially considering that cache comes "cheap" in GAE. Thanks to this feature, it also makes a lot of sense to implement an initial JPA EntityManagerFactory initialization in a very lazy manner. We do not have to actually instantiate EntityManagerFactory if previously cached data can be used to handle the lookup.
JSF, BeanValidation and Cache
In order to make use of JSF2, one must properly setup initial context parameters and unfortunately even use a few hacks. The parameters need to disable any multi-threaded JSF behavior, while that hack we used got us past forbidden class usage; no InitialContext allowed. Another thing to be aware of is that GAE already comes with EL 1.0 (Unifi ed Expression Language library) in its classpath, which due to some strange rule gets used instead of the application shipped EL library – meaning we're stuck with EL 1.0 features, needing some workaround to invoke parameterized bean methods.
The actual implementation usage depends on how many validation features we want to get out of BeanValidation. In our case it turned out we actually just needed a simple bean property validation, making it a lot easier and lightweight to implement a few BeanValidation SPIs on our own, and just use that. Since all actual usage is hidden behind proper BeanValidation API, this makes it trivial to replace with more complex implementations if needed.
As we already mentioned, caching in GAE is "cheap" and you have a lot of it. This should encourage you to try and cache as many things as possible. Of course you should pay high attention to the cache eviction policy and how to properly apply it across all layers in order not to leave any stale data lying around. Another thing to have in mind is that with application version updates the cached data's structure can change, thus breaking serialization contract. Similarly, the GAE UI admin interface allows admin to change data. For that reason one should expose clear-all-cache operation through an application's admin interface.
Another great thing about GAE is its ease of use even in local environment. Setting up testing of your application shouldn't be too hard. As with other technologies we used, our testing framework also should be able to support runtime environment changes. At JBoss we developed two really state-of-the-art testing frameworks to help you easily achieve this – Arquillian and ShrinkWrap. The ShrinkWrap project abstracts actual deployment bits, whereas Arquillian abstracts the actual runtime container aka environment. In order to test your applications in GAE we just needed to code a proper Arquillian container implementation which is capable of running GAE in embedded mode. This way the testing code is environment agnostic, where actual environment is determined by the actual single Arquillian container implementation on the test's classpath.
We can see that developing applications for GAE doesn't mean we need to abandon previously learned JavaEE technologies. However, we do need to be even more careful when using these technologies, evaluating every use case against existing and potential GAE restrictions. We can also see the benefits of using standardized APIs and good frameworks which hide away the environment dependencies.
At the JBoss Weld project we encourage users for any feedback they might have on the existing GAE experience, taking every suggestion, bug fix, patch or criticism into consideration, while making CDI an enjoyable platform to work with on GAE.
Ales Justin will deliver his 'JavaEE on Google App Engine: CDI to the Rescue!' session at JAXconf with even more info on using Java EE specs within GAE's restrictive sandbox, while still benefitting from the scalable environment it provides and maintaining portability to other Java EE containers. JAXconf will run from June 20th-23rd, 2011, in San Jose, California. For more information on the conference, please visit the JAXconf website.