JSR 107

Caching: An introduction to the Caching API (Part 1)

Anatole Tresch
Server farm image via Shutterstock

In the first half of his look at the Caching API, Anatole Tresch explains how to access a cache as well as cache events. While the API might be classified as simple and easy to use, there are some elements that could confuse users or even lead to possible issues.

This article will give a short introduction into the topic of caching. It will start with a general introduction and will wind up to the basics of JSR-107 (Java Caching API, [3]), which is planned to be included in Java EE 8.

General background

A long time ago application performance issues could often easily be handled. They were solved somehow “automatically” because of Moore’s Law [1]. According to this, computing power doubles an average every 8 months. So the only thing to do is drink a coffee and wait a bit then go to the local hardware shop and replace your hardware. Bingo! Problem solved. Unfortunately hardware manufacturers hit the physical limits of speed.

In this regard Amdahl stated [2] that adding additional processing power does not add up to your final throughput linearly as suggested when adding up all your processor’s power. So the increased number of parallel cores in your processors, servers have multiple parallel processors working and an increased the number of servers in your computing cluster does not solve automatically your performance issues. The problem is many software systems written in the past have bigger or smaller portions that perform its work in a serialised way. Meaning one task is exactly executed at a time, the other are blocked, until the current task has been finished. Typical examples are databases, with multiple requests operating on the same data. Without serialisation inconsistencies can occur during runtime (e.g. data read, that is not yet written). With so called isolation levels relational databases and the JDBC standard allow to control things to some extend, but at the end it is still one execution thread somehow synchronised with its IO subsystems.

So you have to actively do something in your software to benefit from the new processing architecture. Especially when your software was not written with parallelism in mind, you might have to rewrite portions of it. But even rewriting all your code may not help, since your backends (e.g. your databases or remote services) are still not performant enough. You are still locked with serialisation. This is where caching may come to help you.

But hold on for a moment, is it always useful to add caching features to your applications? My recommendations are:

  • Don’t do it! In many cases there are other design flaws that render your application slow. Try to tackle them first before adding caching.
  • If you really think, you want to use it, think again twice, if there are not alternate solution approaches that lead to the required performance.
  • OK, so you are sure caching is the only option to solve your issues. Think on the consequences and the locations in your code, where you want to add caching mechanisms. Think about the type of caching for each functionality that should be cached: local only, clustered or distributed. If you have multiple instances serving client requests, local caches may lead to inconsistent responses depending, which of your instances is answering the requests. Shared caching may consume significant resources for synchronising its state. Distributed caching with lots of cache misses may render your system even slower than it was before.
  • Said this, there is another important recommendation is “Measure, don’t trust”. Ensure caching has the performance effects that you expect. If not, don’t do it, since it adds complexity without gains.

As mentioned earlier there are different flavours of caches:

  • Local Caches are only stored in local memory and local storage devices are NOT shared with any other applications.
  • Clustered Caches are aware of each other cache instance in a cluster and support synchronisation with its peer instances. Typically the cache contents is fully mirrored.
  • Distributed Caches distribute the cache contents across the cluster to maximise retrieval and memory efficiency. Data redundancy also prevents (to some extent) cache data loss in case of failures.

In real world applications all these flavours can co-exist. The performance and consistency requirements as well the use cases define, which caching flavour matches best for a given functionality. The good thing is that, from a user’s perspective, all this different flavours of caches can still be accessed using the same API. So let’s us have a deeper look into the Java Caching API (JSR 107).

What is a Cache?

In general a cache is an easy accessible data structure that allows thread-safe access to in-memory data. Given that, a cache basically can be modelled as follows:

public interface Cache<K,V> extends Iterable<Cache.Entry>K, V>>, Closeable{
  String getName();
  V get(K key);
  Map<K, V> getAll(Set<? extends K> keys);
  boolean containsKey(K key);
  void put(K key, V value);
  void putAll(Map<? extends K, ? extends V> map);
  boolean putIfAbsent(K key, V value);
  boolean remove(K key);
  void removeAll(Set<? extends K> keys);
  void clear();
  ...
}

Not accidentally these signatures look very similar compared to java.util.Map, only the method getName() is not present, which allows to easily distinguish different caches (note that JSR 107 defines some restrictions and naming conventions on cache names, not further discussed here). Nevertheless there are also some subtle differences compared to Map. put, putAll and putIfAbsent do not return any previously stored item. One of the reasons is that the value simply may not be known (or easily accessible) in cases where a cache is a distributed cache, because the current value is not locally present.

Accessing a Cache

But how can we create and access a cache? The most simple way is using the Caching singleton by calling:

Cache<String,String> cache = Caching.getCache(“myCache”,
                                              String.class, String.class);

This will use the default CachingProvider and CachingManager. Nevertheless for many use cases you will have to use the more complicated parts of the API. For example if you want to create a new Cache instance and configure it programmatically you might have to write the following code:

CacheManager manager =
                 Caching.getCachingProvider().getCacheManager();

MutableConfiguration<String, String> config =
                               new MutableConfiguration<String, String>();
config.setTypes(String.class, String.class)
      .setStoreByValue(false)
      .setStatisticsEnabled(true)
      .setExpiryPolicyFactory(FactoryBuilder.factoryOf(
         new AccessedExpiryPolicy(new Duration(TimeUnit.HOURS, 1))));

Cache cache = manager.createCache("helloCache", config);

So let’s start going deeper with the CacheManager interface, which basically provides methods to access and create caches:

public interface CacheManager{
   CachingProvider getCachingProvider();
   <K, V, C extends Configuration<K, V>> Cache<K, V> createCache(
                 String name, C configuration)
      throws IllegalArgumentException;

   <K, V> Cache<K, V> getCache(String name,
                               Class<K> keyType, Class<V> valueType);
   <K, V> Cache<K, V> getCache(String name);
   Iterable<String> getCacheNames();

   void destroyCache(String cacheName);
   void enableManagement(String cacheName, boolean enable);
   void enableStatistics(String cacheName, boolean enable);

   URI getURI();
   ClassLoader getClassLoader();
   Properties getProperties();

   void close();
   boolean isClosed();
   <T> T unwrap(Class<T> cacheImplClass);
}

During runtime, multiple CacheManagers can be co-existing. CacheManagers can be obtained from a CachingProvider, which is the SPI interface defined by the JSR. For extended functionality the manager instance can be unwrapped to one of the effective implementation types.

Finally CachingProviders finally can be accessed from the Caching singleton:

CachingProvider prov = Caching.getCacheProvider();

This will return the default CachingProvider, which can be configured by setting the javax.cache.CachingProvider system property. Alternately, if more than one CachingProvider is registered, you can select one using:

  • Its fully qualified class name
  • Its ClassLoader
  • Or both

So let’s have a quick look at the Caching singleton:

public final class Caching {

  private Caching() { }

  public static ClassLoader getDefaultClassLoader() ;
  public static void setDefaultClassLoader(ClassLoader classLoader);
  public static CachingProvider getCachingProvider();
  public static CachingProvider getCachingProvider(
                                       ClassLoader classLoader);
  public static Iterable<CachingProvider> getCachingProviders();
  public static Iterable<CachingProvider> getCachingProviders(
                                       ClassLoader classLoader);
  public static CachingProvider getCachingProvider(
                                       String fullyQualifiedClassName) ;
  public static CachingProvider getCachingProvider(
                                       String fullyQualifiedClassName,
                                       ClassLoader classLoader);
  public static <K, V> Cache<K, V> getCache(String cacheName,
                                       Class<K> keyType,
                          	         Class<V> valueType);
}

Honestly, many of you may say this class looks weird, because clients using the Caching API either only can refer to defaults or client code must explicitly pass implementation class names to identify the target provider! Also setting a (possibly) globally shared default Classloader may lead to unpredictable side effects within a multi-Classloader runtime, such as Java EE or OSGI. Additionally passing Strings representing fully qualified class names to identify CachingProviders is in my opinion unnecessary: it might be better to allow the providers to have arbitrary unique names, which makes them more easily eligible for mocking. Unfortunately all this behaviour of the Caching singleton is hard-coded into the JSR’s API. There is nothing such as an SPI that is backing the Caching singleton, giving users control of what is happening inside.

Finally, if you have successfully obtained a CacheProvider, you will have to obtain a CacheManager:

CacheManager getCacheManager(
                        URI uri, ClassLoader cl, Properties properties);
CacheManager getCacheManager(URI uri, ClassLoader cl);
CacheManager getCacheManager();

Hereby you can rely on the defaults for URI, ClassLoader and Properties or pass the values matching your use case. The exact behaviour and URIs valid are vendor specific.

Cache Events

When performing operations on a cache, corresponding CacheEntryEvents are emitted, such as CREATEDUPDATED, REMOVED or EXPIRED. They can be consumed by implementing any of the four child interfaces of CacheEntryListener.

Such listeners can be registered by passing an instance of CacheEntryListenerConfiguration. This interface also allows to set additional listener properties and an optional CacheEntryEventFilter. A cache listener configuration can be passed when a cache is created or later by calling Cache.registerCacheEntryListener. For convenience JSR 107 provides a MutableCacheEntryListenerConfiguration class, which already implements CacheEntryListenerConfiguration.

Summary and references

This post has shown the basic parts of the JSR 107 API. The API is relatively simple and easy to use. Nevertheless, there are some aspects that in my opinion may confuse users or even lead to possible issues. But there is still a lot more to uncover, so stay tuned for the next post on this topic.

[1] Moore’s Law: G. E. Moore: Cramming more components onto integrated circuits. In: Electronics. 38, Nr. 8, 1965, S. 114–117.

[2] Amdahl’s Law: Gene Amdahl: Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities. In: AFIPS Conference Proceedings. 30, 1967, S. 483–485.

[3] JSR 107: https://jcp.org/en/jsr/detail?id=107

Author
Java EE

Anatole Tresch

Anatole Tresch studied economics and computer science and spent several years as a managing partner and consultant. He previously worked as a Technical Architect and Coordinator at Credit Suisse. Currently, Anatole is the Principal Consultant for the Trivadis AG, the specification lead of JSR 354 (Java Money & Currency) and PPMC member of Apache Tamaya. You can find on on Twitter via @atsticks.


Comments
comments powered by Disqus