A la node

Processing on the Grid


Why do traditional cache semantics sometimes struggles to scale, and what can we do about it?

If you ever have the luxury of designing a brand new Java
application there are many, new, exciting and unfamiliar
technologies to choose from. All the flavours of NoSQL stores; Data
Grids; PAAS and IAAS; JEE7; REST; WebSockets; an alphabet soup of
opportunity combined with many programming frameworks both on the
server side and client side adds up to a tyranny of

However, if like me, you have to architect large scale,
server-side, Java applications that support many thousands of users
then there are a number of requirements that remain constant. The
application you design must be high-performance, highly available,
scalable and reliable.

It doesn’t matter how fancy your new lovingly crafted Javascript
Web2.0 user interface is, if it is slow or simply not available
nobody is going to use it. In this article (originally published in
JAX Magazine) I will
try and demystify one of your choices, the Java Data Grid and show
how this technology can meet those constant non-functional
requirements while at the same time taking advantage of the latest
trends in hardware.

Latency: The Performance

When building large scale Java applications the most likely
cause of performance problems in your application is latency.
Latency is defined as the time delay between requesting an
operation, like retrieving some data to process, and the operation
occurring. Typical causes of latency in a distributed Java
application are;

  • IO latency pulling data from disk

  • IO latency pulling data across the network

  • Resource contention for example a distributed lock

  • Garbage Collection pauses

For example typical ping times across a network range from; 57µs
on a local machine; 300 µs on a local LAN segment through to 100ms
from London to New York. When these ping times are combined with
typical network data transfer rates; 25MB – 30MB/s for 1Gb
Ethernet; 250MB/s – 350MB/s for 10Gb Ethernet a careful trade-off
between operation frequency and data granularity must be made to
achieve acceptable performance. Ie. if you have 100MB of data to
process the decision between making 100 calls across the network
each retrieving 1MB, or 1 call retrieving the full 100MB will
depend on the network topology. Network latency is normally the
cause of the developer cry, “It was fast on my machine!”

Latency due to disk IO is also a problem, a typical SSD when
combined with a SATA 3.0 interface can only deliver data at a
sustained data rate of 500-600MB/s so if you have Gigabytes of data
to process disk latency will impact your application

The hardware component with the lowest latency is memory,
typical main memory bandwidth, ignoring Cache hits, is around 3-5
GB/s and scales with the number of CPUs. If you have 2 processors
you will get 10GB/s and with 4 CPUs 20GB/s etc. John McCalpin at
Virginia maintains a memory benchmark called STREAM (http://www.cs.virginia.edu/stream/)
which measures the memory throughput of many computers with some
achieving TB/s with large numbers of CPUs.

In conclusion:

Memory is FAST: And therefore, for
high performance, you should process data in memory.

Network is SLOW: Therefore for
high performance minimise network data transfer.

The question then becomes is it feasible to process many
Gigabytes of data in memory? With the costs of memory dropping it
is now possible to buy single servers with 1TB of memory for only a
few £30K – £40K and the latest SPARC servers are shipping
supporting 32TB of RAM so Big Memory is here.

The other fundamental shift in hardware at the moment is the
processing power of single hardware threads is starting to reach a
plateau with manufactures moving more into providing CPUs with many
cores and many hardware threads. This trend forces us to design our
Java applications in a fashion that can utilise the large number of
hardware threads appearing in modern chips.

Parallel is the Future: For
maximum performance and scalability you must support many hardware


Figure 1: Java
Data Grids

The key benefits of the partitioned key space in a Data Grid
when compared to fully replicated clustered Cache are that the more
JVMs you add the more data you can store and access times for
individual keys are independent of the number of JVMs in the

For example, if we have 20 JVM nodes in our Grid each with 4GB
of free heap available for the storage of objects then we can
store, when taking into account duplicates, 40GB of Java objects.
If we add a further 20 JVM nodes then we can store 80GB. Access
times are constant to read/write objects as the grid will go
directly to the JVM which owns the primary key space for the object
we require.

JSR 107 defines a standards based API to data grids which is
very similar to the java.util.Map API as shown

Listing 1

   public static void main( String[] args )
        CacheManager CacheManager = Caching.getCachingProvider().getCacheManager();
        MutableConfiguration<String, String> config = new MutableConfiguration<String, String>();
        Cache Cache = CacheManager.getCache("C2B2");
        Cache.put("Key", "Value");

Many Data Grids also make use of Java NIO to store Java objects
“off heap” in Java NIO buffers. This has the advantage that we can
increase the memory available for storage without increasing the
latency from garbage collection pause times.

Parallel Processing on the

The problem arises when we store many 10s of GB of Java objects
across the Grid in many JVMs and then want to run some processing
across the data set. For example, we may store objects representing
hotels and their availability on dates. What happens when we want
to run a query like “find all the hotels in Paris with availability
on Valentines day 2015”? If we follow the simple Map API approach
we would need to run code like that shown in Listing 2.

Listing 2

  public static void main( String[] args )
        CacheManager CacheManager = Caching.getCachingProvider().getCacheManager();
        MutableConfiguration<String, String> config = new MutableConfiguration<String, String>();
        Cache hotelCache = CacheManager.getCache("ParisHotels");
Date valentinesDay = new Date(2015,2,14); // I know it is deprecated
for (String hotelName : hotelNames ) {
Hotel hotel = (Hotel)hotelCache.get(hotelName);
if (hotel.isAvailable(valentinesDay)){
System.out.println(“Hotel is available” + hotel);

However the problem with this approach, when accessing a Data
Grid, is that the objects are distributed according to their keys
across a large number of JVMs and every “get” call needs to
serialize the object over the network to the requesting JVM. Using
the listing above this could pull 10s of GB of data over the
network which as we saw earlier is slow.

Thankfully most Java Data Grid products allow you to turn the
processing on its head and instead of pulling the data over to the
code to process they send the code to each of the Grid JVMs hosting
the data and execute it in parallel in the local JVMs. As typically
the code is very small in size only a few KB of data needs to be
sent across the network.

Processing is run in parallel across all the JVMs making use of
all the CPU cores in parallel. Example code, which runs the Paris
query across the Grid, for Oracle Coherence, a popular Data Grid
product is shown in Listing 3 and 4. Listing 3 shows the code for a
Coherence EntryProcessor which is the code that will be serialized
across all the nodes in the data grid.

This EntryProcessor will check each hotel as before to see if
there is availability for Valentine’s day but unlike in Listing 2
it will do so in each JVM on local in-memory data. JSR107 also has
the concept of an EntryProcessor so the approach is common to all
Data Grid products.

Listing 3

Public class HotelSearch implements EntryProcessor {

HotelSearch(Date availability) {
this.availability = availability;

Map processAll(Set hotels) {
Map mapResults = new ListMap();
for (Entry entry : hotels) {
Hotel hotel = (Hotel)entry.getValue();
if (hotel.isAvailable(this.availability)) {




Listing 4 shows the Oracle Coherence code needed
to send this processor across the Data Grid to execute in parallel
in all the grid JVMs.

Listing 4

   public static void main( String[] args )
        NamedCache hotelCache = CacheFactory.getCache("ParisHotels");
Date valentinesDay = new Date(2015,2,14); // I know it is deprecated
Map results = hotelCache.processAll((Filter)null, new HotelSearch(valentinesDay);

Processing data using EntryProcessors as shown in Listings 3 and
4 will result in much greater performance on a Data Grid than
access via the simple Cache API. As only a small amount of data
will be sent across the network and all CPU cores across all the
JVMs will be used to process the search.

Fast Data: Parallel Processing on the

As we’ve seen, using a Data Grid in your next application will
enable you to store large volumes of Java objects in memory for
high performance access in a highly available fashion. This will
also give you large scale parallel processing capabilities that
utilise all the CPU cores in the Grid to crunch through processing
Java objects in parallel. Take a look at Data Grids next time you
have a latency problem or you have the luxury of designing a brand
new Java application.

Steve Millidge is the director and founder of C2B2 Consulting Limited, he has used Java extensively since pre1.0 and has been a field based professional service consultant for over 10 years. Through C2B2 he now focuses on the configuration of JEE and SOA infrastructure for maximum Scalability, Performance, Availability, Recoverability, Manageability and Security. Having worked for and on behalf of Oracle, BEA and Red Hat professional services he has extensive experience of deploying large scale production systems. Steve has spoken at a number of events including Java One, Jax London, UK Oracle User Group Conference, The Server SOA, Cloud & Service Technology Symposium, JBoss World; he is the main organiser of the London JBoss User Group and regularly presents brown bag technical sessions for C2B2's customer base.
comments powered by Disqus