Achieving Low Latency in the Public Cloud
As all online business becomes more competitive – within the same industry and in the broader realm of customer experience in web and mobile apps – low latency will continue to be a priority. So as a business, how can you ensure the highest performance from your systems when moving to the cloud?
More businesses are moving their technology stacks to the public cloud, showing common concerns from the past are inhibiting them less. The advantages of the cloud are clear, and solutions are in place to address concerns like security, less control, fewer configuration options, and lower performance. Interestingly, cloud application performance remains a top-of-mind priority for businesses that view low latency as a source of competitive advantage. This stance is consistent with the performance aspirations for on-premises deployments, as businesses continually seek to get the most out of their technologies no matter where they are deployed.
This continued prioritization of performance is not surprising considering our technology-oriented world is moving faster and faster, and consumer expectations are universally becoming more demanding. All digital interfaces are essentially in competition with each other. For example, the experiences customers enjoy with banking apps force makers of health and insurance apps to up their game. And as more users adopt online interactions, businesses are challenged to handle the increased load while improving the experience.
Consider the level in which businesses think about performance, especially around low latency.
Card processing companies are performing all transaction approval calculations, including complex fraud detection algorithms, in less than 10 milliseconds. Mobile banking teams are setting internal objectives to reduce data access latency from an average of 500 milliseconds to a maximum of 50 milliseconds. Businesses are measuring performance with very narrow windows of time, representative of the extreme levels of performance required to address the high responsiveness expectations today.
Ensuring High Performance Once You Move to the Cloud
So as a business, how can you ensure the highest performance from your systems when moving to the cloud? It’s not always easy to translate on-premises performance to public cloud performance, even if you’re familiar with the underlying hardware that runs the cloud instances. You may need to do some testing to better understand your systems’ performance characteristics in the cloud. With that understanding, you can choose the appropriate cloud instance sizes. But that’s only a first step in your quest for cloud performance.
You can continue improving performance with several other techniques. This includes application or system tuning, using faster hardware resources, and adding new techniques like in-memory processing. So what’s involved with each of these approaches and which is the best fit for your business?
Tuning Your Current System
Tuning your system to maximize performance sounds easy enough. You roll up your sleeves and attack the problem with application performance monitoring (APM) and other profiling tools to identify the bottlenecks. Then you make adjustments to your code to minimize those bottlenecks. This is a time-consuming process, though, and only gets you so far. Some bottlenecks have no obvious improvement path. Activities like network or disk accesses are limited by physics and often the way to minimize the associated latencies is to reduce those accesses. But this isn’t necessarily easy to do from an application coding standpoint.
Utilize Faster Hardware
Another approach to improving performance is by using faster hardware or even just adding more instances. This often helps with improving throughput, though you still won’t get the optimal level of performance because network and storage media accesses remain the limiting factors. Not to mention, for certain workloads where the data accesses are largely predictable, this is an excess approach that results in overprovisioning and thus more expensive. So, in cases where the data accesses are predictable – that is, a subset of your data set that is most frequently accessed – an in-memory technology might be the right approach to focus on.
Add a Completely new Technology Like In-Memory Processing
The use of in-memory technologies is an increasingly popular way to efficiently add performance in the cloud. These technologies are especially good at managing volatile memory on behalf of applications to speed up data accesses dramatically. While in-memory use has historically been associated with high costs, the decreasing cost of RAM, along with new innovations like Intel Optane, are making in-memory performance very affordable today.
Pure cloud-based caches go a long way in deploying in-memory technologies as they speed up access to data. These “cache-as-a-service” deployments run as central clusters used by multiple teams to store the most frequently accessed data. These centrally managed caches are more efficient than having separate development teams building individual caches for their own applications. As an additional benefit, data sets that are common across development teams can be shared within the cache assuming proper security controls are in place. Also, as a multi-tenant architecture, you can get better resource utilization if you have some applications whose workloads requirements differ from others. As some applications reduce their load, that frees resources in the cache for other applications.
Sometimes this deployment is more than a cache and serves like a database where data is stored long-term for ongoing access. The implementation then becomes a database-as-a-service use case. These are useful for reading and writing state for serverless computing and cloud-based microservices. Since serverless computing frameworks charge for time, reducing the overall window by shrinking data access time can help save costs.
In-memory technologies are about more than just caching and data storage, though. While both in-memory databases (IMDB) and in-memory data grids (IMDG) are used as the foundation for public cloud caches, IMDGs go further for cloud application performance. IMDGs are more than just caching frameworks, which IMDBs are good for, and offer a full distributed computing environment by supporting a framework for delivering code to the instances where the data resides. This strategy known as “data locality” ensures network latency is minimized, especially since small amounts of code are moved to the data, versus moving large chunks of data. By leveraging the data locality capabilities of an IMDG, you have simplified means to run parallelized applications in a distributed manner, which reduces network traffic.
Network traffic can also be further reduced in IMDGs with a capability known as “near-cache.” This feature is like a cache within a cache, where any accesses for data that happens to be on a different node is copied locally and cached there to avoid future network accesses.
Reconsider Where you are Distributing
Another way to reduce latency is with geographic distribution, and since common data is often needed across different sites, an active-active replication capability puts the data closer to end-users. The separate sites are closely synchronized so that a global user base can connect to the nearest site and avoid the unnecessary latency of accessing a distant, central site. The costs of maintaining active-active systems are far less than in the past, making such an implementation more pragmatic today.
As all online business becomes more competitive – within the same industry and in the broader realm of customer experience in web and mobile apps – low latency will continue to be a priority. If you are responsible for backed systems that drive key business goals for your company, and in-memory solutions in the cloud are not yet on your radar, now is a good time to investigate. Start with a search on “in-memory processing” and research the solutions that best fit your needs.