The future of cloud computing
The cloud has changed everything. And yet the cloud revolution at the heart of IT is only getting started. As data becomes more and more important, we’re beginning to realise how central a role the database will play in future.
Cloud computing engines today allow businesses to easily extend their IT infrastructure at any time. This means that you can rent servers with only a few clicks, and various software stacks including web-servers, middleware and databases can be installed and run on to these server instances with little-to-no effort. With data continuing to aggregate at a rapid speed, the database is becoming a large part of this infrastructure. By leveraging conventional cloud computing, every business can run its own database stack in cloud the same way as if it were on-premise.
There’s still a huge amount of potential to accelerate speed and efficiency by using a multi-tenant database. For multi-tenant distributed databases, a certain amount of servers in a cloud footprint are set aside for managing databases, but these resources are shared by many users. This opens up the possibility for improving speed and efficiency of the IT infrastructure within organizations. A combined database footprint has massive resources and the ability to parallelize a much wider range of requests than users with their own dedicated servers. Such a setup allows faster run time and avoids the painful sizing and provisioning process associated with on-premise infrastructure and traditional cloud computing. So what should businesses look for when selecting a database solution? A multi-tenant database solution is worth considering given it can help overcome the following challenges.
I – Failure tolerance of distributed systems
By design, distributed systems with state replication are resistant against most forms of single machine failures. Guarding against single machine hardware failures is relatively straightforward. With the distributed database design, every database is hosted on multiple machines that replicate each partition several times. Therefore, in the case of server failure, each system routes traffic to healthy replicas to make sure that data is replicated elsewhere – ensuring higher availability. However, making distributed systems tolerant against software failures is much more difficult due to common cause and presents a difficult challenge. The ultimate power of distributed systems comes from parallelism, but this also means that the same code is executed on every server participating in fulfilling the request. If working on a particular request causes a fatal failure that has a negative impact on the operation of a system or even crashes it, this means the entire cluster is immediately affected.
Sophisticated methods are necessary to avoid such correlated failures, which might be rare, that have devastating effects. One method involves trying each query on a few isolated computational nodes before sending it down to the entire cluster with massive parallelism. Once failures are observed in the sandbox, suspicious requests are immediately quarantined and isolated from the rest of the system.
II – Performance guarantees in a multi-tenant environment.
Another common problem that often manifests itself in public clouds is the “noisy neighbour” issue. When many users share computational resources, it is important to ensure that they are prioritized and isolated properly so that sudden changes in behaviour of one user do not have an adverse impact on another. A common approach for computing engines has been isolation of resources into containers. This requires giving each user a certain sized box that it cannot break out from – providing a level of isolation – however, it’s not flexible in terms of giving users enough resources exactly when they need them. Effective workload scheduling, low-level resource prioritization and isolation are key techniques to achieving a predictable performance.
A multi-tenant database software stack actually provides more opportunities to share and prioritize resources dynamically while providing performance guarantees. This is possible because the database software can manage access of critical resources like a CPU core or a spinning disk through a queue of requests that are accessing the resource. The provisioning process ensures that there are enough aggregated resources in the cluster. However, in the case that some user behaves unpredictably, the software stack is able to control the queues and can make sure that only the offender is affected and other users whose resource usage patterns are unchanged remain unaffected. Additionally, management of requested queues can ensure, through prioritization, that the end user’s latency metrics are optimised by picking the next request from the queue.
III: ACID-compliant transactions: A NoSQL challenge
Another obstacle for massively paralleled distributed systems has been consistency guarantees. For NoSQL distributed databases, ensuring transactional consistency and ACID properties have been a real problem. This is due to the fact that with a distributed database, many nodes have to be involved in processing the transaction and it is not obvious how to act in cases of failure. Plus, the state of the cluster has to be synchronized to ensure consistency, which presents high overheads in a highly distributed environment.
Instead of compromising performance or consistency, investment needs to be made to make database software scale while preserving consistency. For example, transactional consistency can be managed through the use of a transaction log, which can in turn, be distributed and replicated for high throughput and durability.
Distributed databases can serve as a solid foundation for distributed computing that is massively parallel and instantly scalable. In this respect NoSQL technologies and its community can leverage this trend to contribute to the architecture of a “future computer.” By understanding the benefits of a multi-tenant system and adopting the appropriate solutions, organizations can experience instant scalability and massive parallelism within their own data infrastructures.