What Does AWS Outage Mean for Cloud?

AWS Downtime Sparks Cloud Debate

Jessica Thornsby

Amazon has suffered a disruption in its EC2 service, sparking furious debate in the community.

On Thursday, April 21st at 1.41 AM PDT, Amazon suffered a
disruption in its EC2 service, which took down a string of
websites, including Hootsuite, Reddit, Foursquare, Heroku, Quora,
and more.

Some have theorised that the outage was caused by “auto-immune disease” where Amazon’s automated
processes began re-mirroring a large number of EBS (Elastic Block
Store) columns. This could have significantly degraded EBS and RDS
performance and availability, and affected more than one
availability zone. Amazon have not yet posted a root cause
analysis.

This outage has caused some to question the future of cloud
computing, although technical Director of Atalanta Systems,
Stephen Nelson-Smith, has argued that we can benefit
from this outage by focusing on what we can learn from the event,
becoming better prepared for future disruptions. He proposes the
following:

  1. Expect, and prepare for, downtime. Nelson-Smith advises Amazon
    Web Services (AWS) users to make use of autoscaling groups, to
    deploy in more than two availability zones, and include headroom
    for load spikes. He also suggests having a written plan, to follow
    in case of downtime.
  2. Think about how you use EBS. He warns against expecting EBS to
    behave in the same way as a NetApp, and expecting to use EBS
    effectively if the network is saturated. He also points out that
    EBS snapshots should not be used as a backup.
  3. Consider working towards a vendor-neutral architecture.

“Outages are part of life – get used to it,” he summarises,
urging the community not to shy away from cloud computing following
the Amazon outage:

“One, albeit major, outage in one region of one cloud vendor
doesn’t mean the cloud was a big con, a waste of time, a marketing
person’s wet dream. The emperor isn’t naked, and the nay-sayers are
simply enjoying their day of ‘I told you so’. The cloud is here to
stay, and brings with it huge benefits to the IT industry. However,
it does require a different approach to building systems. The cloud
is not dead – it’s still great.”

George Reese takes this argument one step
further, and claims that the outage exposes the strong points of
cloud computing: namely, that it puts the developer in control of
application availability. He states that the outage wasn’t Amazon’s
fault; those whose systems failed either deemed an outage an
acceptable risk, or failed to design for Amazon’s cloud computing
model. Reese highlights Netflix as one AWS customer who managed to
keep going throughout the outage. “Try doing that in your private
IT infrastructure with the complete loss of a data center,” he
says.

Software-as-a-Service development company, Lecere’s FIRMS, also
reportedly managed to keep going throughout the outage, by
diverting its cloud-based system to Amazon Web Services’ west coast
service.

However others, such as Klint Finley place the blame on AWS, and not on
customers using EBS: “AWS has been offering the EBS service since
2008. It’s not considered a “beta” product. Why shouldn’t customers
be able to rely on it?” he argues.

Amazon have yet to post a statement regarding the
disruption.

Author
Comments
comments powered by Disqus