Summary of the AWS Service Event in the US East Region
...from Amazon <http://aws.amazon.com/message/67457/>
The event was triggered during a large scale electrical storm which swept through the Northern Virginia area
Though the resources in this datacenter, including Elastic Compute Cloud (EC2) instances, Elastic Block Store (EBS) storage volumes, Relational Database Service (RDS) instances, and Elastic Load Balancer (ELB) instances, represent a single-digit percentage of the total resources in the US East-1 Region, there was significant impact to many customers. The impact manifested in two forms. The first was the unavailability of instances and volumes running in the affected datacenter. This kind of impact was limited to the affected Availability Zone. Other Availability Zones in the US East-1 Region continued functioning normally. The second form of impact was degradation of service “control planes” which allow customers to take action and create, remove, or change resources across the Region. While control planes aren’t required for the ongoing use of resources, they are particularly useful in outages where customers are trying to react to the loss of resources in one Availability Zone by moving to another.
Systems Affected
Timeline - June 29-30, 2012
Time (PDT)
System
Event
8:04 pm
all
Servers began losing power
8:21 PM
Amazon status update: We are investigating connectivity issues for a number of instances in the US-EAST-1 Region
9:10pm
Control plane
control plane functionality was restored for the Region
10pm
RDS
a large number of the affected Single-AZ RDS instances had been brought online
11pm
The remaining Multi-AZ instances were processed when EBS volume recovery completed for their storage volumes.
between 11:15pm PDT and just after midnight
EC2
Instances came back online
2:45am
EBS
90% of outstanding volumes had been turned over to customers
Note, Amazon seems strangely ambiguous on timing around the ELB outage
Summary of Amazon control plane issues
Why this is important?
Details:
Some AWS Hosted Companies Affected
More on Netflix
Why did Netflix go out?
What went right for Netflix?
Sources used
Amazon Services Dashboard