An EBS node in a EBS cluster is connected to two networks. One is used for the traffic to and from the EBS volumes(Primary network), the other is used to replicate the EBS volume on a EBS node to a different EBS node(Secondary network).
Amazon wanted to upgrade the capacity of the primary network. Their standard step doing this is to shift the traffic to a redundant router. This step was executed incorrectly. This resulted in the traffic not being routed to the primary network but instead to the secondary network which has less capacity.
All this traffic satured the secondary network and resulted in the EBS volumes becoming "stuck".
When the traffic got routed the right way all the EBS volumes were trying to remirror. Part of the remirroring process is that the EBS volumes search the cluster for free space to remirror to.
The EBS cluster couldn't handle this load and new capacity was needed for the EBS cluster.Amazon offers a 10 day credit equal to 100% of their usage of EBS Volumes, EC2 Instances and RDS database instances.
This credit will be automatically applied to the next bill.
No comments yet.