AWS is down due to an electrical storm in the US

[+] zacharyvoase|13 years ago|reply

By what stretch of the imagination is this icon suitable for representing a total loss of availability due to a power outage?: http://status.aws.amazon.com/images/status2.gif

Is this not a 'service disruption' situation? At the bottom of the page, the yellow icon is associated with 'performance issues'.

If there's one thing that's shocked me about AWS, it's the total failure to acknowledge the severity of service disruptions. Like the above case, or the fact that a 3-hour loss of connectivity is displayed on the service history as a green tick with a small 'i' box: http://oi46.tinypic.com/x5qtch.jpg

[+] flyt|13 years ago|reply

Or that there is absolutely no way to deep link to an ongoing outage, and users must reload, then expand the link every single time, or subscribe to an RSS feed.

AWS needs to blantantly copy Heroku's status system, which is worlds better for people needing fast updates on their infrastructure.

https://status.heroku.com/ vs http://status.aws.amazon.com/

[+] jdub|13 years ago|reply

They've reporting a power issue with a single US East availability zone. There are four EC2 availability zones in US East. Strange that they would cite orange as "performance issues", but it's certainly more appropriate than suggesting a complete service disruption.

[+] hristov|13 years ago|reply

It is obvious we are dealing with the imagination of a marketing exec here. And that is a sick cynical place.

[+] datasage|13 years ago|reply

They do have a red icon if an event impacts an entire region. If a customer is correctly utilizing multiple availability zones, a failure in one zone should only impact the customer until they can fail over (Should be within minutes if they are automated).

[+] astrodust|13 years ago|reply

Clearly Amazon hired Baghdad Bob as their PR guy when he was looking for a new gig.

Apart from Apple's legendary secrecy, Amazon's EC2 is a solid #2 in terms of impenetrability.

[+] underwater|13 years ago|reply

It's a terrible choice, but seems inspired by the triangle shape used for warning signs.

[+] rwl4|13 years ago|reply

Hey! At least Amazon.com is up!

[+] zvrba|13 years ago|reply

An upwards triangle is used as a danger/warning traffic sign.

[+] 1SaltwaterC|13 years ago|reply

We had 0 downtime. The only thing that's screwed up is a read replica of a multi-AZ MySQL on RDS deployment. Amazon did not send any notification. Kinda annoying.

[+] paulsutter|13 years ago|reply

AWS is not down. Only US-East. If your app is down, it's only because you don't care about the availability of your service.

It's pointless to complain. We've all seen before that Amazon can't keep whole regions up. If you rely on a region being up, you will have downtime and it's your fault.

[+] haberman|13 years ago|reply

According to the AWS status page, only one availability zone within US-East is down, not the whole US-East region. Running a highly-available service exclusively from US-East is a reasonable strategy as long as you're spread across multiple availability zones.

I'm not an AWS customer, just reading their docs; please correct me if I'm wrong about any of this.

[+] beedogs|13 years ago|reply

> If your app is down, it's only because you don't care about the availability of your service.

That is absolutely absurd. At what point did the common-sense solution to "unacceptable downtime on AWS" become "buy two of everything"?

[+] oconnore|13 years ago|reply

Robust systems aren't hip. Get back to work and ship, ship, ship.

[+] sehugg|13 years ago|reply

Be careful. Nothing is as it seems right now. Do not trust any API output, nor should you do any API operations that are non-recoverable. Things are up that are reported down and vice-versa.

Wait for the dust to settle. We're all just going to be a bunch of Fonzies here.

EDIT: Looks like API access has been restored, so I'm cautiously optimistic about things working. Note though that some instances may have rebooted or be otherwise impacted so check your error logs.

EDIT2: Nope, ELB is still hosed. Continue to be skeptical.

[+] adrianpike|13 years ago|reply

There's another comment thread going over at (http://news.ycombinator.com/item?id=4180339), if, like me, you got extremely lucky and picked today's lucky availability zones, and have time to read HN instead of scramble to get things back up.

Good luck, friends.

[+] philip1209|13 years ago|reply

I commented along the same lines during the last AWS/Heroku outage, but Rackspace still is giving me amazing value and uptime, and every time I try to move away (as I did this week with my lastest project, on Heroku) I get hit with a massive service disruption that pushes me back to Rackspace.

[+] shawnps|13 years ago|reply

Hi, I work at Rackspace. If you don't mind me asking, what makes you initially want to move away?

[+] bconway|13 years ago|reply

If you're interested in information on the storms themselves and the destruction they caused in West Virginia, there's good coverage here: http://www.foxnews.com/weather/2012/06/30/state-emergency-de...

[+] morsch|13 years ago|reply

Gov. Earl Ray Tomblin in a statement: With temperatures near 100 degrees expected this weekend, it's critical that we get people's power back on as soon as possible.

So let me get this straight: the critical issue with not having electricity after a huge storm is that the A/C isn't working? And 100F/38C isn't even that hot, right?

[+] RegEx|13 years ago|reply

The status page seems to really underplay the severity of the situation. Netflix and Heroku are down, yet these are just side effects of 'performance issues' instead of a 'service disruption'. I wonder what it would take to cross that threshold.

[+] adrianpike|13 years ago|reply

AWS has historically been both slow to update and heavily optimistic with their status page.

When I got the frantic texts when EC2 first dropped offline, sure enough, the AWS status page was all green, but twitter was alight with people talking about it.

I suspect a service disruption would have to be Godzilla.

[+] 16s|13 years ago|reply

Must be the same storm that took several of my trees down (east coast Virginia USA) last night. It was a violent storm. 90 MPH winds. Made 80 foot tall oaks bend like straws and they were almost touching the ground. I spent the morning running the chainsaw just to clear the downed trees from the driveway.

AEP (local power company) says about 65% of customers in this area are w/o power. May be days before it's fully restored. Hope no one from the HN community got hurt.

Edit: I posted this from a computer in town. No power at my place so I can't respond to follow-up posts.

[+] kryptiskt|13 years ago|reply

According to Colin Percival on Twitter[1][2], the US East-1 AZ has more IP addresses, and thus probably other resources, than the rest of AWS put together. It casts comments about "limited to one availability zone" into some relief.

[1] https://twitter.com/cperciva/status/219067641023840257 [2] https://twitter.com/cperciva/status/219067963356098561

[+] rabbitfang|13 years ago|reply

> the US East-1 AZ

us-east-1 is a region, containing multiple AZ's.

[+] codex|13 years ago|reply

Pardon my rant, but I am frustrated. It seems there is always an excuse with Amazon cloud. Is Google similarly disabled?

[+] kroo|13 years ago|reply

Nope.

[+] batista|13 years ago|reply

If you mean GAE, its even worse...

[+] streeter|13 years ago|reply

You can use the EC2 API and ec2-describe-availability-zones to find out which availability zone is having issues: http://alestic.com/2012/06/ec2-outage-availability-zone

[+] lsb|13 years ago|reply

Interestingly, this is a great time to see which of your favorite websites are rock-solid and which are kind of shaky.

I've been thinking about building a site with a Parse backend, and they're up, which is good to discover.

[+] jared314|13 years ago|reply

It's like looking for a house in the rain, so you can see where the water drains.

[+] myearwood|13 years ago|reply

[deleted]

[+] dakrisht|13 years ago|reply

Is this the same EC2 zone that went out just 3-4 days ago??

Second or I believe third power outage/loss of service for AWS in the past 10-days if I'm not mistaken.

This is wild. I wonder what's going on at Amazon and if they're capable of handling this much usage in addition to having power issues, etc.

Instagram and Netflix servers are down from what I hear and have been down for a few hours. Now it makes sense that they're being hosted on AWS.

[+] hendler|13 years ago|reply

If you have a load balancer you may have balanced across availability zones (Not regions) you'd still be up. So US-EAST didn't all go down, just one AZ.

[+] genwin|13 years ago|reply

But many people are saying that despite paying for multi-AZ for RDS, they were still down. Do you think they didn't also load-balance across AZs for their webservers?

[+] gee_totes|13 years ago|reply

Do we know this is due to an electrical storm? Today had a leap second as well (The minute of midnight, June 30th lasted a second longer than normal).

[+] dfc|13 years ago|reply

The leap second has not happened yet[1]:

                                   UTC TIME STEP
                            on the 1st of July 2012
                      
  A positive leap second will be introduced at the end of June 2012.
  The sequence of dates of the UTC second markers will be:		
		
                          2012 June 30,     23h 59m 59s
                          2012 June 30,     23h 59m 60s
                          2012 July  1,      0h  0m  0s

[1] http://hpiers.obspm.fr/iers/bul/bulc/bulletinc.43

[+] dangrossman|13 years ago|reply

This is the kind of weather conditions that spawn very electrically active storms. I don't doubt they could cause the issues. Last night was probably the most electrically active storm I've ever seen up here -- virtually non-stop lightning strikes for an hour or two, and there's another just like it over Virginia right now.

http://i.imgur.com/d5pEP.png

[+] jaredbeck|13 years ago|reply

8:40 PM PDT We can confirm that a large number of instances in a single Availability Zone have lost power due to electrical storms in the area. Amazon Elastic Compute Cloud (N. Virginia)

[+] molecule|13 years ago|reply

reading the linked page, "AWS is down" means "some N. Virginia AWS services are down"

[+] Aloisius|13 years ago|reply

Just pay the extra money and get off US-East people.

[+] rmc|13 years ago|reply

EU-West had similar levels of outage last year due to a lightening strike. Twas out for several hours and took a few days for everything to be back to normal.

[+] maybird|13 years ago|reply

Acts of God can happen anywhere.

[+] danryan|13 years ago|reply

us-west-2 is the same price as us-east-1. Price is no excuse anymore.

[+] dustingetz|13 years ago|reply

how is it that amazon.com itself is never, ever, impacted?

edit: so basically, the businesses suffering outages (heroku, netflix, etc) don't value uptime to the same extent that amazon does. they got what they paid for.

[+] sofuture|13 years ago|reply

Amazon.com does not run on the same EC2 that you and I use. It runs on a nearly identical system that is isolated and private to Amazon. I wouldn't be surprised if they were in entirely different physical locations.

[+] dkulchenko|13 years ago|reply

This outage only affected us-east-1. Considering an Amazon.com outage would cost them $51k in lost sales every minute, I seriously doubt they put all their eggs (servers, that is) in one basket.

[+] marcuspovey|13 years ago|reply

Cloud taken out by a cloud.

[+] suninwinter|13 years ago|reply

It looks like this is affecting iTunes Match, possibly. I have two tracks just sitting there, waiting to upload and running lsof -i shows iTunes with a connection to an AWS machine.

163 comments