top | item 4180543

AWS is down due to an electrical storm in the US

239 points| aritraghosh007 | 13 years ago |status.aws.amazon.com | reply

163 comments

order
[+] zacharyvoase|13 years ago|reply
By what stretch of the imagination is this icon suitable for representing a total loss of availability due to a power outage?: http://status.aws.amazon.com/images/status2.gif

Is this not a 'service disruption' situation? At the bottom of the page, the yellow icon is associated with 'performance issues'.

If there's one thing that's shocked me about AWS, it's the total failure to acknowledge the severity of service disruptions. Like the above case, or the fact that a 3-hour loss of connectivity is displayed on the service history as a green tick with a small 'i' box: http://oi46.tinypic.com/x5qtch.jpg

[+] flyt|13 years ago|reply
Or that there is absolutely no way to deep link to an ongoing outage, and users must reload, then expand the link every single time, or subscribe to an RSS feed.

AWS needs to blantantly copy Heroku's status system, which is worlds better for people needing fast updates on their infrastructure.

https://status.heroku.com/ vs http://status.aws.amazon.com/

[+] jdub|13 years ago|reply
They've reporting a power issue with a single US East availability zone. There are four EC2 availability zones in US East. Strange that they would cite orange as "performance issues", but it's certainly more appropriate than suggesting a complete service disruption.
[+] hristov|13 years ago|reply
It is obvious we are dealing with the imagination of a marketing exec here. And that is a sick cynical place.
[+] datasage|13 years ago|reply
They do have a red icon if an event impacts an entire region. If a customer is correctly utilizing multiple availability zones, a failure in one zone should only impact the customer until they can fail over (Should be within minutes if they are automated).
[+] astrodust|13 years ago|reply
Clearly Amazon hired Baghdad Bob as their PR guy when he was looking for a new gig.

Apart from Apple's legendary secrecy, Amazon's EC2 is a solid #2 in terms of impenetrability.

[+] underwater|13 years ago|reply
It's a terrible choice, but seems inspired by the triangle shape used for warning signs.
[+] rwl4|13 years ago|reply
Hey! At least Amazon.com is up!
[+] zvrba|13 years ago|reply
An upwards triangle is used as a danger/warning traffic sign.
[+] 1SaltwaterC|13 years ago|reply
We had 0 downtime. The only thing that's screwed up is a read replica of a multi-AZ MySQL on RDS deployment. Amazon did not send any notification. Kinda annoying.
[+] paulsutter|13 years ago|reply
AWS is not down. Only US-East. If your app is down, it's only because you don't care about the availability of your service.

It's pointless to complain. We've all seen before that Amazon can't keep whole regions up. If you rely on a region being up, you will have downtime and it's your fault.

[+] haberman|13 years ago|reply
According to the AWS status page, only one availability zone within US-East is down, not the whole US-East region. Running a highly-available service exclusively from US-East is a reasonable strategy as long as you're spread across multiple availability zones.

I'm not an AWS customer, just reading their docs; please correct me if I'm wrong about any of this.

[+] beedogs|13 years ago|reply
> If your app is down, it's only because you don't care about the availability of your service.

That is absolutely absurd. At what point did the common-sense solution to "unacceptable downtime on AWS" become "buy two of everything"?

[+] oconnore|13 years ago|reply
Robust systems aren't hip. Get back to work and ship, ship, ship.
[+] sehugg|13 years ago|reply
Be careful. Nothing is as it seems right now. Do not trust any API output, nor should you do any API operations that are non-recoverable. Things are up that are reported down and vice-versa.

Wait for the dust to settle. We're all just going to be a bunch of Fonzies here.

EDIT: Looks like API access has been restored, so I'm cautiously optimistic about things working. Note though that some instances may have rebooted or be otherwise impacted so check your error logs.

EDIT2: Nope, ELB is still hosed. Continue to be skeptical.

[+] adrianpike|13 years ago|reply
There's another comment thread going over at (http://news.ycombinator.com/item?id=4180339), if, like me, you got extremely lucky and picked today's lucky availability zones, and have time to read HN instead of scramble to get things back up.

Good luck, friends.

[+] philip1209|13 years ago|reply
I commented along the same lines during the last AWS/Heroku outage, but Rackspace still is giving me amazing value and uptime, and every time I try to move away (as I did this week with my lastest project, on Heroku) I get hit with a massive service disruption that pushes me back to Rackspace.
[+] shawnps|13 years ago|reply
Hi, I work at Rackspace. If you don't mind me asking, what makes you initially want to move away?
[+] bconway|13 years ago|reply
If you're interested in information on the storms themselves and the destruction they caused in West Virginia, there's good coverage here: http://www.foxnews.com/weather/2012/06/30/state-emergency-de...
[+] morsch|13 years ago|reply
Gov. Earl Ray Tomblin in a statement: With temperatures near 100 degrees expected this weekend, it's critical that we get people's power back on as soon as possible.

So let me get this straight: the critical issue with not having electricity after a huge storm is that the A/C isn't working? And 100F/38C isn't even that hot, right?

[+] RegEx|13 years ago|reply
The status page seems to really underplay the severity of the situation. Netflix and Heroku are down, yet these are just side effects of 'performance issues' instead of a 'service disruption'. I wonder what it would take to cross that threshold.
[+] adrianpike|13 years ago|reply
AWS has historically been both slow to update and heavily optimistic with their status page.

When I got the frantic texts when EC2 first dropped offline, sure enough, the AWS status page was all green, but twitter was alight with people talking about it.

I suspect a service disruption would have to be Godzilla.

[+] 16s|13 years ago|reply
Must be the same storm that took several of my trees down (east coast Virginia USA) last night. It was a violent storm. 90 MPH winds. Made 80 foot tall oaks bend like straws and they were almost touching the ground. I spent the morning running the chainsaw just to clear the downed trees from the driveway.

AEP (local power company) says about 65% of customers in this area are w/o power. May be days before it's fully restored. Hope no one from the HN community got hurt.

Edit: I posted this from a computer in town. No power at my place so I can't respond to follow-up posts.

[+] codex|13 years ago|reply
Pardon my rant, but I am frustrated. It seems there is always an excuse with Amazon cloud. Is Google similarly disabled?
[+] kroo|13 years ago|reply
Nope.
[+] batista|13 years ago|reply
If you mean GAE, its even worse...
[+] lsb|13 years ago|reply
Interestingly, this is a great time to see which of your favorite websites are rock-solid and which are kind of shaky.

I've been thinking about building a site with a Parse backend, and they're up, which is good to discover.

[+] jared314|13 years ago|reply
It's like looking for a house in the rain, so you can see where the water drains.
[+] dakrisht|13 years ago|reply
Is this the same EC2 zone that went out just 3-4 days ago??

Second or I believe third power outage/loss of service for AWS in the past 10-days if I'm not mistaken.

This is wild. I wonder what's going on at Amazon and if they're capable of handling this much usage in addition to having power issues, etc.

Instagram and Netflix servers are down from what I hear and have been down for a few hours. Now it makes sense that they're being hosted on AWS.

[+] hendler|13 years ago|reply
If you have a load balancer you may have balanced across availability zones (Not regions) you'd still be up. So US-EAST didn't all go down, just one AZ.
[+] genwin|13 years ago|reply
But many people are saying that despite paying for multi-AZ for RDS, they were still down. Do you think they didn't also load-balance across AZs for their webservers?
[+] gee_totes|13 years ago|reply
Do we know this is due to an electrical storm? Today had a leap second as well (The minute of midnight, June 30th lasted a second longer than normal).
[+] dfc|13 years ago|reply
The leap second has not happened yet[1]:

                                   UTC TIME STEP
                            on the 1st of July 2012
                      
  A positive leap second will be introduced at the end of June 2012.
  The sequence of dates of the UTC second markers will be:		
		
                          2012 June 30,     23h 59m 59s
                          2012 June 30,     23h 59m 60s
                          2012 July  1,      0h  0m  0s
[1] http://hpiers.obspm.fr/iers/bul/bulc/bulletinc.43
[+] dangrossman|13 years ago|reply
This is the kind of weather conditions that spawn very electrically active storms. I don't doubt they could cause the issues. Last night was probably the most electrically active storm I've ever seen up here -- virtually non-stop lightning strikes for an hour or two, and there's another just like it over Virginia right now.

http://i.imgur.com/d5pEP.png

[+] jaredbeck|13 years ago|reply
8:40 PM PDT We can confirm that a large number of instances in a single Availability Zone have lost power due to electrical storms in the area. Amazon Elastic Compute Cloud (N. Virginia)
[+] molecule|13 years ago|reply
reading the linked page, "AWS is down" means "some N. Virginia AWS services are down"
[+] Aloisius|13 years ago|reply
Just pay the extra money and get off US-East people.
[+] rmc|13 years ago|reply
EU-West had similar levels of outage last year due to a lightening strike. Twas out for several hours and took a few days for everything to be back to normal.
[+] maybird|13 years ago|reply
Acts of God can happen anywhere.
[+] danryan|13 years ago|reply
us-west-2 is the same price as us-east-1. Price is no excuse anymore.
[+] dustingetz|13 years ago|reply
how is it that amazon.com itself is never, ever, impacted?

edit: so basically, the businesses suffering outages (heroku, netflix, etc) don't value uptime to the same extent that amazon does. they got what they paid for.

[+] sofuture|13 years ago|reply
Amazon.com does not run on the same EC2 that you and I use. It runs on a nearly identical system that is isolated and private to Amazon. I wouldn't be surprised if they were in entirely different physical locations.
[+] dkulchenko|13 years ago|reply
This outage only affected us-east-1. Considering an Amazon.com outage would cost them $51k in lost sales every minute, I seriously doubt they put all their eggs (servers, that is) in one basket.
[+] suninwinter|13 years ago|reply
It looks like this is affecting iTunes Match, possibly. I have two tracks just sitting there, waiting to upload and running lsof -i shows iTunes with a connection to an AWS machine.