Is this not a 'service disruption' situation? At the bottom of the page, the yellow icon is associated with 'performance issues'.
If there's one thing that's shocked me about AWS, it's the total failure to acknowledge the severity of service disruptions. Like the above case, or the fact that a 3-hour loss of connectivity is displayed on the service history as a green tick with a small 'i' box: http://oi46.tinypic.com/x5qtch.jpg
Or that there is absolutely no way to deep link to an ongoing outage, and users must reload, then expand the link every single time, or subscribe to an RSS feed.
AWS needs to blantantly copy Heroku's status system, which is worlds better for people needing fast updates on their infrastructure.
They've reporting a power issue with a single US East availability zone. There are four EC2 availability zones in US East. Strange that they would cite orange as "performance issues", but it's certainly more appropriate than suggesting a complete service disruption.
They do have a red icon if an event impacts an entire region. If a customer is correctly utilizing multiple availability zones, a failure in one zone should only impact the customer until they can fail over (Should be within minutes if they are automated).
We had 0 downtime. The only thing that's screwed up is a read replica of a multi-AZ MySQL on RDS deployment. Amazon did not send any notification. Kinda annoying.
AWS is not down. Only US-East. If your app is down, it's only because you don't care about the availability of your service.
It's pointless to complain. We've all seen before that Amazon can't keep whole regions up. If you rely on a region being up, you will have downtime and it's your fault.
According to the AWS status page, only one availability zone within US-East is down, not the whole US-East region. Running a highly-available service exclusively from US-East is a reasonable strategy as long as you're spread across multiple availability zones.
I'm not an AWS customer, just reading their docs; please correct me if I'm wrong about any of this.
Be careful. Nothing is as it seems right now. Do not trust any API output, nor should you do any API operations that are non-recoverable. Things are up that are reported down and vice-versa.
Wait for the dust to settle. We're all just going to be a bunch of Fonzies here.
EDIT: Looks like API access has been restored, so I'm cautiously optimistic about things working. Note though that some instances may have rebooted or be otherwise impacted so check your error logs.
EDIT2: Nope, ELB is still hosed. Continue to be skeptical.
There's another comment thread going over at (http://news.ycombinator.com/item?id=4180339), if, like me, you got extremely lucky and picked today's lucky availability zones, and have time to read HN instead of scramble to get things back up.
I commented along the same lines during the last AWS/Heroku outage, but Rackspace still is giving me amazing value and uptime, and every time I try to move away (as I did this week with my lastest project, on Heroku) I get hit with a massive service disruption that pushes me back to Rackspace.
Gov. Earl Ray Tomblin in a statement: With temperatures near 100 degrees expected this weekend, it's critical that we get people's power back on as soon as possible.
So let me get this straight: the critical issue with not having electricity after a huge storm is that the A/C isn't working? And 100F/38C isn't even that hot, right?
The status page seems to really underplay the severity of the situation. Netflix and Heroku are down, yet these are just side effects of 'performance issues' instead of a 'service disruption'. I wonder what it would take to cross that threshold.
AWS has historically been both slow to update and heavily optimistic with their status page.
When I got the frantic texts when EC2 first dropped offline, sure enough, the AWS status page was all green, but twitter was alight with people talking about it.
I suspect a service disruption would have to be Godzilla.
Must be the same storm that took several of my trees down (east coast Virginia USA) last night. It was a violent storm. 90 MPH winds. Made 80 foot tall oaks bend like straws and they were almost touching the ground. I spent the morning running the chainsaw just to clear the downed trees from the driveway.
AEP (local power company) says about 65% of customers in this area are w/o power. May be days before it's fully restored. Hope no one from the HN community got hurt.
Edit: I posted this from a computer in town. No power at my place so I can't respond to follow-up posts.
According to Colin Percival on Twitter[1][2], the US East-1 AZ has more IP addresses, and thus probably other resources, than the rest of AWS put together. It casts comments about "limited to one availability zone" into some relief.
If you have a load balancer you may have balanced across availability zones (Not regions) you'd still be up. So US-EAST didn't all go down, just one AZ.
But many people are saying that despite paying for multi-AZ for RDS, they were still down. Do you think they didn't also load-balance across AZs for their webservers?
UTC TIME STEP
on the 1st of July 2012
A positive leap second will be introduced at the end of June 2012.
The sequence of dates of the UTC second markers will be:
2012 June 30, 23h 59m 59s
2012 June 30, 23h 59m 60s
2012 July 1, 0h 0m 0s
This is the kind of weather conditions that spawn very electrically active storms. I don't doubt they could cause the issues. Last night was probably the most electrically active storm I've ever seen up here -- virtually non-stop lightning strikes for an hour or two, and there's another just like it over Virginia right now.
8:40 PM PDT We can confirm that a large number of instances in a single Availability Zone have lost power due to electrical storms in the area. Amazon Elastic Compute Cloud (N. Virginia)
EU-West had similar levels of outage last year due to a lightening strike. Twas out for several hours and took a few days for everything to be back to normal.
how is it that amazon.com itself is never, ever, impacted?
edit: so basically, the businesses suffering outages (heroku, netflix, etc) don't value uptime to the same extent that amazon does. they got what they paid for.
Amazon.com does not run on the same EC2 that you and I use. It runs on a nearly identical system that is isolated and private to Amazon. I wouldn't be surprised if they were in entirely different physical locations.
This outage only affected us-east-1. Considering an Amazon.com outage would cost them $51k in lost sales every minute, I seriously doubt they put all their eggs (servers, that is) in one basket.
It looks like this is affecting iTunes Match, possibly. I have two tracks just sitting there, waiting to upload and running lsof -i shows iTunes with a connection to an AWS machine.
[+] [-] zacharyvoase|13 years ago|reply
Is this not a 'service disruption' situation? At the bottom of the page, the yellow icon is associated with 'performance issues'.
If there's one thing that's shocked me about AWS, it's the total failure to acknowledge the severity of service disruptions. Like the above case, or the fact that a 3-hour loss of connectivity is displayed on the service history as a green tick with a small 'i' box: http://oi46.tinypic.com/x5qtch.jpg
[+] [-] flyt|13 years ago|reply
AWS needs to blantantly copy Heroku's status system, which is worlds better for people needing fast updates on their infrastructure.
https://status.heroku.com/ vs http://status.aws.amazon.com/
[+] [-] jdub|13 years ago|reply
[+] [-] hristov|13 years ago|reply
[+] [-] datasage|13 years ago|reply
[+] [-] astrodust|13 years ago|reply
Apart from Apple's legendary secrecy, Amazon's EC2 is a solid #2 in terms of impenetrability.
[+] [-] underwater|13 years ago|reply
[+] [-] rwl4|13 years ago|reply
[+] [-] zvrba|13 years ago|reply
[+] [-] 1SaltwaterC|13 years ago|reply
[+] [-] paulsutter|13 years ago|reply
It's pointless to complain. We've all seen before that Amazon can't keep whole regions up. If you rely on a region being up, you will have downtime and it's your fault.
[+] [-] haberman|13 years ago|reply
I'm not an AWS customer, just reading their docs; please correct me if I'm wrong about any of this.
[+] [-] beedogs|13 years ago|reply
That is absolutely absurd. At what point did the common-sense solution to "unacceptable downtime on AWS" become "buy two of everything"?
[+] [-] oconnore|13 years ago|reply
[+] [-] sehugg|13 years ago|reply
Wait for the dust to settle. We're all just going to be a bunch of Fonzies here.
EDIT: Looks like API access has been restored, so I'm cautiously optimistic about things working. Note though that some instances may have rebooted or be otherwise impacted so check your error logs.
EDIT2: Nope, ELB is still hosed. Continue to be skeptical.
[+] [-] adrianpike|13 years ago|reply
Good luck, friends.
[+] [-] philip1209|13 years ago|reply
[+] [-] shawnps|13 years ago|reply
[+] [-] bconway|13 years ago|reply
[+] [-] morsch|13 years ago|reply
So let me get this straight: the critical issue with not having electricity after a huge storm is that the A/C isn't working? And 100F/38C isn't even that hot, right?
[+] [-] RegEx|13 years ago|reply
[+] [-] adrianpike|13 years ago|reply
When I got the frantic texts when EC2 first dropped offline, sure enough, the AWS status page was all green, but twitter was alight with people talking about it.
I suspect a service disruption would have to be Godzilla.
[+] [-] 16s|13 years ago|reply
AEP (local power company) says about 65% of customers in this area are w/o power. May be days before it's fully restored. Hope no one from the HN community got hurt.
Edit: I posted this from a computer in town. No power at my place so I can't respond to follow-up posts.
[+] [-] kryptiskt|13 years ago|reply
[1] https://twitter.com/cperciva/status/219067641023840257 [2] https://twitter.com/cperciva/status/219067963356098561
[+] [-] rabbitfang|13 years ago|reply
us-east-1 is a region, containing multiple AZ's.
[+] [-] codex|13 years ago|reply
[+] [-] kroo|13 years ago|reply
[+] [-] batista|13 years ago|reply
[+] [-] streeter|13 years ago|reply
[+] [-] lsb|13 years ago|reply
I've been thinking about building a site with a Parse backend, and they're up, which is good to discover.
[+] [-] jared314|13 years ago|reply
[+] [-] myearwood|13 years ago|reply
[deleted]
[+] [-] dakrisht|13 years ago|reply
Second or I believe third power outage/loss of service for AWS in the past 10-days if I'm not mistaken.
This is wild. I wonder what's going on at Amazon and if they're capable of handling this much usage in addition to having power issues, etc.
Instagram and Netflix servers are down from what I hear and have been down for a few hours. Now it makes sense that they're being hosted on AWS.
[+] [-] hendler|13 years ago|reply
[+] [-] genwin|13 years ago|reply
[+] [-] gee_totes|13 years ago|reply
[+] [-] dfc|13 years ago|reply
[+] [-] dangrossman|13 years ago|reply
http://i.imgur.com/d5pEP.png
[+] [-] jaredbeck|13 years ago|reply
[+] [-] molecule|13 years ago|reply
[+] [-] Aloisius|13 years ago|reply
[+] [-] rmc|13 years ago|reply
[+] [-] maybird|13 years ago|reply
[+] [-] danryan|13 years ago|reply
[+] [-] dustingetz|13 years ago|reply
edit: so basically, the businesses suffering outages (heroku, netflix, etc) don't value uptime to the same extent that amazon does. they got what they paid for.
[+] [-] sofuture|13 years ago|reply
[+] [-] dkulchenko|13 years ago|reply
[+] [-] marcuspovey|13 years ago|reply
[+] [-] suninwinter|13 years ago|reply