(no title)
ipmb | 4 years ago
> 8:22 AM PST We are investigating increased error rates for the AWS Management Console.
> 8:26 AM PST We are experiencing API and console issues in the US-EAST-1 Region. We have identified root cause and we are actively working towards recovery. This issue is affecting the global console landing page, which is also hosted in US-EAST-1. Customers may be able to access region-specific consoles going to https://console.aws.amazon.com/. So, to access the US-WEST-2 console, try https://us-west-2.console.aws.amazon.com/
jesboat|4 years ago
Even this little tidbit is a bit of a wtf for me. Why do they consider it ok to have anything hosted in a single region?
At a different (unnamed) FAANG, we considered it unacceptable to have anything depend on a single region. Even the dinky little volunteer-run thing which ran https://internal.site.example/~someEngineer was expected to be multi-region, and was, because there was enough infrastructure for making things multi-region that it was usually pretty easy.
all_usernames|4 years ago
stevehawk|4 years ago
tekromancr|4 years ago
ithkuil|4 years ago
sangnoir|4 years ago
I'm guessing Google, on the basis of the recently published (to the public) "I just want to serve 5TB"[1] video. If it isn't Google, then the broccoli man video is still a cogent reminder that unyielding multi-region rigor comes with costs.
1. https://www.youtube.com/watch?v=3t6L-FlfeaI
alfiedotwtf|4 years ago
ehsankia|4 years ago
hericium|4 years ago
They're cheap. HA is for their customers to pay more, not for Amazon which often lies during major outages. They would lose money on HA and they would lose money on acknowledging downtimes. They will lie as long as they benefit from it.
sheenobu|4 years ago
balls187|4 years ago
How long before Meta takes over for Facebook?
jabiko|4 years ago
IMHO it should mean that the rate of errors is increased but the service is still able to serve a substantial amount of traffic. If the rate of errors is bigger than, let's say, 90% that's not an increased error rate, that's an outage.
thallium205|4 years ago
jeremyjh|4 years ago
lsaferite|4 years ago
packetslave|4 years ago
STS at least has recently started supporting regional endpoints, but most things involving users, groups, roles, and authentication are completely dependent on us-east-1.
Rantenki|4 years ago
Rapzid|4 years ago
dang|4 years ago
There are also various media articles but I can't tell which ones have significant new information beyond "outage".
stephenr|4 years ago
I opened it again just now (maybe 10 minutes later) and it now shows DynamoDB has issues.
If past incidents are anything to go by, it's going to get worse before it gets better. Rube Goldberg machines aren't known for their resilience to internal faults.
JPKab|4 years ago
Sagemaker is not working, I can't get to my work (notebook instance is frozen upon launch, with zero way to stop it or restart it) and Sagemaker Studio is also broken right now.
The length of this outage has blown my mind.
wahern|4 years ago
Rather, you use AWS because when it is down, it's down for everybody else as well. (Or at least they can nod their head in sympathy for the transient flakiness everybody experiences.) Then it comes back up and everybody forgets about the outage like it was just background noise. This is what's meant by "nobody ever got fired for buying (IBM|Microsoft)". The point is that when those products failed, you wouldn't get blamed for making that choice; in their time they were the one choice everybody excused even when it was an objectively poor choice.
As for me, I prefer hosting all my own stuff. My e-mail uptime is better than GMail, for example. However, when it is down or mail does bounce, I can't pass the buck.
markus_zhang|4 years ago
guenthert|4 years ago
Frost1x|4 years ago
I've broken things before and been aware of it, but didn't acknowledge them until I was confident I could fix them. It allows you to maintain an image of expertise to those outside who care about the broken things but aren't savvy to what or why it's broken. Meanwhile you spent hours, days, weeks addressing the issue and suddenly pull a magic solution out of your hat to look like someone impossible to replace. Sometimes you can break and fix things without anyone even knowing which is very valuable if breaking something had some real risk to you.
czbond|4 years ago
> Dev1: Pushing code for branch "master" to "AWS API". > <slackbot> Your deploy finished in 4 minutes > Dev2: I can't react the API in east-1 > Dev1: Works from my computer
flerchin|4 years ago
tonyhb|4 years ago
giorgioz|4 years ago
Our backend is failing, it's on us-east-1 using AWS Lambda, Api Gateway, S3
pbreit|4 years ago
bamboozled|4 years ago
bobviolier|4 years ago
banana_giraffe|4 years ago