> People don't believe me when I say how much DNS matters.
That's weird to me. I have been working in sysadmin/DevOps for over a decade, but it did not take me very long to learn that DNS outages cause massive problems.
Amazing that down detector manages to stay up during these kinds of outages. Noticed it has been a little slow but they really have done a good job keeping it up even though large portions of the internet is down right now.
It's interesting that they report an AWS outage but there don't seem to be any issues there. Looks like their methodology is a bit too reliant on those speculative tweets from the first 5 minutes of all these sites going down. https://downdetector.com/status/aws-amazon-web-services/
> So many websites are down, are AWS servers down or something?
> Amazon web services is down which is affecting a lot of company web sites and services. Not sure what is going on.
> Miss us? @aldotcom and a whole bunch of other folks have been knocked off the internet by what appears to be an AWS attack/system failure. We'll be back. ?
Just got booted out of Netflix on the PS4 because the console could no longer connect to Sony's license server. Netflix was working just fine by the way.
Ah thats whats going on. Happened to me as well, I just assumed that Sony is neglecting PS4 performance with its new system, while bogging it down with bloatware.
What's frustrating is that DNS is returning an address, instead of just failing, and so macos is caching that value (though it might be cloudflare doing that).
I wonder if this is why LastPass is down. It has completely locked me out of my vault. You'd think it'd continue to work offline in a case like this. :/
It is relatively easy to make DNS highly redundant: just put multiple DNS server in data-centers which are as independent as possible (different geo locations, different ISPs). You can also use different DNS software and different OS (say BSD+Linix) to exclude correlated bugs. Root DNS server AFAIK use different software for this reason.
Problems starts when you want to easy make frequent changes and introduce complex software to manage DNS zones (and complexity usually comes with bugs).
The problem isn't DNS though, is it? The problem is that people don't necessarily use the redundancies on DNS?
The whole reason it takes a domain 24h to fully work with DNS is because it propagates the information other DNS servers, thus making not be a centralized service.
It's an interesting question, as it's always been solved on the server side. All of the current problem is client side. That is, client resolvers that aren't using diverse providers, and only do things like round-robin with long timeouts.
So here's a weird question: Supposing companies multi-home for DNS, or whatever other essential service, via multiple service providers.
Whatever multi-home means, why can't there just be one service provider that does that? And are we sure that these service providers aren't already doing that as best we might hope for? (For instance, Amazon already has multiple zones, etc.)
I suppose the one thing this can't protect against is some sort of political (broadly defined) threat related to the company itself.
Using multiple providers for mostly static DNS is easy, pick one as primary and AXFR to the other and notifications and whatever. Or it's not too hard to keep a zone file in source control and sync it to the providers.
Using multiple providers for fancy DNS, like only providing IPs that pass healthchecks or geotargetting users to datacenters gets pretty hard, because the different providers have similar capabilities, but no uniform interface, so you've either got to do it manually, or you have to build out your own abstraction that is probably limiting.
If possible, insourcing DNS makes the most sense to me, because if you can't keep your service online, it's not the worst if your DNS is offline; and if you can keep your service online, you probably won't mess up your DNS too badly.
Most CDNs offer huge incentives for sending them more traffic, a lot of time you end up in a contract obligated to handle X requests and Y gigabytes of traffic per month. But personally I believe you should never have a single provider for anything - particularly when it’s acceptable for a company to cut you off with no warning or recourse.
So many sites being reported as down, but change your DNS to something else (e.g. Google 8.8.8.8 and 8.8.4.4) and, after flushing your DNS cache, the sites are available. I was unable to get to ups.com or newegg.com (why yes, I am expecting a new toy), but after switching DNS and flushing DNS cache, I was able to get to both.
Specifically, 1.1.1.1 provided bad addresses (as opposed to no addresses), and removing 1.1.1.1 fixed my problem. By then it had returned a bunch of bad addresses and I had to flush my DNS cache.
I am surprised financial institutions don't have any regulation for redundancy. The one that stuck out to me is the Navy Federal Credit Union website being down. I have not had any issues logging into mobile though for some of the reported sites.
> financial institutions don't have any regulation for redundancy
As CTO of a bank, I wasn’t aware of this. So either we wasted a ton of money and time constantly upgrading redundancy and business continuity technologies to satisfy our regulators… or this statement could be mistaken.
I'm not sure how easy it would be to regulate. But yeah. I've got a few short term trades in my brokerage account, and outages really throw a wrench into those.
because the way downdetector works is it just basically counts how many people are searching/visiting for <site> down and if it's much higher than typical it flags the site as down.
So if everyone searched "is google down" and visited the link on downdetector that was returned in the search, that would add to the downdetector count for that site.
Downdetector doesn't actually know if the site is up or down.
Was just browsing a website where the first page of a query worked, but visiting page 2 of the results was returning a DNS error. Was curious how and why only part of the site was down, but it looks like this was the problem as now the whole site is down.
What role does Akamai Edge DNS play in normal internet traffic? DNS responses usually get cached, as far as I understand correctly. And it is usually possible to change your DNS server to e.g. Google's and circumvent the outage. Does Akamai Edge DNS play a role on the server side?
If you use a CDN to front your traffic, you need the CNAME for www (or whatever) to be pointing at their DNS infrastructure, so they can return whichever closest POP is going to serve your traffic.
e.g. dig @1.1.1.1 www.nvidia.com +trace
... various things from the root ...
www.nvidia.com. 7200 IN CNAME www.nvidia.com.edgekey.net.
;; Received 83 bytes from 208.94.148.13#53(ns5.dnsmadeeasy.com) in 35 ms
So the main DNS is fine, but it'll never get an A record because the last link in the chain is toast -- edgekey being Akamai in this case, but all CDNs do this so they can route traffic. Normally, this is a good thing so they can shift traffic within 30 seconds on their side. Unfortunately, it also means it would take nvidia an two hours to point away from Akamai.
The trend these days are DNS TTLs of 60 - 300 seconds, to allow "Cloud agility" or something, so sites are exposed to a much larger risk of authoritative nameservers going down.
Well it's been an hour now since I first noticed the effects and their service status still has no useful information or ETA for a fix. It's just an "emerging issue".
Strange thing about the duration of this outage... From logs I have, it seems to have lasted exactly one hour, from 15:38 to 16:38. Their Twitter account also said "disruption lasted up to an hour", though they incorrectly said it started at 15:46 (did it take 8 minutes for their monitoring to notice?).
That makes me think that whatever the fix was, it had to wait for some one-hour cache to expire before it took effect. I'm very interested to find out what the cache issue was, more so than what the original bug was.
Yes, was trying to do the same. Getting this 2nd jab has been a nightmare. Places listed as walk-in having Moderna, don't and they ran out of it when I went to get my secheduled jab. Ringing 119 just ends up in a dead line, then this outage. Fun.
With all due respect, having also run auth DNS servers in the 90s, and seen the inside of Akamai’s CDN/DNS setup more recently, it isn’t remotely at the same level of scale or sophistication.
DNS is designed to be fault tolerant. Such a design, however, is often not leveraged correctly; the implementation of DNS can be and frequently is subject to SPOFs.
geocrasher|4 years ago
https://soundcloud.com/ryan-flowers-916961339/dns-to-the-tun...
dolni|4 years ago
That's weird to me. I have been working in sysadmin/DevOps for over a decade, but it did not take me very long to learn that DNS outages cause massive problems.
wpasc|4 years ago
Love this.
southerntofu|4 years ago
ricardo81|4 years ago
Frost1x|4 years ago
mvanbaak|4 years ago
pololee|4 years ago
brianjking|4 years ago
patleeman|4 years ago
unknown|4 years ago
[deleted]
kevando|4 years ago
zyberzero|4 years ago
paaaaaaaaul|4 years ago
[deleted]
dbsmith83|4 years ago
So many sites down... and unfortunately not one of them is Twitter
cpgeier|4 years ago
mcintyre1994|4 years ago
> So many websites are down, are AWS servers down or something?
> Amazon web services is down which is affecting a lot of company web sites and services. Not sure what is going on.
> Miss us? @aldotcom and a whole bunch of other folks have been knocked off the internet by what appears to be an AWS attack/system failure. We'll be back. ?
grawprog|4 years ago
dheera|4 years ago
Basically soft-invalidate your local DNS cache but it back from the cache graveyard if DNS is down.
1f60c|4 years ago
Please keep comments like this off HN
tjpnz|4 years ago
lxgr|4 years ago
vmception|4 years ago
hackerbrother|4 years ago
tyingq|4 years ago
halfmatthalfcat|4 years ago
lowbloodsugar|4 years ago
mvelie|4 years ago
roody15|4 years ago
cbeley|4 years ago
eunai|4 years ago
zxcvbn4038|4 years ago
davidjgraph|4 years ago
sakisv|4 years ago
GOV.UK for example uses both aws and gcp for DNS
grishka|4 years ago
citrin_ru|4 years ago
Problems starts when you want to easy make frequent changes and introduce complex software to manage DNS zones (and complexity usually comes with bugs).
hk1337|4 years ago
The whole reason it takes a domain 24h to fully work with DNS is because it propagates the information other DNS servers, thus making not be a centralized service.
tyingq|4 years ago
arberx|4 years ago
jakeschaeffer|4 years ago
https://namebase.io is a "registrar" for it.
toddh|4 years ago
topranks|4 years ago
What’s the single point of failure?
foobarbazetc|4 years ago
I wonder how much they spend on multi-AZ redundant architectures...
orblivion|4 years ago
Whatever multi-home means, why can't there just be one service provider that does that? And are we sure that these service providers aren't already doing that as best we might hope for? (For instance, Amazon already has multiple zones, etc.)
I suppose the one thing this can't protect against is some sort of political (broadly defined) threat related to the company itself.
toast0|4 years ago
Using multiple providers for fancy DNS, like only providing IPs that pass healthchecks or geotargetting users to datacenters gets pretty hard, because the different providers have similar capabilities, but no uniform interface, so you've either got to do it manually, or you have to build out your own abstraction that is probably limiting.
If possible, insourcing DNS makes the most sense to me, because if you can't keep your service online, it's not the worst if your DNS is offline; and if you can keep your service online, you probably won't mess up your DNS too badly.
nexuist|4 years ago
zxcvbn4038|4 years ago
topranks|4 years ago
delgaudm|4 years ago
mcintyre1994|4 years ago
nonfamous|4 years ago
sammy2244|4 years ago
[deleted]
lowbloodsugar|4 years ago
Specifically, 1.1.1.1 provided bad addresses (as opposed to no addresses), and removing 1.1.1.1 fixed my problem. By then it had returned a bunch of bad addresses and I had to flush my DNS cache.
aix1|4 years ago
thunfisch|4 years ago
zhdc1|4 years ago
knaik94|4 years ago
deckard1|4 years ago
toomuchtodo|4 years ago
(a component of my consulting work is reporting to financial regulators for institutions)
Terretta|4 years ago
As CTO of a bank, I wasn’t aware of this. So either we wasted a ton of money and time constantly upgrading redundancy and business continuity technologies to satisfy our regulators… or this statement could be mistaken.
christophilus|4 years ago
brentm|4 years ago
cryptoz|4 years ago
cbono1|4 years ago
sathackr|4 years ago
So if everyone searched "is google down" and visited the link on downdetector that was returned in the search, that would add to the downdetector count for that site.
Downdetector doesn't actually know if the site is up or down.
memco|4 years ago
katbyte|4 years ago
sebyx07|4 years ago
schemathings|4 years ago
mvanaltvorst|4 years ago
uncertainrhymes|4 years ago
e.g. dig @1.1.1.1 www.nvidia.com +trace
... various things from the root ...
www.nvidia.com. 7200 IN CNAME www.nvidia.com.edgekey.net. ;; Received 83 bytes from 208.94.148.13#53(ns5.dnsmadeeasy.com) in 35 ms
So the main DNS is fine, but it'll never get an A record because the last link in the chain is toast -- edgekey being Akamai in this case, but all CDNs do this so they can route traffic. Normally, this is a good thing so they can shift traffic within 30 seconds on their side. Unfortunately, it also means it would take nvidia an two hours to point away from Akamai.
carlsborg|4 years ago
So for example:
Top level domain for nvidia resolved fine..
dig @1.1.1.1 nvidia.com => status: NOERROR, Nameservers are ns6.dnsmadeeasy.com
But the website didnt. dig @1.1.1.1 www.nvidia.com => status: SERVFAIL,
The Nameserver for the this www.nvidia resolved to the akamai nameserver which had a problem..
dig @1.1.1.1 www.nvidia.com NS => CNAME e33907.a.akamaiedge.net.
r1ch|4 years ago
NeckBeardPrince|4 years ago
Clearly a big one.
twalichiewicz|4 years ago
SandroG|4 years ago
Multiple websites including DraftKings, Airbnb, FedEx, Delta and others appear to be experiencing issues.
https://www.bloomberg.com/news/articles/2021-07-22/multiple-...
00deadbeef|4 years ago
realSaddy|4 years ago
ssully|4 years ago
unknown|4 years ago
[deleted]
00deadbeef|4 years ago
jonnyone|4 years ago
testplzignore|4 years ago
That makes me think that whatever the fix was, it had to wait for some one-hour cache to expire before it took effect. I'm very interested to find out what the cache issue was, more so than what the original bug was.
unknown|4 years ago
[deleted]
swarnie_|4 years ago
This time i think /r/sysadmin pegged the issue first, great sub.
nowahe|4 years ago
unknown|4 years ago
[deleted]
Scoundreller|4 years ago
soheil|4 years ago
aliswe|4 years ago
xyzzy21|4 years ago
didjathinkmess|4 years ago
penultimatebro|4 years ago
It’s just a completely random DNS outage, nothing more.
SjorsVG|4 years ago
ricardo81|4 years ago
SjorsVG|4 years ago
tru3_power|4 years ago
MrRadar|4 years ago
_joel|4 years ago
jamespwilliams|4 years ago
Eikon|4 years ago
https://www.apple.com/go/
iruoy|4 years ago
remram|4 years ago
archive.org seems to indicate there was never anything there...
jdlyga|4 years ago
blondie9x|4 years ago
bpye|4 years ago
_joel|4 years ago
throwawaysha|4 years ago
topranks|4 years ago
unknown|4 years ago
[deleted]
unknown|4 years ago
[deleted]
conqrr|4 years ago
[deleted]
fredski42|4 years ago
topspin|4 years ago
rvz|4 years ago
EDIT: So HN can't even take a joke after this? [0]
[0] https://news.ycombinator.com/item?id=27893482
whitepoplar|4 years ago
unemphysbro|4 years ago
mdtancsa|4 years ago
simonswords82|4 years ago
ceejayoz|4 years ago
tootie|4 years ago
sammy2244|4 years ago
gianpaj|4 years ago
How am I going to sell my AMC stock...
swarnie_|4 years ago