DDoS Attack Against Dyn Managed DNS

[+] bhauer|9 years ago|reply

Out of curiosity, why do caching DNS resolvers, such as the DNS resolver I run on my home network, not provide an option to retain last-known-good resolutions beyond the authority-provided time to live? In such a configuration, after the TTL expiration, the resolver would attempt to refresh from the authority/upstream provider, but if that attempt fails, the response would be a more graceful failure of returning a last-known-good resolution (perhaps with a flag). This behavior would continue until an administrator-specified and potentially quite generous maximum TTL expires, after which nodes would finally see resolution failing outright.

Ideally, then, the local resolvers of the nodes and/or the UIs of applications could detect the last-known-good flag on resolution and present a UI to users ("DNS authority for this domain is unresponsive; you are visiting a last-known-good IP provided by a resolution from 8 hours ago."). But that would be a nicety, and not strictly necessary.

Is there a spectacular downside to doing so? Since the last-known-good resolution would only be used if a TTL-specified refresh failed, I don't see much downside.

[+] davidu|9 years ago|reply

OpenDNS does this: https://support.opendns.com/hc/en-us/articles/227987767-Dyna...

It's called SmartCache.

[+] Kalium|9 years ago|reply

Historically, doing this has been a source of a truly awe-inspiring amount of pain.

[+] JoshTriplett|9 years ago|reply

It'd be nice to have a "backup TTL" included, to allow sites to specify whether and how long they wanted such caching behavior.

Also, that cache would need to only kick in when the server was unreachable or produced SERVFAIL, not when it returned a negative result. Negative results returned by the authoritative server are correct, and should not result in the recursive resolver returning anything other than a negative result.

[+] bluejekyll|9 years ago|reply

I've been thinking of adding this exact feature to my DNS framework that I've been working on (if github was resolving): https://github.com/bluejekyll/trust-dns

If you have any feedback, I'd love to hear it.

[+] the_mitsuhiko|9 years ago|reply

> Is there a spectacular downside to doing so? Since the last-known-good resolution would only be used if a TTL-specified refresh failed, I don't see much downside.

Because you would keep old DNS records around forever if a server goes away for good. So you need to have a timeout for that anyways.

[+] DivineTraube|9 years ago|reply

HTTP has a good solution/proposal for this: the server can include a stale-on-error=someTimeInSeconds header in addition to the TTL and then every cache is allowed to continue serving stale data for the specified time while the origin is unreachable. Probably a good idea to include such a mechanism in DNS, too.

https://tools.ietf.org/html/rfc5861

[+] jasimp|9 years ago|reply

I can guarantee you that popular DNS resolvers (think 500b+ transactions a day) do have this feature!

Don't want to say much more due to it being my job, and I don't want to give away too much.

EDIT: https://www.google.com/patents/US8583801

[+] beachstartup|9 years ago|reply

i seem to remember that dns has generally been reliable (until recently, i guess), probably nobody has ever thought that to be necessary.

you could write a cron script that generates a date-stamped hosts file based on a list of your top-used domain names, and simply use that on your machine(s) if your dns ever goes down. that's basically a very simple local dns cache.

if you feel like living dangerously, have it update /etc/hosts directly.

[+] LordHumungous|9 years ago|reply

I think a problem that you might be overlooking is that DNS lookups aren't just failing, they are also very slow when a DDOS attack is underway on the authority servers. This introduces a latency shock to the system which causes cascading failures.

[+] drinchev|9 years ago|reply

All will break the moment that one of the websites that you access makes a server-side request to another website ( think about logging-services, server-clusters, database servers, etc - they all either have IPs or most-likely some domains. )

[+] jedisct1|9 years ago|reply

You can install EdgeDNS locally. It does that, among other things.

https://github.com/jedisct1/edgedns

[+] zwily|9 years ago|reply

This can be pretty bad in a world where AWS ELB IP addresses change regularly.

[+] scrollaway|9 years ago|reply

Relevant (or at least a-propos) post by Bruce Schneier, from a month ago: "Someone Is Learning How to Take Down the Internet"

https://www.schneier.com/blog/archives/2016/09/someone_is_le...

Edit: And to be clear: I don't mean to imply there's any connection :)

[+] tim_armandpour|9 years ago|reply

I wanted to provide an update on the PagerDuty service. At this time we have been able to restore the service by migrating to our secondary DNS provider. If you are still experiencing issues reaching any pagerduty.com addresses, please flush your DNS cache. This should restore your access to the service. We are actively monitoring our service and are working to resolve any outstanding issues. We sincerely apologize for the inconvenience and thank our customers for their support and patience. Real-time updates on all incidents can be found on our status page and on Twitter at @pagerdutyops and @pagerduty. In case of outages with our regular communications channels, we will update you via email directly.

In addition you can reach out to our customer support team at [email protected] or +1 (844) 700-3889.

Tim Armandpour, SVP of Product Development, PagerDuty

[+] pfarnsworth|9 years ago|reply

I had the privilege of being on-call during this entire fiasco today and I have to say I was really really disappointed. It's surprising how broken your entire service was when DNS went down. I couldn't acknowledge anything, and my secondary on-call was getting paged because it looked like I wasn't trying to respond. I was getting phone calls for alerts that wasn't even showing up on the web client, etc. Overall, it caused chaos and I was really disappointed.

[+] kirizt|9 years ago|reply

I appreciate the update, but your service has been unavailable for hours already. This is unacceptable for a service whose core value is to ensure that we know about any incidents.

[+] patrickg_zill|9 years ago|reply

Sorry if this sounds dickish, but renting 3 servers @ $75 apiece from 3 different dedicated server companies in the USA, putting TinyDNS on them, and using them as backup servers, would have solved your problems hours ago.

Even a single quad-core server with 4GB RAM running TinyDNS could serve 10K queries per second, based on extrapolation and assumed improvements since this 2001 test, which showed nearly 4K/second performance on 700Mhz PIII CPUs: https://lists.isc.org/pipermail/bind-users/2001-June/029457....

EDIT to add: and lengthening TTLs temporarily would mean that those 10K queries would quickly lessen the outage, since each query might last for 12 hours; and large ISPs like Comcast would cache the queries for all their customers, so a single successful query delivered to Comcast would have (some amount) of multiplier effect.

[+] pjlegato|9 years ago|reply

"Challenges" is exactly the sort of Dilbertesque euphemism that you should never say in a situation like this.

Calling it a "challenge" implies that there is some difficult, but possible, action that the customer could take to resolve the issue. Since that is not the case, this means either you don't understand what's going on, or you're subtly mocking your customers inadvertently.

Try less to make things sound nice and MBAish, and try more to just communicate honestly and directly using simple language.

[+] nodesocket|9 years ago|reply

Running multiple DNS providers is not actually that difficult and certainly not cost prohibitive. I am sure after this, we will see lots of companies adding multiple DNS providers and switching to AWS Route53 (which has always been solid for me).

[+] AlanBoyce69|9 years ago|reply

How am i meant to see twitter status updates when twitter is down?

[+] recycle|9 years ago|reply

PagerDuty outage is the real low point of this whole situation. Email alerts from PagerDuty that should have alerted of the outage in the first place, only got delivered hours later after the whole mess cleared out.

[+] cjbprime|9 years ago|reply

The outage started more than eight hours before you posted this message..

[+] jssjr|9 years ago|reply

I'm a GitHub employee and want to let everyone know we're aware of the problems this incident is causing and are actively working to mitigate the impact.

"A global event is affecting an upstream DNS provider. GitHub services may be intermittently available at this time." is the content from our latest status update on Twitter (https://twitter.com/githubstatus/status/789452827269664769). Reposted here since some people are having problems resolving Twitter domains as well.

[+] cddotdotslash|9 years ago|reply

I'm curious why you don't host your status page on a different domain/provider? When checking this AM why GitHub was down, I also couldn't reach the status page.

[+] lanna|9 years ago|reply

This is what you can do to restore your GitHub access:

    grep github ~/.ssh/known_hosts
    sudo vim /etc/hosts
    sudo killall -HUP mDNSResponder
    ping github.com

[+] afshinmeh|9 years ago|reply

Just being curious, why don't you use different DNS servers?

[+] 3pt14159|9 years ago|reply

If this is consistently a problem why doesn't Github have fallback TLDs that use different DNS providers? Or even just code the site to work with static IPs. I tried the Github IP and it didn't load, but that could be for an unrelated issue.

[+] jssjr|9 years ago|reply

Another status update from GitHub: "We have migrated to an unaffected DNS provider. Some users may experience problems with cached results as the change propagates."

We're maintaining yellow status for the foreseeable future while the changes to our NS records propagate. If you have the ability to flush caches for your resolver, this may help restore access.

Latest status message: https://twitter.com/githubstatus/status/789565863649304576

[+] BlackGuyCoding|9 years ago|reply

I love how the White House & GH posted a statement on Twitter.. that we can't access since its down.

[+] JoshGlazebrook|9 years ago|reply

I wish you guys used statuspage or at least allowed email updates for the status of GitHub services.

[+] unknown|9 years ago|reply

[deleted]

[+] elwell|9 years ago|reply

To get on github you can add to your /etc/hosts:

    192.30.253.113  github.com
    151.101.32.133  assets-cdn.github.com

And it seems faster than normal right (less users).

Edit; for profile pics include:

    151.101.32.133  avatars0.githubusercontent.com
    151.101.32.133  avatars1.githubusercontent.com
    151.101.32.133  avatars2.githubusercontent.com
    151.101.32.133  avatars3.githubusercontent.com
    151.101.32.133  avatars4.githubusercontent.com
    151.101.32.133  avatars5.githubusercontent.com

[+] Animats|9 years ago|reply

So who was prepared for this? Pornhub:

pornhub.com:

    Name Server: ns1.p44.dynect.net
    Name Server: ns2.p44.dynect.net
    Name Server: ns3.p44.dynect.net
    Name Server: ns4.p44.dynect.net
    Name Server: sdns3.ultradns.biz
    Name Server: sdns3.ultradns.com
    Name Server: sdns3.ultradns.net
    Name Server: sdns3.ultradns.org

ultradns.biz:

    Name Server: PDNS196.ULTRADNS.ORG
    Name Server: ARI.ALPHA.ARIDNS.NET.AU
    Name Server: ARI.BETA.ARIDNS.NET.AU
    Name Server: ARI.GAMMA.ARIDNS.NET.AU
    Name Server: ARI.DELTA.ARIDNS.NET.AU
    Name Server: PDNS196.ULTRADNS.NET
    Name Server: PDNS196.ULTRADNS.COM
    Name Server: PDNS196.ULTRADNS.BIZ
    Name Server: PDNS196.ULTRADNS.INFO
    Name Server: PDNS196.ULTRADNS.CO.UK

[+] Animats|9 years ago|reply

Looks like Pagerduty just dumped Dyn:

pagerduty.com:

    Name Server: NS-219.AWSDNS-27.COM
    Name Server: NS-1198.AWSDNS-21.ORG
    Name Server: NS-1569.AWSDNS-04.CO.UK
    Name Server: NS-739.AWSDNS-28.NET

Pagerduty annoucement: "If you are having issues reaching any pagerduty.com address please flush your DNS cache to resolve the issue."

[+] Animats|9 years ago|reply

Github just added AWS DNS:

github.com:

    Name Server: ns2.p16.dynect.net
    Name Server: ns-1283.awsdns-32.org.
    Name Server: ns-1707.awsdns-21.co.uk.
    Name Server: ns-421.awsdns-52.com.
    Name Server: ns1.p16.dynect.net
    Name Server: ns4.p16.dynect.net
    Name Server: ns3.p16.dynect.net
    Name Server: ns-520.awsdns-01.net.

[+] dEnigma|9 years ago|reply

I was not aware of the attacks going on until this happened:

1. Tried to download "Unknown Horizons" (game featured recently on Hacker News) binary, github-link doesn't work.

2. Think "Ok, might be an old link", google their github-repository, github appears down.

3. Try accessing github status website, is down.

4. Interested, try to visit github status twitter account, twitter is down.

Really weird experience, normally at least the second source of news on a downed website I try during an attack works.

[+] foobarbecue|9 years ago|reply

According to Fortune, Hacker News "reported" on the incident. Are we journalists now?

"Popular tech site Hacker News reported many other sites were affected including Etsy, Spotify, Github, Soundcloud, and Heroku." -- http://fortune.com/2016/10/21/internet-outages/

[+] meshko|9 years ago|reply

Very funny guys, can you stop now? We have a demo in 4 minutes.

[+] chromaton|9 years ago|reply

I can't currently get resolution on www.paypal.com.

$ dig @8.8.8.8 www.paypal.com

; <<>> DiG 9.8.1-P1 <<>> @8.8.8.8 www.paypal.com ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 17925 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION: ;www.paypal.com. IN A

;; Query time: 29 msec ;; SERVER: 8.8.8.8#53(8.8.8.8) ;; WHEN: Fri Oct 21 12:35:33 2016 ;; MSG SIZE rcvd: 32

[+] sly010|9 years ago|reply

I am confused. Are so many big websites using Dyn, or does Dyn have some special role in the DNS chain in the US?

[+] jtmarmon|9 years ago|reply

I'm updating a list of confirmed outages as I see them here https://news.ycombinator.com/item?id=12759520

So far twitter, etsy, soundcloud, spotify, github, pagerduty...crazy that this can even happen

[+] danyork|9 years ago|reply

Journalist and security researcher Brian Krebs believes this is someone doing a DDoS as payback for research into questionable "DDoS mitigation services" that he and Dyn's Doug Madory did. Doug just presented his results yesterday at NANOG and Krebs believes this is payback. Read more: https://krebsonsecurity.com/2016/10/ddos-on-dyn-impacts-twit...

[+] rybosome|9 years ago|reply

I'm wondering, from a regulatory perspective, what might be done to mitigate DDoS attacks in the future?

From comments made on this and other similar posts in the past, I've gathered the following:

1) Malicious traffic often uses a spoofed IP address, which is detectable by ISPs. What if ISPs were not allowed to forward such traffic?

2) There is no way for a service to exert back pressure. What if there was? e.g. send a response indicating the request was malicious (or simply unwanted due to current traffic levels), and a router along the way would refuse to send follow up requests for some time. There is HTTP status code 429, but that is entirely dependent on a well-behaved client. I'm talking about something at the packet level, enforced by every hop along the way.

3) I believe it is suspected that a substantial portion of the traffic is from compromised IoT devices. What if IoT devices were required to continually pass some sort of a health check to make other HTTP requests? This could be enforced at the hardware/firmware level (much harder to change with malware), and, say, send a signature of the currently running binary (or binaries) to a remote server which gave the thumbs up/down.

[+] Animats|9 years ago|reply

Analysis of the Mirai botnet: [1]

This is worth reading. It has links to copies of the code and names the known control servers. Quite a bit is known now about how this thing works.

The bots talk to control servers and report servers. The attacker appears to communicate with the report servers over Tor.

[1] http://blog.level3.com/security/grinch-stole-iot/

[+] Mizza|9 years ago|reply

Although I don't like to to recommend Google products, they provide a provide a public DNS-over-HTTPS interface that should be useful for people who want to add specific entries into their /etc/hosts files: https://dns.google.com/query?name=github.com&type=A&dnssec=t...

[+] Animats|9 years ago|reply

"digikey.com", the big electronic part distributor, is currently inaccessible. DNS lookups are failing with SERVFAIL. Even the Google DNS server (8.8.8.8) can't resolve that domain. Their DNS servers are "ns1.p10.dynect.net" through "ns4.p10.dynect.net", so it's a Dyn problem.

This will cause supply-chain disruption for manufacturers using DigiKey for just-in-time supply.

(justdownforme.com says the site is down, but downforeveryoneorjustme.com says it's up. They're probably caching DNS locally.)

[+] newsat13|9 years ago|reply

Switch to OpenDNS servers - 208.67.222.222 and 208.67.220.220. Even google NS are down it seems. Heroku works after switching to opendns.

[+] bgentry|9 years ago|reply

If you're having issues with people accessing your running Heroku apps, it's likely because you're running your DNS through herokussl.com (with their SSL endpoint product) which is hosted on Dyn.

If you can update your DNS to CNAME directly to the ELB behind it, it should at least make your site accessible.

[+] cm3|9 years ago|reply

Just to be clear, this is a DDoS against Dynect's NS hosts, right?

I'm confused because of the use of "dyn dns", which to me means dns for hosts that don't have static ip addresses.

I'm actually surprised so many big-name sites rely on Dynect, which I hadn't heard of, but more importantly don't seem to use someone else's NS hosts as 2nd or 4th entries.

674 comments