It's remarkable that the ordinary DNS lookup function in glibc doesn't work if the records aren't in the right order. It's amazing to me we went 20+ years without that causing more problems. My guess is most people publishing DNS records just sort of knew that the order mattered in practice, maybe figuring it out in early testing.
pixl97|1 month ago
CNAMES are a huge pain in the ass (as noted by DJB https://cr.yp.to/djbdns/notes.html)
silverwind|1 month ago
immibis|1 month ago
skywhopper|1 month ago
jeroenhd|1 month ago
If a small business or cloud app can't resolve a domain because the domain is doing something different, it's much easier to blame DNS, use another DNS server, and move on. Or maybe just go "some Linuxes can't reach my website, oh well, sucks for the 1-3%".
Cloudflare is large enough that they caused issues for millions of devices all at once, so they had to investigate.
What's unclear to me is if they bothered to send patches to broken open-source DNS resolvers to fix this issue in the future.
iainmerrick|1 month ago
Based on what we have learned during this incident, we have reverted the CNAME re-ordering and do not intend to change the order in the future.
To prevent any future incidents or confusion, we have written a proposal in the form of an Internet-Draft to be discussed at the IETF.
That is, explicitly documenting the "broken" behaviour as permitted.
fweimer|1 month ago
Dylan16807|1 month ago
Parsing the answer section in a single pass requires more finesse, but does it need fancier data structures than a string to string map? And failing that you can loop upon CNAME. I wouldn't call a depth limit like 20 "a rather low limit on the number of CNAMEs in a response", and max 20 passes through a max 64KB answer section is plenty fast.
unknown|1 month ago
[deleted]