DNS is something you rarely change that has costly consequences if you mess it up: It can bring down an entire domain and keep it down until TTL passes.
If you set your TTL to an hour, it raises the costs of DNS issues a lot: A problem that you fix immediately turns into an hour-long downtime. A problem that you don't fix on the first attempt and have to iteratively try multiple fixes turns into an hour-per-iteration downtime.
Setting a low TTL is an extra packet and round-trip per connection; that's too cheap to meter [1].
When I first started administering servers I set TTL high to try to be a good netizen. Then after several instances of having to wait a long time for DNS to update, I started setting TTL low. Theoretically it causes more friction and resource usage but in practice it really hasn't been noticeable to me.
[1] For the vast majority of companies / applications. I wouldn't be surprised to learn someone somewhere has some "weird" application where high TTL is critical to their functionality or unit economics but I would be very surprised if such applications were relevant to more than 5% of websites.
The big thing that articles like this miss completely is that we are no longer in the brief HTTP/1.0 era (1996) where every request is a new TCP connection (and therefore possibly a new DNS query).
In the HTTP/1.1 (1997) or HTTP/2 era, the TCP connection is made once and then stays open (Connection: Keep-Alive) for multiple requests. This greatly reduces the number of DNS lookups per HTTP request.
If the web server is configured for a sufficiently long Keep-Alive idle period, then this period is far more relevant than a short DNS TTL.
If the server dies or disconnects in the middle of a Keep-Alive, the client/browser will open a new connection, and at this point, a short DNS TTL can make sense.
(I have not investigated how this works with QUIC HTTP/3 over UDP: how often does the client/browser do a DNS lookup? But my suspicion is that it also does a DNS query only on the initial connection and then sends UDP packets to the same resolved IP address for the life of that connection, and so it behaves exactly like the TCP Keep-Alive case.)
> patched an Encrypted DNS Server to store the original TTL of a response, defined as the minimum TTL of its records, for each incoming query
The article seems to be based on capturing live dns data from some real network. While it may be true that persistent connections help reduce ttl it certainly seems like the article is accounting for that unless their network is only using http1.0 for some reason.
I agree that low TTL could help during an outage if you actually wanted to move your workload somewhere else, and I didn't see it mentioned in the article, but I've never actually seen this done in my experience, setting TTL extremely low for some sort of extreme DR scenario smells like an anti pattern to me.
Consider the counterpoint, having high TTL can prevent your service going down if the dns server crashes or loses connectivity.
i was taught this as a matter of professional courtesy in my first job working for an ISP that did DNS hosting and ran its own DNS servers (15+ years ago). if you have a cutover scheduled, lower the TTL at $cutover_time - $current_ttl. then bring the TTL back up within a day or two in order to minimize DNS chatter. simple!
of course, as internet speeds increase and resources are cheaper to abuse, people lose sight of the downstream impacts of impatience and poor planning.
I usually set mine to between an hour and a day, unless I'm planning to update/change them "soon" ... though I've been meaning to go from a /29 to /28 on my main server for a while, just been putting off switching all the domains/addresses over.
Maybe this weekend I'll finally get the energy up to just do it.
I guess I'm not sure I understand the solution. I use a low value (idk 15 minutes maybe?) because I don't have a static ip and I don't want that to cause issues. It's just me to my home server so I'm not adding noticable traffic like a real company or something, but what am I supposed to do? Is there a way for me to send an update such that all online caches get updated without needing to wait for them to time out?
For a private server with not many users this is mostly irrelevant. Use low ttl if you want to, since you're putting basically 0 load on the DNS system.
> such that all online caches get updated
There's no such thing. Apart from millions of dedicated caching servers, each end device will have it's own cache. You can't invalidate DNS entries at that scope.
I used to get more excited about this but even when browsers don't do a DNS prefetch (or even a complete preload) the latency for lookups is usually still so low on the list of performance impacting design decisions that it is unlikely to ever outweigh even the slightest advantages (or be worth correcting misperceived advantages) until we all switch to writing really really REALLY optimized web solutions.
That's not how TTL works. Or do you mean propagation after changing an existing RR?
It's "common" to lower a TTL in preparation for a change to an existing RR, but you need to make sure you lower it at least as long as the current TTL prior to the change. Keeping the TTL low after the change isn't beneficial unless you're planning for the possibility of reverting the change.
A low TTL on a new record will not speed propagation. Resolvers either have the new record cached or they don't. If it's cached, the TTL doesn't matter because it already has the record (propogated). If it doesn't have it cached, then it doesn't know the TTL so doesn't matter if it's 1 second or 1 month.
Maybe, but I don't think TTL matters for speed of initial propagation. I do set it low when I first configure a website so I don't have to wait hours to correct a mistake I might not have noticed.
I have mine set low on some records because I want to be able to change the IP associated with specific RTMP endpoints if a provider goes down. The client software doesn't use multiple A records even if I provide them, so I can't use that approach; and I don't always have remote admin access to the systems in question so I can't just use straight IPs or a hostfile.
Because unless your TTL is exceptionally long you will almost always have a sufficient supply of new users to balance. Basically you almost never need to move old users to a new target for balancing reasons. The natural churn of users over time is sufficient to deal with that.
Failover is different and more of a concern, especially if the client doesn't respect multiple returned IPs.
Why do you need a low ttl for those? You can add multiple IPs to your A/AAAA records for very basic load balancing. And DNS is a pretty bad idea for any kind of failover. You can set a very low ttl, but providers might simply enforce a larger one.
Perhaps as most these days are using Anycast [1] to do failovers. It's faster and not subject to all the oddities that come with every application having its own interpretation of DNS RFC's most notably java and all its work-arounds that people may or may not be using and all the assorted recursive cache servers that also have their own quirks thus making Anycast a more reliable and predictable choice.
And a similar version of the same blog post on a personal blog in 2019 https://news.ycombinator.com/item?id=21436448 (thanks to ChrisArchitect for noting this in the only comment on a copy from 2024).
csense|29 days ago
If you set your TTL to an hour, it raises the costs of DNS issues a lot: A problem that you fix immediately turns into an hour-long downtime. A problem that you don't fix on the first attempt and have to iteratively try multiple fixes turns into an hour-per-iteration downtime.
Setting a low TTL is an extra packet and round-trip per connection; that's too cheap to meter [1].
When I first started administering servers I set TTL high to try to be a good netizen. Then after several instances of having to wait a long time for DNS to update, I started setting TTL low. Theoretically it causes more friction and resource usage but in practice it really hasn't been noticeable to me.
[1] For the vast majority of companies / applications. I wouldn't be surprised to learn someone somewhere has some "weird" application where high TTL is critical to their functionality or unit economics but I would be very surprised if such applications were relevant to more than 5% of websites.
gertop|1 month ago
a012|1 month ago
compumike|29 days ago
In the HTTP/1.1 (1997) or HTTP/2 era, the TCP connection is made once and then stays open (Connection: Keep-Alive) for multiple requests. This greatly reduces the number of DNS lookups per HTTP request.
If the web server is configured for a sufficiently long Keep-Alive idle period, then this period is far more relevant than a short DNS TTL.
If the server dies or disconnects in the middle of a Keep-Alive, the client/browser will open a new connection, and at this point, a short DNS TTL can make sense.
(I have not investigated how this works with QUIC HTTP/3 over UDP: how often does the client/browser do a DNS lookup? But my suspicion is that it also does a DNS query only on the initial connection and then sends UDP packets to the same resolved IP address for the life of that connection, and so it behaves exactly like the TCP Keep-Alive case.)
hannasm|29 days ago
I agree that low TTL could help during an outage if you actually wanted to move your workload somewhere else, and I didn't see it mentioned in the article, but I've never actually seen this done in my experience, setting TTL extremely low for some sort of extreme DR scenario smells like an anti pattern to me.
Consider the counterpoint, having high TTL can prevent your service going down if the dns server crashes or loses connectivity.
GuinansEyebrows|1 month ago
of course, as internet speeds increase and resources are cheaper to abuse, people lose sight of the downstream impacts of impatience and poor planning.
tracker1|1 month ago
Maybe this weekend I'll finally get the energy up to just do it.
Neywiny|1 month ago
viraptor|1 month ago
> such that all online caches get updated
There's no such thing. Apart from millions of dedicated caching servers, each end device will have it's own cache. You can't invalidate DNS entries at that scope.
zamadatix|1 month ago
garciasn|1 month ago
fukawi2|1 month ago
It's "common" to lower a TTL in preparation for a change to an existing RR, but you need to make sure you lower it at least as long as the current TTL prior to the change. Keeping the TTL low after the change isn't beneficial unless you're planning for the possibility of reverting the change.
A low TTL on a new record will not speed propagation. Resolvers either have the new record cached or they don't. If it's cached, the TTL doesn't matter because it already has the record (propogated). If it doesn't have it cached, then it doesn't know the TTL so doesn't matter if it's 1 second or 1 month.
deceptionatd|1 month ago
deceptionatd|1 month ago
nubinetwork|29 days ago
When you run a website that receives new POSTed information every 60 seconds, you sure do. ;)
arter45|29 days ago
bjourne|1 month ago
kevincox|1 month ago
Failover is different and more of a concern, especially if the client doesn't respect multiple returned IPs.
BitPirate|1 month ago
Bender|1 month ago
[1] - https://en.wikipedia.org/wiki/Anycast
c45y|1 month ago
Relatively simple inside a network range you control but no idea how that works across different networks in geographical redundant setups
jurschreuder|1 month ago
joelthelion|29 days ago
ece|29 days ago
effnorwood|1 month ago
1970-01-01|1 month ago
zamadatix|1 month ago
And a similar version of the same blog post on a personal blog in 2019 https://news.ycombinator.com/item?id=21436448 (thanks to ChrisArchitect for noting this in the only comment on a copy from 2024).
UltraSane|29 days ago