top | item 41958894

(no title)

fotta | 1 year ago

> Golang HTTP2 clients will reuse the first server they can connect to over and over and the DNS is never re-resolved.

I’m not a DNS expert but shouldn’t it re-resolve when the TTL expires?

discuss

order

__turbobrew__|1 year ago

You nerd sniped me. The guts of how http2 deals with this in golang is in transport.go : https://github.com/golang/go/blob/master/src/net/http/transp...

If I’m reading the code right round trips (HTTP requests) go through queueForIdleConn which picks up any pre-existing connections to a host. The only time these connections are cleaned up (in HTTP2) is if keepalives are turned off and the connection has been idle for too long OR the connection breaks in some way OR the max number of connections is hit LRU cache evictions take place.

Furthermore, the golang dnsclient doesn’t even expose record TTLs to callers so how could the HTTP2 transport know when an entry is stale? https://github.com/golang/go/blob/master/src/net/dnsclient_u...

toast0|1 year ago

It should, but like the sibling, I haven't seen what Go does. I've seen it happen elsewhere. Exchange used to cache any answer it got until it restarted. Java has had that behavior from time to time if you're not careful as well.

Querying DNS can be expensive, so it makes sense to build a cache to avoid querying again when you don't need to, but typical APIs for name resolution such as gethostbyname / getaddrinfo don't return the TTL, so people just assume forever is a good TTL. Especially for a persistant (http) connection, it kind of makes sense to never query DNS again while you already have a working connection that you made with that name, and if it's TLS, it's quite possible that you don't check if the certificate has expired while you're connected or if you do a session resumption.

But innocent things like this add up to make operating services tricky. Many times, if you start refusing connections, clients figure it out, but sometimes the caches still don't get cleared.

fotta|1 year ago

> but typical APIs for name resolution such as gethostbyname / getaddrinfo don't return the TTL

Oh wow I didn’t know this but I looked it up and you’re right. Interesting.

hypeatei|1 year ago

I've seen DNS only be refreshed when restarting on embedded devices I work with too. They use a proprietary HTTP library...

loevborg|1 year ago

I don't know about Golang but I swear I've seen this before as well - clients holding on to an old IP address without ever re-resolving the domain name. It makes me wary of using DNS for load balancing or blue-green deployments. I feel like I can't trust DNS clients.

wink|1 year ago

It's been 8-10 years but when I was serving tracking pixels we were astonished how long we still got requests from residential IPs for whole hostnames we had deprecated. That means I would not trust DNS caching anyway. I'm not talking days here, but months, with a TTL set to mere days.

ignoramous|1 year ago

Some reasons to connect to the same IP: TCP Fast Open, TLS session resumption, connection pools, residual censorship.

kkielhofner|1 year ago

TTL isn't universally respected. Consider the following path:

Your machine -> Local router -> Configured upstream DNS Server (ISP/CF/Quad8/etc) -> ? -> Authoritative DNS Server

Any one of those layers can override/mess with/cache in a variety of ways including TTL. This is why Cloudflare and a variety of other providers use IP anycast. They accepted DNS for what it is and worked around it.

Not only is the IP always the IP, the "global" BGP routing table actually universally and consistently updates much faster than DNS. Then whatever routers, machines, etc downstream from that don't matter.

__turbobrew__|1 year ago

I read through the golang code once due to coming across this issue with kubernetes clients which use the standard golang http client under the hood.

I would need to re-read the code to refresh my memory.

pvtmert|1 year ago

not an expert but overall; unless connection closes for any reason, resolution does not happen.

also, java historically had -1 ttl (eg: infinite) by default. causing a lot of headaches with ephemeral/container services.