top | item 45235651

(no title)

eklitzke | 5 months ago

A few reasons, I think.

The first is that getaddrinfo is specified by POSIX, and the POSIX evolve very conservatively and at a glacial pace.

The second reason is that specifying a timeout breaks symmetry with a lot of other functions in Unix/C, both system calls and libc calls. For example, you can't specify a timeout when opening a file, reading from a file, or closing a file, which are all potentially blocking operations. There are ways to do these things in a non-blocking manner with timeouts using aio or io_uring, but those are already relatively complicated APIs for just using simple system calls, and getaddrinfo is much more complicated.

The last reason is that if you use the sockets APIs directly it's not that hard to write a non-blocking DNS resolver (c-ares is one example). The thing is though that if you write your own resolver you have to consider how to do caching, it won't work with NSS on Linux, etc. You can implement these things (systemd-resolved does it, and works with NSS) but they are a lot of work to do properly.

discuss

jstimpfle|5 months ago

> For example, you can't specify a timeout when opening a file, reading from a file, or closing a file, which are all potentially blocking operations.

No they're not. Not really, unless you consider disk access and interacting with the page cache/inode cache inside the kernel to be blocking. But if you do that, you should probably also consider scheduling and really any CPU instruction to be blocking. (If the system is too loaded, anything can be slow).

To be fair, network requests can be considered non-blocking in a similar way, but they're depending on other systems that you generally can't control or inspect. In practice you'll see network timeouts. Note that you (at least normally -- there might be tricky exceptions) won't see EINTR from read() to a filesystem file. But you can see EINTR for network sockets. The difference is that, in Unix terminology, disks are not considered "slow devices".

jcelerier|5 months ago

I'd consider "blocking" anything that given same inputs, state and cpu frequency, may take variable time. That means pretty much every system call and entering the system scheduler, doing something that leads to a page fault, etc. Pretty much only pure math in total functions and function calls to paged functions are acceptable.

surajrmal|5 months ago

In a data center, networks can have lower latency than disk (even ssd). Now the real place this all falls on its head is page faults. That there are definitely places where you need to have granular control over what can and cannot stall a thread from making progress.

jcalvinowens|5 months ago

> No they're not. Not really, unless you consider disk access and interacting with the page cache/inode cache inside the kernel to be blocking.

The important point is that the kernel takes locks during all those operations, and will wait an unbounded amount of time if those locks are contended.

So really and truly, yes, any synchronous syscall can schedule out for an arbitrary amount of time, no matter what you do.

It's sort of semantic. The word "block" isn't a synonym for "sleep", it has a specific meaning in POSIX. In that meaning, you're correct, they never "block". But in the generic way most people use the term "block", they absolutely do.

Joker_vD|5 months ago

> disks are not considered "slow devices".

And neither are the tapes. But the pipes, apparently, are.

Well, unfortunately, disk^H^H^H^H large persistent storage I/O is actually slow, or people wouldn't have been writing thread-pools to make it look asynchrnous, or sometimes even process-pools to convert disk I/O to pipe I/O, for the last two decades.