top | item 20227633

(no title)

jforberg | 6 years ago

This discussion seems to ignore the fact that being blocked in a "long-running system call" is the normal state for many (most?) Unix services.

If you look at `ps ax` on your system, you'll likely see about a hundred processes. But if you look at `top`, you'll see only a handful of processes having non-zero CPU usage. Why? Because most processes are just waiting (in a system call) for something to do. A web server is blocked in a select/poll/epoll() call waiting for a connection. Your shell is blocked in a read() call waiting for you to type something. This is just the normal way that a main loop is implemented on Unix.

When you kill one of these processes, they need a way to break out of their loop and with the EINTR approach, they get a chance to break and exit.

I'm far from convinced that a "majority" of services want to just catch signals and carry on.

discuss

order

haberman|6 years ago

Correct me if I'm wrong, but I think if you kill a process normally, it will invoke a signal handler which will exit the process (maybe writing a core file first) without ever returning to normal program flow.

Recognizing EINTR at the program level isn't required for this kind of shutdown. I think the system call will only return if the signal is ignored or if the signal handler returns, but you would only do this if you thought you had recovered from the error.

jforberg|6 years ago

This is true, but if anything it reinforces my point that continuing past a signal is the exception, not the rule.

In the general case services are not at liberty to just exit(), they need to perform some kind of active cleanup action before exit. So the signal handler would set an "exit flag" somewhere and the EINTR would be an indication for the main loop to check this flag before continuing.

The only common case I can think to continue past signals is SIGHUP, which some services interpret as a command to re-read their configuration file. In this case, you are essentially doing a shutdown and startup sequence anyway, only in a possibly more efficient way. E.g. the case of a web server, if you were previously listening on port N there's no reason to believe that the new config file won't ask you to instead listen on port M. So you will be closing down most connections anyway, and catching SIGHUP is mostly an optimisation as exiting and restarting would have a similar effect.