top | item 5089014

Why doesn't `kill -9` always work?

126 points| fool | 13 years ago |noah.org | reply

52 comments

order
[+] agwa|13 years ago|reply
Do not mount NFS with "soft" unless you really know what you're doing. NFS' behavior to hang is not "stupidity" - it's actually one of the best things about NFS. Applications do not deal well with failed reads/writes. If there's a brief network interruption or the NFS server goes down, it's WAY safer to cause applications to hang until the server comes back. Since NFS is a stateless protocol, when the server comes back, I/O resumes as if nothing ever happened. This helps make using NFS feel more like using a local filesystem. Otherwise it becomes a very leaky abstraction.

Not being able to kill processes stuck on NFS I/O is annoying though, so you can mount with the "intr" option and that makes such processes killable. However, since Linux 2.6.25, you don't even need this and SIGKILL can always kill applications stuck in NFS I/O.

[+] dredmorbius|13 years ago|reply
Expecting NFS (or any other remote filesystem) to behave as if it were local is a fundamental error.

Time, and speed of light, ultimately matter. If you need assurance, find a way of getting reliability in your system through redundancy and locality. Distinguish between "task has been delegated" and "task has been confirmed completed". Down any other path runs pain, and anyone who tells you otherwise is selling something.

You're going to have to compromise: whole systems (or clusters) going titsup because your NFS heads had a fart, or lost commits. Neither is very attractive when shit's on the line.

Your database relying on NFS is a fundamental error you'll have to design around.

[+] ambrop7|13 years ago|reply
I consider any uninterruptable sleep in the kernel a bug. There's no technical reason a process waiting for a resource (e.g. disk I/O) couldn't be killed on the spot, leaving the resource on its own. If it can't be, it just means it hasn't been implemented in the kernel.
[+] dfox|13 years ago|reply
It's bug motivated by compatibility. On original 70's implementations of Unix, file system I/O mostly led to busy wait in kernel and thus was not interruptible because it was simply not possible and there were applications that relied on this behavior. On UNIX, signal received during system call generally causes the kernel to abort whatever it was doing and requires application to deal with that situation and restart the operation, implementations of stdio in libc generally do the right thing, but most applications that do filesystem I/O directly do not (and surprisingly large number of commonly used network services behave erraticaly when network write(2) is interrupted by signal). And even applications that handle -EINTR from all I/O still have places where it is not handled (allowing interruptible disk I/O will cause things like stat(2) to return EINTR).

Allowing SIGKILL to work and not any other signal is ugly special case, and while generally reasonable it is still special case that is relevant for things like NFS (with modern linux NFS client allowing you to disable this behavior) and broken hardware (and then trying to recover the situation with anything other than kernel-level debugger is mostly meaningless, with power cycling being the real solution when you can do that. Accidentally we currently have similar issue on one backend server where power-cycling is not an option).

[+] Trufa|13 years ago|reply
That site is trying to murder my eyes!

Go here http://www.readability.com/articles/zcqkmihi and switch to Readability view!

[+] glenstein|13 years ago|reply
Another (which I originally found because of a comment here a few years ago) is http://viewtext.org/.

I like this one because I can add it as a custom search engine in Opera without adding a browser extension or leaving the page.

[+] a_bonobo|13 years ago|reply
Can anyone explain the "Why is a process wedged?" part? I do understand that piping from /dev/random to /dev/null is going to run forever, but I do not understand the gdb-output, nor what that has to do with the rest of the text.
[+] dllthomas|13 years ago|reply
Not sure just how much you do/don't understand, or how much others will/won't understand, so I'll run through line by line:

    PID=$!
grabs the PID of the process you just spawned (in the background, with &) into shell variable PID

    CMDLINE="!-2"
grabs the full line you just ran (before the line storing PID) with shell history expansion

    CMD=${CMDLINE%% *}
expands the CMDLINE variable, replacing everything after the first space (so CMD now has "cat") with bash trickery

    WCHAN=$(cat /proc/${PID}/wchan)
grabs the name of the currently executing syscall for the process (at least, according to http://www.lindevdoc.org/wiki/proc/pid/wchan)

    echo "command: ${CMD}, pid: ${PID}, wchan: ${WCHAN}"
prints the info we've grabbed

    strace -p ${PID}
connects a trace to the process to see what it's doing

    gdb ${CMD} ${PID}
connects to the process (gdb needs program name and can be given a pid to connect to)

    (gdb) disassemble
prints the actual (assembler) code being run. In this case, I think all we get from the output is that it's in fact in the middle of some syscall - you'd have to check registers and syscall tables to determine which.

As others have mentioned, much of this is less useful than implied in the face of an actual wedged process.

Tangentially, using gdb to attach to running processes is a very powerful technique - I've been able to get line numbers out of running bash scripts.

[+] jsnell|13 years ago|reply
So that whole section doesn't make a lot of sense to me. Running strace on an unkillable process tends to produce no output (if the process was actually making new system calls, it wouldn't be wedged). And worse, my experience is that attaching to a wedged process with ptrace() usually does nothing at all except also hang the attaching the process. This also applies to gdb.

And finally, even if attaching worked, getting a disassembly of the current PC (in this case the syscall trampoline) would tell nothing useful about what's going on.

[+] nonane|13 years ago|reply
It think one of the reasons the kernel can't kill a process is because one of the process's threads is blocked inside a kernel call (not completely sure about this). Hes using ps's 'wchan' option to get the address of the kernel function that the process is currently blocked or sleeping on. After he gets the wchan address, he uses gdb to map the address to function name.

Taken from: http://unixhelp.ed.ac.uk/CGI/man-cgi?ps

nwchan WCHAN address of the kernel function where the process is sleeping (use wchan if you want the kernel function name). Running tasks will display a dash ('-') in this column.

[+] mbell|13 years ago|reply
I believe the author is using that as an example of a command that will 'wedge' if you tried to kill -9 it.

The rest of it is an example of how to dump the cause of a process that is wedged.

"/proc/{pid}/wchan" contains the name of the current syscall a process is executing.

Basically you should be able to use everything but the first line of that section as a shell script(with some modifications) to determine the cause of a wedged process.

[+] nikster|13 years ago|reply
Much simpler answer: Bugs.

If kill -9 does not work, its a bug. The kernel needs to be able to end processes no matter what the process is doing. By definition this should not be about how the misbehaving process was implemented. I imagine practical considerations are keeping these bugs in there, eg I can imagine the effort of making all processes killable would stand in no relation to the gains - its hard to do, and rare to occur,

[+] darwinGod|13 years ago|reply
The number of times I have spent 10 minutes staring at the output of 'pgrep processname' , when I had attached gdb to the process in another terminal session... Urgh!! :-/
[+] JimmaDaRustla|13 years ago|reply
I always thought kill -9 won't always work because it currently has control of a system resource, like disk or something.
[+] dfox|13 years ago|reply
That is almost correct understanding. Processes that are waiting for things like disk I/O do not respond to any signals, not even KILL.
[+] liotier|13 years ago|reply

  That is not dead which can eternal lie
  And with strange aeons even death may die
[+] ucee054|13 years ago|reply
ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn
[+] kaeso|13 years ago|reply
> ps Haxwwo pid,command | grep "rpciod" | grep -v grep

pgrep(1) is there for a reason.

[+] drivebyacct2|13 years ago|reply
sshfs used to have this problem and it was enough to bring nautilus and a lot of other applications to their knees as they tried to stat() my homedir and failed on the hung sshfs mount point.
[+] pyre|13 years ago|reply
Same for the cifs.kext (or was is smbfs.kext) in early versions of OSX. Putting your laptop to sleep with a mounted Samba share was enough to slowly grind the system to a halt when you woke it up.