Do not mount NFS with "soft" unless you really know what you're doing. NFS' behavior to hang is not "stupidity" - it's actually one of the best things about NFS. Applications do not deal well with failed reads/writes. If there's a brief network interruption or the NFS server goes down, it's WAY safer to cause applications to hang until the server comes back. Since NFS is a stateless protocol, when the server comes back, I/O resumes as if nothing ever happened. This helps make using NFS feel more like using a local filesystem. Otherwise it becomes a very leaky abstraction.
Not being able to kill processes stuck on NFS I/O is annoying though, so you can mount with the "intr" option and that makes such processes killable. However, since Linux 2.6.25, you don't even need this and SIGKILL can always kill applications stuck in NFS I/O.
Expecting NFS (or any other remote filesystem) to behave as if it were local is a fundamental error.
Time, and speed of light, ultimately matter. If you need assurance, find a way of getting reliability in your system through redundancy and locality. Distinguish between "task has been delegated" and "task has been confirmed completed". Down any other path runs pain, and anyone who tells you otherwise is selling something.
You're going to have to compromise: whole systems (or clusters) going titsup because your NFS heads had a fart, or lost commits. Neither is very attractive when shit's on the line.
Your database relying on NFS is a fundamental error you'll have to design around.
This would be a good time to re-read Waldo's "A Note on Distributed Computing", which points out how remote filesystems will never act like local filesystems. http://labs.oracle.com/techrep/1994/smli_tr-94-29.pdf
I consider any uninterruptable sleep in the kernel a bug. There's no technical reason a process waiting for a resource (e.g. disk I/O) couldn't be killed on the spot, leaving the resource on its own. If it can't be, it just means it hasn't been implemented in the kernel.
It's bug motivated by compatibility. On original 70's implementations of Unix, file system I/O mostly led to busy wait in kernel and thus was not interruptible because it was simply not possible and there were applications that relied on this behavior. On UNIX, signal received during system call generally causes the kernel to abort whatever it was doing and requires application to deal with that situation and restart the operation, implementations of stdio in libc generally do the right thing, but most applications that do filesystem I/O directly do not (and surprisingly large number of commonly used network services behave erraticaly when network write(2) is interrupted by signal). And even applications that handle -EINTR from all I/O still have places where it is not handled (allowing interruptible disk I/O will cause things like stat(2) to return EINTR).
Allowing SIGKILL to work and not any other signal is ugly special case, and while generally reasonable it is still special case that is relevant for things like NFS (with modern linux NFS client allowing you to disable this behavior) and broken hardware (and then trying to recover the situation with anything other than kernel-level debugger is mostly meaningless, with power cycling being the real solution when you can do that. Accidentally we currently have similar issue on one backend server where power-cycling is not an option).
Can anyone explain the "Why is a process wedged?" part? I do understand that piping from /dev/random to /dev/null is going to run forever, but I do not understand the gdb-output, nor what that has to do with the rest of the text.
connects a trace to the process to see what it's doing
gdb ${CMD} ${PID}
connects to the process (gdb needs program name and can be given a pid to connect to)
(gdb) disassemble
prints the actual (assembler) code being run. In this case, I think all we get from the output is that it's in fact in the middle of some syscall - you'd have to check registers and syscall tables to determine which.
As others have mentioned, much of this is less useful than implied in the face of an actual wedged process.
Tangentially, using gdb to attach to running processes is a very powerful technique - I've been able to get line numbers out of running bash scripts.
So that whole section doesn't make a lot of sense to me. Running strace on an unkillable process tends to produce no output (if the process was actually making new system calls, it wouldn't be wedged). And worse, my experience is that attaching to a wedged process with ptrace() usually does nothing at all except also hang the attaching the process. This also applies to gdb.
And finally, even if attaching worked, getting a disassembly of the current PC (in this case the syscall trampoline) would tell nothing useful about what's going on.
It think one of the reasons the kernel can't kill a process is because one of the process's threads is blocked inside a kernel call (not completely sure about this). Hes using ps's 'wchan' option to get the address of the kernel function that the process is currently blocked or sleeping on. After he gets the wchan address, he uses gdb to map the address to function name.
nwchan WCHAN address of the kernel function where the process is
sleeping (use wchan if you want the kernel function name).
Running tasks will display a dash ('-') in this column.
I believe the author is using that as an example of a command that will 'wedge' if you tried to kill -9 it.
The rest of it is an example of how to dump the cause of a process that is wedged.
"/proc/{pid}/wchan" contains the name of the current syscall a process is executing.
Basically you should be able to use everything but the first line of that section as a shell script(with some modifications) to determine the cause of a wedged process.
If kill -9 does not work, its a bug. The kernel needs to be able to end processes no matter what the process is doing. By definition this should not be about how the misbehaving process was implemented. I imagine practical considerations are keeping these bugs in there, eg I can imagine the effort of making all processes killable would stand in no relation to the gains - its hard to do, and rare to occur,
The number of times I have spent 10 minutes staring at the output of 'pgrep processname' , when I had attached gdb to the process in another terminal session... Urgh!! :-/
sshfs used to have this problem and it was enough to bring nautilus and a lot of other applications to their knees as they tried to stat() my homedir and failed on the hung sshfs mount point.
Same for the cifs.kext (or was is smbfs.kext) in early versions of OSX. Putting your laptop to sleep with a mounted Samba share was enough to slowly grind the system to a halt when you woke it up.
[+] [-] agwa|13 years ago|reply
Not being able to kill processes stuck on NFS I/O is annoying though, so you can mount with the "intr" option and that makes such processes killable. However, since Linux 2.6.25, you don't even need this and SIGKILL can always kill applications stuck in NFS I/O.
[+] [-] dredmorbius|13 years ago|reply
Time, and speed of light, ultimately matter. If you need assurance, find a way of getting reliability in your system through redundancy and locality. Distinguish between "task has been delegated" and "task has been confirmed completed". Down any other path runs pain, and anyone who tells you otherwise is selling something.
You're going to have to compromise: whole systems (or clusters) going titsup because your NFS heads had a fart, or lost commits. Neither is very attractive when shit's on the line.
Your database relying on NFS is a fundamental error you'll have to design around.
[+] [-] NelsonMinar|13 years ago|reply
[+] [-] ambrop7|13 years ago|reply
[+] [-] dfox|13 years ago|reply
Allowing SIGKILL to work and not any other signal is ugly special case, and while generally reasonable it is still special case that is relevant for things like NFS (with modern linux NFS client allowing you to disable this behavior) and broken hardware (and then trying to recover the situation with anything other than kernel-level debugger is mostly meaningless, with power cycling being the real solution when you can do that. Accidentally we currently have similar issue on one backend server where power-cycling is not an option).
[+] [-] Trufa|13 years ago|reply
Go here http://www.readability.com/articles/zcqkmihi and switch to Readability view!
[+] [-] roryokane|13 years ago|reply
[+] [-] glenstein|13 years ago|reply
I like this one because I can add it as a custom search engine in Opera without adding a browser extension or leaving the page.
[+] [-] AYBABTME|13 years ago|reply
I've setup a bookmarklet using it; greatest thing since sliced bread.
[+] [-] a_bonobo|13 years ago|reply
[+] [-] dllthomas|13 years ago|reply
As others have mentioned, much of this is less useful than implied in the face of an actual wedged process.
Tangentially, using gdb to attach to running processes is a very powerful technique - I've been able to get line numbers out of running bash scripts.
[+] [-] jsnell|13 years ago|reply
And finally, even if attaching worked, getting a disassembly of the current PC (in this case the syscall trampoline) would tell nothing useful about what's going on.
[+] [-] nonane|13 years ago|reply
Taken from: http://unixhelp.ed.ac.uk/CGI/man-cgi?ps
nwchan WCHAN address of the kernel function where the process is sleeping (use wchan if you want the kernel function name). Running tasks will display a dash ('-') in this column.
[+] [-] mbell|13 years ago|reply
The rest of it is an example of how to dump the cause of a process that is wedged.
"/proc/{pid}/wchan" contains the name of the current syscall a process is executing.
Basically you should be able to use everything but the first line of that section as a shell script(with some modifications) to determine the cause of a wedged process.
[+] [-] nikster|13 years ago|reply
If kill -9 does not work, its a bug. The kernel needs to be able to end processes no matter what the process is doing. By definition this should not be about how the misbehaving process was implemented. I imagine practical considerations are keeping these bugs in there, eg I can imagine the effort of making all processes killable would stand in no relation to the gains - its hard to do, and rare to occur,
[+] [-] kqr2|13 years ago|reply
http://www.youtube.com/watch?v=Fow7iUaKrq4
[+] [-] SG-|13 years ago|reply
[+] [-] darwinGod|13 years ago|reply
[+] [-] JimmaDaRustla|13 years ago|reply
[+] [-] dfox|13 years ago|reply
[+] [-] liotier|13 years ago|reply
[+] [-] ucee054|13 years ago|reply
[+] [-] kaeso|13 years ago|reply
pgrep(1) is there for a reason.
[+] [-] drivebyacct2|13 years ago|reply
[+] [-] pyre|13 years ago|reply