Click to See Complete Forum and Search --> : the process that won't die
doublec16
03-04-2004, 09:26 PM
I wrote a program that calculates some vectors and plots them using PGPLOT. I'm not sure if it matters what the program is but it worked just fine with one set of parameters but I changed the numbers which seems to have caused some kind of endless loop (though it shouldn't have; my code isn't that bad). The postcript plot the program was creating became almost 900 MB big and then it filled the partition which might have been the cause of the hang. I have tried numerous times to kill -9 it as both the originating user and root, and I have deleted the ps file, but the program is still running according to top, and df shows that though the file doesn't appear on the listing it is still using up all the space on the partition.
How can I end this &*^*%&^% process and clear up the disk space it is tying up? I prefer not to reboot as I am logged in remotely.
Thanks in advance for any suggestions.
bwkaz
03-04-2004, 11:14 PM
If a process has the file open, then deleting the file won't have any effect. This is due to Unix filesystem conventions -- and it's also the reason you can upgrade things like your web server software while that web server is running. Unlinks remove the filename from the directory, but the actual blocks stay allocated until the last file handle gets closed.
I'm not sure about what program did what, and what program you tried to kill, though. I'm going to assume that your program uses either system() or fork()/exec() to call pgplot (since I'm going to assume that pgplot is a separate program, and not a library).
If you kill your process, but pgplot is also running, that kill signal will not get sent to pgplot. You need to kill it separately (as root, and use kill -9). Otherwise, if your program is running but you've killed pgplot, then your program needs to call wait() for each child that it spawns (system() does this for you, but using fork()/exec(), you have to do it manually). Otherwise, the pgplot process will hang out there in the output of ps, with a "zombie" state, even after it's dead. I don't know if it keeps its file handles, and while I'd doubt it, it is possible.
Anyway, you can't kill zombie processes, all you can do is wait() for them in their parent, or kill off their parent which will make the zombie into its own parent. When the zombie becomes its own parent, the kernel will clean up after it and it'll go away.
The SIGKILL that "kill -9" sends cannot be ignored and cannot be caught. So if you sent that signal successfully, the process you sent it to must be dead. If it shows up in ps' output, then it's either zombied, or the signal couldn't be sent. The signal might not be sendable if you're out of disk space and also memory+swap, because the kernel eventually won't be able to swap anything else out, so it won't be able to allocate memory to start up "kill". If you get into that state, all you can do is reboot, but I really doubt that you're in that state.
Also check the PID of the process before and after killing it. If the PID changes, then you have a process that's basically forkbombing on you, and killall might be a good way to get rid of it (killall's argument is the process name).
Or, there are options you can give to fuser to kill off processes that have a certain filename opened. That might be worth trying too.
doublec16
03-05-2004, 01:34 AM
The program calls the pgplot library. There are no fork calls in the program, just regular subroutine calls. The program is the only process running AFAIK.
It is taking up 100% of my CPU. Killing (kill -9 or killall) it leaves it with the same process number, so basically killing doesn't do anything at all. If it is a zombie process why is it taking up all the CPU?
I have never heard of fuser. What is that?
Thanks.
doublec16
03-05-2004, 07:37 PM
Well it's almost 24 hours later and the program is still running and making the CPU run at 100%. Is my only option to reboot?
bwkaz
03-05-2004, 08:03 PM
fuser gives you the process ID of any process that has a specified filename open, or (with the -k option) allows you to automatically kill that (or those) process(es). Check out its manpage.
But basically, fuser -KILL -k /path/to/file should kill the process (and anything else that's holding that file open still).
There has to be a way to kill it (because again, SIGKILL can't be caught or ignored, and its default action is to terminate the process), unless maybe it hosed up a kernel thread. If that's the case, then a reboot would be necessary, yeah. Or maybe your kernel's signal delivery is broken somehow.
doublec16
03-05-2004, 11:31 PM
I did check out the fuser manpage but there isn't one. I don't seem to have that program on my computer. Where can I get it for RH8? Thanks.
bwkaz
03-06-2004, 12:43 AM
On my LFS, it's installed by the psmisc package.
hammer123
03-10-2004, 05:47 AM
/sbin/fuser or /usr/sbin/fuser
man pages may work for that
locate fuser
fuser -k /foopath/foobar
ChryZKoiD
03-10-2004, 07:29 AM
Might be that I'm sidetracked here, but I can't see how that effect is even possible. The only way that could happen is if you kill a childprocess and the parent process keeps reviving or restarting it. So I'd simply give the command "ps auxf" and find the parentprocess (grandparent or whatever) and kill that one instead with "kill -9 process_id".
binaryDigit
03-10-2004, 02:35 PM
that's the problem. the parent process is dying before the child process.
when that happens the child process gets a zombie state.
you can't just kill the parent process because the parent process is dead.
bwkaz explained this very well in his post.