----- Begin Included Message -----
>From firstname.lastname@example.org Thu Apr 15 20:51:10 1993
From: "Andrew S. Rogers" <email@example.com>
Reply-To: "Andrew S. Rogers" <firstname.lastname@example.org>
Subject: Unstoppable or non-existent processes.
Our lab contains a mix of IPX's, Sparc2s, IPC's, HPs and, now, some
xterms. Most of the Suns are running 4.1.2 with OpenWindows 3.
The problem which was mentioned briefly in a previous letter to this group,
is processes that fail to exit and then use up space on the root partition.
Some of these seem related to sessions with the xterm where the window
manager didn't exit, but in many cases there are logins to the machine
that hang on for days with no identifiable process attached to them.
I have tried using fuser to identify processes and am trying to get lsof
to run on the 4.1.2. machines ( It works on the few running 4.1.3).
Yesterday, for example, one of our file servers showed 97% full on /.
I could only account for about 60% of that with actual files. I cleared
/tmp and linked it to a file on a different partition, did the same
with /var/tmp, both of which had little effect. I killed all the
ghost processes that I could find. That reduced the % full from about 94
to 86. I spent about two hours using du to search for any hidden files.
Then I rebooted. The root partition came back up about 46% full and
has stayed that way today. This seems to happen periodically, however,
and rebooting is a very poor choice because that particular lab is
run in a production mode with programs frequently running 24 hours a day.
If anyone knows of a way to identify and kill the offending processes, or
to free the space on / without rebooting, I would appreciate knowing about it.
Of course, I will summarize the responses.
Andrew S. Rogers
1166h LeFrak Hall
Department of Geography
University of Maryland
College Park MD 20742
----- End Included Message -----
I haven't been able to really test any of these suggestions because, of
course, the problem hasn't recurred since then, but here are the three
responses I got.
Run fsck on root partition specifically to look for inodes with a count of 0.
Use pstat -i to find out more about that inode.
Try pstat -f.
Look at new software to see if someone has created a program that appears to
exit nicely but leaves a "tail" that hangs on to the disk space.
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:07:46 CDT