Thanks to all for the help!  
The bottom line:
================
The /tmp filesystem was corrupt.  An fsck fixed it.
Original inquiry:
=================
Hi there!
 
I'd appreciate your wisdom on how to maintain /tmp.  I'm
currently experiencing some weirdness with /tmp on a Sun 4/630
with Sun O/S 4.1.2 as follows:
/tmp is a separate 40MB disk partition. From df: 
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/sd0a              15671    8846    5258    63%    /
/dev/sd0g             151983  117420   19365    86%    /usr
/dev/sd0d              40879   29906    6886    81%    /tmp
/dev/sd0h             795414  305228  410645    43%    /home
/var/tmp is symbolic linked to /tmp.
The problem is in that 30MB or 81% of /tmp space in use.  There aren't
any files using that much space in /tmp.  The files that are there add
up to 3MB or so altogether.  That mere 6MB of /tmp space available is
causing some problems, and I can't free up any more.
This system has been up for 38 days since the last reboot. It hosts
24 Xterminals and has had as many as 118 active logins at times,
with hundreds of X user files in /tmp.  I clean up /tmp with job
run from cron every night:
17 1 * * * cd /tmp;find . -mtime +15 -xdev -exec rm -rf {} \; >/dev/null 2>&1
I intend that the mtime +15 will avoid rm'ing files that are being held
open by a active program.  "atime" won't work, because the files are
"accessed" each night by the tape backup job.
I suspect that I've got a ton of orphaned, un-freed up inodes in /tmp. 
Some program is using the space by opening a temporary file and then
unlinking the file while it's open for reading or is otherwise exiting
without properly freeing up the file's inode allocation. I don't know
if any programs we're running are unlinking open files.  If anybody has
seen this behaviour in standard Sun programs like calendar manager,
textedit, or mailtool I'd appreciate the tip.
I assume that that to fix this I'm forced to fsck and/or newfs /tmp.
My problem is that this is a critical system that I can't shut down
without begging permission for weeks. I'd like a more permanent solution.
Any and all advice is appreciated. Thanks in advance.
The fix
=======
When I booted in single user mode df still showed that /tmp was 68% full.  
An fsck found a 30MB unattached file owned by root.  I have no idea
where it came from.  After fsck fixed it, /tmp showed 0% in use.
Today it looks like this:
$df /tmp
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/sd0d              40879     628   36164     2%    /tmp
$uptime
  6:08pm  up 5 days,  5:05,  52 users,  load average: 1.06, 1.02, 0.98
What I failed to mention in the original inquiry was that /tmp never
went above 30% full or so before this problem.  That's why I was
worried about the sudden lack of space in /tmp.
A change
========
I commented out the rm job run from cron.  Perry Huchison suggested
I didn't need the mtime in the find since I never back up /tmp
(faulty logic on my part).  I removed the job to avoid making more
orphan files unnecessarily.  /tmp gets cleared on every reboot.
I may be able to reboot more often now.  In any case, I'll 
clean /tmp manually.
Another change
==================
I removed the link from /var/tmp to /tmp on advice from Hal Stern
and Mike Raffety.  Some programs depend on files being in /var/tmp
after a reboot.  /tmp is cleared on each reboot.
I will make a larger /var in a separate disk partition in
the future.  I have done that on newer Suns here. I could never
bring this one down long enough to repartition and restore the disk. 
A change in the future
======================
Several replies suggested using the tmpfs system for /tmp.  I'm going
to try that, once we get some more RAM and disk for swap space.
About unlinked open files
=========================
Several replies suggested that some Sun programs do indeed open files
and then unlink them, tying up the /tmp space without showing up in
an ls. Mailtool and /usr/ucb/mail do it for security. Audiotool does
it.  Shelltool and Commandtool with scroll bar enabled will use
spce in /tmp without creating a file.
Any space tied up this way will be freed up when the program exits or
closes the file.  Only running programs can hold the files open.
Some great utilities
====================
Several suggested using fuser to see what process is using a file.
It looks to me that fuser wants a file name. I'm still not sure that
fuser will show what's using space if there is no file there.
Several suggested using lsof.  It looks like lsof WILL show what process
is using disk space even if there is no file.  I found lsof on ftp.uu.net
in /usenet/comp.sources.unix/volume25. It's in the same directory on
gatekeeper.dec.com.
A wish may come true
====================
Mike Raffety suggests that you should reboot once a week.  That was
impossible around here.  Then we discovered that our production
crew no longer works on Saturday mornings.  I may be able do things
other growing SysAdmins can do, like routine shutdowns.  
This 4/630 had an uptime of 40 days when rebooted.  It's been up as
long as 90 days continously.  BTW, our Novell server has been up
for 267 days.
My job has been to change the tires on a moving semi.
Thanks to:
==========
peters@nms.otc.com.au (Peter Samuel)
heas@chpc.org (Heas)
bill@saloft.att.com (Bill Shorter)
moll@informatik.uni-bonn.de (Wolfgang Moll)
Steve_Kilbane@gec-epl.co.uk (Steve Kilbane)
a.talbot@lsi-logic.co.uk (Allen Talbot)
jdavis@cs.arizona.edu (Jim Davis)
stern@sunne.East.Sun.Edu (Hal Stern)
strombrg@uci.edu (Dan Stromberg)
miker@il.us.swissbank.com (Mike Raffety)
dzambon@afit.af.mil (Dan Zambon)
perryh@pluto.rain.com (Perry Huchison)
glenn@uniq.com.au (Glenn Satchell)
mike@trdlnk.chi.il.us (Michael Sullivan)
The replies:
===========
From: peters@nms.otc.com.au (Peter Samuel)
-------
The alternative to using 'real' disk real estate for /tmp is to use
the swap space as /tmp.
To do this follow the following procedure:
1) If your kernel doesn't support TMPFS, uncomment the following line
from your kernel config file and rebuild the kernel
    options TMPFS           # tmp (anonymous memory) file system
2) Add the following line to /etc/fstab
    swap		/tmp		tmp rw 0 0
If you want to add your 40Mb partition as more swap space, add the
following line to /etc/fstab
    /dev/sd0d		swap		swap rw 0 0
If you already have heaps of swap you may want to consider using the
40Mb partition as /var and removing the symbolic link from /var/tmp/
to /tmp.
3) Uncomment the following line in /etc/rc.local
    mount /tmp
If you want to use this new /tmp now either
a) reboot if you had to make a new kernel
    OR
b) mount /tmp
This will mean that everything currently in /tmp will be hidden by the
TMPFS file system. (You can get at them by dumping /dev/sd0d).
Next time you boot you'll have a /tmp file system equal to
approximately the size of your swap space. As processes use swap space
the available space left for /tmp will decrease but as swapping should
only be a transitory phenomena this shouldn't be a problem.
The advantages of this are that accessing /tmp will be much faster and
you'll have much more space. Adding space to /tmp is simply a matter of
adding swap space - see mkfile(8) and swapon(8).
The disadvantage is that /tmp is guaranteed to be temporary. Whereas
before the boot procedure cleared all the files out of /tmp and left
subdirectories in /tmp alone, running TMPFS means that if your machine
reboots EVERYTHING in /tmp disappears. This shouldn't be too much of a
problem for you.
Regards
Peter
----------
Peter Samuel                    Email:  peters@nms.otc.com.au
Telstra - OTC Australia         Phone:  +61 2 339 3953 Fax: +61 2 339 3688
Computer and Network Services   Snail:  GPO Box 7000, Sydney 2001, Australia
From: Heas <heas@chpc.org>
--------
have you looked at: df -i /tmp
that'l show you the inode info for the partition in the same way a normal df
does for blocks.  try fsck -N /dev/rsd??.  the -N should force it to check 
but not write so you can *see* how dirty it is.  I would imagine that your 
fsys is just out of inodes.
if you find this to be true (lack of inodes), get downtime and mkfs and play
with the cly/grp and/or blk size to get more inodes.
-heas
>From aloft.att.com!bill Tue Jun 15 06:25:33 1993
-------
If you open a file, then delete it without closing it, the blocks 
are still tied up.  That is one way to mysteriously lose blocks in
a filesystem.
Here's an experiment that you can run to show how blocks can
be missing from tmp, yet not be shown as allocated to a file
by an ls -l.
Read five minutes worth of sound into audiotool.  At some
point, you will run out of space in tmp.  Now look for the
file holding the space.  Not there.
Well, actually it was there, for a moment.  It was opened,
then deleted.  Its name appeared in tmp, then disappeared.
But, its blocks were still held captive because the file wasn't
closed yet.  Quit audiotool, and the missing space is magically
back.
(I may be wrong about audiotool and /tmp.  It may be /var/tmp -
I don't recall for sure.)  
--- Mike,I don't know how common that practice is. I have heard that it is considered bad practice, but it is typical of "gurus". If opening then deleting a file, without closing it, should be considered "guru practice", then expect a lot of people to be doing so.
The audiotool thing drove me crazy for a couple of days.
Bill Shorter bill@aloft.att.com
---------
From: moll@informatik.uni-bonn.de (Wolfgang Moll) ---------
Your problem ist most probably caused by shelltool or cmdtool applications with scrollbar enabled.
The workspace is allocated from /tmp. Neither ls nor find will show any files.
You can verify this if you quit all these windows - you should get quite a lot of more availible space on /tmp
Regards,
Wolfgang Moll Computer Science Department University of Bonn
From: Steve_Kilbane@gec-epl.co.uk ------------
well, you've got a good problem:-). you're almost certainly right about it being caused by opened, unlinked files. you should be able to get fuser off the distribution medium (it should be /usr/etc/fuser), and you can use that to check which processes have files open in /tmp. Then kill them.
> The problem is that there are not many file names to check in /tmp. > It's the processes that are not leaving file names in tmp that are > using most of the space.
[ sorry this is so late; i've been away ]
fuser will also tell you what processes have files open on a device, i believe, which should solve the problem if it reoccurs.
-- <Steve_Kilbane@gec-epl.co.uk>
From: Allen Talbot --------- i have experienced tmp and slash filling up and not reducing even though i would delete the problem files...the problem is that the data doesnt get removed but hangs around on the disk until reboot. the way i have found around this is to ;
cat /dev/null > filename
Allen Talbot +-----+ INTERNET : a.talbot@lsi-logic.co.uk
From: "Jim Davis" <jdavis@cs.arizona.edu> -----------
In article <1vja16$gu2@optima.cs.arizona.edu> you write: :I suspect that I've got a ton of orphaned, un-freed up inodes in /tmp. :Some program is using the space by opening a temporary file and then :unlinking the file while it's open for reading or is otherwise exiting :without properly freeing up the file's inode allocation. I don't know :if any programs we're running are unlinking open files.
That sounds likely; have you tried using fuser or lsof to look for processes with open file in /tmp?
You might consider mounting /tmp on swap (with the tmpfs facility). Then you can empty /tmp simply by unmounting and remounting it.
---
Maybe this example will make things clearer. I wrote a little program that copies /etc/hosts to a file in /tmp and then unlinks that file. The disk space is still allocated until the program exits, however (as your guru says). Note that lsof does pick up the disk space, even though the unlink command has removed that filename:
wolf; cat mystery.c #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h>
main() { int ifd, ofd, n; char buf[1024];
if ((ifd = open("/etc/hosts", O_RDONLY)) < 0) { perror("/etc/hosts"); exit(1); } if ((ofd = open("/tmp/notthere", O_RDWR | O_CREAT | O_TRUNC, O_RDWR)) < 0) { perror("/tmp/notthere"); exit(1); } if (unlink("/tmp/notthere") < 0) { perror("unlink"); exit(1); } while ((n = read(ifd, buf, sizeof(buf))) > 0) write(ofd, buf, n); close(ifd); pause(); /* now /tmp/notthere is gone, but its disk space lives on */ exit(0); } wolf; wc -c /etc/hosts 417803 /etc/hosts ^^^^^^ wolf; ./mystery & 13020 wolf; ./lsof /tmp COMMAND PID USER FD TYPE DEVICE SIZE/OFF INODE/NAME rc 11711 jdavis cwd VDIR 135, 0 352 2 /tmp (swap) mystery 13020 jdavis cwd VDIR 135, 0 352 2 /tmp (swap) mystery 13020 jdavis 4u VREG 135, 0 417803 65 /tmp (swap) ^^^^^^ lsof 13021 jdavis cwd VDIR 135, 0 1080 9 /tmp (swap) pine 11712 jdavis 4uW VREG 135, 0 0 5 /tmp (swap) trn 11843 jdavis cwd VDIR 135, 0 352 2 /tmp (swap) trn 11843 jdavis 5u VREG 135, 0 107163 53 /tmp (swap) trn 11843 jdavis 6u VREG 135, 0 0 54 /tmp (swap) script 13007 jdavis cwd VDIR 135, 0 352 2 /tmp (swap) rc 11726 jdavis cwd VDIR 135, 0 1080 9 /tmp (swap) rc 13009 jdavis cwd VDIR 135, 0 1080 9 /tmp (swap) script 13008 jdavis cwd VDIR 135, 0 352 2 /tmp (swap) script 13008 jdavis 3w VREG 135, 0 0 8 /tmp (swap) wolf; kill -HUP 13020 wolf; ./lsof /tmp COMMAND PID USER FD TYPE DEVICE SIZE/OFF INODE/NAME rc 11711 jdavis cwd VDIR 135, 0 352 2 /tmp (swap) lsof 13240 jdavis cwd VDIR 135, 0 1080 9 /tmp (swap) pine 11712 jdavis 4uW VREG 135, 0 0 5 /tmp (swap) trn 11843 jdavis cwd VDIR 135, 0 352 2 /tmp (swap) trn 11843 jdavis 5u VREG 135, 0 107163 53 /tmp (swap) trn 11843 jdavis 6u VREG 135, 0 0 54 /tmp (swap) script 13007 jdavis cwd VDIR 135, 0 352 2 /tmp (swap) rc 11726 jdavis cwd VDIR 135, 0 1080 9 /tmp (swap) rc 13009 jdavis cwd VDIR 135, 0 1080 9 /tmp (swap) script 13008 jdavis cwd VDIR 135, 0 352 2 /tmp (swap) script 13008 jdavis 3w VREG 135, 0 0 8 /tmp (swap) -- Jim Davis | "Ah no, you didn't delete it - I did." jdavis@cs.arizona.edu | -- BOfH
From: stern@sunne.East.Sun.COM (Hal Stern - NE Area Systems Engineer) ----------
don't link /var/tmp to /tmp. vi (and others) use /var/tmp for working files, and this is consuming lots of space.
--hal
From: strombrg@hydra.acs.uci.edu ------------
You might try "fuser". Note that there is a patch to fuser available. There is also "ofiles" on the net, though I've not tried it.
Dan Stromberg - OAC/DCS strombrg@uci.edu
From: Mike Raffety <miker@il.us.swissbank.com> ----------
> I suspect that I've got a ton of orphaned, un-freed up inodes in /tmp. > Some program is using the space by opening a temporary file and then > unlinking the file while it's open for reading or is otherwise exiting > without properly freeing up the file's inode allocation. I don't know
If a process exits, any deleted but still open files are gone. You've got a lot of files still being held open by active processes. No need to fsck or newfs; just reboot (or kill off all the user processes).
You should schedule regular weekly reboots, say, early Sunday mornings. This will avoid the problem.
BTW, making /var/tmp a symbolic link to /tmp is a bad move; it breaks a number of fault-recovery mechanisms, like vi/expreserve. /tmp is cleaned out on reboot, /var/tmp is NOT.
From: perryh@pluto.rain.com (Perry Hutchison) ------- > > I clean up /tmp with job run from cron every night: > > 17 1 * * * cd /tmp;find . -mtime +15 -xdev -exec rm -rf {} \; >/dev/null 2>&1 > > I intend that the mtime +15 will avoid rm'ing files that are being held > open by a active program. "atime" won't work, because the files are > "accessed" each night by the tape backup job.
I'm not sure that either atime or mtime will do what you want. They may be updated only when a file is opened or closed, rather than every time a program reads from or writes to an open file.
BTW why is the tape backup job "accessing" these files? I would think you might not need to back up /tmp when you're going to clean it out every night anyway; and if you're using dump(8), it reads the special file directly, so there should be no effect on the atimes of the files being backed up (although the atime of the special file's own entry in /dev will be set). The filesystem does not even have to be mounted during dump.
If you're using a file-oriented backup such as tar(1), you might consider putting /tmp in an exclude list, which would avoid having its files accessed.
> I suspect that I've got a ton of orphaned, un-freed up inodes in /tmp. > Some program is using the space by opening a temporary file and then > unlinking the file while it's open for reading or is otherwise exiting > without properly freeing up the file's inode allocation.
Quite a few programs open files in /tmp and then remove the directory entry, precisely so that they will NOT leave junk lying around if they terminate unexpectedly. The inode and associated data blocks remain allocated until the file is closed (or the last process using it exits), and are then freed automatically.
I believe /usr/ucb/mail, which is used by at least the SunView version of mailtool, does this. Don't know about cm or textedit. However, as noted above it is a way of preventing problems, not causing them.
> I assume that that to fix this I'm forced to fsck and/or newfs /tmp.
Just shutting down and restarting will most likely fix it, by getting rid of the processes which are holding files open in /tmp. You could also do a ps and look for processes which have been around for an excessive length of time, although correlating processes with open unlinked /tmp files is not easily accomplished using only the tools delivered with SunOS.
> My problem is that this is a critical system that I can't shut down > without begging permission for weeks. I'd like a more permanent solution.
My guess is that, with 24 Xterminals and over 100 active logins, you simply need a larger /tmp.
Maybe you could shut down over a weekend to repartition, reducing /home by 30-40 Mb and adding that space to /tmp. Unfortunately this would take a while -- you would have to back up and restore all of /home, and probably /usr as well depending on how the partitions are currently laid out.
An alternative, requiring only about a 5 minute shutdown, would be to add another disk -- you can format and partition it after restarting, and then mount it over the top of the current /tmp without needing a second reboot. New files in /tmp will be created on the new disk, but any already-open ones in the old /tmp will still be accessible until closed. This should ideally be done at the same time when you are doing the nightly cleanout of /tmp -- the mount could be added to the crontab to run directly after the find on a single night. The next time a reboot happened (planned or unplanned), the old /tmp partition would become available and you could consider migrating /var to it, thereby ultimately allowing /var/tmp to have its own space instead of having to be a symlink to /tmp.
-------
From: glenn@uniq.com.au (Glenn Satchell - Uniq Professional Services) -------
I think that your assumption is correct, ie that there are programmes that open a file, unlink it, and then attempt to close it at some later stage. I know that mail and mailtool do this as a security feature. If the files not there then you've got to find the disk block to read. I think the only real solution is to reboot as this will close all the files then fsck the filesystem.
regards, -- Glenn Satchell glenn@uniq.com.au | "When I die I want to go Uniq Professional Services Pty Ltd ACN 056 279 335 | peacefully in my sleep PO Box 70, Paddington, NSW 2021, (Sydney) Australia | like my Grandfather, Phone 02 360 7434 Pager 016 287 000 Fax 02 331 2572 | not screaming like the "Sun Accredited System Consultants" | passengers in his car."
From: mike@trdlnk.chi.il.us (Michael Sullivan) --------
Your analysis sounds reasonable to me. One way to identify processes that have files open in /tmp would be to use the freely distributable program lsof, which stands for "list open files". I ftp-ed version 2.16 from purdue's ftp site (I think it's ftp.cc.purdue.edu, but that might not be quite right; if not, check with archie). It only lists the mount point name of local files, since the kernel doesn't keep track of the name by which files were opened, but that should be good enough since your /tmp is a separate file system.
I haven't seen that behavior with and standard Sun programs.
>I assume that that to fix this I'm forced to fsck and/or newfs /tmp. >My problem is that this is a critical system that I can't shut down >without begging permission for weeks. I'd like a more permanent solution.
Simply killing the programs holding the unlinked files open should free up the space. -- Michael Sullivan | email: mike@trdlnk.chi.il.us TradeLink Corp. | voice: +1 312 408 2599 175 W. Jackson, Suite A1235 | fax: +1 312 939 2531 Chicago, Illinois 60604 USA |
-----------------------------------------------------------------------
-- Mike Andrews |VideOcart, Inc. mandrews@hq.videocart.com |Chicago, IL USA =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--=-=-=-=-=-=-=-=-=-=-=-=-=-= Think you've got problems? How many SHOPPING CARTS are on YOUR WAN?
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:07:57 CDT