First off, many thanks to:
Stephen Harris, Jon Wright, Unixboy,
Oliver Hemming, Scott D. MacKay, Bruce Rossiter,
Bismark Espinoza, and Chris Marble.
Most all of whom redirected my attention from the filesystem or
disksuite, simply to swap and the way it is used. The culprit was
*not* disksuite or the fact it is mirrored, but instead a process
tying up swap - as many of the above people suggested. Surpisingly,
running lsof showed *nothing* but running top showed that the Legato
front-end/GUI "nwadmin" was hogging all of swap.
What had thrown me was that I did not realize that the *total reported
size* of /tmp (swap + RAM) would *decrease* when being used. Could
someone explain this to me, I mean there are already fields for "used"
so why shouldn't total remain the total?
Anyhow the solution/summary is that killing the errant process caused
all of swap to reappear. I did this on two of our 8 statewide backup
servers where /tmp had shrunk dangerously low. The short-term
solution will be to not run nwadmin for weeks on end - the operators
tend to open the app and leave it open and that has been what has been
slowly chewing up /tmp. If anyone has any tips on how to be able to
run nwadmin and not have it behave like this, please let me know. I
imagine this is a bug in Legato and our versions may not all be
patched so I will check that first before getting pushed out of shape.
Thanks again for all your help and suggestions,
Adam
Here is the before killing just the nwadmin process (this may not be
the same system I first posted about as it happens on various systems,
but the problem and fix were/are the same):
bkup% sudo top -b
Password:
last pid: 7713; load averages: 0.11, 0.05, 0.04 09:58:08
48 processes: 47 sleeping, 1 on cpu
Memory: 57M real, 664K free, 125M swap, 49M free swap
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
7713 root -7 0 1376K 1184K cpu 0:00 1.46% 1.46% top
7698 root 33 0 1752K 1424K sleep 0:00 0.97% 0.97% sshd1
23138 root 33 0 4120K 2872K sleep 283:36 0.93% 0.93% nsrd
23068 root 24 0 98M 5840K sleep 130:27 0.45% 0.45% nwadmin
7701 adam 20 0 1552K 1176K sleep 0:00 0.10% 0.10% ksh
8181 root 15 0 1664K 832K sleep 1:55 0.01% 0.01% sshd1
bkup% df -k /tmp
Filesystem kbytes used avail capacity Mounted on
swap 49880 1272 48608 3% /tmp
bkup% sudo kill 23068
bkup% df -k /tmp
Filesystem kbytes used avail capacity Mounted on
swap 145552 1272 144280 1% /tmp
bkup% sudo top -b
last pid: 7722; load averages: 0.00, 0.03, 0.04 10:01:04
47 processes: 46 sleeping, 1 on cpu
Memory: 57M real, 1072K free, 32M swap, 143M free swap
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
7722 root -17 0 1376K 1184K cpu 0:00 1.41% 1.41% top
23138 root 33 0 4120K 2872K sleep 283:38 0.10% 0.10% nsrd
etcetera...
Here is the swap -l and -s some of you suggested (it is from a
different server from the above, but problem is the same, sorry its
the only swap output I have handy and it is current and shows the
problem of reduced swap):
bkup% swap -l
swapfile dev swaplo blocks free
/dev/md/dsk/d1 85,1 16 267504 432
bkup% swap -s
total: 162256k bytes allocated + 4080k reserved = 166336k used, 12416k
available
email: regnis@worldnet.att.net
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:13:18 CDT