(SUMMARY)Root filesystem full, reprise

From: adam r. christopher (adam@du.edu)
Date: Thu Aug 20 1998 - 13:05:23 CDT


SUMMARY

Thanks, everyone, for the advice. This is a long one, so you may want to
just file it.

The best way to find huge files is by far du -sk * from a suspect
directory. Also, be sure to copy /dev/null to the file eg. cp /dev/null
filename or cat /dev/null >> filename, don't just delete it. Then HUP the
processes that use the files. Copying /dev/null is a good idea because not
all log files are automatically recreated on HUP or even reboot. Read and
love the du manpage. I was also pointed to the freeware utility lsof.

I had a lot of people warn me to check the /devices directory for huge
files because people often run backups to /dev/deviceO instead of
/dev/device0. This wasn't my situation, but a good thing to look out for.

I eventually found that the Soltice Backup server was creating
huge files in its /nsr/index and /nsr/mm directories. I then cp -rp to
~adam/nsr_save and tar -cf - . and ftp'd the data to another box. Then,
nsr_shutdown -a to stop the backup server, edit vfstab to create another
mount point (in my case, I also ran newfs to create the ufs and fsck to
check its state) mount /dev/dsk/c*t*d*s* /nsr and copied the data back.
/etc/mnttab will show you what is currently mounted.

Then a reboot to make sure all is well. It is not, the partition isn't
large enough ;), but at least I know where to go (/ went down to 23%!).

The Backup Server can also be stopped via the rc2.d/s95networker
stop/start and by /etc/init.d/networker stop/start, but nsr_shutdown -a
seemed the most comprehensive. If, by the way, you can't run any of the
SolticeBackup commands, put /usr/sbin/nsr in your path. There are many
gems in this directory, the gui just sucks.

Moral of the story: If you're going to use Soltice Backup, have a huge
partition for it and keep an eye on the contents of /nsr/*

Oh, and it was suggested that I just create a link from /nsr to another
directory. This is easy, but not as fun as bullying the db around the
network, killing pids, and playing with mount points and partitions.

Thanks go to Frank Fiamingo, Steve Boronski, Mark Lundy, and Craig
Ledbetter, and to the 25 others that replied!!!

 -adam

On Wed, Aug 19, 1998 at 08:12:52AM -0600, adam r. christopher wrote:

ORIGINAL QUESTION

> Hello Gurus,
>
> Sunny Chen solicited your advice earlier this week because her root
> filesystem was full - while the timing for this solution was perfect,
> the problem still exists for me. I have three Solaris 2.5 servers and /
> is full on two of them. No core files exist, my /usr and /tmp
> directories are on separate partitions, there's nothing in lost+found,
> and I moved or removed the following files:
>
> /var/log/syslog*
> /var/adm/messages*
> /var/adm/wtmpx
> /var/adm/wtmp
> /var/adm/utmpx
> /var/adm/utmp
> var/preserve/*
>
> and / is still at %100. The boxes are about four months old and have
> uptimes around thirty days. One is not used at all at this point, the
> other serves a flat file database to a modem pool (logs get written to
> /usr/local/...) . Each only has five users. Here's the df -k for each
> box:
>
> Filesystem kbytes used avail
> capacity Mounted on
> /dev/dsk/c0t0d0s0 334448 325185 0 100% /
> /dev/dsk/c0t0d0s7 1961791 945197 820424 54% /usr
> /proc 0 0 0
> 0% /proc
> fd 0 0 0
> 0% /dev/fd
> /dev/dsk/c0t0d0s4 491977 230266 212521 53% /qip
> /dev/dsk/c0t0d0s5 245980 28 221362 1%
> /backup
> /dev/dsk/c0t0d0s6 245980 9 221381 1%
> /undefined
> swap 206056 6648 199408 4%
> /tmp
> /dev/dsk/c0t0d0s3 961257 673070 192067 78% /old-usr
>
> and
>
> Filesystem 1024-blocks Used
> Available Capacity Mounted on
> /dev/dsk/c0t0d0s0 576656 543076
> 0 100% /
> /dev/dsk/c0t0d0s1 816975 554806
> 180479 76% /usr
> /proc 0
> 0 0 0% /proc
> fd 0
> 0 0 0% /dev/fd
> /dev/dsk/c0t0d0s2 480620 9
> 432551 1% /backup
> /dev/dsk/c0t0d0s7 432839 9
> 389550 1% /http
> /dev/dsk/c0t0d0s3 480620 9
> 432551 1% /undefined
> /dev/dsk/c0t0d0s4 480620 9
> 432551 1% /undefined2
> /dev/dsk/c0t0d0s5 480620 9
> 432551 1% /undefined3
> swap 237560 1304
> 236256 1% /tmp
>
> Where do I go from here? Thank you very much for your help with this,
> I'll update and/or summarize.

UPDATE

> I'vce received many responses to my original question (see below),
thanks
> to everyone! This list is a godsend. I've received many great tips, I'll
> summarize them as soon as I get this figured out. For now, it's
important
> for me to mention that I have rebooted and the files in /var/adm/* and
> the like were not the culprit. Neither was there an errant /dev file.
>
> My current state is this: I've isolated the culprit (I think) to the
> Soltice backup utility's save set database. The result of a du -sk
/nsr/*
> is:
>
> 153 /nsr/cores
> 255821 /nsr/index
> 153 /nsr/cores
> 255821 /nsr/index
> 3 /nsr/logs
> 290 /nsr/mm
> 1 /nsr/rap
> 17 /nsr/res
> 1 /nsr/tmp
>
> from /nsr/index it is:
> 255820 hostname
>
> from hostname:
> 2 README
> 255816 db
>
> from /nsr/mm
> 288 mmvolume
>
> Does anyone have any advice as to how I might change /undefined in
> /etc/vfstab to /nsr and then move the /nsr directory to its own mount
> point without breaking my backup setup? Is there a better way? I've
> already deleted volumes and compressed the database with nsrck and
> nsrmm.
> No joy. My policy is to browse and retain for seven days, but this still
> leaves me with that huge db file. Hell, am I even on the right track?
>
> I once again thank everyone for their responses! I owe you a Molsen.
>
> -adam
> unix/nt sysadmin
> university of denver
>



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:12:46 CDT