Original question:
>We have a S4 (64MB) serving two diskless S4's (32MB each) and two diskless
>>S20MP
>(64 and 128MB). The clients have a local swap disk (512Meg internal). All
>systems run 5.4 (full install), 101945-32 and other recommended patches.
>The problem is that the clients have serious performance problems. It can take
>weeks, but there comes a moment that the response becomes extremely slow. From
>then on you can hear the local swap disk continuously snoring, indicating
>>memory
>problems. A ps -el shows nothing unusual (it takes minutes before the
>output is
>on the screen). It just seems that there is less RAM available than
>>installed...
>I asked Sun but they couldn't help so far. Rebooting the system is my only
>solution.
>I suspect a memory leak somewhere. What is the best way to identify the
>responsible process? (Maybe it is the kernel)
>How can you know how many physical memory (RAM) is unused?
It turned out to be a caused by the bugs
1176873 libvolmgt has a "file descriptor leak"
1177560 volume driver leaks memory
which were solved in patch 101907-03 (we had 101907-02 installed). The
current version is 09.
It may be worth noting that the leak only occurs (in our case at least)
when the volume managers is loaded (which is the default) and the file
manager is used, but only if there are NO devices present that interact
with the volume manager, like cd-roms and floppy disks. This explains why
some machines of ours had problems and some had not.
Casper Dik pointed out a method to see if the kernel might have a memory leak:
lauden% crash
dumpfile = /dev/mem, namelist = /dev/ksyms, outfile = stdout
> kmastat
buf buf buf memory #allocations
cache name size avail total in use succeed fail
---------- ----- ----- ----- -------- ------- ----
kmem_slab_cache 32 104 381 12288 827 0
kmem_bufctl_cache 12 225 1524 24576 3739 0
kmem_alloc_8 8 45 1524 12288 41958 0
kmem_alloc_16 16 165 1016 16384 297024 0
kmem_alloc_320 320 4 67308 22974464 67831 0
---- lots of lines ----
Total - - - 27029504 60085873 3
This means that 67308 320 byte shrunks are lost, being over 80% of the
total memory. No wonder the system had performance problems! After applying
101907-09 we get
kmem_alloc_320 320 6 12 4096 92 0
remaining constant for days.
Thanks to all who responded!
Peter
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:51 CDT