My original posting was:
> We have occasionally had processes lock on disk access -
> a couple of times it was nemacs, a couple of times it was
> an NFS disk read. Each time the STAT field of ps was 'D'
> (Process in disk (or other short term) waits). But the
> process never seems to come out of it. It is also unkillable.
> So far, the only way I've found to get rid of such processes
> is to reboot. Is there any other solution?
>
> Vital statistics:
> System: Sun4/370 server + several sparcstation1's
> O/S: SunOS 4.0.3c
>
Well, just as I feared, the only way to get rid of the process
seems to be to reboot. Someone suggested trying to kill the
rpc.lockd's on both machines. Haven't had a chance to try it yet.
I think I did get the cause of the problem, though.
from jstewart@rodan.acs.syr.edu :
> Oftentimes, especially with emacs flavours, we've found it it because
> the partition is full, or the account using it is full. If that's so,
> then freeing up space works very well.
and from stern@sunne.east.sun.com
> there are many NFS bugs in 4.0.3[c] that cause processes
> to hang -- most of them have to do with the NFS client code
> going to sleep waiting for a page that was already freed up.
>
from jan@eik.ii.uib.no
> - take much greater care to keep filesystems < 90% full.
> (this may be worth checking. I cannot remember seeing the
> nfsd's run into disk-wait except if there was very full
> filesystems.)
It is very probable that the problem is related to disk being full.
I think the locked processes coincided with the disk going to >98%
full.
Solutions suggested are:
Upgrading to 4.1.1
Getting the "NFS Jumbo Patch for 4.0.3" from sun, which
includes about 17 different bug fixes for this
and related problems.
Getting the lockd-patch
Freeing up disk space.
I'm looking into these options now.
Upgrading to 4.1.1 may help although someone seems to be having
similar problems with it:
> Hi Sandra. We just installed three new Sparc2 fileservers running 4.1.1
> and have begun to experience the same problem. Occassionally, NFS reads
> will cause the nfsd processes to go into disk wait; one by one, all 8
> of our nfsd's succumb. Our installation is about as vanilla as it
> comes - pre-installed 4.1.1B. I don't have any solutions for you,
> except to say that we haven't seen the problem in a couple of days...
> We never experienced any of these troubles before we started playing with
> automount -- possibly connected? I don't know.
Thanks to the following people for responding:
holle@asc.slb.com
jan@eik.ii.uib.no
jstewart@rodan.acs.syr.edu
jnapier@ucsd.edu
stern@sunne.east.sun.com
oconnor!sbcoc.com!miker@oddjob.uchicago.edu
sheryl@gwusun.gwu.edu
tyen@mundo.eco.utexas.edu
kevin.sheehan@fourx.aus.sun.com
cdr@acc.stolaf.edu
brett@den.mmc.com
hermit@pcs.cnc.edu
kirk@zabriskie.berkeley.edu
sdb%hotmomma@uunet.uu.net
Thanks, sandra
email: sandra@digital.co.jp (We are not DEC!)
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:20 CDT