SUMMARY: Sun Managers - nfsd's spinning in circles chewing CPU time

From: Alan G. Arndt (aga@Comtech.COM)
Date: Mon Dec 13 1993 - 19:17:51 CST


I must apologize for taking SO LONG to send out a summary but it took
us some time to upgrade and then I wanted to be sure it solved the
problem.

Anyhow, the original text of my first message is included below. The
quick summary is that we upgraded from 4.1.1 to 4.1.3 and the problem
has been solved. I have no idea of what actually solved the problem
and I guess at this point I don't care, just so that it doesn't happen
again. The main dissapointment in the upgrade is that in some circumstances
4.1.3 seems to be slower for NFS requests than 4.1.1 was. We are not
talking a huge amount and that slower might be in response time for a
single PC client and the overall throughput might be higher, I have no
way of telling.

Most of the responses were from people saying to upgrade so that is what
we did.

Thanks to all (not that many actually) who assisted.

Alan Arndt Comtech Labs
415-813-4500 900 Hansen Way
aga@Comtech.com Palo Alto, CA 94304

>
> Recently we have had a terrible problem of nfsd's on our server
> getting into a state where all they seem to do is run. Accomplishing
> next to nothing.
>
> The System:
> Sun 1+, 4.1.1 os, a few minor patches.
> I agree it isn't much of a server, but it's what we have.
>
> Under normal circumstances it works ok for the 30+ pc's that pelt on
> it. It will easily handle 100-200 read/write nfs operations a
> second.
>
> In it's dead state all the nfsd's are continually running, using up
> 100% of the cpu time. The are only servicing 6-20 read/write nfs
> operations a second. There seems to be no single item that throws
> the nfsd's into this state. They can stay that way for quite some
> time (over half and hour sometimes). It appears that at least the
> last few times that if the network is diconnected for a few seconds
> the nfsd's recover and work fine. The server itself is incredible
> slow during these times. even echo'ing characters can take 30
> seconds and running a command several minutes. When the cable is
> disconnected everything goes back to normal.
>
> So from this we have definately determined that the network is
> causing the problems. The network itself is not dead, we can
> transfer 600-800kb/sec between other machines on the net. The nfsd's
> are processing some requests. Just at about 1/10 to 1/50th the rate
> they should. Also during this dead state the disk is barely being
> accessed, so it isn't locked up waiting for the disk.
>
> As far as I know we haven't changed anything that would effect
> anything but as always that could be totaly wrong. I went as far
> comparing the kernel and the nfsd binary files with old copies to see
> if somehow they had been corrupted, no luck, they are fine.
>
> I have not noticed any weird messages in the system logs.
>
> So does anyone have any clue? Is there some patch around that might
> solve this? I really have no idea why it started recently.
>
> Thanks alot for any help you may provide,
> Alan Arndt
> aga@Comtech.com



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:08:33 CDT