SUMMARY: NFS (UDP) Loading of MP690

From: Stephen Miller (stevem@csdc02.orl.mmc.com)
Date: Fri Jul 23 1993 - 22:16:34 CDT

Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Hey All:

I'm posting this summary quite late; my apologies. I got a lot of enlightening
responses, which cured the problem. It was actually quite trivial, with the
propper tools.

Original problem:

> HELP! My main server, a MP690/140, has developed a serious problem over the
> last couple of days. I've just rebooted the machine and, as I'm here late
> at night, the network is stable, with little activity. Yet, the machine has
> about a 20% load. I think I recall this should be around 2%.
>
> During the day, with the network lightly loaded, I get the following amd
> errors, repeated many times; and from all the various workstations.
> As the machine hosts NIS and AMD, this problem is rendering my
> network unusable!
>
> NFS server amd:141 not responding still trying
> NFS server amd:141 ok

The first correct response (with many others to follow) was from
mikem@ll.mit.edu (Michael J Maciolek);

*It appears that you have a rogue client that's making an abnormally large number
*of NFS requests; could be some process stuck in a loop somewhere on another
*client that's flooding you with 'stat' requests or some such.
*
*Step one is to use a network monitoring tool...

Right on! I already had top, but that didn't really uncover the problem. It
just told me that there where many nfsd's running - hogging up cpu. What I
lacked was a P/D tool called nfswatch, as chuck-strickland@orl.mmc.com pointed
out. Thanks Chuck!

A particular user had 11 processes running on several machines. These where put
in the background with a nice level of 10. As I quickly learned, this doesn't
always guarantee safety. The process was designed to do very little i/o. It
would do a scanf, perform its function, flush, and then repeat. The problem
was, when the process finished, it didn't exit. Worse yet, it kept doing scanf,
flush, over and over, while only checking for a 0 condition - NOT the EOF! That
was it!

It turns out that I could have used etherfind -i le0 dst 192.149.52.1 to learn
what was flooding this server, but nfswatch 4.0 is really trick!

Shelley L. Shostak <742123506@duke.cs.duke.edu> reports that nfswatch could be
found from ecn.purdue.edu.

P.S. I'm using a new mail tool, zmail, so if this comes out "funny",
my appologies!

Many thanks to;

"Malcolm C. Strickland" <chucks@orl.mmc.com>
mikem@ll.mit.edu (Michael J Maciolek)
Mark Herberger <mherberg@eve044.cpd.ford.com>
tommy@boole.uucp
"Ric Anderson" <ric@cs.arizona.du>
root@ewi.ch Christoph Rothlin
rwolf@dretor.dciem.dnd.ca
kowal@ide.com
perryh@pluto.rain.com (Perry Hutchison)
Arie Bikker <aribi@geo.vu.nl>
<742123506@duke.cs.duke.edu> Shelley L. Shostak
Jeff Mallory <jeff@access.digex.net>

-- 
--
_/_/_/  Stephen Miller
/_/_/ stevem@csdc02.orl.mmc.com
_/_/ Martin Marietta Information Systems 
  / Orlando, Florida 32811-0385
 / Voice: 407/826-1348 Fax: x6230 
--------------------------------------------------------------------------------

Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:08:03 CDT