On 4 Oct. 1990 I wrote to this list, requesting help with two problems:
(1) NFS daemons on a Sun 4/280 server consumed abnormal amounts of CPU time.
(2) Same server had numerous ``ie0: no carrier'' messages.
I received a number of responses, the gist of which seemed to be as follows.
(1) Abnormal NFS activity.
This can be caused by a process which goes into a ``paging'' loop over
NFS. This happens when an executable image is recompiled while the
previous version of the executable is still running (typically in the
background). Look for a huge value in the PAGEIN field displayed by
``ps -gavx'', for example. This appeared to be the cause of my problem.
It was also mentioned that there are various utilities which help with
NFS problems, including Sun's ``traffic'' (a Sunview utility, rumored
to be somewhat undependable), and NFSwatch, written by Dave Curry at
SRI. I was told that NFSwatch is available for anonymous ftp from
syn-gate-gw.synoptics.com, in either ~ftp/pub or ~ftp/tmp. I was unable
to reach that machine, so I can't comment on the utility.
(2) ie0: no carrier
Someone pointed out that this error message is mentioned in the manual
entry for ``ie''. Most people noted that this can be caused by flaky
connections, typcially, but *not neccessarily* on the machine where the
error is reported. Several people recommended using some kind of a network
analyzer, which I think is a good idea, but which I haven't yet done.
Other people mentioned that this message can be caused simply by an
overloaded network.
Below I've listed the individuals who responded, along with a very short
description of each response. Please contact me if you want the unexpurgated
version of any of these messages. Also, if you responded, and you feel I've
badly misrepresented your position, please send me a note.
(1) Steven Blair (sblair@synoptics.com>
Get NFSwatch from syn-gate-gw.synoptics.com.
(2) Ron Vasey (uunet!mcc.com!vasey)
Maybe a leak in the network caused by poor connections or grounding
problems; might want to do TDR testing.
(3) Laura Pearlman (pearlman@rand.org)
UFS uses lots of the server's memory; problem is related to file
caching in SunOS; use vmstat and look for large ``sr'' number.
File reference count goes to zero after *each* NFS write; SunOS
will examine every page in cache; bad if large files; use vmstat
and look for large ``at'' (~10,000).
SunOS is very inefficient in file lookups if any single component
of the pathname is more than 15 characters in length.
(4) Bill Eshelman (wde@agen.ufl.edu)
DB-15 connector prongs are spaced wrong, preventing full contact.
(5) Daniel Trinkle (trinkle@cs.purdue.edu)
Could be loose transceiver cable or just a busy network. Network
General Sniffer does good job in such cases.
Can also try ``traffic'', ``emon'' (a program used at Purdue
and elsewhere), or ``nfswatch'' to monitor network via software.
(6) Roland Schneider (sch@eeserv.ee.umanitoba.ca)
Probably caused by excessibe paging; look at ``ps -gavx'', find
large PAGEIN, then ``kill -9''. Caused by overwriting an executable
while it's running.
(7) Joe Pruett (tessi!joey@nosun.West.Sun.COM)
Check for cable problem with ``netstat -i''; look for error count
greater than 1%.
(8) Ed Morin (mdisea!fh20c!edm@uunet.UU.NET)
Transceiver cables don't fit well on Sun servers; remove washers
under binding posts on the cable for a better fit.
(9) Bala Vasireddi (bala@sysopsys.com)
Problem may be caused by recompiling executable while previous version
is still running. Use ``traffic'', with ``src'' and ``dest'' in
different windows; then run ``ps -aux'' and compare MEM and RSS cols.
RSS will be small (600k) compared to MEM (10MB).
(10) Colin Alison (colin@cs.st-andrews.ac.uk)
The nocarrier message can be caused by aggressive network traffic.
They stopped running rwhod to get rid of it.
(11) Alastair Young (alastair@es2.co.uk)
``ie0: no carrier'' can be indicative of dodgy hardware causing
dropped packets. Analyze using a network monitor/reflectometer.
(12) Ron Madurski (ron@DRD.Com)
Make sure there are no getty's trying to get a non-existent terminal
to login. Try to isolate section of network that's giving errors;
maybe use a Lanalyzer.
(13) Debbie Eckel (deckel@relay.nswc.navy.mil)
She has similar problems. Points out that man page for ie describes
the problem.
In addition, there were a few other people who responded that they had
experienced similar problems. I hope that this summary has answered some of
their questions.
- Mike
-----------
Mike Hannon mike@ucdhep (Bitnet)
ucdhep::mike (HEPnet) 42385::mike (HEPnet)
mike@ucdphy.ucdavis.edu (Internet) 916-752-4966 (Telephone)
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:05:59 CDT