[I know I mispelled "boot". I got more responses about that than
helpful information. I am leaving it that way so those who match
requests and summaries can find this. -tep]
The problem turned out to be hardware, but not the 3/50. I replaced:
the tranceiver (with a new tap)
the computer, first with another 3/50, then with a SPARCstation IPC
The symptoms came down to the fact that no system tapped in that
location could recieve a packet from any machine more than about 300
feet away on the cable! Machine on either side of the "ethernet
triangle" could talk to each other with no problems!
It seems that the morning the problems started that the building
and/or net was hit by lightning. It blew three power supplies and the
Ethernet interface on a system three buildings away (but on the same
Ethernet cable). They replaced the supplies and Ethernet interface
board on that system, but that system still couldn't see the net.
When they lifted the false floor, they could *smell* the remains of
the tranceiver!
When they replaced that tranceiver, all of the problems in our
building went away.
Boy am I confused!
----------------Original Article:------------------
Sender: eecs.nwu.edu!sun-managers-relay@ucsd.EDU
From: tots!tots.Logicon.COM!tep@ucsd.EDU
Date: Wed, 20 Mar 91 15:16:11 PST
Reply-To: ucsd!tots.logicon.com!tep
X-Organization: Logicon, Inc., San Diego, California
OK, its been a long day, and I'm still stuck.
Environment: one Sun 3/180 server, four 3/50 clients, SunOS 3.5.
I came in this morning and my workstation (galt) was screenblanked and
did not respond to anything (including L1-a). It behaved as though the
server was down, but I checked the server (it was up) before I rebooted galt.
The other three clients are fine, although I have *not* tried to
re-boot them.
When trying to boot, galt never got any response to his RARP request.
I watched with etherfind -rarp on the server and I saw the requests
from galt, but saw no responses to the RARP request.
The portmapper, ypserver, ypbind, rarpd, inetd, rpc.lockd, rpc.statd,
etc. were all running on the server. The /etc/services, /etc/servers,
and /etc/rpc files are all over 1 month old.
I have rebooted the server, replaced and un-replaced the inetd with an
older version from one of the other servers (no effect).
I checked /tftpboot, the dir is unmodified since before the last
successful boots of the clients. The dir was last changed three months
ago, the clients have all booted in the last three days. The in.tftpd
and the ndboot.* all show the distribution date (Nov 87).
The /etc/nd.local file is also three months old, I can mount the
client's root on the server; it fsck'ed OK.
I started a rarpd on one of the other clients, and now the poor galt
machine knows its internet address, but now the server fails to respond to
the tftpboot requests! I now have 9 in.tftpd daemons on the server.
Apparently the tftp daemon gets started, but never responds, and
another daemon gets started when the client times-out and re-requests.
Ypcat of the ethers and hosts maps show everything A-OK. The ethers
file changed 2 months ago, and some blank lines were removed from the
host table this morning (restoring yesterday's host and ethers files,
followed by re-making the yp maps made no difference.)
The server has old disks and crashed recently with no apparent damage.
We see occasional "disk sequencer error" messages.
The server was re-booted this morning to install a new kernel (more
text table entries). I have the same problems with the old and new kernels.
What has happened to the server that has caused it to lose the ability
to boot this client? I am afraid to take the other clients down, as I
doubt that they would reboot.
I can't find any configuration errors; is it possible that some
critical piece of software has become corrupt on disk? What is that
can make both rarpd and tftpboot fail, but in different ways? Remember
that inetd *is* starting the tftpd, but tftpd cannot seem to respond
and hangs.
*Sigh* The network *might* be the computer :-(
--------------End Original Article--------------------
Tom Perrine (tep) |Internet: tep@tots.Logicon.COM
Logicon - T&TSD | UUCP: sun!suntan!tots!tep
P.O. Box 85158 |GENIE: T.PERRINE
San Diego CA 92138 |Voice: +1 619 455 1330
"Harried: with preschoolers" | FAX: +1 619 552 0729
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:12 CDT