I wrote:
Our configuration: Sun3s (3/280s and 3/180s) running 3.2 on one subnet,
sun4s (mostly IPCs) running 4.1.1b on the other subnet. The sun4s have
some Archive DAT drives on them. Routing between the two nets is via
two gateways, one an IPC, one a sun3/180.
Until a few days ago we were successfully using dump to do backups on the
sun3s on net A, writing to DATs on sun4s on net B. Suddenly, during the
dump of the first filesystem in the list (in phase IV, after some 70% of
the dump is done), I get "Lost connection to remote host", and dump dies.
I have tried the following to fix this problem:
1. Changed tape host on net B to a different sun4.
2. Changed ethernet route between nets A and B to the sun3 gateway host
(it was routing via the sun4 gateway host).
3. Changed dump host on the net A side from one sun3 to another.
None of these attempts changed the behaviour at all. Any suggestions are
appreciated; I haven't gotten a successful dump in three days, and I'm
getting a little worried. Thanks!
*lindy*
It seemed to boil down to a flaky ethernet board; a reboot fixed it for
now.
I got several helpful replies. Thanks for the suggestions!
Here are they are:
Date: Thu, 14 Nov 91 16:19:29 GMT
From: eeimkey@eeiua.ericsson.se (Martin Kelly)
To: lindy@olsen.ch
Subject: Re: dumps across subnets just broke -- help!
there are 4 types of lost connection error
#define ENETRESET 52 /* Network dropped connection on reset */
#define ECONNABORTED 53 /* Software caused connection abort */
#define ECONNRESET 54 /* Connection reset by peer */
#define ETIMEDOUT 60 /* Connection timed out */
#define ECONNREFUSED 61 /* Connection refused */
Echo the status value to see which type of connection error you are
getting: maybe its an ETIMEDOUT which may indicate that some part of
the network is unreachable. Also, you may switch the order of the
filesystems that you dump. Should you not be using rdump with the
relevant file servers in the dumping servers .rhosts file ?
We do not have either 3/280's or a DAT system so that's all I can
say I'm afraid.
From: David Fetrow <fetrow@orac.biostat.washington.edu>
Subject: Re: dumps across subnets just broke -- help!
To: lindy@olsen.ch (Linda Foster)
Is there anyone else involved in your network? Around here we sometimes
get upwards of 12 hours notice that the routers are being changed.
-- -dave fetrow@orac.biostat.washington.eduFrom: stu%ccse@hub.ucsb.edu To: lindy@olsen.ch (Linda Foster) Subject: Re: dumps across subnets just broke -- help!
After you get the routing problem resolved, remember to look for any orphaned processes left on the remote and local systems when your network lost its connections.
------------------------------------------------------------------------------ % ps -aux | egrep "rdump|rmt|rsh|perl" operator 26624 0.0 2.1 56 140 ? S 05:33 0:00 /etc/rmt operator 26618 0.0 0.0 28 0 ? IW 05:33 0:00 sh -c rsh sys1 '/etc operator 26619 0.0 2.9 36 200 ? S 05:33 0:00 rsh sys1 /etc/rdump operator 25707 0.0 0.0 1836 0 ? IW 04:00 0:03 /usr/local/bin/perl -s operator 26621 0.0 0.0 72 0 ? IW 05:33 0:00 csh -c /etc/rmt operator 26631 0.0 2.0 56 136 ? S 05:34 0:00 /etc/rmt ------------------------------------------------------------------------------
Good luck. Hope that helps.
Stu Swartz Computer Facilities Manager Center for Computational Sciences and Engineering (CCSE) University of California, Santa Barbara 3111 Engineering I Santa Barbara, CA 93106
From: wallen@cogsci.UCSD.EDU (Mark R. Wallen) To: lindy@olsen.ch Subject: Re: dumps across subnets just broke -- help!
I had a similar problem on a single cable. Remotely dumping to a Sun 4/280 tape host suddenly seemed to take forever, and eventually for some hosts (DECstation 3100s), the dump would finally time out and fail. My problem turned out to be a bad transceiver cable--it was making only marginal contact at one end. A new/different cable fixed the problem.
What changed 3 days ago when you started having the problem (or perhaps over the weekend)? Anyone bump cables?
Mark Wallen Cogsci, UCSD
From: feigin@inf.ethz.ch To: " (Linda Foster)" <lindy@olsen.ch> Subject: Re: dumps across subnets just broke -- help!
Are you running Sun's routed ? Very bad news indeed, especially when you are running parallel gateways to the same network. It doesn't work very well, if at all. The effects of running in parallel with SunOS 3.2 and 4.1.1 may also be part of the problem....
Try bringing down one of the interfaces on one of the gateways, killing routed on it, flush the routing tables, and then start routed in quiet mode (caveat: Sun has a funny idea as to what 'quiet' means) on that machine, and retry your dumps.....
If you're going to be running parallel gateways in production mode, dump routed and install gated; it works MUCH better in this type of situation....
Hope this helps...
/AWF
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:16 CDT