SUMMARY: Problem with Solaris 2 backup to a remote tape unit.

From: Jeff Graves (jgraves@eng.auburn.edu)
Date: Thu Aug 12 1993 - 04:04:29 CDT


Sun Managers:

ORIGINAL QUESTION:

   A few days ago we changed one of our servers from an ELC running SunOS 4.1.3 to
a Classic running Solaris 2.2 . This machine is also the tape host for an Exabyte
used primarily for backups. Since the change, I have not been able to run rdump/ufsdump
specifying this machine as the remote tape host, except when running rdump/ufsdump from
root which we don't want to do. If I run rdump/ufsdump specifying the backup file as
local rather than remote, then this problem does not occur. If I run dump to a disk
file instead of the tape drive, I get the same results. The following output listing
illustrates the problem:

armstrong.eng.auburn.edu{jgraves}1: /usr/sbin/ufsdump 5ucsf 23000 /dev/rmt/0cn /
.
.
  DUMP: Dumping /dev/rdsk/c0t3d0s0 (/) to /dev/rmt/0cn
.
.
.
  DUMP: dumping (Pass III) [directories]
  DUMP: dumping (Pass IV) [regular files]
  DUMP: level 5 dump on Fri Aug 6 15:46:43 1993
  DUMP: 19518 blocks (9.53MB) on 1 volume
  DUMP: DUMP IS DONE
armstrong.eng.auburn.edu{jgraves}2: /usr/sbin/ufsdump 5ucsf 23000 armstrong:/dev/rmt/0cn /
.
.
  DUMP: Dumping /dev/rdsk/c0t3d0s0 (/) to /dev/rmt/0cn on host armstrong
.
.
.
  DUMP: Protocol to remote tape server botched (in rmtgets).
rdump: Lost connection to remote host.
  DUMP: Bad return code from dump: 1
armstrong.eng.auburn.edu{jgraves}3: /bin/su
Password:
# /usr/sbin/ufsdump 5ucsf 23000 armstrong:/dev/rmt/0cn /
.
.
  DUMP: Dumping /dev/rdsk/c0t3d0s0 (/) to /dev/rmt/0cn on host armstrong
.
.
.
  DUMP: dumping (Pass III) [directories]
  DUMP: dumping (Pass IV) [regular files]
  DUMP: level 5 dump on Fri Aug 6 17:41:25 1993
  DUMP: 19524 blocks (9.53MB) on 1 volume
  DUMP: DUMP IS DONE
#

[the remaining part of the listing deleted]

Any ideas, hints, RTFM pointers, etc, will be greatly appreciated.

Jeff Graves
jgraves@eng.auburn.edu

THANKS TO:

From: trinkle@cs.purdue.edu (Daniel Trinkle)
From: "Richard J. Niziak" <rickn@copley.com>
From: jandoo@thijssen.nl (Jan van Doorn)
From: Kelly G. Price <Kelly.Price>
From: poffen@sj.ate.slb.com (Russ Poffenberger)
From: Ian_MacPhedran@engr.usask.ca (Ian MacPhedran)
From: sdo@phoebus.larc.nasa.gov (Sharon O. Beskenis)
From: vasey@issi.com
From: hoogs@alc.com
From: hkatz@nucmed.NYU.EDU (Henry Katz)
From: Tom Conroy <trc@NSD.3Com.COM>
From: Ron Russell <rrussell@ag.auburn.edu>

SOLUTION:

   The problem was in my .cshrc file, as almost everyone pointed out. We
use the Modules package to manage applications in a user's environment, and
we have separate sets of modules for Solaris 1 and Solaris 2. A few modules
have not yet been defined for Solaris 2, one of which was in my module load
list. When I would log into or use a remote command to a Solaris 2 machine,
Modules would give an error message for the missing module, and this error
message would cause the remote dump (probably rmt) to choke. Removing the
offending module from my module list fixed the problem. I did not have this
problem when armstrong, the tape host, was running Solaris 1, however. I do
have the line if ($?USER == 0 || $?prompt == 0) exit in my .cshrc but
it is after the module load command. My thanks to everyone who responded,
and yes, next time I will read the FAQ first.

Jeff Graves
jgraves@eng.auburn.edu

RESPONSES:

From: trinkle@cs.purdue.edu (Daniel Trinkle)

     My guess is that you have something in the remote user's .cshrc
file that generates output that ufsdump sees and assumes is coming
from the remote rmt process. I will assume you are doing the backup
as jgraves. I would suggest trying

armstrong.eng.auburn.edu{jgraves}1: rsh armstrong echo yes

     If you see anything other than "yes", you know something is
wrong. You probably don't see this for root because root's login
shell is /bin/sh, and .profile does not get sourced except for login
shells (rsh is not a login shell).

     To solve this, put the following close to the beginning of
jgraves' .cshrc, possibly after setting PATH and umask. You must have
it before any stty commands, or anything else that might generate
output.

if ( ! $?prompt ) exit

Daniel Trinkle trinkle@cs.purdue.edu
Dept. of Computer Sciences {backbone}!purdue!trinkle
Purdue University 317-494-7844
West Lafayette, IN 47907-1398

From: "Richard J. Niziak" <rickn@copley.com>

Got this from the SUN-FAQ:

29) My rdump is failing with a "Protocol botched" message. What do I do?

        The problem produces output like the following:

          DUMP: Date of this level 0 dump: Wed Jan 6 08:50:01 1993
          DUMP: Date of last level 0 dump: the epoch
          DUMP: Dumping /dev/rsd0a (/) to /dev/nrst8 on host foo
          DUMP: mapping (Pass I) [regular files]
          DUMP: mapping (Pass II) [directories]
          DUMP: estimated 8232 blocks (4.02MB) on 0.00 tape(s).
          DUMP: Protocol to remote tape server botched (in rmtgets).
         rdump: Lost connection to remote host.
          DUMP: Bad return code from dump: 1

        This occurs when something in .cshrc on the remote machine prints
        something to stdout or stderr (eg. stty, echo). The rdump command
        doesn't expect this, and chokes. Other commands which use the rsh
        protocol (eg. rdist, rtar) may also be affected.

        The way to get around this is to add the following line near the
        beginning of .cshrc, before any command that might send something
        to stdout or stderr:

        if ( ! $?prompt ) exit

        This causes .cshrc to exit when prompt isn't set, which
        distinguishes between remote commands (eg. rdump, rsh) where these
        variables are not set, and interactive sessions (eg. rlogin) where
        they are.

##########################################################################
                                +
Richard J. Niziak +
Systems Engineer + e:mail -> rickn@copley.com
Copley Systems + land mail -> Copley Systems, Inc
                                + 165 University Ave
                                 + Westwood, MA 02090
                                + voice mail -> (617)320-8300 x305
                                +
##########################################################################

From: jandoo@thijssen.nl (Jan van Doorn)

Hi,

Did you check the FAQ:

29) My rdump is failing with a "Protocol botched" message. What do I do?

        The problem produces output like the following:

          DUMP: Date of this level 0 dump: Wed Jan 6 08:50:01 1993
          DUMP: Date of last level 0 dump: the epoch
          DUMP: Dumping /dev/rsd0a (/) to /dev/nrst8 on host foo
          DUMP: mapping (Pass I) [regular files]
          DUMP: mapping (Pass II) [directories]
          DUMP: estimated 8232 blocks (4.02MB) on 0.00 tape(s).
          DUMP: Protocol to remote tape server botched (in rmtgets).
         rdump: Lost connection to remote host.
          DUMP: Bad return code from dump: 1

        This occurs when something in .cshrc on the remote machine prints
        something to stdout or stderr (eg. stty, echo). The rdump command
        doesn't expect this, and chokes. Other commands which use the rsh
        protocol (eg. rdist, rtar) may also be affected.

        The way to get around this is to add the following line near the
        beginning of .cshrc, before any command that might send something
        to stdout or stderr:

        if ( ! $?prompt ) exit

        This causes .cshrc to exit when prompt isn't set, which
        distinguishes between remote commands (eg. rdump, rsh) where these
        variables are not set, and interactive sessions (eg. rlogin) where
        they are.

Looks awfully similar to me!
Good luck!

-- 
Jan van Doorn, 
Thijssen BV, Veenendaal, Holland.
+31 8385 35111, jandoo@thijssen.nl.
 

From: Kelly G. Price <Kelly.Price>

Hi Could this problem be caused by your .cshrc? Specifically, do you have anything in your .cshrc that does output to stdout before the "if ($?USER == 0 || $?prompt == 0) exit" line is executed?

Kelly

From: poffen@sj.ate.slb.com (Russ Poffenberger)

The errors tyou listed are usually caused by performing operations in the login's .cshrc. An example would be 'biff y', that tries to grab or modify the terminal characteristics. This confuses the remote dump protocol.

Russ Poffenberger DOMAIN: poffen@sj.ate.slb.com Schlumberger Technologies ATE UUCP: {uunet,decwrl,amdahl}!sjsca4!poffen 1601 Technology Drive CIS: 72401,276 San Jose, Ca. 95110 Voice: (408)437-5254 FAX: (408)437-5246

From: Ian_MacPhedran@engr.usask.ca (Ian MacPhedran)

>From the FAQ for this list: 29) My rdump is failing with a "Protocol botched" message. What do I do?

The problem produces output like the following:

DUMP: Date of this level 0 dump: Wed Jan 6 08:50:01 1993 DUMP: Date of last level 0 dump: the epoch DUMP: Dumping /dev/rsd0a (/) to /dev/nrst8 on host foo DUMP: mapping (Pass I) [regular files] DUMP: mapping (Pass II) [directories] DUMP: estimated 8232 blocks (4.02MB) on 0.00 tape(s). DUMP: Protocol to remote tape server botched (in rmtgets). rdump: Lost connection to remote host. DUMP: Bad return code from dump: 1

This occurs when something in .cshrc on the remote machine prints something to stdout or stderr (eg. stty, echo). The rdump command doesn't expect this, and chokes. Other commands which use the rsh protocol (eg. rdist, rtar) may also be affected.

The way to get around this is to add the following line to the beginning of .cshrc, before any command that might send something to stdout or stderr:

if ( ! $?USERS || ! $?PROMPT ) exit

This causes .cshrc to exit when USERS or PROMPT isn't set, which distinguishes between remote commands (eg. rdump, rsh) where these variables are not set, and interactive sessions (eg. rlogin) where they are.

From: sdo@phoebus.larc.nasa.gov (Sharon O. Beskenis)

Hi Jeff! I have seen this before. This occurs when you have commannds executed that require an interactive shell and one does not exist such as rsh, rdump, ufsdump to a remote host, etc. We do the following in our .cshrc files

if ($?prompt) then set prompt="`hostname`> " set notify stty dec stty erase ^H /usr/games/fortune date endif

The ($?prompt) construct is used to test for an interactive shell so that rlogin works as expected whereas rdump will not try to set terminal characteristics that are non-existent or send date output to your process on the host invoking the remote dump. I hope this helps.

Sharon Beskenis Systems Manager Lockheed Engineering & Sciences Co. NASA Langley Research Center MS 478 Hampton, VA 23681-0001 (804)864-1703

From: vasey@issi.com

> Since the change, I have not been able to run rdump/ufsdump specifying > this machine as the remote tape host, except when running rdump/ufsdump > from root which we don't want to do.

Under 4.1.2 cannot dump regular filesystems or directories except as a member of the "operator" group, since rdump needs read access to the inode list, and this was granted via the /dev device entries. Although, the 4.1.2 dump fails earlier in the process with a different message when this happens, it could be a related problem for your version. (Sorry, no 2.2 here until late this year.) (Who said Unix error messages needed to be meaningful, anyway? ;^)

++ Ron vasey@issi.com International Software Systems Peace! ++ 1+512+338-5724 9430 Research, Austin TX 78759 <><

From: hoogs@alc.com

longshot:

check your dump user's group. this really shouldn't cause the problem, though, since dump is starting up. i have not been able to get our solaris box working within our backup scheme since svr4 and bsd don't use the same gid for 'operator'/'sys'.

-Tim

From: hkatz@nucmed.NYU.EDU (Henry Katz)

Jeff,

You've already checked the .rhosts, /etc/hosts and /etc/hosts.equiv for correctness. Also check the .cshrc on the remote host to see if it has any interactive portion with the console.

Henry Katz hkatz@nucmed.med.nyu.edu

From: Tom Conroy <trc@NSD.3Com.COM>

Hi Jeff:

Let me sumarize your testing:

1 - dump from Solaris machine to local tape - successful 2 - remote dump of Solaris machine to local tape - unsuccessful 3 - remote dump of Solaris machine to local tape *from sh* - successful 4 - dump from Solaris machine to local file - successful 5 - remote dump of Solaris machine to local file - unsuccessful 6 - remote dump of Solaris machine to local file (w/FQDN) - unsuccessful

Observations:

Using local dump rather than remote dump works (hostname:filename forces remote dump).

Using remote dump always fails *except* with bourne shell. Weird. The remote dump will use the shell initialization files (.cshrc or .profile) appropriate to the shell used on the remote machine. The shell that you are running 'ufsdump' from should be irrelevant unless you changed the default shell for root ...

Anyway, the problem is somewhere in your root .cshrc or .login file. If there is anything sent to standard output by these files, remote dumps will fail. To fix, remove whatever is in the file that is echoing something.

Good Luck,

trc

Tom Conroy trc@NSD.3Com.COM NSD Engineering SysAdm Group 3Com Corporation Santa Clara, CA

From: Ron Russell <rrussell@ag.auburn.edu>

I would suggest that you look in the newsgroup:comp.unix.solaris for the FAQ. In this FAQ is a listing of commonly experienced problems.

The protocol botched is an FAQ. I was out most of the day and thus did not see your message till late. There are hints as to why this happens and it does not seem to be SOL related.

What I think would work as a front-line test ist to try this as another user that does not have the same startup files.

In otherwords, RTFFAQ ala sun-mgrs.... :-{

I am head-deep in awk-scripts so I cannot test this and I would not dare from home. Please send me a summary as I am curious to hear if this differs in SOL2 from SOL1.

Regards,, Ron

Ronald C. Russell ronald.c.russell@ag.auburn.edu Network Mangler Voice: (205) 844-3213 College of Ag. FAX: (205) 844-4814 Auburn University Auburn AL, 36849 ##################################################

From: Cron report <news@ms.uky.edu>

This site has received a news article entitled

Problem with Solaris 2 backup to a remote tape unit.

apparently posted by you to a questionable group or groups:

comp.sys.sun.managers is not a real USENET group. If your site treats it as valid, this is most likely due to an old accident, prank, or software bug. Though some other sites may also recognize the name, the group is not official and doesn't have a large audience. You may want to look at comp.sys.sun.admin instead.

An official list of USENET groups, as well as a partial list of bogus groups, is regularly posted to news.announce.newgroups and news.groups. You may wish to take a look at them.

This is an automatic letter; no reply is necessary. If you wish to discuss this message, please contact kherron@ms.uky.edu, Kenneth Herron.



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:08:06 CDT