SUMMARY: Problems with dump utility

From: rwolf@dretor.dciem.dnd.ca
Date: Thu Jan 27 1994 - 13:53:33 CST


--------->>>>>>>>>> Original Question >>>>>>>>>>>>>>------------
Hello Sun-managers

    I am running SunOS 4.1.3 many patches on a Sun 4/670MP.

    For several years, I have been using the 'dump' utility to do backups.
    We have always done a level 0 dump on a daily basis to an Exabyte 8mm
    tape with hardly any trouble.
    /etc/rdump 0bsuf 126 60000 ...

    This week I have been doing level 1 dumps since the tape is getting full.
    /etc/rdump 1bsuf 126 60000 ...

    I am getting all kinds of errors like:
    DUMP: /dev/nrst9: Device busy
    DUMP: fopen on /dev/tty fails
    DUMP: The ENTIRE dump is aborted.

    This has happened on several different tapes, and I have power-cycled
    the tape unit. The tape unit is cleaned twice a month.

    The backups are done in the evening with the machine in multi-user mode
    via cron. I know the man pages says the machine should be in single-user
    mode but that is a lot of crap since, I have been doing level 0 dumps
    with hardly any trouble. Besides how in the world do you make 30+ machines
    go into single-user mode at specific times at night whenever a backup
    needs to be done and then back into multi-user mode later on?

    Is there a patch for dump? Does level 1 dumps have more problems than
    level 0? Thanks, and yes I will summarize.

--------->>>>>>>>>> Solution >>>>>>>>>>>>>>------------

It turns out that the dump utility was not the culprit. The key idea was
adding the sleep commands after each dump command. Normally you should not
have to do this but this was the key to solving the problem.

It turns out to be a flakey scsi interface on the tape unit itself. Setting
5 minute sleeps between dumps reduced the problems and was the convincing
point that this was a hardware problem.

Once we had the tape unit replaced everything started working again. I have
this great backup script that is now twice the size and filled with all kinds
of debugging code so I can prove when the next time it fails that it is a
hardware problem.

Thanks to all those who took the time to reply.

Robert J Wolf, Sun System Admin. DCIEM, CFB Toronto
rwolf@dciem.dnd.ca PO Box 2000 1133 Sheppard Avenue West
                                    North York (Toronto), Ont., Canada M3M 3B9
Internet: 192.16.207.3 Phone: (416)635-2073 FAX: (416)635-2104
"Capitalism with environmental ethics will benefit the entire world."

--------->>>>>>>>>> Original Replies >>>>>>>>>>>>>>------------
>From edguer@alpha.CES.CWRU.Edu Thu Dec 23 18:31:39 1993
Subject: Re: PROBLEMS with level 1 dump
To: rwolf@dretor.dciem.dnd.ca
X-Mailer: ELM [version 2.3 PL11]
Content-Length: 1607
X-Lines: 31
Status: RO

The "device busy" probably means that the previous tape operation has not
finished. The fopen fails is because it is trying to send you an error
message and get some input and cannot because there is no controlling
terminal. I would suggest putting a "sleep 30" between each of your
dumps to allow the tape drive time to cycle.

Yes, it can be done. It was described on Sun-Spots, check the archives.
Personally, it is too much bother and I cannot afford the down time even
at night.

In essence, you modify the rc.* files [depending on when during the boot
process you want it done] to look for a specific file [just as the SunOS
4.1.3 rc.boot looks for /etc/.UNCONFIGURED] and perform the appropriate
actions if the file exists [note - this can be a security risk]. Then you
rsh a shutdown [to try to keep things in synch - otherwise use a time
synch protocol like NTPv3 and cron] and after the client has finished the
dump it completes the boot process and goes multi-user.

Aydin

>From ian@fmlrnd.co.uk Fri Dec 24 04:57:52 1993
To: rwolf@dretor.dciem.dnd.ca
Subject: Re: PROBLEMS with level 1 dump
Content-Length: 500
X-Lines: 18
Status: RO

Hi Robert,

Just one question. Do you run dump as root (UID=0)?

If not then that is your problem. Only root can access the filesystem
properly for incremental dumps.

(I believe there is another user that can be used which has a UID=5
and GID=15 but I have not tried it for incrementals).

Hope this helps,
Ian

Ian Camm e-mail:i.camm@fmlrnd.co.uk
Systems Administrator Tel:+44 61 230 6262
Computer Services Group Fax:+44 61 230 6276
Fujitsu Microelectronics Limited
Manchester, England

>From heas@chpc.org Fri Dec 24 10:48:47 1993
To: rwolf@dretor.dciem.dnd.ca
Subject: Re: PROBLEMS with level 1 dump
Content-Length: 357
X-Lines: 9
Status: RO

> DUMP: /dev/nrst9: Device busy
> DUMP: fopen on /dev/tty fails

        has this machine been rebooted ? This doesn't look like a tape/drive
problem, but more like there is some process (defunct maybe?) that haas the
device open....fuser /dev/nrst9 & /dev/tty. I would pos -auxww and look for
rmt's or tar's or dump's, etc that may be hanging out.

-heas

>From leafusa!orac.HQ.Ileaf.COM!stuart@ileaf.com Fri Dec 24 14:05:39 1993
To: rwolf@dretor.dciem.dnd.ca
Subject: Re: PROBLEMS with level 1 dump
Content-Length: 570
X-Lines: 12
Status: RO

It looks to me as though you're using the right dump syntax, but that
you're trying to use the tape drive while another process has it
allocated. Do a "ps ax" on the machine with the drive & see if there
are any dump, mt, or rmt commands hanging around. You can then poke
around & see if you have a stuck dump somewhere.

                Stuart

-- 
Stuart Freedman                           Interleaf, Inc.
stuart@ileaf.com, uunet!leafusa!stuart    Prospect Place
also postmaster@ileaf.com                 9 Hillside Ave.
+1(617)290-0710 or 290-4990,1-1708        Waltham, MA 02154

>From ndd@sunbar.mc.duke.edu Tue Dec 28 10:29:15 1993 To: rwolf@dretor.dciem.dnd.ca Subject: Re: PROBLEMS with level 1 dump Newsgroups: duke.sun-managers X-Newsreader: TIN [version 1.2 PL2] Content-Length: 1201 X-Lines: 32 Status: RO

the fact that it is level 1 should be irrelevant. have you power-cycled the machine that the tape drive is on? I've had this happen when a dump failed, perhaps by running out of tape, and a flag in the kernel gets stuck on. rebooting the dumphost always fixed it.

-- Ned Danieley (ndd@sunbar.mc.duke.edu) Basic Arrhythmia Laboratory Box 3140, Duke University Medical Center Durham, NC 27710 (919) 660-5111 or 660-5100

>From symanski@gold.nosc.mil Wed Dec 29 10:31:19 1993 To: rwolf@dretor.dciem.dnd.ca Subject: Re: PROBLEMS with level 1 dump Content-Length: 363 X-Lines: 16 Status: RO

I have seen similar errors for level 0 dumps.

They seem to be intermitent. I usually dump about 6 pm.

Just restarting the dump usually works.

I don't think there is a definitive answer to this problem.

I have seen much similar mail on this subject.

I have seen answers from power cables to lingering rmt processes..?

Please send me any solutions.

--- jjs

>From Larry.Belvin@analog.com Mon Jan 3 10:22:27 1994 To: rwolf@dretor.dciem.dnd.ca Subject: Re: PROBLEMS with level 1 dump Content-Length: 832 X-Lines: 19 Status: RO

Robert:

Sorry for the late response, but our division shut down for the holidays. We have the identical configuration as you - 670 MP running SunOS 4.1.3. However, we did not install any patches for 4.1.3. We also run our back-ups in the evening with the machine in multi-user mode via cron. We run level 0 on Monday, level 1 on Tuesday, level 2 on Wednesday, level 3 on Thursday, and level 5 on Friday. We have been doing this for 2+ years with no problems at all. I haven't seen the error that you've been getting. From my experience, level 1 dumps do not have any more problems that level 0 dumps.

Larry Belvin Analog Devices CTS Division Larry.Belvin@Analog.com 181 Ballardvale Street (617)-937-1252 Wilmington, MA 01887-1051 (617)-937-1013 (fax) U.S.A.

>From strombrg@hydra.acs.uci.edu Mon Jan 3 11:14:10 1994 To: rwolf@dretor.dciem.dnd.ca Subject: Re: PROBLEMS with level 1 dump Content-Length: 2443 X-Lines: 52 Status: RO

This "device busy" thing is also a function of the device driver - and device drivers can get wedged into inconsistent internal states. Sometimes, it can take a reboot, to get the driver's private data structures into a consistent state again. This is perhaps especially true of some of sun's tape drivers - there might be a relevant patch.

Sometimes this can also happen, because something really -is- using the drive, in a sense. "fuser /dev/nrst9" could prove useful. Often when this is the case, it's a matter of there being a bogus /etc/rmt hanging around.

>From bern@kleopatra.Uni-Trier.DE Tue Jan 4 05:26:23 1994 To: rwolf@dciem.dnd.ca Subject: Re: PROBLEMS with level 1 dump Reply-To: bern@uni-trier.de Content-Length: 1922 X-Lines: 46 Status: RO

(Probably VERY late, but here goes ...)

> For several years, I have been using the 'dump' utility to do backups. > We have always done a level 0 dump on a daily basis to an Exabyte 8mm > tape with hardly any trouble. > This week I have been doing level 1 dumps since the tape is getting full. > I am getting all kinds of errors like: > DUMP: /dev/nrst9: Device busy

Try "fuser /dev/nrst9" (or some of the "Aliases" of this Device) to find out what blocks this Device. If fuser doesn't do the Job, there are some enhanced PD Doalikes availables.

> DUMP: fopen on /dev/tty fails

I read ahead that you launch this from cron - Perfectly normal, then. If this should be a serious Problem, there are PD "Wrappers" which make Things look tty-y to a Process.

> The backups are done in the evening with the machine in multi-user mode > via cron. I know the man pages says the machine should be in single-user > mode but that is a lot of crap since, I have been doing level 0 dumps > with hardly any trouble. Besides how in the world do you make 30+ machines > go into single-user mode at specific times at night whenever a backup > needs to be done and then back into multi-user mode later on?

Add the following to /etc/rc.local (after having mounted everything, but before going multi-User):

if [ -f /etc/SingleStuff ]; then mv /etc/SingleStuff /etc/SingleFAILED chown root /etc/SingleFAILED chmod u+x /etc/SingleFAILED /etc/SingleFAILED rm -f /etc/SingleFAILED fi

Now to execute whatever Script in single-User, copy it to /etc/SingleStuff and do a shutdown -r. If in the next Morning you find /etc/SingleFAILED existing, there was something wrong causing a Re-Reboot before the Script was done. (The mv in the Stuff above avoids a Do-forever-Loop in such Cases.) You might want to add a "tee" to keep a Log of the Execution etc. etc..

Regards, J. Bern



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:08:55 CDT