SUMMARY: Problems with piping dump output to exabyte via network

From: Erin M Brent (emb900@cscgpo.anu.edu.au)
Date: Fri Dec 18 1992 - 07:33:48 CST


>
>>The host is a sun4 running 4.1.2 backing up hosts which are ditto. The
>>drive is a 2.3G exabyte.
>>
>>I am having trouble restoring data written using the following procedure :
>>
>>rsh <host> /etc/dump 0udbfs 54000 126 - 6000 /<partition> | dd bs=1024 conv=sync of=/dev/exabyte 2>&1
>>
>>If the dataset is small, the procedure appears to work, but if it is
>>large, the chances are that after reading part of the dataset, restore
>>becomes confused, and issues a variety of error messages indicating that
>>it is skipping blocks. It then wanders down the tape, and cannot find
>>the files required.
>>
>>The problem is unlikely to be hardware, as backing up the datasets with
>>rsh rdump to the same drive works. The procedure used the 'rsh dump
>>pipe dd' construction so that the server with the drive does not have
>>to trust the hosts.
>>
>>I can forward the details to anyone who is interested. Basically the
>>problem looks to be that when the data is piped over the network to the
>>exabyte it can be impossible to retrieve the data. If someone can answer
>>the following questions, perhaps I can find out why this procedure seems
>>to go wrong.
>>
>>Questions
>>What happens when data is piped across the network into dd ? How does
>>it decide that the block needs to be padded ?
>>
>>What does the exabyte do when there is a delay in the arrival of data.
>>Does it rewind, move back a few blocks, or what ?
>>
>>Is there likely to be a similar problem using utilities such as cpio to
>>a remote exabyte? I notice that tar has a 'B' option to force it to read
>>exactly enough bytes to read a block.

Replies
26 replies were received.
I have included a very short summary, and some discussion, as there was
considerable disagreement amongst respondents.

This summary covers 3 areas :

1: Interaction of dump/ pipe across network/dd to exabyte

2: How to recover files from a tape written in this way

3: Better ways of doing remote dumps.

Short Summary
It seems that 'dd' (or at least the version I am using) and its poor
documentation is the culprit. If you are relying on using remote
exabytes across the network with 'dd' in the pipe, it is probably best
to reconstitute the block sizes exactly using ibs and obs on the
receiving side to ensure that the data is not padded when a block is
fragmented in transit. 'tar' has a -B option to deal with fragmentation
on the network, but cpio does not, and could have similar problems to
the ones I had with 'dump'. There is no reliable way to remove padding
interpolated into the data stream by 'dd'.

1: Interaction of dump/ pipe across network/dd to exabyte

Procedure used was :
rsh <host> /etc/dump 0udbfs 54000 126 - 6000 /<partition> |\
        dd bs=1024 conv= sync of=/dev/exabyte 2>&1

Some people were puzzled as to why the procedure failed, as they used
similar schemes themselves. Opinion was divided as to whether an
inappropriate choice of block size written to tape, or a mismatch of
blocking factor used on the dump command, the pipe command, or the
output to the exabyte was the true cause of the problem.

Quote from Bill Tapley <tapley@sdxs01.llnl.gov>
>I'm not well versed in this subject, but I believe that a pipe will
>fragment blocks, regardless of the dd bs=2b, since the input to dd thru
>the pipe causes a problem. Solution is to write a little "buffered"
>dd, that reads till it gets a 1024 byte buffer full and then writes to
>exabyte. I think writes of smaller than 1024 byte blocks to the exabyte
>are illegal, and so you should have seen an error, if what I am saying
>caused a problem actually is the problem. I know that when I was
>backing up system over the network onto exabyte, I had to use a
>utility, bdd (buffered dd), from Delta Microsystems (Livermore,
>California - USA) to avoid the problem that you are seeing.

M.solda (msolda@lamont.ldgo.columbia.edu) said
>we had the same problem when we starting using exabytes from delta
>micro. the actual problem is the following as explained by delta micro:

        'a problem exists when using the dd command across a network or
        pipe. regardless of the block size specified on the dd command
        line, the data transfer is broken down into smaller blocks for
        transfer. Combined with the code in dd that acccepts the
        delivered data and pads the rest of the buffer with nulls, this
        action results in garbage begin introduced into the data
        stream.'

>hence, your dumps become unreadable

>delta micro probides a utility, bdd, that is just a wrapper around dd
>that does its own buffering so that dd receives a complete block without
>null padding.

>this became our answer. there may be other solutions that other people
>can provide. i would be interested in finding out what they are."

Several other people mentioned using bdd, and Steve Harris
(etnibsd!vsh@uunet.uu.net) provided documentation for bdd, and a
prototype bdd-style program.

Frank Allan (fallan@baobab.awadi.com.AU) said:
>we have a (commercial) product (Flashback) which uses pipes to
>backup systems over the network and their Tech Support
>people have told me that they use a blocking factor of 512
>to ensure they don't get this problem. I don't know why,
>but it appears to work, because we backup across the
>network every night and restoring, even from dumps of a 1Gb
>user partition works reliably using this blocking factor."

Per Hedeland (per@erix.ericsson.se) said of the script
>This is not appropriate - you should specify both ibs and obs to dd
>preferrably with the same blocksize as the one used for dump, and *not*
>conv=sync - this may cause dd to do NUL-padding, and the result most
>certainly won't be readable by restore. Note that specifying bs only
>instead of ibs and obs is not equivalent, despite what the manual says."

and in reply to
>>What happens when data is piped across the network into dd ? How does
>>it decide that the block needs to be padded ?

>It does a single read() of the requested blocksize - if less than the
> expected number of characters are received, the remainder is filled out
>with NULs. This will of course happen essentially whenever there isn't
>enough data available at the time of the read, which in turn depends on
>a number of things..."

Dave Mitchell <D.Mitchell@dcs.sheffield.ac.uk> said
>at a rough guess I'd say replace conv=sync with conv=block. Then dd
>will keep reading input until it gets a full block, then write it.
>With sync, dd reads in a partial block, pads it with 0's and writes it.
>This will completely screw up the dump. Also, your block size seems
>too small (so things will go slowly). I'm not sure what the max block
>size is for exabyte, but bs=126b is always a good 1st try."

Ulf Tropp <tropp@ce.chalmers.se> said in reply to
>>Is there likely to be a similar problem using utilities such as cpio
>>to a remote exabyte? I notice that tar has a 'B' option to force it to
>>read exactly enough bytes to read a block. Do other utilities do this
>>by default ?

>"You've got a clue. dd doesn't, by default. You want to
>use dd ibs=N obs=N ... where N=126 (in the above case). Some people
>suggest that N should be some multiple of 8(or 8K?) since that is that
>is what the exabyte writes in each helical scan. We don't bother. Note
>that in the days of 9 track 1/2" tapes N would have been 20 by
>dump/restore (and tar) convention and less than 128 due to ancient
>hardware not being able to write >=64K blocks. I think the latter is
>still true. Anyway, it seems safer to use rdump."

John P Linderman (jpl@allegra.att.com) said
>Although I am not morally certain about what will happen,
>I fear the bs=1024 on the near side only means that reads
>will take place in units of 1024. The writes, on the other hand,
>are taking place in units of 126 blocks. These, in turn, will
>be busted into smaller transfers over the net, and the potential
>for lost data is horrendous. You would do well to write in
>smaller units, let's say 10k, the dump default, run the output
>through dd ibs=10k obs=1k ON THE REMOTE, to ensure that no
>block larger than 1k goes across the net, then read the data
>on the local side with dd ibs=1024 obs=10k, to ensure that
>the original dump block size is reconstructed on the exabyte.

2: How to recover files from a tape written in this way

Several people offered suggestions as to how to get rid of the
interpolated padding in the data stream, but I had no success with any
of them. It seems hard to see how embedded nulls and padded nulls can
be distinguished - even by piping across the network with dd on either
side of the pipe.

Perry_Hutchison.Portland@xerox.com suggested
>"Possible recovery scheme:
>"SunOS dump is presumably similar to Berkeley Net-2 dump, so you might
>be able to read these tapes by hacking the Net-2 restore to allow for
>the presence of padding (which consists of an arbitrarily-long string
>of zeros ending on a 1Kb boundary). I have not had occasion to look
>at the source for restore, so do not know how difficult this would be.
>It will probably not be 100% reliable in any event, but may be of some
>use if these tapes contain important files which are otherwise lost."

3: Better Ways of Doing Remote Dumps
Many people criticised the use of the flaky pipe when a good alternative
(rdump) does the job better and faster.
 
Interestingly, the problem of retrieving data from the tape did not
arise until after I had checked a large data set and found problems with
the restore. The very next day someone deleted a directory.

The script was originally written to use "rsh dump pipe" so that the
dump host did not have to trust the servers. Several people pointed
out that remote dumps can be done to an account other than "root" on
the host with the exabyte.

Jay Lessert (bit!jayl@Sun.COM) suggested
>"BTW, have you thought about trying the "user@tapehost:/dev/exabyte"
>syntax? Tapehost must still trust remote root, but only mapped to a
>normal user (~user/.rhosts), not as root (/.rhosts). Perhaps worth
>another look, my experience has been that rmt is 30-50% faster than
>rsh/dd."

I found that the revised script (using rdump) ran in about 1/3 of the
time taken by the previous version.

Another solution is to enter the remote host in the /.rhosts file for
the duration of the rdump, and remove it after. (Thanks to Scott Babb
and Ian MacPhedran)

Some people used commercial products such as Flashback

Thanks to

Arie Bikker <aribi@geo.vu.nl>
Bill Tapley <tapley@sdxs01.llnl.gov>
Mike Raffety <miker@il.us.swissbank.com>
Mr T Crummey (DIJ) <tom@sees.bangor.ac.uk>
Perry_Hutchison.Portland@xerox.com
Ted Rodriguez-Bell <ted@ssl.Berkeley.EDU>
Ulf Tropp <tropp@ce.chalmers.se>
Vincent Everett <vincent.everett@mrc-applied-psychology.cambridge.ac.uk>
babb@k2.sanders.lockheed.com (Scott Babb)
bill%grape@uunet.UU.NET (Bill McSephney )
bit!jayl@Sun.COM (Jay Lessert)
etnibsd!vsh@uunet.uu.net (Steve Harris)
fallan@baobab.awadi.com.AU (Frank Allan (Network Mgr))
jpl@allegra.att.com (John P. Linderman)
macphed@dvinci.usask.ca (Ian MacPhedran)
matt@wbst845e.xerox.com (Matt Goheen)
mikey@ccs.carleton.ca (Mike G McFaul)
msolda@lamont.ldgo.columbia.edu (m solda)
per@erix.ericsson.se (Per Hedeland)
rpage@bigbrother.matrox.com (Real Page)
trdlnk!mike@uunet.uu.net (Michael Sullivan)
ups!upstage!glenn@fourx.Aus.Sun.COM (Glenn Satchell)



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:54 CDT