[Additional Summary] Recovery backups to slightly different hardware

From: Chris Hoogendyk <hoogendyk_at_bio.umass.edu>
Date: Wed Oct 25 2006 - 15:11:32 EDT
While the procedure I mentioned in my original summary (at end of
message) worked; there were, and still are, some caveats. With
additional input from Derek Smythe and Noel Milton Vega, and a gob more
testing on my part, I thought I would post a complete procedure. This
allows me to take ufsdump backup tapes from one system to another when
the first has failed and I don't have an identical machine to recover
to. In my case I'm planning for a situation where I might have to
recover a Sun Blade 100 to a Sun Enterprise 250 (of which I have several
hand-me-downs in waiting).



I'm using Solaris 9. With 10, I understand, I would need to do a
"root_archive" (see man page).

Many of the variations I tried with devfsadm to rebuild the device tree
ended up with the internal drives being c1 rather than c0. The procedure
below simply works. When I did it with -v for verbose, I was astonished
at how much it did.

Both machines (the "dead" and the replacement) are sun4u platform with
UltraSPARC II CPU's. I don't know how this procedure would work if the
differences were more extreme.

I brought the recovered system up without a network connection. I could
see that the system booted, that the expected applications came up, and
that the console messages were as expected. The applications were doing
their regular things, trying to punch their way out of the box, and
complaining they had no network. If I had given it a network connection,
it would have played havoc with my network.

Those who have the infrastructure set up might use flash archives to
accomplish what I am doing here. I'm using hand-me-down machines and
still lobbying for a tape robot so that I can centralize backups.



Dead machine. Have backups from ufsdump.

Set up replacement machine with appropriate tape drive, CD, and boot
drive that can be reformatted.

Boot off Solaris 9 Install CDROM, choose language, when it asks about
formatting drives, quit. This gets you to a unix prompt.

# format

choose the boot drive, and format/partition it according to the boot
drive on the original machine. It is important that you have kept such
information as part of your backups -- a printout of /etc/vfstab, `df
-k`, and a printout of the partition table. The drive I had was actually
bigger than the one I was replacing, so I had extra space. I just made
sure the relevant partitions large enough.

# newfs /dev/dsk/c0t0d0s0

set up a file system on whichever partitions you are going to need to
recover from the tapes.

put the tape in.

# mount /dev/dsk/c0t0d0s0 /mnt

mount the partition you are going to recover.

# cd /mnt

get into the partition where you are going to recover.

# rm -r lost+found

don't know that that is really necessary, but it eliminates an error
complaint when the tape tries to recover that.

# mt status

make sure you are at the right position for the recovery. my tapes have
several "files" per tape corresponding to the partitions on the boot
drive. It is important to have a copy of the backup script or command
that was used to make the tape so that you know exactly what is on it
and in what order. This should have been done in advance, since the
machine is now dead.

# ufsrestore rf /dev/rmt/0n

do the recovery. From a console, verbose mode can actually slow it down,
so I leave that off. I use the no-rewind device "0n" so that I can then
grab the next partition off the same tape.

# ls

check to make sure you have it.

# cd /
# umount /mnt

repeat the above (from the mount down to the umount) for each partition
that needs to be recovered. Be sure to check the mt status and ls to
make sure you have what you think you should have.

# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t0d0s0

get your boot blocks in. Worth noting here that, although I have
documented things with s0 as the root partition, I actually had my
system set up with s3 as the root partition. I installed the boot blocks
on s3.

now, because the hardware is actually somewhat different (e.g. scsi
devices rather than ide), some changes have to be made on some
information in the root partition.

# mount /dev/dsk/c0t0d0s0 /mnt

# mv /mnt/etc/hostname.eri0 /mnt/etc/hostname.hme0

this was specific to my switch of systems. others may differ. the Sun
Blade uses eri for the network interface, whereas the E250 uses hme. I
also had 10 virtual interfaces on this machine, so I had to repeat the
above command that many times, modified each time like:

# mv /mnt/etc/hostname.eri0:1 /mnt/etc/hostname.hme0:1

then toss and rebuild the device tree:

# rm /mnt/etc/path_to_inst

# rm -r /mnt/dev

# rm -r /mnt/devices

# /usr/sbin/devfsadm -r /mnt -p /mnt/etc/path_to_inst

if you put the -v on the end, it will scroll off things it is adding for
5 minutes or more.

it is of interest that I didn't find the -p option in my man pages. I
knew it wasn't doing it, and I kept trying slightly different things.
Finally I found a page describing veritas recovery that mentioned the
-p. It worked. I had also tried a variety of alternatives that were more
selective about the rm. I kept ending up with my internal drives being
c1 rather than c0. Complete removal of the device tree got rid of that
problem and worked. I actually newfs'd this partition and started from
scratch several times, just to make sure I had a straight through clean
procedure and wasn't muddying things up with multiple retries.

# touch /mnt/reconfigure

just for good measure. it can't hurt really.

# umount /mnt

Now, at this point, if the eeprom is in order, I could just reboot, and
I would be in business.

However, I have had issues at one time or another with inherited
machines having settings that got me in trouble in one way or another.
So, ...

# stop-a   (stop key, a)

{1} ok printenv

check all the eeprom settings. in particular, I want

{1} ok setenv auto-boot? true

{1} ok setenv boot-device disk

remember, I said I had my root partition on s3? to make that work, I
needed boot-device disk:d rather than disk.

{1} ok setenv diag-switch? false

that's just to keep it from going to diagnostics and then booting to
diag-device, which is net.

{1} ok boot disk

or, I would say "boot disk:d" to boot off s3 as the root partition.

also, note that these settings are particular to my hardware. depending
on the hardware you are setting up on, this may differ.

at this point, I should be in business.


now, since I had to do all those virtual interfaces, and the rebuilding
of the device tree, I decided to put that in a script and store it on
the machine that "might die" so that it would be on the backup tapes I
would be recovering from. I put it in /etc/rebuild. It contains some
documentation of the rebuild process and the following executable lines
(all with absolute paths to get rid of the risk of someone executing the
script in the wrong situation). I figured this might save me some
typing, and possible typos, some day when I'm in panic mode.


    mv /mnt/etc/hostname.eri0 /mnt/etc/hostname.hme0
    mv /mnt/etc/hostname.eri0:1 /mnt/etc/hostname.hme0:1
    mv /mnt/etc/hostname.eri0:2 /mnt/etc/hostname.hme0:2
    mv /mnt/etc/hostname.eri0:3 /mnt/etc/hostname.hme0:3
    mv /mnt/etc/hostname.eri0:4 /mnt/etc/hostname.hme0:4
    mv /mnt/etc/hostname.eri0:5 /mnt/etc/hostname.hme0:5
    mv /mnt/etc/hostname.eri0:6 /mnt/etc/hostname.hme0:6
    mv /mnt/etc/hostname.eri0:7 /mnt/etc/hostname.hme0:7
    mv /mnt/etc/hostname.eri0:8 /mnt/etc/hostname.hme0:8
    mv /mnt/etc/hostname.eri0:9 /mnt/etc/hostname.hme0:9
    mv /mnt/etc/hostname.eri0:10 /mnt/etc/hostname.hme0:10

    rm /mnt/etc/path_to_inst
    rm -r /mnt/dev
    rm -r /mnt/devices

    /usr/sbin/devfsadm -r /mnt -p /mnt/etc/path_to_inst

    touch /mnt/reconfigure

Then, I could sidestep all that typing by doing:

# mount /dev/dsk/c0t0d0s0 /mnt

# /mnt/etc/recover

# umount /mnt

That's it.


Chris Hoogendyk

   O__  ---- Systems Administrator
  c/ /'_ --- Biology & Geology Departments
 (*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst 



Erdvs 4

-------- Original Message --------
Subject: 	[Summary] Recovery backups to slightly different hardware
Date: 	Wed, 18 Oct 2006 16:54:38 -0400
From: 	Chris Hoogendyk <hoogendyk@bio.umass.edu>
To: 	Sun Managers List <sunmanagers@sunmanagers.org>
References: 	<45365B4A.6040905@bio.umass.edu>

Thanks to everyone. [Original message at bottom.]

Essentially all had the same suggestion with slight variants -- Karl
Rossing from Federated Ins., CA; Claude Charest from Hydro-Quebec, CA;
Steve Beuttel from cox.net; Francisco from Ann Arbor, MI, US
(www.blackant.net); Michael Maciolek from world.std.com; Stan
Pietkiewicz from Statistics Canada; and Christopher Manly from Cornell

I used Steve's suggestion, because he provided step by step detail that
accounted for idiosyncrasies of copying device trees:

     Assuming you're booted from the CD, and your "/" is mounted on
"/a", try:

     "cd /a"
     "mv dev <yymmdd>_dev"
     "mv devices <yymmdd_devices"
     "mkdir dev devices"
     "chmod 755 dev devices"
     "chown root:sys dev devices"
     "cd /dev; find . -depth -print | cpio -pdm /a/dev"
     "cd /devices; find . -depth -print | cpio -pdm /a/devices"
     "cd /a/etc"
     "mv path_to_inst <yymmdd_path_to_inst"
     "cp -p /etc/path_to_inst /a/etc/path_to_inst"

     Then reboot.


Others suggested using devfsadm. I should probably look into that for
the future. However, Steve's method worked. I also did a touch
/a/etc/reconfigure for good measure.


Chris Hoogendyk

   O__  ---- Systems Administrator
  c/ /'_ --- Biology & Geology Departments
 (*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst 



Erdvs 4

Chris Hoogendyk wrote:
> I have been trying to do a proof of concept and document the details to
> recover one of our critical servers in case it fails for some reason.
> (Just last month we had a building wide power snafu that caused untold
> $$$ damage. My servers survived, but the event instilled the fear of
> God, so to speak.) The server in question is a Sun Blade 100 (yeah, I
> know, it's not a Server) that is running our name services and for our
> internal network. If it goes down, the network starts falling apart.
> Anyway, most of our departmental servers are E250's, and we happen to
> have a few extra E250's for backup.
> Both of these systems are sun4u and we are running Solaris 9. I have
> backup tapes that are done using ufsdump from an fssnap snapshot piped
> through ssh to a remote tape drive on another server. I've used these to
> recover files and directories, but never had to do a full recovery. So,
> I figured I would grab a backup tape, a spare E250, plop some drives in
> it, and try to do a recovery.
> I started out by booting off the Solaris 9 install CD, formatting and
> partitioning c0t0d0 to match the boot drive on the Sun Blade, and then
> doing newfs and recovering all the partitions from the backup tape using
> ufsrestore. Everything seems to be there. I went into /mnt/etc and did
> `mv hostname.eri0 hostname.hme0` for each of the interfaces, 'cause I
> knew that would hit. Then I did the installboot, got back to the OK
> prompt and did a `boot disk:d` (that's where the root partition is). It
> goes through all it's stuff and finishes up with:
> -----------------------------
> Rebooting with command: boot disk:d
> Boot device: /pci@1f,4000/scsi@3/disk@0,0:d  File and args:
> Loading ufs-file-system package 1.4 04 Aug 1995 13:02:54.
> FCode UFS Reader 1.12 00/07/17 15:48:16.
> Loading: /platform/SUNW,Ultra-250/ufsboot
> SunOS Release 5.9 Version Generic_118558-03
> 64-bit|\-/|\-/|\-/|\-/|\-/|\-/|\-/|\-/|\-/|\-/|\-/|\-/|\-/
> Copyright 1983-2003 Sun Microsystems, Inc.  All rights reserved.
> Use is subject to license terms.
> WARNING: status 'fail' for '/rsc'-/|\-/|\-/
> configuring IPv4 interfaces: hme0 hme0:1 hme0:10 hme0:2 hme0:3 hme0:4
> hme0:5 hme0:6 hme0:7 hme0:8 hme0:9.
> Hostname: pilot
> /dev/dsk/c0t0d0s1: No such device or address
> The / file system (/dev/rdsk/c0t0d0s3) is being checked.
> Can't open /dev/rdsk/c0t0d0s3
> /dev/rdsk/c0t0d0s3: CAN'T CHECK FILE SYSTEM.
> WARNING - Unable to repair the / filesystem. Run fsck
> manually (fsck -F ufs /dev/rdsk/c0t0d0s3). Exit the shell when
> done to continue the boot process.
> Type control-d to proceed with normal startup,
> (or give root password for system maintenance):
> -----------------------------
> When I went in and tried `format`, it said "no disks found".
> I rebooted off the cdrom, did `format`, and they are there.
> I actually did 2 more things in the process of debugging and getting to
> this point.
> I did `mount /dev/dsk/c0t0d0s3 /mnt`, went into /mnt/etc and did a
> `touch reconfigure`.
> I also went into /mnt/platform/SUNW,Ultra-250 and didn't find a "unix",
> whereas I did find it in /mnt/platform/sun4u. So, I did `mv
> SUNW,Ultra-250 SUNW,Ultra-250.orig` followed by a `ln -s sun4u
> SUNW,Ultra-250`. This got me past an earlier error, ... I think.
> So, now I'm stuck and not quite sure whether this is impossible or I'm
> just missing the magic trick. I thought since they were both UltraSPARC
> and sun4u that I would be able to do it. Any suggestions or insight
> would be much appreciated.
> ---------------
> Chris Hoogendyk
> -
>    O__  ---- Systems Administrator
>   c/ /'_ --- Biology & Geology Departments
>  (*) \(*) -- 140 Morrill Science Center
> ~~~~~~~~~~ - University of Massachusetts, Amherst 
> <hoogendyk@bio.umass.edu>
> --------------- 
> Erdvs 4
> _______________________________________________
> sunmanagers mailing list
> sunmanagers@sunmanagers.org
> http://www.sunmanagers.org/mailman/listinfo/sunmanagers
sunmanagers mailing list
sunmanagers mailing list
Received on Wed Oct 25 15:12:05 2006

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:44:02 EST