SUMMARY: boot fails as /usr fails to mount in 1st (read only) phase!

From: Adam Singer (regnis@worldnet.att.net)
Date: Tue Jun 08 1999 - 23:01:57 CDT


Dear Sun Managers

Thank you for the many thorough and informative replies to my several
queries to the list. Definitely got some good info (below). Usually
I wait a few days, but I have gotten so many replies and what I
believe to be the correct info, so first let me thank the following:

Kevin Sheehan {Consulting Poster Child; Robert.Rose; Birger Wathne;
Jaewan Kim; Matthew Stier; Frank Fiamingo;; Tim Evans; James Coby;
Bill Armand; Bunny Pfau; Scott R Kulp; Ira Kronitz; Danny Johnson;
Nagraj B; and Thomas Carter

The responses/explanations followed into the following categories:

1. truss the process

2. A messed up binary/file in /sbin, /etc/path_to_inst or /etc/vfstab
- use the stuff off the cdrom or another machine to replace all or
some of those contents

3. does the mount point exist and is /etc/vfstab ok (yes to both)

4.Tim Evans said this happened to him during an upgrade install of
Solaris 7 and a reinstall was only thing that helped. I break his
reply out because I *had* in fact done an upgrade a few weeks before
and possibly had not rebooted since then (though I rebooted several
times the day that I did the upgrade).

5. mirroring? Yes, Disksuite but only on an external disk pack, not
the boot disk.

6. comment out tmpfs and if it works, reload the O/S - was unable to
try this.

7. Did you reboot with a boot -r? The answer is no *but* often
programs will touch /reconfigure and it *is* possible I did not notice
it since I hadn't been anticipating a problem just bringing down a
server to remove a cdrom drive. So this is a strong possibility and
those who asked about this also mentioned:

8. M

7. Ira Kronitz, Jaewan Kim, Danny Johnson, and Nagraj B all
hit the nail on the head when they suggested, as Ira did, that "maybe
the /dev file for that filesystem device needs to be rebuilt. The
procedure below fixed it for me. Hope it helps."

I am including the SRDB below because I did not find it when I hit
Sunsolve! Thanks a lot, Ira, for that file. Nagraj B also gave me
the steps, but this doc seems most thorough. WARNING: I have not
tried these steps and so as always use caution, etc. I didn't have to
do them because I ended up reinstalling the O/S from scratch (my
preferred method but I was being lazy with this upgrade).

The only other item I wish to address was my side-question about the
diff between init 0 and shutdown. Bill Armand wrote
"Yes there is a difference between init 0 and shutdown. Some of your
applications have stop or shutdown scripts that key off of the
shutdown command and not the "K" /etc/rc* scripts...." I include this
for your illumination but am not sure I believe/agree with it. Will
have to research that one a bit more.

thanks again to everyone,
Adam Singer
HERE IS THE SRDB IRA KRONITZ FORWARDED:
>From SunSolve, the method of rebuilding the /devices and
/dev/directories is
as follows:

Symptoms and Resolutions article 17614

SRDB ID: 17614

SYNOPSIS: Unable to mount /usr

DETAIL DESCRIPTION:

During boot the following information is displayed:

ok boot /sbus@1f,0/SUNW,fas@2,8800000/sd@5,0:a
Boot device: /sbus@1f,0/SUNW,fas@2,8800000/sd@5,0:a File and args:
SunOS Release 5.5.1 Version Generic [UNIX(R) System V Release 4.0]
Copyright (c) 1983-1996, Sun Microsystems, Inc.
configuring network interfaces: le0.
Hostname: alviso
mount: /dev/dsk/c1t2d0s6 no such device
/sbin/swapadd: expr: not found
/sbin/swapadd: /usr/sbin/swap: not found

WARNING - /usr/sbin/fsck not found. Most likely the
mount of /usr failed or the /usr filesystem is badly
damaged. The system is being halted. Either reinstall
the system or boot with the -b option in an attempt
to recover.

syncing file systems... done
Program terminated
ok

SOLUTION SUMMARY:

This may be a result of moving a boot disk to a device location (SCSI
id)
that has not been defined in the /devices directory, or could happen
when a
'clone' is made following the directions in the Systems Administraion
Guide.
Please see the documentation at http:/docs.sun.com/ab2 for more
information
on cloning
disks.

   NOTE: A reconfiguration boot will not fix this condition

Two options are available.

A. Move the disk to a SCSI address that the user knows has been
defined in
the /devices directory. An example of this would be if the disk was
cloned
while installed at target 2 from a disk at target 0, both of those
targets
would be acceptable for this disk as they are fully 'defined'.

B. Create new /dev, /devices and path_to_inst entries based on the new
disk
location using the configuration generated by booting from cdrom.

   NOTE: This will only work if the architecture of the system is the
         same as that installed on the disk, for instance you can not
         move a disk from an Ultra2 to an Ultra450 as the two systems
         use different bus architectures. By the same token you could
         not move the Ultra2 disk to a SPARCstation20 as the hardware
         architectures are not compatible.

The steps to take when using option B are:

   ok boot cdrom -s
   ...
   # mount /dev/dsk/c0t3d0s0 /a <--- t3 is being used as an example
only
   # cd /tmp/dev
   # tar cvfp - . | ( cd /a/dev; tar xvfp - )
   # cd /tmp/devices
   # tar cvfp - . | ( cd /a/devices; tar xvfp - )
   # cd /tmp/root/etc
   # cp path_to_inst /a/etc/path_to_inst

   NOTE: The vfstab file may need to be edited at this time to make
any
         changes to the target address

   # cd /tmp/root; umount /a; halt
   ...
   ok boot -rv

PRODUCT AREA: System Administration
PRODUCT: Backups
SUNOS RELEASE: Solaris 2.x
HARDWARE: any

ORIGINAL POST:
>Dear Sun Managers,
>
>This is a very peculiar problem that neither Sun nor a day and a half
>of me hammering at it were able to resolve. I have a Sun Sparc 10
>that I upgraded about 3 weeks ago from 2.5.1 to 2.6 fully patched.
>The server acts as our Legato backup server. Well I brought down the
>server (using init 0 - is there really any diff between init 0 and
>shutdown -y -g 0 -i 0 other than the messages?) to use the external
>CDROM for something else and when I went back to bring up the system,
>the system would fail to boot.
>
>The exact error message is actually misleading as it says:
>/sbin/swapadd: /usr/sbin/swap not found....
>
>Then the system goes to a WARNING that there is a problem with the
>system and to either try boot -b or reinstall the O/S!
>
>Well the error is misleading because by inserting echo statements
>throughout some of the startup scripts I determined that it is
>failing from within rootusr (named S30rootuser.sh when in
>/etc/rcS.d). What happens is when the system gets to the line
>/sbin/mount -m -o ro /usr
>it fails silently.
>
>I know this is the failure point because I put an extra /sbin/mount to
>list what is mounted and /usr isn't while /, /proc, /tmp, and /fd are
>and on anther Sparc 10 running 2.6 this command does have /usr at this
>point. So I tried hard coding the mount, trying mount multiple times
>in this file, hard coding the device name (c0t3d0s3). These attempts
>only gave the rather uninformative standard error message of "/usr
>either exceeded its mount points, is not mounted, or is corrupted."
>I even swapped s3 and s6 thinking maybe it should be on s6 like it is
>on most systems. There was also a .mnttab_lock file in /etc that I
>deleted. *All* to no avail. And yes I was able to boot off cdrom and
>fsck and mount /, /usr, and /var all fine!
>
>I am really stumped as to why it is failing. Boot -v and -b all fail
>at the same point: you get the Sun message, then hostname is set then
>you get the /sbin/swapadd: /usr/sbin/swap not found and then the
>warning about being unable to boot and try boot -b or reinstall!
>
>Has anyone ever encountered such a strange situation where the disk
>(root) boots but it fail to mount /usr even though an fsck and mount
>work when booted off the cdrom. The error is so low-level fundamental
>that I am unable to go forwards or backwards and really really would
>like to figure this out instead of just reinstalling the O/S as the
>stumped 1st and 2nd level support guys as Sun suggested.
>
>thanks for any ideas
>
>Adam

email: regnis@worldnet.att.net



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:13:21 CDT