Apologies for the late summary - I had been hoping I would have a maintenance window to be 100% sure of the solution, but it has not worked out that way. However, I am assured by Sun that things should be back to normal ;) In the haze of 1:30 a.m. and not thinking clearly, I negelected to check the state of the metadevices. It turns out that all slices were in need of repair. According to metastat -t, this system had been in need of repair for some time. Unfortunately, no one had set up any of the suggested cron jobs to monitor for this issue. So, the system was stopping after the check of the metadevices and doing a metasync -r, as called for in lvm.sync script. (this check is ignored when booting into single-user mode). After bringing up the system from single-user, we ran metasync -r to sync up the filesystems and all appears to be well. The d30 volume is very large (over 30GB), and did take a LOT of time to sync -- which is apparently what was happening during a normal boot. Thanks to everyone who sent in a suggestion. (To the 13 of you who let me know that you were out of the office that day, I could have done without that information.) Darren Dunham and Jay Lessert sent along ideas of how to make the init process a bit more verbose, so that these types of issues can be easier to find. I can imagine the reaons Sun wants a less chatty boot process, but considering all of the things that can go wrong during boot, I wouldn't mind a bit more feedback while my systems were booting. From Darren: What I will do sometimes is to modify /sbin/rc2 temporarily. Keep a backup and add the two "echo" lines below in the appropriate place. if [ $_INIT_PREV_LEVEL != 2 -a $_INIT_PREV_LEVEL != 3 -a -d /etc/rc2.d ]; then for f in /etc/rc2.d/S*; do if [ -s $f ]; then echo "Starting to run $f" case $f in *.sh) . $f ;; *) /sbin/sh $f start ;; esac echo "Completed running $f" fi done fi Thanks again! jef Original Message: > Hello All, > > This past weekend, we applied the latest 8_Recommended cluster to an > E220R (which appeared to be an original Sol 8 install, and had never > been patched before - lucky me). After the installation and reboot, > the system hangs after checking the filesystems, i.e. > > ... > /dev/dsk/md/d20 is clean > /dev/dsk/md/d30 is stable > > and just stops here. The longest I let it go was probably 20 mintues, > just to see if it would eventually do anything. If we boot into > single-user mode, and start up all of the things we need by hand, > however, the system works just fine, as do all services. (it's a > Real/Helix streaming server). > > I'm guessing that there is probably an issue with an rc script, since > I can mount the file systems and start services by hand, including an > NFS mount. I'm not familiar enough with the boot sequence to know > exactly the route to take from rcS to rc2 (or even rc3) to have walked > through the required scripts. > > I don't know if this will help, but here is the vfstab, just in case > (and yes, I am also not a fan of these mount points, but I inherited > the box : > > fd - /dev/fd fd - no - > /proc - /proc proc - no - > /dev/dsk/c0t0d0s1 - - swap - no - > /dev/md/dsk/d0 /dev/md/rdsk/d0 / ufs 1 no - > /dev/md/dsk/d10 /dev/md/rdsk/d10 /var ufs 2 yes - > /dev/md/dsk/d20 /dev/md/rdsk/d20 /usr/local/ ufs > 3 yes - > /dev/md/dsk/d30 /dev/md/rdsk/d30 > /usr/local/RealServer/Content/ ufs 4 yes - > swap - /tmp tmpfs - yes - > nfs.host:/home - /home nfs - yes > soft,quota,bg > > I'm wondering if anyone might have an idea, based on where the boot is > hanging, which scripts I can check for problems. I realize that there > could be mounting issues with the /usr/local items if done out of > order - however, the boot sequence shows them being checked in order, > so I am assuming (incorrectly, maybe?) that they would be mounted in > that order. (and again, they mount fine by hand). > > Oh, I should also mention that it appears that some services are > starting, as the box will respond to a ping from a different subnet, > so it must be getting route/network. dmesg confirms this. So does this > indicate that parts of the system are hitting rc2.d/S69inet and > S72inetsvc? it never makes it to any of the other network related > services, tho, such as ssh or the helix server. > > It also shows a dump to swap that I am unsure about. > > Jun 7 23:04:36 nova genunix: [ID 936769 kern.info] hme0 is > /pci@1f,4000/network@1,1 > Jun 7 23:04:40 nova hme: [ID 517527 kern.info] SUNW,hme0 : Internal > Transceiver Selected. > Jun 7 23:04:40 nova hme: [ID 517527 kern.info] SUNW,hme0 : 100 Mbps > Full-Duplex Link Up > Jun 7 23:04:42 nova genunix: [ID 454863 kern.info] dump on > /dev/dsk/c0t0d0s1 size 1000 MB > > Any helpful pointers/suggestions/ideas appreciated. > > Thanks > jef _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Mon Jun 23 12:53:45 2003
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:15 EST