SUMMARY: (sort of) DiskSuite seems to have severely broken a Solaris 8 host

From: Daniel Baldoni <>
Date: Sun Oct 02 2005 - 08:52:27 EDT
G'day folks,

Firstly, here's a snippet from my original posting (a few weeks back now,
but I wanted to wait and see what the final outcome was):

 >A client (and, as you might guess because of the local time I'm posting
 >this, a mate) is having severe problems with a Solaris 8 box, refusing to
 >boot.  He's getting a whole series of (for example)
 >"/kernel/misc/sparcv9/md_raid: undefined symbol md_unit_incopen" errors
 >(the errors are reported for each of the forceload'd "md_*" modules, with
 >many symbols listed for each).  The machine doesn't even successfully reach
 >single-user mode (the password prompt is displayed but the machine locks at
 >this point).
 > ...[text describing attempts to recover deleted]...
 >I don't have access to his boxes and I doubt I can solve his problem (I've
 >never seen anything like this, before) if I did.  From what I have been told,
 >his /etc/system file contains forceloads only for (forgive the "shell short
 >	misc/md_{hotspares,mirror,raid,sp,stripe,trans}
 >	drv/{dad,isp,pci_pci,pcipsy,sd,simba,uata}
 >The machine in question is an Ultra 10, with two internal IDE drives (one
 >of which appears to be severely dying, which is what led to all these
 >issues), >an internal CD-ROM, and 5 (or 6 - he couldn't tell me which) SCSI
 >drives (in >one of Sun's external enclosures).

Well, the machine has now been running for a couple of weeks, without any
mirrors.  What's really strange is that the sub-mirror (the OS still insists
on using a metadevice-based filesystem - and we're not willing to experiment
any more) in use is on the drive that was generating "bad-block" messages
(can you say "developing media faults"? <frown>).

The organisation in question have (finally, remember this is an *OLD* Ultra
10) decided to replace the machine with a Linux setup.

As for the RAID5 array, no recovery was required as the md replicas were
fine.  And, as was pointed out by Kev Smith, John Hudson and Dave Dunaway
(thanks to each of you, BTW), the raid's configuration information may be
available in (in this case, it was).  If not, go looking for the
setup docs for the nachine in question - they should contain the command
line arguments passed to metainit (they did - I set this box up for them
a few years ago <grin>).

So, to sum up, the machine is running but the underlying problem was side-
stepped rather than fixed.

To "complete" this summary, let me say that we (this time I was directly
involved) did try and convince the system that there was no SVM involvement
by commenting out all of the meta-related information from /etc/system on
one of the mirrors (the one we had already screwed up - we weren't willing
to "break" the other one as well).  All to no avail ... and, that is
something I really wish I could explain.  But, it got to the point where it
was going to cost more for the "after-hours emergency support" than it would
to simply replace the box in question.

Again, thanks to the gentlemen mentioned above.

Daniel Baldoni BAppSc, PGradDipCompSci                 |  Technical Director
require 'std/'                            |  LcdS Pty. Ltd.
-------------------------------------------------------+  856B Canning Hwy
Phone/FAX:  +61-8-9364-8171                            |  Applecross
Mobile:     041-888-9794                               |  WA 6153
URL:                    |  Australia
"Any time there's something so ridiculous that no rational systems programmer
  would even consider trying it, they send for me."; paraphrased from "King Of
  The Murgos" by David Eddings.  (I'm not good, just crazy)
sunmanagers mailing list
Received on Sun Oct 2 08:53:02 2005

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:52 EST