I received much useful advice and information from Peter Bauer

I learned from Peter:

1) The following can be used to find the cause of the error in the messages

          grep -n maint /var/adm/messages*

Just a few lines ahead of this should be the SCSI warning associated with
the disk error.

If you have the chance, go back in time by looking at /var/adm/messages* and
look for a
message like "WARNING: write error...". If your metadisk had an error during
a "read", 
your data is OK. If it was a write error, the original data was not
successfully written to disk, 
so you have faulty data.  Run a database consistency check.

2) Create a new submirror with

metainit d28 1 1 <newdisk>
metattach d8 d28

DON'T USE metattach -f (at this time).

The disks should automatically synchronize [which they did in my case].

After the resync is done, you will have a so-called "two-way mirror".  

3)  Detach the defective mirror half (using metadetach dX dY) or 
force the resync of the last-erred disk (metareplace -e c1t2d3s4).   
[I detached the defective mirror.]

4)  If possible, run an fsck on the metadisk after detaching the submirror
in the last-erred state.   This is so that if there was an error copied over
the new sub-mirror, fsck will find it.  You might use format to do an 
analyse->refresh and see if your disk still works, but it would be better to

replace the disk. It's usually cheaper than having unintended data 
modification or loss of service.

5) Make sure you have enough replica/state databases. They should be 
on at least three diffrent disks.

6) Some of the error messages in /var/adm/messages* indicated a 
problem with the SCSI bus. Since all problems occured on the same 
disk, it _might_ be a problem with the cabelling. Check that all cables 
are fitted and secured - if possible, and it also might be a good idea to 
remove the cables to that disk and re-install them. If the disk is in an 
enclosure (UniPack, MultiPack), you might want to open it, remove the 
disk and re-install it. It could be a non-perfect electrical connection.

Marian Russell

I inherited responsibility for a SUNBlade100 with Solaris 8 OS about a year
ago.  Someone had already set up mirroring for the / (d0), /var (d4), and
/export (d7) filesystems and, additionally, had created names for the two
large disks (d8 & d9, each RAID 1), but instead of making two sub-mirrors
for each of these, each has only one sub-mirror (d18 & d19, each RAID 0)
that is basically the size of the entire disk.

In the course of adding a new PCI SCSI card and disk pack to our system, I
set up the two new disks in the same manner as d8 and d9, and noticed that
d8 is showing Needs Maintenance and d18 is showing Last Erred.

This morning I took the system down to single user mode, unmounted d8 and
did an fsck.  There were no errors.  Then I ran metastat again and the
status was still the same.  Then I tried

metareplace -e d8 /dev/dsk/c1t1d0s7

and got the error:  attempt to replace a component on the last running

So I have a lot of questions (if you can answer even one of these, I would
be most grateful!):

1.  Is the setup of the disks with only one sub-mirror component okay with
the top level being RAID 1 and the one sub-component RAID 0?

2.  Since the two new disks have exactly the same configuration as the one
with the errors, could I set up a second sub-mirror temporarily that is one
of the new disks and somehow make it a mirror of d8 while I fix d8 somehow? 

3.  How do I fix d8 (this is like our main applications system disk and we
need it!) ?  

4.  Why doesn't fsck find any problems with it?

5.  Is this like a critical situation that needs to be remedied asap?  It
could be that the system has been in the state for some time (like weeks or
even months), because I didn't know that it was something I should be
looking at on a regular basis until now.

Thanks in advance for any help, advice, info you have time to provide!

Marian Russell

