My original post was:
What is the proper procedure for replacing a failed disk in a Raid 5
configuration on a Sparc Storage Array. My volumes consist of 11
subdisks plus a log disk and I am using the Veritas volume management
software.
I want to simulate a failure in order to test my recovery procedures.
My idea would be to umount all of the volumes involved, initialize
one of the disks by creating a large partion on it and running newfs.
Then I would bring it on line as though it were a new disk. So my
question is how do I do this last step?
--------------------------------------------------------------------
I only received one response.
>From ottenber@mr.med.ge.com Tue May 7 14:53 EDT 1996
From: "Paul A. Ottenberg 4-6166 MR" <ottenber@mr.med.ge.com>
Date: Tue, 7 May 1996 14:02:10 +0600
To: ps4330@okc01.rb.jccbi.gov
Subject: Replacing failed disks on a Raid 5 Disk array
Peter:
backup that system before toasting anything....
highly recommend you review: http://www.columbia.edu/~marg/misc/ssa/
before simulating a failure.
paul.
-- ''' (o o) --------------------------o00--(_)--00o------------------------------- Paul A. Ottenberg | email : ottenbergp@med.ge.com EIS Admin Team Leader | voice : 414.521.6166 GE Medical Systems | fax : 414.521.6800 PO Box 414; Mail Stop: W832 | Milwaukee, WI 53201-0414 | ---------------------------------------------------------------------- I checked the web site at Columbia and there was some pretty scary stuff there about the Sun/Veritas implementation of RAID5. Nevertheless, I pushed ahead on the faith that my daily backups would bail me out if I lost everything.
Here is what I did:
1. I selected one physical disk to sacrifice, disk03. Before starting the process I used vxprint |grep disk03 to list all of the virtual disks which used disk03.
2. I umounted all of the virtual disks which used disk03.
3. Used format to create a single partition on disk03 (physical device /dev/rdsk/c1t0d2).
4. Used newfs to create a file system on c1t0d2s0, thus wiping out whatever information had been on this disk.
5. Mounted /u02, one of the virtual disks which uses disk03.
6. At this point vxvm sent me two email messages warning me that a hardware failure had occured on disk03, listing all of the affected drives. It also said that no hot spare was found (correct - I did not have any defined ) and that "apparently" no data had been lost.
7. At this point vxprint -l vol02 showed the "degraded" flag. (Same for all other virtual disks which use disk03).
8. I followed the instructions in the manual for "Replacing Physical Disks". (I umounted /u02)
In the gui interface:
Select the view for the disk group.
Basic Ops->Disk Operations->Replace Disks
9. vxprint -l vol02 still showed the degraded flag set.
10. Called Sun support.
11. Sun Support told me to use vxdiskadmin:
Select menu option "remove disk for replacement", specifying disk03.
Select menu option "replace disk" specifying disk03.
12. I mounted all of the disks which used disk03. One by one, the "degraded" flag dissappeared from the vxprint -l listing.
While this was going on, vxprint | grep disk03 looked like this:
dm disk03 c1t0d2s2 - 4152640 - - - - sd disk03-01 vol02-01 ENABLED 419520 0 - - - sd disk03-02 vol04-01 ENABLED 419520 0 - - - sd disk03-03 vol07-01 ENABLED 419520 0 - - - sd disk03-04 vol09-01 ENABLED 419520 0 - - - sd disk03-05 vol11-01 ENABLED 419520 0 - - - sd disk03-06 vol13-01 DETACHED 419520 0 RECOVER RECOV - sd disk03-07 vol15-01 ENABLED 419520 0 - - - sd disk03-08 vol18-01 ENABLED 419520 0 - - - sd disk03-09 vol20-01 ENABLED 419520 0 - - - sd disk03-10 vol22-01 ENABLED 376960 0 RECOVER RECOV -
so that I could track the recovery process.
13. During the recovery process, I got a message:
vxvm:vxvol: ERROR: Subdisk disk03-06 in plex vol13-01 is locked by another utility
When vol13 stayed in the status shown above for a long time, I rebooted. This seemed to clear whatever conflict the above message was talking about and vol13 finally completed its recovery process.
As a result of this exercise, I and my customer have considerably more confidence in the disk array as we have configured it as well as a better understanding of how the whole error detection/correction process works. I hope this was helpful.
Peter Schauss ps4330@okc01.rb.jccbi.gov Gull Electronic Systems Division Parker Hannifin Corporation Smithtown, NY
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:59 CDT