SUMMARY: RAID 5 with 22 disks

From: Rasana Atreya (atreya@library.ucsf.edu)
Date: Mon Jun 09 1997 - 13:39:23 CDT


Hi Managers,

Thanks to:

From: "Birger A. Wathne" <birger@Vest.Skrivervik.No>
From: vnarayan@haverford.edu (Vasantha Narayanan)
From: patesa@aur.alcatel.com (Sanjay Patel)
From: David Robson <robbo@box.net.au>
From: Jim Harmon <jharmon@telecnnct.com>

I had 22 disks, 2Gig each with which I wanted to set up RAID5. But I came
across a previous summary that said:

"Sun advised that more than 6 disks in a RAID 5 stripe was _bad_. Brian Wong's
paper suggested that we can create RAID 5 stripes, each with six disks, and
then _concatenate_ them together to make larger devices!"

        In Brian Wong's paper
        (http://www.sun.com/sunworldonline/swol-09-1995/swol-09-raid5.html)
        I found out the answer to my question: Why it is wise to limit the
        width of a parity RAID volume to no more than 6 disks:

        Suppose you have a 30 disk RAID with parity (RAID 3 or RAID 5), and
        one of them fails; a read would require 29 physical I/O operations -
        to recover the failed member's data! Writes to such a volume are also
        very expensive. He says that most array software permits the
        concatenation of multiple RAID volumes if larger capacity volumes are
        required.
 
        Jim Harmon thought that "limit" the paper talks about is the RAID
        controller, since a RAID controller (per channel) can only support 6
        drives on one chain. While it is true that the _controller_ can
        support only 6 devices, this is not what the paper was talking about
        (see above). I had these disks on 5 different channels.

        Jim went on to give some useful info about controllers:
        However, that is FAST NARROW SCSI. In a FAST WIDE SCSI system, it is
        theoretically possible to mount 256 drives and under the right
        management program, treat them all as one huge virtual drive.
                Typically, a 4-channel or 5-channel RAID Controller can easily
        control 32 -or more- drives, and under various levels of raid, can be
        configured as seperate drives, collections of drives, or one virtual
        drive. Mirroring, striping, hotswapping, etc. can all be mixed under
        the newer controllers.

        David Robson commented: With RAID 5, write access is considerably
        slower than normal and causes a significant system overhead
        continualy recalculating the parity. If a disk dies, the system will
        (should) hold up but performance will degrade further, and then when
        you recover the replacment disk it will take quite some time! You
        should also note that "growing" a RAID 5 metadisk is not recommended,
        which means if you have 4 disks in a RAID 5 device and try to add two
        more, performance may be reduced. this means you will have to back off
        your data and recreate the entire device! If you can afford the
        disks, concatenate and then mirror to gain redundancy (thats what Sun
        recommended to me).

        He's right about everything except the "write access is considerably
        slower than normal and causes a significant system overhead continualy
        recalculating the parity" part. Brian Wong's paper says that this is
        a common misconception. According to him "This process is commonly and
        erroneously thought to be the most expensive part of RAID-5 overhead,
        but parity computation consumes less than a millisecond, a figure
        dwarfed by the typical 3-15 millisecond service times for I/O to
        member disks.

My first question was if I could create 4 independent metadevices (RAID 5),
each with one hot spare and then mount each of the metadevices under a
different mount point.

        The anwer is yes. It is indeed possible. I went ahead and setup 4 RAID5
        metadevices each with 5 disks. But instead of asssociating each
        metadevice with a hotspare, (thanks to help from Sanjay Patel, Birger A.
        Wathne and Vasantha Narayanan) I created a hot spare pool with 2 hot
        spares in it. I associated the pool with each metadevice by indicating
        this in the md.tab file.

        Vasantha Narayanan wasn't sure if we could use a single disk as the
        hotspare for multiple metadevices. This is definitely possible. All
        you need to do is to indicate this in md.tab.

        Birger A. Wathne pointed out that I did not need one spare for each
        raid set (as was my original plan). He felt that 1 spare disk for all
        raid sets should be enough since raid5 sets can run with a disk failure.

        He also said: The rule is that you cannot survive two failed disks in
        the same RAID 5 set. With one hot spare the first disk failure gets the
        RAID set containing the failed disk in a critical situation only for a
        limited time. The file system cannot survive another hit while the hot
        spare is syncing up. But after that, you are ready to take minimum two
        more blows before you lose any file system.
                I have been told to expect 1 to 2 % failed disks each year in
        big disk farms. My own experience is that the failure rate for new
        disks is rather high the first months. So be very vigilant for the first
        2 months.

My second question was: If I create RAID 5 stripes, and then concatenate them
together to make larger devices, would this be a good thing to do? If so, how
would I do it?

        I ended up not doing any striping (I'm quitting this Friday, and I did
        not want to leave behind something I wasn't sure of).

        Sanjay Patel's suggestions:

        Hot spare pool -> 2 disks
        RAID 5 - 1 -> 6 disks
        RAID 5 - 2 -> 6 disks
        RAID 5 - 3 -> 5 disks
        RAID 5 - 4 -> 3 disks

        concat/stripe 1 -> contains RAID 5 - [ 1 thru 4 ]

        total disk space available (raw) will equal 16 x disk size
        note. 2.1 GB disks have a formated capacity of 1.8 GB.
        RAW = (16 * 1.8) = 28.8 GB

        attach the hot spare pool to all raid stripes.

        if your disks are hot swapable, then i would only have one hot
        spare and place the extra disk with the RAID 5 - 3 stripe.
        
        to create a concatentate/stripe of all the raid devices in disk suite,
        simply create an empty contactentate/stripe and place all of the raid
        devices you have previously created into the concatentate/stripe as if
        they were normal disks.
        
        a hot swapable disk is a a disk that can be unplugged while the system
        is running. most SSAs (110, 112, 114) are not truely hot swapable since
        an entire tray has to be removed to replace a disk. how swapable arrays
        include Netras, DiskPacks (the new type), and the RSM arrays.
        
        An example of concatenation in md.tab for Solaris 2.5.1 & DiskSuite 4.0:
        
        if you are starting this server from scratch, i would recommend you get
        SDS 4.1 (dont forget to to download the patches). if you dont have a
        copy and you need to use SDS 4.0:
        
        create the RAID 5 devices then to concatenate:
        
        /dev/md/dsk/d? 4 1 /dev/md/dsk/d? 1 /dev/md/dsk/d? 1 \
                /dev/md/dsk/d? 1 /dev/md/dsk/d?
        
        the first d? is the next available metadevice number
        the 4 is the number of items to concatenate followed by the devices that
        are part of the concatenate (ie. your raid stripes)
        
        in SDS 4.1, its all GUI, and its all point-click, drag & drop :->

---------------------------------------------------------------------------
Thanks much.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ Rasana Atreya Voice: (415) 476-3623 ~
~ System Administrator Fax: (415) 476-4653 ~
~ Library & Ctr for Knowledge Mgmt, Univ. of California at San Francisco ~
~ 530 Parnassus Ave, Box 0840, San Francisco, CA 94143-0840 ~
~ atreya@library.ucsf.edu ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:11:56 CDT