SUMMARY: Solaris volume manager -- weird metasync

From: Anshuman Kanwar <anshuman_at_expertcity.com> Date: Fri May 30 2003 - 13:56:23 EDT · This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:11 EST

Hi Managers,

I was not able to get an answer from this list so I opened a case with Sun.
It seems they have at least 3 internal bugs open regarding Solaris Volume
manager in RAID 5 configurations. One of their suggestions actually fixed
the problem:

	# mv /usr/lib/drv/preen_md.so.1 /usr/lib/drv/preen_md.so.1.old
	# reboot

I had rebuilt the machine earlier just to check if the error was
reproducable or not. It was. 

Strangely enough the bug appeared only after I created the RAID5 volume.
SImply mirroring the boot disk (and rebooting) worked fine. Seems to be a
combination of fsck and volume manager that caused the "wait: No child
processes". 

Though it is resolved I still do not understand why.

Thanks,
Anshuman Kanwar   
Unix SysAdmin
Expertcity Inc.
--
(805) 690-5714   [off]   ansh@expertcity.com
(805) 895-4231   [cel]   5385 Hollister Ave 
(805) 690-6471   [fax]   Goleta, CA.  93111

> -----Original Message-----
> From: Anshuman Kanwar 
> Sent: Monday, May 19, 2003 4:39 PM
> To: 'sunmanagers@sunmanagers.org'
> Subject: Solaris volume manager -- weird metasync
> 
> 
> Hi Managers,
> 
> I set up disk mirroring on a 420R. It has 2 internal drives 
> (c0t0 and c0t1) and is connected to 11 drives in a A5200 
> (c1t0 through c1t10).
> 
> The internal disks are mirrored and the external disks are 
> configured as a raid 5 volume with one of the disks as a standby.
> 
> Every seems to work correctly till I boot, at which point 
> this happens:
> 
> Rebooting with command: boot                                          
> Boot device: disk  File and args: 
> SunOS Release 5.9 Version Generic_112233-04 64-bit
> Copyright 1983-2002 Sun Microsystems, Inc.  All rights reserved.
> Use is subject to license terms.
> WARNING: forceload of misc/md_trans failed
> WARNING: forceload of misc/md_sp failed
> configuring IPv4 interfaces: hme0.
> Hostname: e420-1.sjc
> The system is coming up.  Please wait.
> checking ufs filesystems
> /dev/md/rdsk/d5: is clean.
> wait: No child processes
> 
> WARNING - Unable to repair one or more filesystems.
> Run fsck manually (fsck filesystem...).
> Exit the shell when done to continue the boot process.
> 
> 
> Type control-d to proceed with normal startup,
> (or give root password for system maintenance): 
> single-user privilege assigned to /dev/console.
> Entering System Maintenance Mode
> 
> May 19 15:01:42 su: 'su root' succeeded for root on /dev/console
> 
> e420-1.sjc#metastat
> d8: Mirror
>     Submirror 0: d10
>       State: Needs maintenance 
>     Submirror 1: d20
>       State: Needs maintenance 
>     Pass: 1
>     Read option: roundrobin (default)
>     Write option: parallel (default)
>     Size: 4096602 blocks
> 
> d10: Submirror of d8
>     State: Needs maintenance 
>     Invoke: metasync d8
>     Size: 4096602 blocks
>     Stripe 0:
>         Device     Start Block  Dbase        State Reloc Hot Spare
>         c0t0d0s0          0     No            Okay   Yes 
> 
> 
> d20: Submirror of d8
>     State: Needs maintenance 
>     Invoke: metasync d8
>     Size: 4096602 blocks
>     Stripe 0:
>         Device     Start Block  Dbase        State Reloc Hot Spare
>         c0t1d0s0          0     No            Okay   Yes 
> 
> 
> d4: Mirror
>     Submirror 0: d14
>       State: Needs maintenance 
>     Submirror 1: d24
>       State: Needs maintenance 
>     Pass: 1
>     Read option: roundrobin (default)
>     Write option: parallel (default)
>     Size: 4096602 blocks
> 
> d14: Submirror of d4
>     State: Needs maintenance 
>     Invoke: metasync d4
>     Size: 4096602 blocks
>     Stripe 0:
>         Device     Start Block  Dbase        State Reloc Hot Spare
>         c0t0d0s4          0     No            Okay   Yes 
> 
> 
> d24: Submirror of d4
>     State: Needs maintenance 
>     Invoke: metasync d4
>     Size: 4096602 blocks
>     Stripe 0:
>         Device     Start Block  Dbase        State Reloc Hot Spare
>         c0t1d0s4          0     No            Okay   Yes 
> 
> 
> d1: Mirror
>     Submirror 0: d11
>       State: Needs maintenance 
>     Submirror 1: d21
>       State: Needs maintenance 
>     Pass: 1
>     Read option: roundrobin (default)
>     Write option: parallel (default)
>     Size: 8193204 blocks
> 
> d11: Submirror of d1
>     State: Needs maintenance 
>     Invoke: metasync d1
>     Size: 8193204 blocks
>     Stripe 0:
>         Device     Start Block  Dbase        State Reloc Hot Spare
>         c0t0d0s1          0     No            Okay   Yes 
> 
> 
> d21: Submirror of d1
>     State: Needs maintenance 
>     Invoke: metasync d1
>     Size: 8193204 blocks
>     Stripe 0:
>         Device     Start Block  Dbase        State Reloc Hot Spare
>         c0t1d0s1          0     No            Okay   Yes 
> 
> 
> d5: Mirror
>     Submirror 0: d15
>       State: Needs maintenance 
>     Submirror 1: d25
>       State: Needs maintenance 
>     Pass: 1
>     Read option: roundrobin (default)
>     Write option: parallel (default)
>     Size: 54330534 blocks
> 
> d15: Submirror of d5
>     State: Needs maintenance 
>     Invoke: metasync d5
>     Size: 54330534 blocks
>     Stripe 0:
>         Device     Start Block  Dbase        State Reloc Hot Spare
>         c0t0d0s5          0     No            Okay   Yes 
> 
> 
> d25: Submirror of d5
>     State: Needs maintenance 
>     Invoke: metasync d5
>     Size: 54330534 blocks
>     Stripe 0:
>         Device     Start Block  Dbase        State Reloc Hot Spare
>         c0t1d0s5          0     No            Okay   Yes 
> 
> 
> d9: RAID
>     State: Okay         
>     Hot spare pool: hsp001
>     Interlace: 32 blocks
>     Size: 315892480 blocks
> Original device:
>     Size: 315893952 blocks
>         Device     Start Block  Dbase        State Reloc  Hot Spare
>         c1t0d0s0       5042        No         Okay   Yes 
>         c1t1d0s0       5042        No         Okay   Yes 
>         c1t2d0s0       5042        No         Okay   Yes 
>         c1t3d0s0       5042        No         Okay   Yes 
>         c1t4d0s0       5042        No         Okay   Yes 
>         c1t5d0s0       5042        No         Okay   Yes 
>         c1t6d0s0       5042        No         Okay   Yes 
>         c1t7d0s0       5042        No         Okay   Yes 
>         c1t8d0s0       5042        No         Okay   Yes 
>         c1t9d0s0       5042        No         Okay   Yes 
> 
> hsp001: 1 hot spare
>         Device      Status      Length          Reloc
>         c1t10d0s0   Available    35104400 blocks        Yes
> 
> Device Relocation Information:
> Device    Reloc Device ID
> c0t1d0    Yes   id1,sd@SFUJITSU_MAJ3364M_SUN36G_01M41510____
> c0t0d0    Yes   id1,sd@SSEAGATE_ST336704LSUN36G_3CD1PPV2000071306LU5
> c1t10d0   Yes   id1,ssd@w20000020375b0eac
> 
> 
> e420-1.sjc#metadb
>         flags           first blk       block count
>      a m  p  luo        16              8192            
> /dev/dsk/c0t0d0s7
>      a    p  luo        16              8192            
> /dev/dsk/c0t1d0s7
>      a    p  luo        8208            8192            
> /dev/dsk/c0t1d0s7
> 
> 
> If I do this:
> 
> 
> bash-2.05# metasync d1
> bash-2.05# metasync d4
> bash-2.05# metasync d5
> bash-2.05# metasync d8
> bash-2.05# exit
> exit
> resuming mountall
> 
> 
> then the machine boots and mounts all the file systems 
> correctly. I have tried creating metadbs on seperate slices 
> (unused), the number and location of db's does not seem to 
> make any difference in this behavior. 
> 
> We had this identical problem with  280R, but had to reformat 
> and reinstall without adequate investigation.
> 
> Any ideas what might be wrong ? Is this a known issue ?
> 
> Thanks,
> Anshuman Kanwar   
> Unix SysAdmin
> Expertcity Inc.
> --
> (805) 690-5714   [off]   ansh@expertcity.com
> (805) 895-4231   [cel]   5385 Hollister Ave 
> (805) 690-6471   [fax]   Goleta, CA.  93111
> 
> 
> 
> 
> --------prtdiag-------------
> 
> e420-1.sjc#prtdiag
> System Configuration:  Sun Microsystems  sun4u Sun Enterprise 
> 420R (4 X UltraSPARC-II 450MHz)
> System clock frequency: 113 MHz
> Memory size: 4096 Megabytes
> 
> ========================= CPUs =========================
> 
>                     Run   Ecache   CPU    CPU
> Brd  CPU   Module   MHz     MB    Impl.   Mask
> ---  ---  -------  -----  ------  ------  ----
>  0     0     0      450     4.0   US-II    10.0
>  0     1     1      450     4.0   US-II    10.0
>  0     2     2      450     4.0   US-II    10.0
>  0     3     3      450     4.0   US-II    10.0
> 
> 
> ========================= IO Cards =========================
> 
>      Bus   Freq
> Brd  Type  MHz   Slot        Name                          Model
> ---  ----  ----  ----------  ----------------------------  
> --------------------
>  0   PCI    33     On-Board  network-SUNW,hme                 
>                 
>  0   PCI    33     On-Board  scsi-glm/disk (block)         
> Symbios,53C875     
>  0   PCI    33     On-Board  scsi-glm/disk (block)         
> Symbios,53C875     
>  0   PCI    33        PCI 2  SUNW,hme-pci108e,1001         
> SUNW,qsi-cheerio   
>  0   PCI    33     PCI 1 66  scsi-pci1077,2100.1077.1.4       
>                 
> 
> No failures found in System
> ===========================
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers