Hi Managers, I was not able to get an answer from this list so I opened a case with Sun. It seems they have at least 3 internal bugs open regarding Solaris Volume manager in RAID 5 configurations. One of their suggestions actually fixed the problem: # mv /usr/lib/drv/preen_md.so.1 /usr/lib/drv/preen_md.so.1.old # reboot I had rebuilt the machine earlier just to check if the error was reproducable or not. It was. Strangely enough the bug appeared only after I created the RAID5 volume. SImply mirroring the boot disk (and rebooting) worked fine. Seems to be a combination of fsck and volume manager that caused the "wait: No child processes". Though it is resolved I still do not understand why. Thanks, Anshuman Kanwar Unix SysAdmin Expertcity Inc. -- (805) 690-5714 [off] ansh@expertcity.com (805) 895-4231 [cel] 5385 Hollister Ave (805) 690-6471 [fax] Goleta, CA. 93111 > -----Original Message----- > From: Anshuman Kanwar > Sent: Monday, May 19, 2003 4:39 PM > To: 'sunmanagers@sunmanagers.org' > Subject: Solaris volume manager -- weird metasync > > > Hi Managers, > > I set up disk mirroring on a 420R. It has 2 internal drives > (c0t0 and c0t1) and is connected to 11 drives in a A5200 > (c1t0 through c1t10). > > The internal disks are mirrored and the external disks are > configured as a raid 5 volume with one of the disks as a standby. > > Every seems to work correctly till I boot, at which point > this happens: > > Rebooting with command: boot > Boot device: disk File and args: > SunOS Release 5.9 Version Generic_112233-04 64-bit > Copyright 1983-2002 Sun Microsystems, Inc. All rights reserved. > Use is subject to license terms. > WARNING: forceload of misc/md_trans failed > WARNING: forceload of misc/md_sp failed > configuring IPv4 interfaces: hme0. > Hostname: e420-1.sjc > The system is coming up. Please wait. > checking ufs filesystems > /dev/md/rdsk/d5: is clean. > wait: No child processes > > WARNING - Unable to repair one or more filesystems. > Run fsck manually (fsck filesystem...). > Exit the shell when done to continue the boot process. > > > Type control-d to proceed with normal startup, > (or give root password for system maintenance): > single-user privilege assigned to /dev/console. > Entering System Maintenance Mode > > May 19 15:01:42 su: 'su root' succeeded for root on /dev/console > > e420-1.sjc#metastat > d8: Mirror > Submirror 0: d10 > State: Needs maintenance > Submirror 1: d20 > State: Needs maintenance > Pass: 1 > Read option: roundrobin (default) > Write option: parallel (default) > Size: 4096602 blocks > > d10: Submirror of d8 > State: Needs maintenance > Invoke: metasync d8 > Size: 4096602 blocks > Stripe 0: > Device Start Block Dbase State Reloc Hot Spare > c0t0d0s0 0 No Okay Yes > > > d20: Submirror of d8 > State: Needs maintenance > Invoke: metasync d8 > Size: 4096602 blocks > Stripe 0: > Device Start Block Dbase State Reloc Hot Spare > c0t1d0s0 0 No Okay Yes > > > d4: Mirror > Submirror 0: d14 > State: Needs maintenance > Submirror 1: d24 > State: Needs maintenance > Pass: 1 > Read option: roundrobin (default) > Write option: parallel (default) > Size: 4096602 blocks > > d14: Submirror of d4 > State: Needs maintenance > Invoke: metasync d4 > Size: 4096602 blocks > Stripe 0: > Device Start Block Dbase State Reloc Hot Spare > c0t0d0s4 0 No Okay Yes > > > d24: Submirror of d4 > State: Needs maintenance > Invoke: metasync d4 > Size: 4096602 blocks > Stripe 0: > Device Start Block Dbase State Reloc Hot Spare > c0t1d0s4 0 No Okay Yes > > > d1: Mirror > Submirror 0: d11 > State: Needs maintenance > Submirror 1: d21 > State: Needs maintenance > Pass: 1 > Read option: roundrobin (default) > Write option: parallel (default) > Size: 8193204 blocks > > d11: Submirror of d1 > State: Needs maintenance > Invoke: metasync d1 > Size: 8193204 blocks > Stripe 0: > Device Start Block Dbase State Reloc Hot Spare > c0t0d0s1 0 No Okay Yes > > > d21: Submirror of d1 > State: Needs maintenance > Invoke: metasync d1 > Size: 8193204 blocks > Stripe 0: > Device Start Block Dbase State Reloc Hot Spare > c0t1d0s1 0 No Okay Yes > > > d5: Mirror > Submirror 0: d15 > State: Needs maintenance > Submirror 1: d25 > State: Needs maintenance > Pass: 1 > Read option: roundrobin (default) > Write option: parallel (default) > Size: 54330534 blocks > > d15: Submirror of d5 > State: Needs maintenance > Invoke: metasync d5 > Size: 54330534 blocks > Stripe 0: > Device Start Block Dbase State Reloc Hot Spare > c0t0d0s5 0 No Okay Yes > > > d25: Submirror of d5 > State: Needs maintenance > Invoke: metasync d5 > Size: 54330534 blocks > Stripe 0: > Device Start Block Dbase State Reloc Hot Spare > c0t1d0s5 0 No Okay Yes > > > d9: RAID > State: Okay > Hot spare pool: hsp001 > Interlace: 32 blocks > Size: 315892480 blocks > Original device: > Size: 315893952 blocks > Device Start Block Dbase State Reloc Hot Spare > c1t0d0s0 5042 No Okay Yes > c1t1d0s0 5042 No Okay Yes > c1t2d0s0 5042 No Okay Yes > c1t3d0s0 5042 No Okay Yes > c1t4d0s0 5042 No Okay Yes > c1t5d0s0 5042 No Okay Yes > c1t6d0s0 5042 No Okay Yes > c1t7d0s0 5042 No Okay Yes > c1t8d0s0 5042 No Okay Yes > c1t9d0s0 5042 No Okay Yes > > hsp001: 1 hot spare > Device Status Length Reloc > c1t10d0s0 Available 35104400 blocks Yes > > Device Relocation Information: > Device Reloc Device ID > c0t1d0 Yes id1,sd@SFUJITSU_MAJ3364M_SUN36G_01M41510____ > c0t0d0 Yes id1,sd@SSEAGATE_ST336704LSUN36G_3CD1PPV2000071306LU5 > c1t10d0 Yes id1,ssd@w20000020375b0eac > > > e420-1.sjc#metadb > flags first blk block count > a m p luo 16 8192 > /dev/dsk/c0t0d0s7 > a p luo 16 8192 > /dev/dsk/c0t1d0s7 > a p luo 8208 8192 > /dev/dsk/c0t1d0s7 > > > If I do this: > > > bash-2.05# metasync d1 > bash-2.05# metasync d4 > bash-2.05# metasync d5 > bash-2.05# metasync d8 > bash-2.05# exit > exit > resuming mountall > > > then the machine boots and mounts all the file systems > correctly. I have tried creating metadbs on seperate slices > (unused), the number and location of db's does not seem to > make any difference in this behavior. > > We had this identical problem with 280R, but had to reformat > and reinstall without adequate investigation. > > Any ideas what might be wrong ? Is this a known issue ? > > Thanks, > Anshuman Kanwar > Unix SysAdmin > Expertcity Inc. > -- > (805) 690-5714 [off] ansh@expertcity.com > (805) 895-4231 [cel] 5385 Hollister Ave > (805) 690-6471 [fax] Goleta, CA. 93111 > > > > > --------prtdiag------------- > > e420-1.sjc#prtdiag > System Configuration: Sun Microsystems sun4u Sun Enterprise > 420R (4 X UltraSPARC-II 450MHz) > System clock frequency: 113 MHz > Memory size: 4096 Megabytes > > ========================= CPUs ========================= > > Run Ecache CPU CPU > Brd CPU Module MHz MB Impl. Mask > --- --- ------- ----- ------ ------ ---- > 0 0 0 450 4.0 US-II 10.0 > 0 1 1 450 4.0 US-II 10.0 > 0 2 2 450 4.0 US-II 10.0 > 0 3 3 450 4.0 US-II 10.0 > > > ========================= IO Cards ========================= > > Bus Freq > Brd Type MHz Slot Name Model > --- ---- ---- ---------- ---------------------------- > -------------------- > 0 PCI 33 On-Board network-SUNW,hme > > 0 PCI 33 On-Board scsi-glm/disk (block) > Symbios,53C875 > 0 PCI 33 On-Board scsi-glm/disk (block) > Symbios,53C875 > 0 PCI 33 PCI 2 SUNW,hme-pci108e,1001 > SUNW,qsi-cheerio > 0 PCI 33 PCI 1 66 scsi-pci1077,2100.1077.1.4 > > > No failures found in System > =========================== _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Fri May 30 13:58:52 2003
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:11 EST