Summary: replacing failed drive on J4200

From: Chris Hoogendyk <hoogendyk_at_bio.umass.edu>
Date: Fri May 03 2013 - 13:10:58 EDT
Many thanks to Anthony D'Atri, Chris Gibson and Sal Serafino.

I guess I should have remembered from a few years back, but it's hard to keep track of it all along 
with all the new stuff.

Anyway,

# devfsadm -Cv

solved the format issue. Then

# zpool replace jpool c5t5000C50004B49B27d0 c5t5000C500172B3293d0

got things rolling with the zpool. I presume when it is finished with its scrub and resilver 
everything will be cool. The system has been working fine all along, its just that this maintenance 
work is going on underneath the scenes.


Thank you,

(original message below)


-- 
---------------

Chris Hoogendyk

-
    O__  ---- Systems Administrator
   c/ /'_ --- Biology & Geology Departments
  (*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst

<hoogendyk@bio.umass.edu>

---------------

Erdvs 4






-------- Original Message --------
Subject: 	replacing failed drive on J4200
Date: 	Fri, 03 May 2013 12:27:49 -0400
From: 	Chris Hoogendyk <hoogendyk@bio.umass.edu>
To: 	Sun Managers List <sunmanagers@sunmanagers.org>



Hoping the pool of smart folks on sunmanagers still has a sufficiently high level of redundancy. ;-)

I have a J4200 on a T5220 running "Solaris 10 5/09 s10s_u7wos_08 SPARC". It is SAS multipathed and
configured as a single zpool with raidz2 with a hot spare. So, the equivalent of 9 data drives, 2
parity, 1 spare. The drives show up with their WWN (which is physically printed on the drive case)
as the "t" in cxtxdx, so, e.g. c5t5000C5000F5D37E3d0.

I had a drive fail. ZFS automatically switched in the spare and resilvered. So, I still have raidz2,
but no longer a spare.

I got a replacement drive. Of course, the WWN is different, so I expected it would come up with a
different ID in the sense of cxtxdx. Thus, I expected to have to do more than just a zpool replace.
However, it's a bit more gnarly than that.

It turns out that when I removed the failed drive and put in the new drive, it never showed up in
format. I'm guessing I need to do some sort of cfgadm thing to remove it and then replace it. I
can't do anything with CAM, because I don't have the version that supports the J4200. When I tried
to install it, I could not, because it complained that the java I had was vulnerable and needed to
be replaced. However, I'm no longer on contract and can't get patches or software upgrades. So the
Oracle guides that say to use CAM don't help me.

It seems like I had a failed drive in a J4500 a while back that I replaced without ever having to do
anything (just hot swapped, done). That's also multipathed SAS.

However, in the current case with the J4200, I didn't have an inventory of the WWNs with respect to
position in the J4200 (and the position in format is not related to the position in the J4200), so I
shut it down and removed each drive to get the WWNs and serial numbers. When I found the failed
drive, I replaced it. Then I powered it back up and booted the system. Format only shows the 11
original drives. It doesn't show the 12th (actually in position 6) drive that was replaced.

How do I get this drive to show up in format? Once I've done that, how do I deal cleanly with
replacing it in the zpool?


Thank you,


-- 
---------------

Chris Hoogendyk

-
     O__  ---- Systems Administrator
    c/ /'_ --- Biology & Geology Departments
   (*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst

<hoogendyk@bio.umass.edu>

---------------


Erdvs 4
Received on Fri May 3 13:11:05 2013

This archive was generated by hypermail 2.1.8 : Wed Mar 23 2016 - 09:24:59 EDT