SUMMARY: Re: SVM + Cluster pain

From: Erek Adams <erek_at_theadamsfamily.net> Date: Thu Jan 05 2006 - 15:33:11 EST · This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:54 EST

Thanks for the replies, they were appreciated.  However, one thing that I
didn't manage to get across was that the server(s) would "hang" with this
error message on console and you couldn't continue past that point--Even
NodeB.

It seems that it's a known but rather obscure bug with Solaris 9.

Anyway, it's resolved:

1)  Boot both nodes A + B into non cluster mode.
2)  Comment out the shared md devices of both vfstabs
3)  Reboot NodeA; once it's up, reboot NodeB
4)  On NodeB:
	metaset -s <setname> -P -f
	(blank set should go be purged...)
	metaset
	(should be blank)
5)  On NodeA:
	metaset -s <setname> -P -f
        (blank set should go be purged...)
        metaset
        (should be blank)
	(recreate the metaset from scratch)
	metaset -s <setname> -a -h NodeA NodeB
	metaset -s <setname> -a <diskpath0> <diskpath1> ... <diskpathN>
	metaset -s <setname> -a -m NodeA NodeB
	metaset
	(should show new set and ownership)
6)  On NodeB:
	metaset -s
	(you should see the metaset you just created)
7)  On both nodes:
	mount -a

In theory, you should be back up and happy with all data intact.  But of
course, YMMV!

Cheers!

[Original post below]

On Wed, 4 Jan 2006, Erek Adams wrote:

> NodeA-v890
> NodeB-v890
> Shared-3510 AC
> SunCluster 3.1
>
> We lost power to the cluster--Don't ask, to ugly to tell...  Before then
> everything was fine.  Now, I get this weird problem.  Bring up array,
> comes up fine.  Bring up node1, boots, and starts to grab the array.
> Pauses a while then starts giving the following error:
>
>   Jan  4 16:22:41 node1  Cluster.Framework: stderr: metaset:
>   node1: ingdg: not owner of metadevice database
>   Jan  4 16:22:41 node1  Cluster.Framework: stderr: metaset:
>   node1: ingdg: must be owner of the set for this command
>
> Over and over....
>
> I've tried: pulling the heartbeats and booting only node1.  It just
> flips the above messages over and over.  If I try to boot node2, node2
> hangs on boot waiting on node1.  I've killed the heartbeat between the two
> boxes, with no luck.  I can boot node1 in non-cluster mode and I get the
> same error.
>
> >From what I've found, it seems that purging the metadb and then recreating
> it.  I'm hoping that's not the fix...  It just sounds
> unpleasant.
>
> Thoughts, ideas, suggestions?
>
> -----
> Erek Adams
> Nifty-Type-Guy
> TheAdamsFamily.Net
> _______________________________________________
> sunmanagers mailing list
> sunmanagers@sunmanagers.org
> http://www.sunmanagers.org/mailman/listinfo/sunmanagers
>

-----
Erek Adams
Nifty-Type-Guy
TheAdamsFamily.Net
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers