SUMMARY: HA NFS Solution (under VCS) + stale file handles

From: Greg Gallagher <ggallag_at_foc.com> Date: Fri Nov 02 2001 - 13:48:55 EST · This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:42:27 EST

Final Solution:

   When implementing failover of an NFS share with between two or
more servers, you must ensure that the major/minor numbers match for
the device:

# ls -lL /dev/vx/dsk/homedg/home-v
brw-------   1 root  root  238,27000 Nov  1 17:01 /dev/vx/dsk/homedg/home-v
                            |   |
                            |   |__ (minor)
                            |
                            |__ (major)

So, for example, if you were to import the the diskgroup and volume on
another server, the major/minor number must be the same.  If it isn't,
that will cause the "Stale File Handle" problem.

As a side note, I noticed this when I ran mount looking at the NFS
share:

# mount | grep home
/home/ggallag on /export/home/ggallag read/write/setuid/dev=3b86978

And I wondered if the dev=3b86978 is some sort of representation of
that major/minor number, which is why, when served from a server with
a different one, the stale file handle occurs.  But that's just a side
observation..

VCS includes instructions for changing the major/minor so they'll
match across servers.  In fact, VCS comes with the command 'haremajor'
which will change "/etc/name_to_major", and if the device is a regular
block device (not a volume), the "/etc/path_to_inst" file must be
changed and a "reboot -- -rv"

After I read the instructions (Doh!) which Gary Losito and Christopher
Cibrowski kindly pointed out, the failover was completely transparent
to the end users.

On another note: although several people pointed out that they are
happily using NetApps, I'm not sure that NetApps will really help for
HA and Disaster Recovery, although for a cheap, centralized NFS
solution I can see it.

With the VCS cluster I have in place, bundled NFS agent, and using EMC
Symmetrix with SRDF, I am able to fail our home directories from one
site to another several miles away without the user really noticing
(there is a small bump where things stop working, then return with a
"NFS server ok" message.

Thanks to the following people for responding with the correct answer:

    Gary Losito
    Christopher Ciborowski

As well as:

    Dylan Northrup
    Thomas Anders
    Jacob Charly
    gagan.narang@ps.ge.com
    Mike DeMarco

For the dialogue and suggestions!!  (p.s.  the MAC address doesn't
have to be matched across servers, you can fail with a virtual
interface)

Thanks Sun Managers!!

---
Greg Gallagher
Sr. UNIX Systems Administrator
First Options Chicago .... (312) 362-3643

Original question:
> Basically I'm sharing off /export/home
> as /home to each of the servers (mounted with the automounter).
> 
>    VCS comes with a bundled NFS server and share agent, which works
> great: I can fail the disk, mount and share to any server and ensure
> that a NFS daemon will start up if it isn't there already.
> 
>    The problem is that when the disk fails over to another server, all
> of the clients/users logged in run into a "NFS: stale file handle"
> problem.  If I restart the nfs client it will connect to the new
> server (through the floating IP which follows the disk), then things
> seem to be ok.
> 
>     I was wondering: has anyone used VCS + NFS agent to provide some
> sort of a HA NFS share solution?  If not, how do people handle failing
> over NFS servers??  Is there a way to pass the locking info to another
> server?  My experience with NFS is minimal, but I'd love to know how
> people deal with this.
> 
>     Thanks in advance.  I will definitly summarize.

_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers