ok, I am convinced that rw,hard is the only way to go. intr probably too.
but I suspect things will hang again when a machine goes down again.
I get the distinct feeling NFS mounting is not yet very mature.
Part 1 of 3 of responses I got since my first summary posting.
From: IN%"eric@ists.ists.ca" "Eric M. Carroll" 23-APR-1990 11:55
To: YATES@a.chem.upenn.edu
Subj: Re: fstab options for NFS mounts
My experience is that if you have to be NFS crash freeze impervious, the
only real answer is to do all NFS mounts via amd. amd has the sense to
figure out that the server is dead and start timing out requests with
errors. Sun claims intr does something, but unless you severely limit
the retransmits, you get hung solidly. This is due, I believe to the
fact that interrupts are checked only at one point in the NFS code, once
per retransmit, and the backoff between retransmissions increases over
time. Limiting the retransmissions is bad news if your network or server
is busy. We am using amd on a machine on the other side of a laser link
that occasionally goes away, and it has been satisfactory.
-- Eric Carroll Network Manager Institute for Space and Terrestrial ScienceFrom: IN%"era@niwot.scd.ucar.EDU" 23-APR-1990 12:15 To: YATES@a.chem.upenn.edu Subj: Re: fstab options for NFS mounts
I'd like to hear from experts about what NFS options are recommended for machines that must remain impervious to other machines crashing, hanging, etc. The ones below have proven not adequate for some reason. The retry=3 seemed to cure the things when a machine was down when the reboot occurred, but when a machine crashed (Convex), the up machine (sun 3/280S) would hang up during new logins, immediately after putting out /etc/motd.
We used to have this problem of logins hanging (on both Suns, and a Pyramid) when another machine was down. I gather you must be running 3.x, because this was a known bug that seems to have gone away in 4.x.
Prior to 4.x, a number of people on the net had recommended that you could cure the problem by changing the order of directories in your root directory so that directories on which NFS files were mounted, wouldn't be looked at before the login csh found the user's home directory. You can use "ls -f" to find out what the order is; remove/rename/add directories until you have them in the right order.
From: IN%"pearlman%moose@rand.org" "Laura Pearlman" 23-APR-1990 16:02 To: yates@a.chem.upenn.edu Subj: Re: fstab options for NFS mounts
In article <222@monty.rand.org> you write: >I'd like to hear from experts about what NFS options are recommended >for machines that must remain impervious to other machines crashing, >hanging, etc. The ones below have proven not adequate for some reason. >The retry=3 seemed to cure the things when a machine was down when >the reboot occurred, but when a machine crashed (Convex), the up machine >(sun 3/280S) would hang up during new logins, immediately after putting >out /etc/motd.
There are several different reasons why a user's logins could hang. If a home directoriy is on the down system, then the chdir() to that home directory (which is done right before the user's shell is exec()ed) will hang. If /usr/spool/mail is mounted from the down system, then the mail check will hang. If your /usr/ucb/quota is the real quota program, then the quota check will hang whenever a mounted filesystem lives on a server that's down. If one of the entries in a user's path is on the down system, then the C shell will hang on its original command hashing, and the Bourne shell will hash before exec()ing anything.
But the most insidious reason for a login to hang is that getwd() often hangs when a server is down, especially if you automount filesystems. I have a locally-written getwd that doesn't hang like that, but it's reasonably painful to install and requires source. I've heard rumors that the 4.1 getwd() doesn't hang either.
>Would some combination of retrans=N1,timeo=N2 be helpful?
The "retry" parameter controls how many times a system will retry a mount request; it doesn't affect anything once the filesystem is mounted. Setting "retrans" to a small number (or making "timeo" smaller) may make the hanging problem seem less serious, but also runs the risk of making your filesystems less robust. We hard-mount almost all our filesystems here.
-- Laura Pearlman pearlman@rand.org
From: IN%"sam@telegraph.ICS.UCI.EDU" "Sam Horrocks" 23-APR-1990 19:48 To: YATES@a.chem.upenn.edu Subj: Re: fstab options for NFS mounts
In fa.sun-managers you write:
>I'd like to hear from experts
I'm no expert, but I play one on TV. :-)
>about what NFS options are recommended >for machines that must remain impervious to other machines crashing, >hanging, etc. The ones below have proven not adequate for some reason. >The retry=3 seemed to cure the things when a machine was down when >the reboot occurred, but when a machine crashed (Convex), the up machine >(sun 3/280S) would hang up during new logins, immediately after putting >out /etc/motd. I hate having to change /etc/fstab on multiple machines >(removing NFS mounts for down machines) until they are all up again. >This has required reboots of production machines and simply can't be >tolerated.
>/etc/fstab: >machine:/u1 /u1 nfs rw,soft,retry=3 0 0
How about:
machine:/u1 /u1-link nfs rw,soft,bg,noquota,intr 0 0
Where /u1-link is a symbolic link to /u1. This may still cause problems unless you can change /u1 to some other path such that the mount point isn't in the root directory (ie if you could mount it on /u1/m1 (m1 for mount one) and only put mount points from the same machine inside the /u1 directory) The problem with having the nfs mount points in the root directory is that anybody who stats all of the files in / (getwd might do this, "ls -F /" will for sure.) will get hung until the soft mount times out.
>rw is necessary. It seems that /etc/exports options cannot be causing >the problem.
>What is the consensus about using bg mounts? Can they get you into any >trouble under certain circumstances, or not as much as fg mounts?
We use them all the time and they cause no problems for us.
>Would some combination of retrans=N1,timeo=N2 be helpful?
Possibly, if the soft mount timeout is too short or too long. Remember that timeo is doubled on each transmission (we use timeo=10, retrans=5 on one of our mounts which turns out to be about 30 seconds delay).
>Thanks for the info. I will summarize the results.
You're welcome. Hope that's helpful.
Sam
From: IN%"dave@mti.mti.com" 23-APR-1990 20:24 To: uunet!a.chem.upenn.edu!YATES@uunet.UU.NET Subj: Re: fstab options for NFS mounts
>I'd like to hear from experts about what NFS options are recommended
Well, i'm far from being an nfs expert, but i'll tell you what's worked for us...
We have one machine which acts as a server for various NFS clients. Our server, however, mounts a few filesystems from other machines. Through some experimentation, i've found that mounting nfs filesystems on our main server with the options "rw,bg,hard,intr" usually keeps the other machines from hanging the main server.
We need the "rw". The manual (mount(8) et al.) tells us that rw-mounted filesystems should use the "hard" option. The "bg" prevents an attempted mount of an unavailable filesystem from hanging. The "intr" should allow us to interrupt out of pending requests on hard-mounted filesystems, but this doesn't always seem to work as advertised.
We do have occasional problems when a machine goes down. If the main server has any filesystems mounted from the downed machine, certain processes will hang with "nfs server unreachable ... still trying" messages until the other machine comes back up. (Only certain processes hang, but the machine is otherwise ok and continues to serve its clients.) It seems this shouldn't happen, since these processes aren't accessing any of the remotely mounted filesystems directly (and even if they did, they should be interruptable). However, we do mount them directly under /, and one could come up with various explanations for this involving access of files by absolute paths.
You can use this info as yet another data point for your own investigation and/or summary, but it's probably not the best stuff to broadcast to the whole list. It's clear that we could stand to do a little more research into this area. Unfortunately, there isn't always time to figure out how to do administrative tasks the right way.
Good luck. I look forward to your summary. I'm anxious to hear what the experts have to say! ;-)
--dave
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:05:57 CDT