SUMMARY: login wait during machine failure

From: Susan Thielen (thielen@irus.rri.uwo.ca)
Date: Sat Feb 13 1993 - 01:53:15 CST


I guess the problem I had had already been addressed.. But thanks
for the many helpful hints anyway. The inquiry title says it all..
Phil Antoine gave me a summary from July 12, 1992 that covered
things nicely...
If you are NOT using quotas at all,
you may

mv /usr/ucb/quota /usr/ucb/quota.orig
ln -s /usr/bin/true /usr/ucb/quota

ANd this is what I did.. I'd like to send thanks to

"David S. Comay" <dsc@seismo.CSS.GOV>
"Susan Thielen" <thielen@irus.rri.uwo.ca>
Barry Margolin <barmar@Think.COM>
Cameron Humphries <cameron@cs.adelaide.edu.au>
Christopher Davis <ckd@eff.org>
Claus Assmann <ca@mine.informatik.uni-kiel.dbp.de>
Dave Mitchell <D.Mitchell@dcs.sheffield.ac.uk>
David Fetrow <fetrow@biostat.washington.edu>
Eckhard.Rueggeberg@ts.go.dlr.de
G.ROBERTSON@aberdeen.ac.uk
Mr T Crummey (DIJ) <tom@sees.bangor.ac.uk>
Robert Haddick <rhaddick@us.oracle.com>
Stanier A M <alan@essex.ac.uk>
adiron!rmdctro@uunet.UU.NET (Tom Olin)
antoine@RadOnc.Duke.EDU
bit!jayl@Sun.COM (Jay Lessert)
cc_gucky@rcsun1.rcvie.co.at (Gerhard Holzer)
daryl@cs.athabascau.ca (Daryl Campbell)
gdmr@dcs.ed.ac.uk
guy@auspex.com (Guy Harris)
johnb@edge.cis.mcmaster.ca (John Benjamins)
jumper@spf.trw.com (Greg Jumper)
ken@visix.com (Ken Mayer)
matthew@cs.adelaide.edu.au (Matthew Donaldson)
mgh@bihobl2.bih.harvard.edu (Michael G. Harrington)
pwright@spiff.msfc.nasa.gov (Patrick D. Wright)
wallen@cogsci.UCSD.EDU (Mark R. Wallen)
zdv123@zam092.zam.kfa-juelich.de (V.Sander)

And here are there responses

From: Claus Assmann <ca@mine.informatik.uni-kiel.dbp.de>
To: thielen@irus.rri.uwo.ca
Subject: Re: login wait during machine failure

> So the question after all this is... what can I do about this??

You might disable the quotacheck on login. If you don't have setup
quotas, just link quota -> /usr/bin/true.
This may be solves your problem.
From: Stanier A M <alan@essex.ac.uk>
Date: Wed, 27 Jan 93 12:53:00 GMT
To: thielen@irus.rri.uwo.ca
Subject: RE login wait during machine failure

We have the same problem here, and its taken us 6 months to get a
handle on it.

What we believe is happening is that login tries to check the users
quota on all mounted disks - and when it tries to check on disks
physically mounted on a crashed machine it has to wait a couple of
minutes for a timeout.

Temporarily disabling quotaing seems to clear the problem.
From: Mr T Crummey (DIJ) <tom@sees.bangor.ac.uk>
Subject: Re: login wait during machine failure
To: thielen@irus.rri.uwo.ca
Date: Wed, 27 Jan 93 13:04:07 GMT

Hello Susan,

One answer could be to use the automounter to mount the software partitions.
Another could be to link /usr/ucb/quota to /bin/true (if you don't use quotas).
On login, quota checks all mounted filesystems for user's files and so when it
references the filesystem provided by the dead machine it hangs until the
reference times out.

The second option will not help if the user has the dead filesystem on his path
or if the user subsequently references the filesystem after logging in. I
suppose I would recommend an automounter. If you are fairly familiar with
SUNs automount, you may find that amd (a public domain automounter)
offers a better solution.
From: gdmr@dcs.ed.ac.uk
Date: Wed, 27 Jan 93 13:05:45 GMT
To: Susan Thielen <thielen@irus.rri.uwo.ca>
Subject: Re: login wait during machine failure

> and all seems well until someone wants to log into a machine.
> The login process works fine except that it takes up to 10 minutes
> for the prompt to come up.

It's probably stalling waiting for a quota check to complete.

> So the question after all this is... what can I do about this??

If you don't run quotas, replace /usr/ucb/quota with a symlink to /bin/true.
Or (better) run an automounter.

From: Eckhard.Rueggeberg@ts.go.dlr.de
To: thielen@irus.rri.uwo.ca
Subject: Re: login wait during machine failure

You could automount them. This should prevent the OS from mounting
the disks exept when accessed, which should be not the case by a
normal login.

From: zdv123@zam092.zam.kfa-juelich.de (V.Sander)
To: thielen@irus.rri.uwo.ca
Subject: Re: login wait during machine failure

Try the noquota option and be shure, that you have not mounted in the
rootdirectory (for example mount host:/dir /nfs)

From: wallen@cogsci.UCSD.EDU (Mark R. Wallen)
Date: Wed, 27 Jan 1993 08:32:23 -0800
To: thielen@irus.rri.uwo.ca
Subject: Re: login wait during machine failure

The problem is login calls quota which then tries
to check for quotas on the filesystems that are
exported by the down machine. Make sure that you
have noquota set as an option in fstab for all
file systems that you don't run quotas on. If
you don't use quotas at all you can

mv /usr/ucb/quota /usr/ucb/quota-; ln /bin/true /usr/ucb/quota

From: "David S. Comay" <dsc@seismo.CSS.GOV>

susan,

i think you being bitten by the `slow to rebind' bug in ypbind. there is
a patch for it available: 100342-02. below is the associated README.
some machines that seemed to have it archived include iros1.iro.umontreal.ca,
iskut.ucs.ubc.ca & sifon.cc.mcgill.ca.

dsc

Patch-ID# 100342-02
Keywords: NIS client server rebind
Synopsis: SunOS 4.1;4.1.1;4.1.2: NIS client needs long recovery time if server reboots
Date: 22/Feb/92

SunOS release: 4.1, 4.1.1, 4.1.2
 
Unbundled Product:
 
Unbundled Release:
 
Topic: NIS ypbind patch
 
BugID'd fixed for this patch: 1046416

Architectures for which this patch is available: sun3, sun4

Patches which may conflict with this patch:

Obsoleted by:

Problem Description:

  Bug 1046416:

  If you bring a ypserver down into single user and then boot it into
  multi user by either typing control D or reboot, the yp clients
  will take a long time to rebind to the server.

********************* WARNING ******************************

  This is a new version of ypbind that never uses the NIS
  binding file to cache the servers binding. This will have
  the effect of fixing the current symptom. However, it might
  degrade the overall performance of the system when the
  server is unavailable. This is most likely to happen on an
  overloaded server, which will cause the network to produce
  a broadcast storm.

*************************************************************

INSTALL::

As root and for the correct architecture directory.

Kill the currently running ypbind:

ps aux|grep ypbind
kill <processid of ypbind>

Make a backup copy of ypbind:

mv /usr/etc/ypbind /usr/etc/ypbind.FCS

Install the new version of ypbind:

cp `arch`/ypbind /usr/etc

chown root /usr/etc/ypbind
chmod 755 /usr/etc/ypbind

Restart ypbind

/usr/etc/ypbind

From: daryl@cs.athabascau.ca (Daryl Campbell)
To: thielen@irus.rri.uwo.ca
Subject: login wait during machine failure

We experience similar problems, what we thought we'd do is install the AMD automounter.
AMD has some robustness in terms of recover from NFS servers being down.
Check out Unix World Jan issue for a tutorial. ftp site 'usc.edu', directory '/pub/amd'

I'm still reading the AMD doc so I can't tell you if this approach will work as
prescribed.

Please do a summary with your answers.

From: bit!jayl@Sun.COM (Jay Lessert)
To: thielen@irus.rri.uwo.ca
Subject: Re: login wait during machine failure
Our setup is much like yours. There are three things I can think of
that will cause symptoms similar to this (most likely first):

1) Disk quotas are on. This is most likely because you say that the
   slowdown happens *only* with login. Fix by either using the noquota
   option on *all* UFS and NFS mounts, or by making /usr/ucb/quota a
   link to /bin/true on all machines.

2) The down filesystem is in the users' $PATH. Not much you can do
   about this except think carefully about what is in your path. Not
   likely, since this should cause a hang every time any shell script
   is fired, not just the login shell.

3) Mount points for different hosts are in a common directory, causing
   getwd(3) (pwd(1), for instance) to hang when one of the hosts is
   down. Df(1) would also always hang. Avoid by always maintaining a
   seperate mount point directory for each host, i.e., /hostname1/u0,
   /hostname2/u0, etc.

You also might consider using automounter or amd. This might work well
assuming 2) above is not true, and assuming you can stand /tmp_mnt
showing up when you type pwd (we can't, so we don't...).
From: antoine@RadOnc.Duke.EDU

Here's a summary that will cure your problem IF you don't use quotas. Be sure
to do it on ALL the hosts that mount remote FS's.

Good Luck,
Phil Antoine (antoine@RadOnc.Duke.EDU)
Duke University Medical Center
Radiation Oncology Physics

------- Forwarded Message

Date: Fri, 17 Jul 1992 10:05:18 -0400
From: ken@visix.com (Ken Mayer)
Followup-To: ken@visix.com (Ken Mayer)
Message-Id: <9207171405.AA15095@elan.visix.com>
To: sun-managers@eecs.nwu.edu
Subject: SUMMARY: Login inexplicably hangs on /usr/ucb/quota
Reply-To: ken@visix.com
Summary: ln -s /bin/true quota

Thanks for all the responses! The simple answer is:

        cd /usr/ucb
        mv quota quota.orig
        ln -s /bin/true quota

Other suggestions:

        Use noquota in /etc/fstab
        Diable rquotad in inetd.conf
        Use .hushlogin in $HOME

Interesting info:
- ----------
Date: Thu, 16 Jul 92 16:15:39 CDT
From: pwright@spiff.msfc.nasa.gov (Patrick D. Wright)
To: ken@visix.com
Subject: Re: Login inexplicably hangs on /usr/ucb/quota

Ken,

I had your problem once and my solution is the same as yours. As I
guess you figured out, login runs the quota program. When a NFS
mounted file system goes down it still shows up as mounted so quota
waits for a very long time for a reply ( ~5 min. on our machines). If
you had rebooted the server you would not have had the problem because
the NFS file systems (being down) would not have been mounted and
quota would not check them. My guess is that this is the first time
that this situation has occured with your new system.

For me the solution was not too difficult because I did not have many
machines to make the changes on. From the tone of your question I
gather that is not your case. There may be some options to decrease
the amount of time that an NFS command waits before aborting but I
think you are still back to making many more changes than you want to
make.

Pat
- ----------
Date: Thu, 16 Jul 92 10:59:21 PDT
From: guy@auspex.com (Guy Harris)
To: ken@visix.com
Subject: Re: Login inexplicably hangs on /usr/ucb/quota

> PID TT STAT TIME COMMAND
> 7636 co IW 0:00 login
> 7637 co IW 0:00 /usr/ucb/quota
>
>We're not running with quota's so I was suprised to see login hanging
>on it.

It doesn't matter. "login" will *always* run "quota", which will try to
contact the "remote quota" RPC server for *all* NFS-mounted file
systems, to see if the user is over quota on them, except for...

>Finally, we modified our fstab so that all of our NFS mounted
>file systems (including the home directory one) were all mounted
>rw,bg,noquota which seemed to fix the problem on one machine.

...those mounted with "noquota".

>I'm not
>looking forward to walking around to all our nodes and editing their
>fstab's. Can anyone suggest why this problem would suddenly pop up?

It'd pop up if the "remote quota" server "rpc.rquotad" were, for some
reason, not responding - e.g., had dropped core.

- ----------
Date: Thu, 16 Jul 92 07:45:42 PDT
From: jumper@spf.trw.com (Greg Jumper)
To: ken@visix.com
Subject: Re: Login inexplicably hangs on /usr/ucb/quota

You're right that NFS timeouts cause the hangs. There are several things you
can do to eliminate the problem:

     1) Move /usr/ucb/quota to /usr/ucb/quota.FCS and make a link from
         /usr/bin/true to /usr/ucb/quota.

     2) Use the "noquota" option, as you did. This is particularly
         convenient if you use the automounter.

     3) Disable "rquotad" by commenting out its line in /etc/inetd.conf.

Unfortunately, unless you use the automounter for all NFS filesystems, there
is no way around the fact that you will have to make changes on each machine
where you want quotas to be disabled.
- --

                                       Greg Jumper
                                       TRW DSC Signal Processing Facility
                                       jumper@spf.trw.com

- ----------
Date: Thu, 16 Jul 92 14:52:14 +0200
From: cc_gucky@rcsun1.rcvie.co.at (Gerhard Holzer)
To: ken@visix.com
Subject: Re: Login inexplicably hangs on /usr/ucb/quota

A hint which is somewhere in Sun-Doku or software bulletins.

If you are NOT using quotas at all,
you may

mv /usr/ucb/quota /usr/ucb/quota.orig
ln -s /usr/bin/true /usr/ucb/quota

which speeds up login ;-)

BTW:
Some time (OS releases) ago, they did the following in /etc/rc????

grep quota /etc/fstab && quotaon -a

which is not that intelligent, if you write ...,noquota,.. in the /etc/fstab

BUT they have learned .

Have they ??

Best regards
Gucky

            __ Internet: gucky@rcvie.co.at
  _________| |_____________ //-// _____ +-----------------------------------+
 / Gerhard Holzer (Gucky) | \ | UUCP: ..!relay.EU.net!rcvie!gucky /
+----------------------------------------+ | Tel : +(431) 39-16-21 / 163 +
| Alcatel Austria - ELIN Research Center | | Fax : +(431) 39-14-52 / o \ \
| Ruthnergasse 1-7 | +-+---------------------/ -|- \-----+
| A-1210 Vienna - Austria - Europe | | | / \
+----------------------------------------+-+-+ - -
- ----------
Via: uk.ac.sheffield.dcs; Thu, 16 Jul 1992 13:37:12 +0100
Date: Thu, 16 Jul 92 13:37:14 BST
From: Dave Mitchell <D.Mitchell@dcs.sheffield.ac.uk>
To: ken@visix.com
Subject: Re: Login inexplicably hangs on /usr/ucb/quota

> PID TT STAT TIME COMMAND
> 7636 co IW 0:00 login
> 7637 co IW 0:00 /usr/ucb/quota
>
>We're not running with quota's so I was suprised to see login hanging
>on it. Finally, we modified our fstab so that all of our NFS mounted
>file systems (including the home directory one) were all mounted
>rw,bg,noquota which seemed to fix the problem on one machine. I'm not
>looking forward to walking around to all our nodes and editing their
>fstab's. Can anyone suggest why this problem would suddenly pop up?
>The server machine hasn't been down for days.

Its difficult to give an authoritive answer, not knowing your network,
but since quota has to contact *every* NFS server listed in fstab
not protected by "noquota", there's a fair chance that one of those
servers somewhere had temporarily gone v. slow / crashed/ rebooted,
or someone had unplugged an ethernet cable somewhere, or... etc.
The more hosts in fstab, the worse the problem becomes.

The reason why you got this problem even though
"We're not running with quota's", is because although you may not have
quotas enabled on the local machine or elsewhere, login doesnt know
this, and so executes quota to check to see if you have a quota
on any mounted patitions, inc. NFS ones.

I only know of two ways round this problem. The first (noquota option) you
have already discovered, The second is for each user who doesnt want
to "be hung" to have a .hushlogin file in thier home dir.

Dave.

* David Mitchell, Systems Administrator, email: D.Mitchell@dcs.shef.ac.uk
* Dept. Computer Science, Sheffield Uni. phone: +44 742-768555 ext 5573
* PO Box 600, Sheffield S1 4DU, UK. fax: +44 742-780972
*
* Standards (n). Battle insignia or tribal totems

From: johnb@edge.cis.mcmaster.ca (John Benjamins)
Date: Wed, 27 Jan 1993 13:46:36 -0500
To: "Susan Thielen" <thielen@irus.rri.uwo.ca>
Subject: Re: login wait during machine failure

I run automount to try to avoid this problem. i also use the noquota
option when doing mounts (unless you are using quotas).

the reason the timeout takes so long, is that the login process (when
you first login to a machine) runs /usr/ucb/quota, and when quota
stat's the filesystems, it gets hung. if you are NOT using quotas,
then what some admins do is to replace /usr/ucb/login with /bin/true.

From: Christopher Davis <ckd@eff.org>
To: "Susan Thielen" <thielen@irus.rri.uwo.ca>
Subject: login wait during machine failure

Sounds like quota checking; try mounting with noquota.

From: Cameron Humphries <cameron@cs.adelaide.edu.au>

Hi.

We have in the past experienced similar delays to those you have
mentioned. We discovered that it was the /usr/ucb/quota program
which was invoked by "login" that was causing the delays. As we
don't use disk quotas we simply moved the real quota program to
another location and placed a sym-link from /usr/ucb/quota to
/bin/true. Login delays are no longer a problem for us.

From: David Fetrow <fetrow@biostat.washington.edu>
Subject: Re: login wait during machine failure
To: thielen@irus.rri.uwo.ca
Date: Wed, 27 Jan 93 14:47:28 PST

 The only way out of this I know of is a little kludgy:

        automounter (or, perhaps better, the freeware: amd)

 Amd is a little more versatile and handy if you have non-Suns (and you
probably will if you don't already) but automounter comes with SunOS.

 It avoids the problem by only mounting drives when needed (in a way that's
quite transparent to you once set up) and unmounting them after a period
of non-use. This really improves reliability in a situation such as yours.
If the drive fails when not mounted: no problem (aside from not being able
to access your info).

From: matthew@cs.adelaide.edu.au (Matthew Donaldson)
I think it's something to do with quotas. When you log in, login
runs quota, which I think checks all mounted disks to see if you
have a quota on any of them (and if you are over quota). Now if a
partition from a machine which is down is mounted on your machine,
when you log in, quota wil try to check that disk but will sit around
waiting for it to respond for absolutely ages (like around 10 minutes).
If you actually run quotas, I don't know of a fix for this, but if you
don't use them, replace /usr/ucb/quota with a link to /bin/true. This
will stop login running quota at all and should fix the slow login
problem.
Hope this helps.
From: Robert Haddick <rhaddick@us.oracle.com>
To: thielen@irus.rri.uwo.ca
Subject: Re: login wait during machine failure

make sure and:

mv /usr/ucb/quota /usr/ucb/quota.old;
ln -s /bin/true /usr/ucb/quota;

quota can cause problems sometimes......

From: adiron!rmdctro@uunet.UU.NET (Tom Olin)
To: uunet!irus.rri.uwo.ca!thielen@uunet.UU.NET
Subject: Re: login wait during machine failure

   The login process works fine except that it takes up to 10 minutes
   for the prompt to come up.

   Now I have mounted the disks with various different parameters,
   but the ones I am using at the moment are

    rw,bg,intr,noac

Add noquota to all NFS filesystems and see if that helps.

From: G.ROBERTSON@aberdeen.ac.uk
You have your finger on part of the problem - hard mounts are a pain when
servers reboot AND you have cross-mounted filesystems, ie servers sharing
files with each other. On the other hand soft mounts can give rise to NFS
"write" failures which CAN screw up filesystems. I've taken a halfway
position, and mounted shared filesystems "rw,soft,retrans=8" This allows a
reasonable number of retries.

The other nasty on rebooting is quota checking NFS mounted filesystems. If the
target to be quota checked is down, then it takes forever and a day for the
quota check to complete. There havs been some vague discussion of this on this
list, but nothing I've bee able to make use of to fix the problem

 
Susan KJ Thielen Application Programmer, System Manager
Robarts Research Institute Phone: (519) 663-3833
PO Box 5015, 100 Perth Drive Fax: (519) 663-3789
London, ON N6A 5K8 E-mail: thielen@irus.rri.uwo.ca



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:07:28 CDT