SUMMARY: NIS failures of "hosts" maps

From: T.D.Lee@durham.ac.uk
Date: Mon Mar 02 1992 - 14:59:07 CST


Just over a week ago, I sent the following plea:
 
> Configuration:
>
> NIS master: SPARCserver 470 SunOS 4.1
> NIS slave(1): SPARCserver 670 SunOS 4.1.2
> NIS slave(2): Sun 3/80 SunOS 4.1.1
>
> NIS hosts maps almost empty; using "B=-b" in "/var/yp/Makefile"
> to use DNS. Thus even local lookups go to the DNS. (Actually
> not quite true; "hosts.byaddr" is empty, but for weird UK-specific
> reasons "hosts.byname" has information keyed in the form "dur.xxx".)
>
> NIS domain: "cc". DNS domain "dur.ac.uk".
>
> History:
> This configuration has been working satisfactorily for about a year.
> (in fact just using two 470s; we then had one upgraded to a 670 a few
> weeks ago; everything still OK).
>
> Last weekend the Sun 3/80 was upgraded from 4.0.3 to 4.1.1 and made into
> an NIS slave at the same time. Some time later we started getting
> the trouble described below. Despite everything, I believe this to be
> coincidental, rather than causal.
>
>
> Over the last couple of days, we have started having major trouble
> resolving hostnames. Usually a "ping xxx" succeeds, saying
> "xxx.dur.ac.uk is alive". But a small set of hosts will then
> disappear for about half-an-hour (very approximately). When they
> re-appear, another set of hosts then disappear. But "ping xxx.dur" and
> "ping xxx.dur.ac.uk" continue to work; only the "xxx" form disappears.
>
> All sorts of things, of course, then fall flat: "rlogin", "telnet",
> automounter ... . The effects are dire, especially for NFS file access
> to automounted filesystems.
>
> At first, it seemed to be a feature of one of the slave NIS servers,
> affecting machines bound to it. But it seems that different NIS servers
> lose different hosts at different times. This includes the master (470)
> which has been stable for over a year now.
>
> We killed "ypserv" on the two slaves, so that every NIS client eventually
> re-bound itself to the master NIS server. Still the trouble persisted.
> Even killing and restarting ypserv, ypbind etc. on the master server
> did not cure it.
>
> Eventually, I rebooted this master NIS machine. This is generally
> "last resort" action, as the machine is the main file server for
> the university. (Incidentally, does anyone know of any existing means
> to duplicate and maintain application software to other servers, so that
> we can take advantage of the automounter's "multiple location" facility
> for resilience?) So far, it seems OK.
>
> But given the severity of its effects, I would appreciate quickly any
> diagnosis and possible cures, be they folklore or factual.
>
> Many thanks in advance.
 
I received a few replies, some of which helped to point me in the correct
direction. Many thanks to all.
 
(For completeness, soon after the reboot of the 470 (primary DNS,
master NIS), a few failures still occurred, but by then I had posted
this query.)
 
There had been an element of "pilot error", but the strangeness of the
symptoms is still mysterious and baffling. I am assuming for the time
being that correcting my mistake will cure the problem (it hasn't
recurred yet...).
 
Briefly, during the upgrade to the 3/80 (a secondary DNS server) I had
accidentally got it pointing to itself for refreshing local
information, instead of pointing to the primary DNS server. This is
because I had copied the various "named.boot" files from another
secondary DNS server, which, due to historical oversight, was
refreshing from the 3/80 instead of the primary DNS server.
 
Thus both these DNS servers were effectively cut off from the primary.
(There was a clue in the "SOA" index numbers on each server when I looked
more deeply, having cottoned on to the anomaly.) It ties in with
the "about half-an-hour" symptom; this is our DNS retry time.
 
I have now tidied up the whole lot.
 
Whilst I still do not understand the behaviour, I am hoping that
the above fix to our DNS configuration has cured the problem.
 
Moral: if you change something (as many months ago I moved the primary
DNS from the 3/80 to the 470), finish the job! I broke my own rule
and this might have been the price that our users had to pay.
 
One incidental point, which I had already realised and had implemented
from "day one" of our DNS, but might not be generally known: there
should be a "/etc/resolv.conf" on all NIS servers (master and slaves).
Obviously, it does no harm to have this on all clients also, for the
benefit of any programs (e.g. nslookup) which use the DNS directly.
 
Thanks to:
          " (Marcel Bernards)" <bernards@nl.ecn>
          Jon Peatfield <J.S.Peatfield@uk.ac.cambridge.damtp>
          Frank "G."Fiamingo <frank@edu.ohio-state.acs.tardis>
          Mike Raffety <miker@com.sbcoc>
          zjat02 @ com.amoco.trc
 
 
: David Lee Computer Centre :
: University of Durham :
: Phone: +44 91 374 2882 (ddi) South Road :
: Fax: +44 91 374 3741 Durham :
: ARPA: T.D.Lee@durham.ac.uk U.K. :
: JANET: T.D.Lee@uk.ac.durham :



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:37 CDT