SUMMARY: NIS /etc/group Question

From: Anthony Yen (tyen@mundo.eco.utexas.edu)
Date: Wed Mar 04 1992 - 07:48:44 CST


Sun-Managers:

Boy, life would be MUCH more difficult without this mailing list!
Okay, thanks for all the people who replied:

bit!jayl (Jay Lessert)
barron@cs.uchicago.edu (Tom Barron)
stern@sunne.East.Sun.COM (Hal Stern - NE Area Systems Engineer)
mdl@cypress.com (J. Matt Landrum)
ndd@bal1.mc.duke.edu (Ned Danieley)
stanonik@nprdc.navy.mil (Ron Stanonik)
Upkar Singh Kohli <upkar@wsu-eng.eng.wayne.edu>
judy@qucis.queensu.ca (Judy Russell)
Dieter Muller <dworkin@merlin.rootgroup.com>
Mike Raffety <miker@sbcoc.com>

Special mention goes to Hal Stern, who had successfully helped several
of the repsonders with this same problem.

Most common response: Make sure that a "make netid" is also run. That
is, "make group; make netid" is the correct way to make the group NIS
map; plain old "make group" does NOT work. This is a documented bug.
I'm running SunOS 4.1.1 rev B, so a simple "make" in /var/yp will also
update the netid map. Since I administer only 18 SPARCstations, I'm
used to just issuing the comprehensive "make" on all NIS maps whenever
I work with any one of them, so this was not it, although it did help
(see below).

The short summary is in the final resolution, but it doesn't tell the
whole story. But here is the quick and dirty fix:

                ========== FINAL RESOLUTION ==========
"Rogue" NIS slave that was supposed to be a NIS client with
distribution /etc/group file (instead of "+:" /etc/group file) found
running on network by looking for ypserv processes on all hosts on the
network. Killed ypserv on the rogue.
                ========== FINAL RESOLUTION ==========

                 ---------- NOTE & WARNING ----------
What follows is an exhaustive description of other suggestions, and
then how I eventually solved the problem. I'm hoping that showing my
train of thought might help future sysadmins in seeing how some of
these problems are approached. Come to think of it actually, how you
should NOT approach a problem... :-)
                 ---------- NOTE & WARNING ----------

My understanding of NIS was correct: you do NOT have to do a newgrp.
One respondent (J. Matt Landrum) thought this was the case, so I was
relieved to find out I was not the only one who found the manuals a
little confusing at first; it took me several readings before I
concluded it had to be designed so that you did not have to run
newgrp, and that something was wrong.

One respondent (Upkar Singh Kohli) suggested to check what the clients
thought the groups looked like with a "ypcat -k group". This was a
step in the right direction: I discovered that many of my clients had
distribution /etc/group files, so I replaced them with plain "+:"
/etc/group files. This still did not solve the problem though,
although now all my clients showed the same groups information.

Another reply (Dieter Muller) suggested checking for blank lines in
the /etc/group files:
        Check for blank lines in the various /etc/group files -- they
        tend to really confuse the issue. I'd also check for trailing
        white-space, but that probably doesn't matter nearly as much.
This was a good idea, but I had already tried it (one of the first
things I always try is stripping configuration data down to the bare
minimum).

So I started sleuthing around, beginning with the observation that
most people suggested checking the netid map. These suggestions in
combination with Upkar's led me to think that I should check how the
NIS maps were propagated, so that I might make a map of who was
picking up maps from who (like the graphic in the first few pages of
"System & Network Administration": Chapter 16). I looked at what
"apropos NIS" had to offer, and while working through the list of
stuff I ran into "ypwhich".

At this point I got a surprise: ypwhich reported that I was getting my
maps from a machine that was supposed to be configured as a NIS
client. I immediately started checking the other clients, beginning
with the other one in my office, and found another surprise.

The problem was only affecting the two machines in my office! Since
these were my baseline configurations for the entire network, I had
wrongly assumed that the behavior they were exhibiting was replicated
across all clients (I only tested my SS2 and the IPC in my office, and
then posted my orginal message to Sun Managers). Both the SS2 and IPC
in my office were picking up their maps from another client. All
other clients had the correct NIS maps, and behaved correctly.

I did a quick once over on the client that was giving maps to the two
clients in my office, and determined that it was not configured as a
NIS master. All I did was run a quick diff between this
phantom-NIS-master and my workstation for all /etc/rc* files (had to
create unique copies in my home directory across the net for each file
from each machine, naturally). The diff showed none of the ypxfr
stuff was turned on. Then I checked ypwhich for all other clients on
the network, and they all reported they were picking their maps from
the (legal) NIS slave (see below for how I did this easily).

This had me stumped for awhile (no such thing as voodoo computing,
even though sometimes we like to blame cosmic rays for lots of
nonsense that goes on in the computing world), and then I decided to
check the ypserv daemon-handling code in /etc/rc.local, because this
appeared to be a common denominator on both NIS masters and slaves. I
started out by executing the C-Shell script (typed in from the command
line):
        foreach host (`cat hosts`)
        ? echo -n $host": "
        ? rsh $host ps aux | grep ypserv
        ? end
where hosts in `cat hosts` (note the reverse apostrophes/accent marks)
was a file (awk'd out of my inventory file) with a list of all the
machines I was responsible for on the network, a hostname to each
line. This is also how I ypwhich'd everyone.

Bingo! The phantom was running ypserv, which meant it was configured
as a NIS slave, but somehow incorrectly (see below for reason). I
killed the ypserv process on this illegal NIS slave, followed the
ypserv code in /etc/rc.local to /var/yp and disabled the subdirectory
with the domainname name by renaming it to the domainname without the
dots (so we can bring the phantom back up as a slave in the future if
need be). A quick reboot on the affected two machines in my office
for good measure (not necessary, but I needed to do some cruft
cleaning anyways) and the NIS maps were all straightened out.

This is actually a cleaned-up version of my train of thought, believe
it or not. Back up there where I said I was boggled took about a day,
while I muddled around between the "System & Network Administration"
manual and the /etc/group file (I was convinced that I had somehow
mangled it, despite grpck's clean bill of health). For a while, I
almost threw in the towel and was close to just saying that the two
affected machines had somehow picked up the phantom through a freak
set of incidents (Twinkie computing!), and a quick reboot would fix it
(actually, it did, hehe, but I was really annoyed that I couldn't
explain *how* they picked up the maps---at first I didn't follow the
ypserv code and check where it was looking up for its data). I was
actually very hot on the trail at first (by asking the question: how
are my maps being sent around?), but quickly forked off into a very
circuitous route to the same destination: I plead deafening ignorance
of all relationships in NIS (the "big picture", so to speak).

I didn't realize until much later after I fixed this what probably
happened: the NIS slave that was supposed to be a NIS client was
running the modified /etc/group file, which was confusing my two
SPARCs---this rogue was actually the NIS master at one time, and my
predecessor had probably just not done a thorough job of reconfiguring
it back down to a NIS client when the large server (4/470, now the
670MP) was installed. For closure's sake, I'm repeating the final
resolution here:

                ========== FINAL RESOLUTION ==========
"Rogue" NIS slave that was supposed to be a NIS client with
distribution /etc/group file (instead of "+:" /etc/group file) found
running on network by looking for ypserv processes on all hosts on the
network. Killed ypserv on the rogue.
                ========== FINAL RESOLUTION ==========

Sigh, so that's it. Hope I didn't waste too much bandwidth and bore
everybody to tears---if anyone objects to this kind of summary, please
let me know so I know to post quick and dirty summaries in the future.

THANKS AGAIN TO EVERYONE! EVERY message helped---even if it didn't
hit the nail on the head, it provoked thought on the problem from a
fresh angle, which was what I needed.

Anthony Yen - tyen@mundo.eco.utexas.edu ... Sail tough
SPARC SysAdmin - UT/Austin - Economics or go home ... Kowabunga! ...



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:37 CDT