-----BEGIN PGP SIGNED MESSAGE-----
At this point there are a few minor bugs remaining, but I'll deal with
them in their time. I want to extend a sincere and hearty thanks to
the numerous people who helped out on this one. For those who managed
to miss out on my frantic messages of the last few days, I will recap
with summaries and acknowledgements.
The original situation was this: I had an NIS+ root master server
(brutus) running Solaris 2.5 and a client (cassius) running 2.5.1, and
decided finally to upgrade the server to 2.5.1.
Here's what went wrong in the order I noticed and fixed them. It's
long, but not as long as the time I spent on all of this... (BTW, did
I mention I only get paid for 10 hours of sysadminning a week?)
PROBLEM #1
==========
NIS+ broke utterly, as rpc.nisd kept dying on startup with a corrupt
transaction log complaint.
ANSWER
======
It turns out that the very first suggestion, offered by
Nick Murray <nmurray@csd.abdn.ac.uk>
did the trick. The syslog was reporting a failed attempt to map a
table of size about 256Mb. Well, it turns out that there were
resource limits which prevented a virtual memory image of that size.
This *hadn't* been a problem under 2.5, but I noticed whilst combing
through SunSolve for patches that there was a 2.5 bug report about
rpc.nisd starting up despite bad transaction logs. Apparently, and
much to my annoyance, :) 2.5.1 fixed that.
Anyway, having rewritten the limits so that root is exempt (so sue
me---I never figured on anything needing *that* much memory...),
rpc.nisd started up.
ADD'L INFO
==========
I would like to thank several other people who responded, namely
Cecil Pang <cecilp@adonis.westel.com>
Virginia Coffindaffer <CoffindafferVirginia@wangfed.com>
(who also gets the prize for having the coolest name among
the respondants :)
Francis Liu <fxl@pulse.itd.uts.edu.au>
and
Stuart Kendrick <sbk@fhcrc.org>
They pointed out various other useful pieces of information.
Especially helpful was Cecil's reminder that the nis files are
`hole' sensitive and must be moved about with something like
ufsdump/ufsrestore in order to leave the files usable. In fact, at
the time I was mounting the pre-upgrade NIS+ directory over /var/nis
and this hadn't been a problem, but without the reminder, I probably
would have just used tar *after* I got the rest of the setup working
and would have broken things *again*. :)
The other responses included some good general information about how
NIS+ uses its files, and reminded me to checkpoint my server more
often...
PROBLEM #2
==========
After I got to the point where the server could access its own NIS+
tables, I was unable to get the client to authenticate the server,
hence the client couldn't get any of the NIS+ tables. The symptom,
of course, was the ever-loving `corrupt window' error.
Now, in tracking down a similar problem once before, I was pointed
(by this list) at xntpd. NIS+ authentication is heavily reliant on
timestamp checking to make sure requests are timely and such, so
unless you have software to synchronize the clocks of machines in an
NIS+ domain, using NIS+ is very hit or miss.
The previous trouble had been nowhere near as serious as preventing
the whole client machine from accessing the server; rather, then I'd
had intermittent problems with automounting. And the fact of the
matter was, I *was* running xntpd.
Or so I thought. Turns out that (I believe at the recommendation of
the xntpd install docs) under /etc/rc2.d xntpd was being started
later than RPC services. Therefore the time synch wasn't being
performed in time to allow authentication to happen. Just to add to
the subtlety of the problem, the clocks were off by just about a
second, which meant that a few times when I compared `date' outputs,
they came out the same. Anyway, I moved the xntpd start ahead of
rpc (and I also moved sshd ahead of rpc---that way, when the machine
hung at NIS+ startup, and didn't know any users, I could ssh in as
root from a remote machine), and the authentication problems went
away.
Thanks to
Kevin Davidson <tkld@cogsci.ed.ac.uk
for responding on this one.
PROBLEM #3
==========
Okay, near the end, now. This time around the client could get NIS+
tables, knew who the users were, etc., but hung on NFS mounting
anything from the server with an `RPC: Program not registered'
error. All I ended up doing was stopping and then restarting the
nfs.server script on the server. I don't know why this was
necessary, since both machines had been freshly booted with the, at
that point, current configuration just before. Thanks to
Mattias Zhabinskiy <mattias@txc.com>
John Justin Hough <john@oncology.uthscsa.edu>
Rasana Atreya <atreya@library.ucsf.edu>
Mike D. Kail <mdkail@fv.com>
Aline Runde <ARunde@mms.com>
Rick von Richter <rickv@mwh.com>
and
Jim Harmon <jim@telecnnct.com>
for helping with this last part.
(Rasana also pointed me towards the searchable list archives at
http://www.LaTech.edu/sunman-search.html
which is certainly nice to know.)
I do have a question related to this last topic, though. Everybody
mentioned the various daemons which need to be running on the NFS
server: statd, check; mountd, check; biod... I ain't got it. Did
this go away with 2.5.1? It is nowhere to be found, and it's lack
doesn't seem to be hindering me.
FINALLY
=======
There are two (2) remaining niggling little details which I haven't
yet worked out, and which I think relate to NIS+. First off, I use
the man_db package instead of Sun's man. man_db likes to have
important things owned by a user, called man. Then the man
directories and the man binary itself are setuid man. Well, when I
run man on the client, cassius, it hangs. Not on the server. man
is a local user (not an NIS+ user) and nsswitch is, indeed set to
check files first.
A not unrelated problem involves root on the client, which is
obviously not an NIS+ account. You see, the server exports
/var/mail to the client. This includes the root mail file. Well,
this worked fine before, but now mail (mailx/Mail) hangs when
invoked by root on the root mailbox.
In both cases this is a pretty serious hang, the kind that requires
killing the underlying shell and leaving a zombie. The logs don't
seem to indicate where the problem is, and I haven't gotten anything
out of the truss yet.
But anyway, these are now minor problems, and my system is by and
large usable again. (And there was much rejoicing.)
I'd just like to point out that, once again, this list proves
invaluable in its help, while our multi-kilobuck tech support contract
did exactly squat. Thanks again for all your help,
CJW
- --
**********************************************************************
/\ Colin J. Wynne Johns Hopkins University
(()) Dep't of Mathematical Sciences
/____\ ``Lunatic-at-Large'' E-Mail: cwynne@mts.jhu.edu
/______\
/________\ The cost of living hasn't affected its popularity.
**********************************************************************
-----BEGIN PGP SIGNATURE-----
Version: 2.6.2
Comment: http://www.mts.jhu.edu/~cwynne/
iQCVAwUBMwD1YXEHfObrVHptAQHxcwP/Y2Wb4IJAuosK7k7mnS3zYBeo3ACBjRWJ
eTSv9RX81zhcyUE+PR0U6s2mqCvXMruxpkm6c07yWJu8u+gyOhC0GUJxhonQTgTD
tQZOlDISKeOfbEI6vDZyzB2fc7jhpkUTQXo3XN2f7MROWYnI37S81X6uiOeFekfM
YQUYxEMJXAo=
=d+N3
-----END PGP SIGNATURE-----
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:11:46 CDT