Sun Managers,
Thanks to everyone who replied with suggestions and advice.
Special thanks to Gary, who replied with the solution, as follows:
----------------------------------------------------------------------
From: Gary Mills <mills@cc.UManitoba.CA>
There is a known problem with locking on diskless workstations which
may be affecting you. Apparently, there is a mount option `llock',
which I assume means `local locking'. It should be set for nfs-mounted
filesystems that are not shared with other clients, such as the /var
partition for a diskless workstation. The /etc/init.d/standardmounts
sets it for the / partition, but this is not sufficient if you have a
separate /var.
----------------------------------------------------------------------
I have simply added the llock option to the entry in /etc/vfstab for each
diskless workstation, and this was indeed sufficient to alleviate the
problem, since no longer are nfs locks propagated to the server for
/var/spool/lp/SCHEDLOCK.
I remain, though, a little perturbed, since while this solves my problem,
it just works around the actual problem, which lies with nfs locking for
diskless workstations. I am left wondering whether there are other parts
of Solaris (other than the lp system) or my application environment, which
are also broken due to this nfs locking problem! Oh well.
By the way, Sun maintenance folks failed completely on this one! So what
else is new!!!!
----My original post---->
>I have a network with a large number of diskless workstations (I know, you
>are asking why? The answer is related to security issues!) and an SS20
>server running Solaris 2.4 101945-36. Upon server reboot (whether intended
>or by crash) I have severe difficulties recovering the workstations with
>their print services, or in rebooting those workstations.
>
>First Problem: For some reason, the workstations do not successfully
>recover the lock held by lpsched on /var/spool/lp/SCHEDLOCK after server
>reboot. Without this lock being recovered, all invocations of lpstat or
>lpshut get screwed up because by being able to take the lock themselves,
>they incorrectly conclude that print services are not running (Even though
>the now lockless lpsched and the associated lpNet are still running!) It
>is thus necessary to kill the copies of lpsched and lpNet to recover. This
>is beyond the ken of the average user, who simply elects to reboot the
>workstation whenever the server goes down.
>
>And this brings the Second Problem into consideration. In the course of
>rebooting, the workstations run lpsched from within /etc/rc2.d/S80lp, and
>lpsched attemps to take a lock on /var/spool/lp which is on a server NFS
>mounted volume. This lock then ends up as an NFS lock and statd on the
>server gets involved. This results in the workstation boot hanging,
>because statd is occupied in an attempt (apparently futile though, see
>First Problem) to renegotiate the locks previously held by the workstations
>(SCHEDLOCK among them)). Unfortunately this process takes forever, and
>commonly if any one workstation remains uncontactable (that is, someone
>stops it and doesn't reboot it) ALL other workstations appear to hang in
>this fashion.
>
>These problems don't seem very nice, and seem to be the result of
>fundamental problems with the Solaris environment. Is this a correct
>conclusion, or am I missing something fundamental here? Does anyone know
>of any workarounds for these problems. My long suffering users would
>simply like a system that more easily recovers from reboots/crashes without
>requiring so much intensive interaction and nursing by a systems
>administrator!
>
>Thank you, and a summary, off course, will follow.
>
>Len Whyte
>AWA Defence Industries
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:59 CDT