Problems with lockd -- SUMMARY

From: Leslie S Steckel (syslss@csc.albany.edu)
Date: Wed Feb 19 1992 - 00:35:59 CST


Dear Sun managers,

Some time back I posted the following message to the Sun managers mailing
list. Below is a summary of how I solved the problem. Thanks to all
who responded! (Sorry it took me so long to summarize).

>We are getting repeated error messages that appear to be coming from lockd.
>The error messages (which come in pairs) are:
>
> fcntl: Stale NFS file handle
> rpc.lockd: unable to do cnvt
>
>Our configuration is as follows:
>
> 1 Sparc Server 330 running as YP server and primary file server
> 10 Sparc stations (including SLC's, SS1's, SS1+'s, SS2's, IPC's)
> 16 Ultrix workstations (including a DEC 5000 and DEC 3100's)
>
>The Suns are running SunOS 4.1.1 and the DEC machines are running Ultrix 4.2.
>I do not believe the Ultrix workstations are part of our problem but
>nevertheless, I've included them in our configuration description.
>
>These errors are occurring on the console of the Sparc Server 330 at a rate of
>about 4 per minute. Outside of these errors, we are seeing no other problems.
>A "df" on the Sparc Server 330 shows no stale NFS handles. In addition,
>a "df" on the remaining workstations show no stale NFS handles. Rebooting did
>NOT fix the problem.
>
>These errors began occurring for the first time immediately after our last
>reboot. The purpose of the reboot was to make a few changes to the file
>system structure while the machine was in single user mode. So, the errors
>could be the result of a silly mistake or they could be completely
>coincidental. These are the changes that we made to the file system structure:
>
> - shifted /usr/tmp to a bigger partition (I have checked permissions on
> this directory. They are 1777)
> - shifted /var/spool/news to another partition

SUMMARY:

Some people asked if any of our smaller workstations were diskless or dataless.
The answer is no. Each machine boots and swaps locally.

I tried many of the recommendations made by other Sun managers including
killing rpc.statd, killing rpc.lockd, cleaning /etc/sm and /etc/sm.bak,
and restarting the two daemons. Nothing I did on the SparcServer 330 or the
smaller Sparc Workstations made a difference.

Simply rebooting the DEC 5000 (running Ultrix 4.2) is what finally stopped the
errors on the console of the SparcServer 300. The DEC 5000 is the file server
for all of the DEC 3100 workstations and depends on the SparcServer 330 for
users files, mail files and YP info. The strange thing is that this
particular machine does not remote mount ANY directory trees that we shifted
on the SparcServer 330. In addition, we have shut the SparcServer 330 down
several times before without shutting the DEC 5000 down and never had a
problem. It was just this one particular time. In the future, I plan to
shut the DEC 5000 down each time I shut the SparcSever 330 since this machine
can't so much without its YP master (we haven't configured any YP slaves yet).

Thanks again to those who responded:

katkam@fuwutai.att.com
holle@ASC.SLB.COM
dal@gcm.com
paul.brandariz@kla.com
poffen@sj.ATE.SLB.COM

---
Leslie Steckel				internet:  syslss@csc.albany.edu
UNIX System Programmer			bitnet: syslss@albnyvms
State Univ of NY at Albany		(518) 442-3844
Computing Services Center CS-0	 
1400 Washington Ave
Albany,  NY 12222



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:36 CDT