Hi;
I think the problem is under control now. Here is a summary of the
help I recieved.
ORIGINAL PROB ::
---------------
>I have an NFS problem thats been going on for a few weeks. It only happens at
>night when I get my news feed(via uucp). The news is stored on an NFS mounted
>disk. The FAQ talks about some of the errors I see but I've been scrounging
>around and I can't seem to shed any light on the problem. I'm hoping someone can
>point me in the right direction. Here are a sample of the errors. (I can't
>seem to make it happen during the day even when I force a news feed) It is
>getting worse by the day.(more errors.)
>
>ERRORS::
>NFS lookup failed for server eng4: RPC: Timed out
>NFS write failed for server eng4: RPC: Timed out
>NFS write error 60 on host eng4 fh 708 1 a0000 4b60e 47b3297 a0000 2 39458a8c
>NFS lookup failed for server eng4: RPC: Timed out
>NFS lookup failed for server eng4: RPC: Timed out
-------------------------------------------------------------------------------
Hal Stern seems to have hit the solution::
>(a) you're using soft NFS mounts on a writeable filesystem. bad idea.
> you can corrupt your data that way. make sure you hard mount
> all writeable filesystems.
>
I did have the drive in question soft mounting. I immediately fixed this.
>(b) you're getting timeouts -- it's quite possible that the NFS server
> can't handle the load placed on it by receiving news. are
> you backing up the server (or running something out of cron)
> at the same time you get the news feed? run "vmstat 30"
> on the server overnight and see what kind of load is on it.
> the actual files probably aren't important -- the timeouts are
> caused by network or server delays and probably aren't due to
> a particular file being available (or "used").
Turns out the errors correspond exactly to the times when I have cron on both
machines firing off very intensive disk I/O jobs. I am changing this now!
He further gave me info about showfh after I mentioned I had been getting
RPC not registered messages when I tried to track down the write errors with
showfh ::
>showfh talks to rpc.showfhd. you need to start it on the server
>before using showfh. also, if you're running 4.1.1, make sure
>you install the patch for showfh that keeps it from timing out.
>showfh can run for several minutes before producing an answer.
**** (I am running 4.1.2 but looks like I need the patch.)
----------------------------------------------------------------------------
OTHER POSS::
------------
Jackie Carlson and Hans Baumeister both had problems that were similar
and found it to be Hardware. A faulty repeater in one case and a faulty
transeiver cable in the other.
I'm having it checked as a precaution.
---------------------------------------------------------------------------
Many thanks to ::
carlson@betty-jo.egr.msu.edu (Jackie Carlson)
baumeist@vsun04.ag01.Kodak.com (Hans Baumeister)
stern@sunne.East.Sun.COM (Hal Stern - NE Area Systems Engineer)
Hope this is useful!
-Jim
-------------------------------------------------------------------------------
Jim Murff (murff@irt.com) Voice # (619)622-8878
IRT Corp, San Diego, CA. (619)450-4343
Applications Engineer, System Admin. Fax # (619)622-8888
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:54 CDT