SUMMARY: File corruption

From: G.ROBERTSON@aberdeen.ac.uk
Date: Tue Dec 22 1992 - 13:37:46 CST


Sorry about the long delay in posting this. And for not getting details of
SunOS version/hardware/patches etc into the first posting.

Nevertheless, I believe the cause of my problem has been identified.

Many replies suggested hard mounts and/or the NFS Jumbo patch. Some suggested
NFS checksumms should be on to get a better handle on the problem.

Bill Shorter
bill@aloft.att.com

Gave a spell which will do this. Appended hereto.

Three people described sysmptoms so similar to mine I believe they had the
answer. All were CPU problems.

kwthomas@nsslsun.nssl.uoknor.edu (Kevin W. Thomas)
 doug@perry.berkeley.edu (Doug Neuhauser)
tlr@toy.rad.msu.edu ( Terry Rosenbaum )

The CPU in question is due to become "upgrade residue" on 5th Jan. so I'm
going to live with it `till then (It's our internal departmental server).

Many thanks, and Seasons Greetings to all who replied...

jdavis@noao.edu (Jim Davis)
jaa101@barton.anu.edu.au (James Ashton)
ems@ccrl.nj.nec.com (Ed Strong)
Perry_Hutchison.Portland@xerox.com
 morrow@cns.ucalgary.ca (Bill Morrow)
glenn%upstage%ups@fourx.Aus.Sun.COM (Glenn Satchell)
poul <poul@nilu.no>
sitongia@ozzel.hao.ucar.edu (HAO Computer System Managment Group)
era@niwot.scd.ucar.EDU (Ed Arnold)
gwolsk@sei.com (Guntram Wolski)
kevin%optim1%melb%fourx%ups@fourx.Aus.Sun.COM (Kevin Sheehan
(Marcel Bernards)" <bernards@ECN.nl>
 Dave Mitchell <D.Mitchell@dcs.shef.ac.uk>
"Prof. J.H. Davenport" <J.H.Davenport@maths.bath.ac.uk>
 tommy@boole.att.com
gunn%woden%ldavis@snowbird.Central.Sun.COM (David Gunn)
 kwthomas@nsslsun.nssl.uoknor.edu (Kevin W. Thomas)
bill@aloft.att.com
doug@perry.berkeley.edu (Doug Neuhauser)
.....

The original postings..

From: G.ROBERTSON@aberdeen.ac.uk
Reply-To: G.ROBERTSON@aberdeen.ac.uk
Followup-To: junk
Date: Wed, 18 Nov 92 15:23:59 GMT
Message-Id: <A9211181524.AA07690@uk.ac.aberdeen.sysa>
Received: from cc1.AUCC by aberdeen.ac.uk; Wed, 18 Nov 92 15:24:00 GMT
To: sun-managers@eecs.nwu.edu
Subject: File corruption
Sender: sun-managers-request@nsfnet-relay.ac.uk
Content-Length: 999
X-Lines: 28
Status: RO

We are experiencing files being corrupted when copied between local and NFS-
mounted partitions. This mostly seems to happen to VERY large files. Or at
least its reproducible with very large files. A puzzle is that it seems to be
sensitive to some combination of

host on which the 'cp' runs (process host)
source host
destination host

Three hosts sysa, sysb and aucc are involved.

aucc%cp aucc-file sysa-file FAILS
aucc%cp aucc-file sysb-file FAILS
aucc%cp sysa-file aucc-file OK
sysa%cp aucc-file sysa-file OK
sysb%cp sysb-file sysa-file OK
sysa%cp sysa-file sysa-file OK

and that's all the combinations we've so far tried. The failures are persistant:
with an 80meg file we get 5 or 6 corruptions, always involving groups of 22
or 38 bytes being overwritten. An example of the data which appears is

              0000 E8668203 FF9A0000 E8D48103 FA7C0000
          EE5E8203 FF9A0000 EECC8103 FA7C0000 F4568203
 
Any suggestions anyone? This is currently scuppering our "news" feed.

From: bill@aloft.att.com
Received: by aloft (4.1/DCS-aloft-103192) id AA18488;
          Tue, 24 Nov 92 07:36:30 EST
Date: Tue, 24 Nov 92 07:36:30 EST
Original-From: aloft!bill (B. Shorter)
Message-Id: <9211241236.AA18488@aloft>
To: G.ROBERTSON%aberdeen.ac.uk@nsfnet-relay.ac.uk
Subject: NFS Related corruption
Content-Length: 534
X-Lines: 19
Status: RO

To make NFS more solid, turn on udp checksumming. I have experienced
corruption of one bit in 10 megabytes, due to a faulty Proteon router.

In "/usr/kvm/sys/netinet/in_proto.c" change the line below:

int udp_cksum = 0; /* turn on to check & generate udp checksums */

to be:

int udp_cksum = 1; /* turn on to check & generate udp checksums */

Build a new kernel for ALL machines and reboot. This procedure does
work fine, but it adds overhead to your NFS operations.

Bill Shorter
bill@aloft.att.com



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:55 CDT