And the winner is...
perw@holtec.se (Per Westerlund)
who suggested that it might be an interrupted write or bad blocks and that
an interrupted write was easier to fix than bad blocks so try that first.
First: Thanks to all who responded. I believe I have replied to
everyone, but the responders deserve public recognition.
barnett@unclejack.crd.ge.com (Bruce Barnett)
kensmith@cs.Buffalo.EDU (Ken Smith)
benny@doug.med.utah.edu (Benny yih x3144 MIRL)
"Matt Crawford" <matt@oddjob.uchicago.edu>
ultra!marke@ames.arc.nasa.gov (Marke Clinger)
Steve Simmons <scs@lokkur.dexter.mi.us>
mmsac!nova!ts@sacto.West.Sun.COM (Troy Schumaker)
dupuy@hudson.cs.columbia.edu (Alexander Dupuy)
proton!muon!baumann@ucrmath.ucr.edu (Michael Baumann)
bparent%sdcc20@ucsd.edu (Brian Parent)
Charles <mcgrew@dartagnan.rutgers.edu>
John Posey <posey@utdallas.edu>
ecn!bernards@relay.EU.net (Marcel Bernards)
perw@holtec.se (Per Westerlund)
valideast!boo!fxf@uunet.UU.NET (Frank Farmar)
trinkle@cs.purdue.edu
brian@ucsd.edu (Brian Kantor)
don@doug.med.utah.edu (Don Baune x6088 MIRL)
andys@ulysses.att.com
dws@EBay.Sun.COM (Dennis Sexton)
The Solution:
1) unmounted the filesystem, /dev/xd0d
2) did "icheck -b 179291 179362 /dev/rxd0d" to get the inodes
assocciated with those blocks.
3) did "ncheck -i inum /dev/rxd0d" on each inum from #2 to get the
filenames so that I could know what was scrod and restore them later.
4) ran format, choose analyze and the setup operation of that
5) in setup specified an absolute block range with no automatic
repair to cover the bad abs blocks 412106 & 412177
6) did read and test checks and indeed saw the errors
7) went back into setup and specified only the first abs blk (i.e.
412106) and went back and did a write to it (still in analyze)
8) a retest of the range as per #5&6 showed no errors on either block
9) exited format. the first fsck found a truncated inode and incorrect
block count on the one I wrote to and fixed that
10) full stop reboot went perfectly.
SWIFT SUMMARY of responses:
Most responses indicated bad blocks and that I would have to
slip/remap them using format (diag for 3.x systems). Also
indicated was using the repair option in the 4.0 format. For
controllers that don't support repair, I would have to add to the
defect list and then reformat. Alot of folks reminded me to do a
backup of the filesystem first (which I <of course> did).
Per Westerlund (the WINNER) suggested that this symptom also comes
from an interrupted write during which the ecc data gets screwed up.
Forcing a rewrite using format replaces that info.
Dan Trinkle posted the useful information about using icheck and
ncheck to find out what was in those blocks.
don@doug.utah.med.edu told that they had success mapping bad blocks
but that was a precursor to a full drive replacement they later had to
do. I don't think this applies to my case.
Thanks again everyone. I have the detailed replies saved and will
gladly mail a copy to anyone interested beyond the summary. Just drop
me a line.
Regards to all,
--bill selig@xanth.msfc.nasa.gov [128.158.1.31]
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:03:55 CDT