My original posting:
========================================================================
Our Sparc-10, Model 512 is having some serious problems.
Configuration:
Sparc-10, Model 512
128Mb RAM
2 Internal 1.05Gb Seagate Drives
SunOS 5.3
I'm getting the following console message:
WARNING:
/iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@1,0 (sd1):
Error for command 'write' Error Level:
Fatal
Requested Block 91120, Error Block: 91234 Apr 19 18:58:01 sedan.tamu.edu unix: Sense Key: Media Error
Vendor 'SEAGATE': ASC = 0x12 (no addr mark), ASCQ = 0x0, FRU = 0xe8
I shut the machine down and rebooted. When the reboot process went
through the file system check, it threw me into a shell and
requested that I run fsck manually on the filesystem in question
(mounted as /scratch). I did this, receiving many, many warnings and
errors indicating inconsistencies in the inodes, etc., etc.
After completing the manual fsck, the machine came back up without
any other noticable problems. However, each time I attempt to access
/scratch, I get the above errors again.
I was under the impression that the manual fsck should correct this
problem -- was I mistaken, or is this an indication of a physical
anomaly on the disk? We're not concerned with recovering the data
on the drive (as the name indicates, it's just a scratch disk), but
I would like to get rid of the above error each time something is
written to the disk.
========================================================================
On the "fsck" command:
A couple of respondents pointed out that "fsck" only corrects files
in the file system. Since this is a hardware problem, there is no
reason to expect fsck to fix it.
Things to check:
Several of the respondents suggested that we run the non-destructive
options under "analyze" in the format command to see if the drive
could recover. We ran format/analyze/read and format/analyze/refresh
to test the disk thoroughly. Sure enough, a couple of blocks showed
up with "fatal" errors from this analysis.
Since I was a bit pressed for time (and folks were breathing down
my neck to get the machine back up), I then reformatted the entire
drive. Interesting to note that format didn't complain about the
bad blocks previously found, but when I ran "newfs" it couldn't
allocate one of the blocks, so it skipped the entire sector (pardon
me if the block/sector terminology is off -- I'm doing this from
a somewhat faulty memory!). This got us up and running with a
slightly smaller partition and no problems so far.
Solution:
Most eloquently put by Joseph Mervini:
REPLACE THE SUCKER!!!!!!
Almost all the respondents indicated that this is a hardware problem
and that the drive should be replaced immediately if not sooner.
Since its still under warranty and now that I have a bit of time
I'm going to start pestering Sun to send out the replacement drive.
--Michael Zika
Nuclear Engineering
Texas A&M University
(zika@trinity.tamu.edu)
Many thanx to all the respondents:
daniel@CANR.Hydro.Qc.CA (Daniel Hurtubise)
Steve Elliott <se@computing.lancaster.ac.uk>
peter.allan@aea.orgn.uk (Peter Allan)
rae@nvg_troy.nvg.com (James Rae)
"Ray W. Hiltbrand" <Ray.W.Hiltbrand@Eng.Auburn.EDU>
sckhoo@emtds1.nsc.com (Swee-Chuan Khoo)
jamervi@sandia.gov (1236 Joseph A. Mervini)
mike@trdlnk.com (Michael Sullivan)
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:08:59 CDT