SUMMARY: SUN 1.05Gb Disk Troubles (Media Error)

From: Michael R. Zika (zika@trinity.tamu.edu)
Date: Fri Apr 29 1994 - 01:38:32 CDT


  My original posting:

========================================================================
  Our Sparc-10, Model 512 is having some serious problems.
Configuration:

         Sparc-10, Model 512
         128Mb RAM
         2 Internal 1.05Gb Seagate Drives
         SunOS 5.3

I'm getting the following console message:

WARNING:
  /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@1,0 (sd1):
  Error for command 'write' Error Level:
  Fatal
  Requested Block 91120, Error Block: 91234 Apr 19 18:58:01 sedan.tamu.edu unix: Sense Key: Media Error
  Vendor 'SEAGATE': ASC = 0x12 (no addr mark), ASCQ = 0x0, FRU = 0xe8

I shut the machine down and rebooted. When the reboot process went
through the file system check, it threw me into a shell and
requested that I run fsck manually on the filesystem in question
(mounted as /scratch). I did this, receiving many, many warnings and
errors indicating inconsistencies in the inodes, etc., etc.

  After completing the manual fsck, the machine came back up without
any other noticable problems. However, each time I attempt to access
/scratch, I get the above errors again.

  I was under the impression that the manual fsck should correct this
problem -- was I mistaken, or is this an indication of a physical
anomaly on the disk? We're not concerned with recovering the data
on the drive (as the name indicates, it's just a scratch disk), but
I would like to get rid of the above error each time something is
written to the disk.
========================================================================

On the "fsck" command:

  A couple of respondents pointed out that "fsck" only corrects files
  in the file system. Since this is a hardware problem, there is no
  reason to expect fsck to fix it.

Things to check:

  Several of the respondents suggested that we run the non-destructive
  options under "analyze" in the format command to see if the drive
  could recover. We ran format/analyze/read and format/analyze/refresh
  to test the disk thoroughly. Sure enough, a couple of blocks showed
  up with "fatal" errors from this analysis.

  Since I was a bit pressed for time (and folks were breathing down
  my neck to get the machine back up), I then reformatted the entire
  drive. Interesting to note that format didn't complain about the
  bad blocks previously found, but when I ran "newfs" it couldn't
  allocate one of the blocks, so it skipped the entire sector (pardon
  me if the block/sector terminology is off -- I'm doing this from
  a somewhat faulty memory!). This got us up and running with a
  slightly smaller partition and no problems so far.

Solution:

  Most eloquently put by Joseph Mervini:

             REPLACE THE SUCKER!!!!!!
  
  Almost all the respondents indicated that this is a hardware problem
  and that the drive should be replaced immediately if not sooner.
  Since its still under warranty and now that I have a bit of time
  I'm going to start pestering Sun to send out the replacement drive.

--Michael Zika
  Nuclear Engineering
  Texas A&M University
  (zika@trinity.tamu.edu)

Many thanx to all the respondents:

  daniel@CANR.Hydro.Qc.CA (Daniel Hurtubise)
  Steve Elliott <se@computing.lancaster.ac.uk>
  peter.allan@aea.orgn.uk (Peter Allan)
  rae@nvg_troy.nvg.com (James Rae)
  "Ray W. Hiltbrand" <Ray.W.Hiltbrand@Eng.Auburn.EDU>
  sckhoo@emtds1.nsc.com (Swee-Chuan Khoo)
  jamervi@sandia.gov (1236 Joseph A. Mervini)
  mike@trdlnk.com (Michael Sullivan)



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:08:59 CDT