Harddisk failure, summary of answer

From: Jacques Gelinas (jack@solucor.uucp)
Date: Thu Jan 23 1992 - 08:20:59 CST


I posted few days ago about my hard disk (SCSI 1.2G Seagate) failing
with a long extract of /var/adm/messages. Here are the answers and
suggestion I got from the net. Only one fellow suggest to reinstall
from scratch. In the mean time, the disk has failed twice, but the system
has continued to operate anyway. I have realised that the disk
is connected on a different circuit (110vac) than the computer, and that
the disk share the circuit with a FRIDGE (with no beer and cola in it,
what a useless fridge). Now I have put the disk on the same circuit with
the computer, and I am waiting for judgement day, making the best backup
I can.

From: Strom Beadle <cs.sfu.ca!strom>

I have had the same problem with my Seagate 660 Meg SCSI disk. I believe it
may be a characteristic of Seagate, since I have never seen that error
with respect to another manufacturer's drive. Please forward any information
you receive to me regarding this problem.

Thank you.

Strom

Return-Path: <jim@ferkel.ucsb.edu>
From: Jim Lick <ferkel.ucsb.edu!jim>

It might not be a bad disk. We were having persistent crashes
with a panic from an iinactive having trouble freeing an inode.
It had the same sort of error messages from the disk right before
the panic messages. Finally I tracked down that this was a bug
in the OS, and not a bad disk. I got a patch from an ftp site,
installed it and we've been running fine ever since. Try
calling up Sun to see if this is a known bug. If it is, they
will send you a patch for it.

                            Jim Lick
Work: University of California | Play: 6657 El Colegio #24
      Santa Barbara | Isla Vista, CA 93117-4280
      Dept. of Mechanical Engr. | (805) 968-0189 voice/msg
      2311 Engr II Building | "Don't be naive,
      (805) 893-4113 | Don't deny what you see"
      jim@ferkel.ucsb.edu | -Erasure

From: "Jon Mellott" <jon@delta.ee.ufl.edu>

Dump everything, then do a surface analysis, then reformat and
restore from the dump tapes. These things can happen with hard
disks...

Jon Mellott

From: rthau@qualix.com (Rick Thau)

        Rick suggested to get a copy a speedstor for SUNOS. He said that
        this product really speed up the process of scan/formatting
        the disk.

        To get more information:
                mail to info@qualix.com or call 800-245-UNIX
                by the way, the product sells for 150.

From: larry%cchtor.UUCP@Larry.McRCIM.McGill.EDU (Larry Chin)

hello jack,

zs0 is the serial i/o port on the cpu board and zs3 is the
second port on the first SCSI board in the system. the error message:
        Jan 20 21:23:42 solucor vmunix: zs0: silo overflow
        Jan 20 21:23:42 solucor vmunix: zs3: silo overflow

seems to indicate that the buffers for the two ports overflowed.

The other part of the message of interest is:
        Jan 20 21:23:42 solucor vmunix: panic: ialloc: dup alloc

indicates that there was a duplicate inode allocation detected by the
ialloc routine.

The fact that the disk is working now would seem to indicate that the
problem has been fixed and that all is well. I don't think that your
hard disk is about to fail. I have had messages like this and the
disks are still functioning ( have been now for about 3-4 years ).
Best thing that you could do is just maintain regular backups in the
event that a disk does die then it is just a simple matter of restoring
to a new disk.

Hope this helps.

Larry Chin {larry@cch.com} CCH Canadian Ltd.
System Administrator 6 Garamond Court
Research and Development Don Mills, Ontario.
(416) 441-4001 ext. 349 M3C 1Z5

These are my opinions, MINE YOU HEAR, MINE !!!

--------------------------------------------------

Well this is it. I have my finger crossed.

Thanks to all.

-- 

-------------------------------------------------------- Jacques Gelinas (jack@solucor.uucp) Today it's my opinion



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:34 CDT