My original posting:
: Dear sunmanagers.
: We're experiencing some very scary errors on one of
: our disks: sometimes when we (for one or other reason)
: reboots the system, we are forced to run fsck to
: repair inconsistencies in the filesystem when it
: comes up.
: The inconsistencies are bad reference counts and the
: like. But after doing fsck, and coming up again, we
: often find that files have swapped names ! The other
: day it was critical, as the /vmunix-file had been
: replaced by 8 Kbytes of something that definitely
: was no kernel.
: Normally it is on the same disk, indicating that it
: could be a faulty disk, perhaps suffering from old
: age (it's heavily used for more than 3 years now).
: Still, running format/analyze/read doesn't report
: So, my 1. Q:
: Shouldn't format/analyze/read report errors
: if the disk is faulty. Or should I use the
: format/analyze/test ??
: 2: This is perhaps the most interesting question:
: could it be anything but a faulty disk ??????
: I have the feeling, that the errors are more,
: if we have not done a sync before booting: ie.
: booting just by using 'Stop a', not sync'ing,
: results in more errors.....but I'm not sure.
: 3: Other suggestions ??
: We're considering two things now: reformatting the disk,
: which is rather cumbersome, or bying a new disk, which
: is expensive.
: System: SUN Server 470, running SUNOS 4.1.3.
: Disk: SUN0669.
Now, the answers I got can be summarized as follows:
a: Doing an ungracefull halt will definitely result in lost
chains, and incorrect references. Using 'stop a' is a BAD
idea, as one advisor puts it. Still, fsck should be able
to correct errors introduces this way '9 times out of 10'.
b: D.Mitchell@dcs.shef.ac.uk suggested me to use the command
'dd if=/dev/rsdxn of=/dev/null bs=64k conv=noerr' to check
the disk. I have done that for each partition on the disk,
and it did not report any errors.
c: The format/analyze/*-test will not always find errors. The
most thorough way to check the disk in that program is using
the purge-option. I had considered that, but hesitated, as it
destroys the entire content on the disk (which is as stated
In this context, email@example.com informed me, that some
errors on the disk may be considered 'repairable' by format,
but that format will pause about half a second when finding
such a block. So I guess the message is, that by analyzing
the disk, closely following the reports from the format-
program, one can notice if and where (in which block) it
pauses, then limit the analysis to the neighbouring blocks.
If the pausing is there everytime, one can manualy add the
block to the defect-list.
epl@Kodak.COM writes very badly about format/analyze, even
calling it pure trash. It would be interesting to get some
comments on that ! Anyway, he's not alone to critize it.
d: firstname.lastname@example.org suggested me to evaluate the SCSI-con-
troller by simply replacing it with another controller, and
then see if the problems die out. He experienced similar
problems and this solved his problems. email@example.com
suggested to check the SCSI-cables, or if possible just
e: One suggested overlapping partitions. The disk that causes
the problems doesn't have overlapping partitions, so I rule
f: Many emphasized that I should not buy a disk before I have
checked the current disk, controllers, cables and whatever
else thoroughly. Don't act upon mere suspicion, but get proof.
I'll now do the following:
1: Change habbits when it comes to booting: making sure that all
users log off before we shutdown, and be sure to do filesystem
2: Check if I have all relevant patches to the OS, and if not, get
and install them.
3: If that doesn't help: try to change SCSI-controller, I don't
know about the cabling, as it's an internal disk. But if there
are cables that might be replaced I'll try that as well.
4: If that doesn't help: Do a complete reformatting of the disk.
5: If that doesn't help: Buy a new disk.
Thanks a lot you guys and girls:
Morten Krabbe Barfoed
Danish Space Research Institute phone: +45 42 88 22 77 (switch-board)
Gl. Lundtoftevej 7 phone: +45 45 87 40 77 - 161 (direct)
DK 2800 Lyngby FAX: +45 45 93 02 83
Denmark TELEX: 37 198
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:09:05 CDT