Greetings, and many apologies for the delays in posting this summary. I
HATE office moves...Only 2 more to go, though...:/
Thanks to the people who responded to my quest for information about format
and replacements. Unfortunately, the info was not particularly useful in this
context, but it was interesting.
(BTW: Noone had an answer for "What does format(1)->Repair (try to) do?",
so if anyone has an answer to that, let me know and I'll post a 2nd
summary...thanks...)
I ended up getting a replacement format program from a friend that enables
AWRE (Automatic Write Reallocation Enable) and ARRE (Automatic Read
Reallocation Enable) so that sector sparing is turned on and in the future,
I should be able to have any new bad blocks dynamically remapped...at least
assuming I'm willing to move the disk back over to the Sun4 that I formatted
it on, since the disk size problem is still architecture based...
UNFORTUNATELY, this program, as nice as it is, is NOT available to the
public, since the friend wrote it for his company and they maintain (most)
distribution rights. If he manages to get it released, I'll let you all know.
Thanks again.
JB
jailbait@intercon.com
---------------------
Original Question and Responses follow.
> Subject: Replacement for format?
> Greetings, all...
> and grumblings, too...
>
> So...by personal experience, and by digging in the s-m archives, I've
> determined that the format(1) that comes with SunOS (4.1.1 in this case,
> running on a 3/280) is more or less USELESS!
>
> A) It is unreliable...despite that you can run surface analysis from now
> till St. Swithins Day (a few days ago, if I remember right...:), it will
> still regularly fail to find bad blocks on your disk.
>
> B) If you happen to have a SCSI disk, format is unable to dynamically
> remap/reallocate those self same bad blocks that it was unable to find in
> part A, above.
>
> So my question is:
> Is there any replacement or replacements for format to do such low-level
> things as A) Surface Analysis Of Exacting Standards and B) Mapping Out
> Of Bad Blocks So You Don't Have To Backup And Reformat Your News Spool
> Disk When It Starts Throwing Disk Errors At You?
>
> Oh: And what does format->repair actually DO? and thus, what ISN'T it doing
> when it 'fail's?
>
> Thanks much, and any RAPID answers (before I start running format on said
> News Spool Disk which will only serve for some number of months before
> questionable blocks start showing up again) would be MOST appreciated.
>
> The hardware is, btw: a 3/280, running 4.1.1, with the UFS Jumbo Patch
> installed.
>
> Summary will, of course, follow...
>
> Thanks again,
> JB
> jailbait@intercon.com
>
----------
From: cindy@ddrsrv2.dny.rockwell.com (Cindy Yoho)
JB,
You might want to add this to your list of grumblings about format under 4.1.1:
It won't handle disks bigger than 1 GB, and very possibly corrupts their labels
if you try to even partition using it. There is a "patch" that fixes this,
which is basically just a new copy of the format executable, but I had to
get burned by it before I found out...
Cindy
-----------------
From: poehlem@img.wdc.com (Karl Poehleman (Poehlem) x52552)
Yes format is pretty useless, I've had sucess running say a hundred passes of
analyze afterward, to get the more gross errors, followed by running a home
brewed script called testdrive that writes data to and reads from the disk,
I've included that script.. There is 3rd party software out there, not sure
how muchg better it is than format.
Karl
Karl Poehleman
System Administration
Western Digital Corporation
Voice:414-335-2552
Email:poehlem@img.wdc.com
X-Sun-Data-Type: cshell-script
X-Sun-Data-Description: cshell-script
X-Sun-Data-Name: testdrive
X-Sun-Content-Lines: 193
#!/bin/csh -f
if ($#argv == 0) then
echo "Usage: testdrive <drivetype, drivenumber, times>"
exit 1
endif
echo "This test will destroy all data in your drive"
echo "are you sure you want to continue (y/n)? "
set Answer = $<
if ($Answer == "n") then
echo "Terminating...."
sleep 2
exit 1
endif
set DriveType = $argv[1]
set DriveNumber = $argv[2]
set NumberofPasses = $argv[3]
if ( -f /DriveErrorLog.$DriveType$DriveNumber) then
rm /DriveErrorLog.$DriveType$DriveNumber
endif
touch /DriveErrorLog.$DriveType$DriveNumber
if ( -f /DrivePartition.info) then
rm /DrivePartition.info
endif
touch /DrivePartition.info
set temp = `dkinfo $DriveType$DriveNumber"c" | grep cylinders`
set Cylinders = $temp[1]
@ Sectors = $temp[1] * $temp[3] * $temp[5]
@ DriveSize = $Sectors / 2
@ Partition = $DriveSize / 3
set SizeofUsr = `du -s /usr`
if ($SizeofUsr[1] > $Partition) then
goto UseRoot
else
goto UseUsr
endif
UseUsr:
if($Cylinders % 3 == 0) then
@ a = $Cylinders / 3
@ g = $Cylinders / 3
@ h = $Cylinders / 3
else
@ a = $Cylinders / 3
@ g = $Cylinders / 3
@ h = $Cylinders / 3 + $Cylinders % 3
endif
@ SectorsInCyl = $Sectors / $Cylinders
@ aBegin = 0
@ aEnd = $a * $SectorsInCyl
@ gBegin = $a
@ gEnd = $g * $SectorsInCyl
@ hBegin = $a + $g
@ hEnd = $h * $SectorsInCyl
echo part > DrivePartition.info
echo a >> DrivePartition.info
echo $aBegin >> DrivePartition.info
echo $aEnd >> DrivePartition.info
echo g >> DrivePartition.info
echo $gBegin >> DrivePartition.info
echo $gEnd >> DrivePartition.info
echo h >> DrivePartition.info
echo $hBegin >> DrivePartition.info
echo $hEnd >> DrivePartition.info
echo label >> DrivePartition.info
echo q >> DrivePartition.info
echo format >> DrivePartition.info
format -d $DriveType$DriveNumber -f DrivePartition.info
newfs /dev/r$DriveType$DriveNumber'a'
newfs /dev/r$DriveType$DriveNumber'g'
newfs /dev/r$DriveType$DriveNumber'h'
mount /dev/$DriveType$DriveNumber'a' /mnt
set Source = `df -t 4.2 | grep usr`
set Usr = $Source[1]
dump 0f - $Usr | (cd /mnt; restore rf -)
cd /
umount /mnt
# --------------------Beginning of test------------------------
set Count = 1
while ($Count <= $NumberofPasses)
echo "Starting pass $Count"
sleep 2
newfs /dev/r$DriveType$DriveNumber'g'
newfs /dev/r$DriveType$DriveNumber'h'
mount /dev/$DriveType$DriveNumber'g' /mnt
dump 0f - /dev/r$DriveType$DriveNumber'a' | (cd /mnt; restore rf -)
fsck /dev/r$DriveType$DriveNumber'g'
cd /
umount /mnt
mount /dev/$DriveType$DriveNumber'h' /mnt
dump 0f - /dev/r$DriveType$DriveNumber'g' | (cd /mnt; restore rf -)
fsck /dev/r$DriveType$DriveNumber'h'
umount /mnt
newfs /dev/r$DriveType$DriveNumber'a'
mount /dev/$DriveType$DriveNumber'a' /mnt
dump 0f - /dev/r$DriveType$DriveNumber'h' | (cd /mnt; restore rf -)
fsck /dev/r$DriveType$DriveNumber'a'
umount /mnt
@ Count++
end
exit 1
UseRoot:
if($Cylinders % 3 == 0) then
@ a = $Cylinders / 3
@ g = $Cylinders / 3
@ h = $Cylinders / 3
else
@ a = $Cylinders / 3
@ g = $Cylinders / 3
@ h = $Cylinders / 3 + $Cylinders % 3
endif
@ SectorsInCyl = $Sectors / $Cylinders
@ aBegin = 0
@ aEnd = $a * $SectorsInCyl
@ gBegin = $a
@ gEnd = $g * $SectorsInCyl
@ hBegin = $a + $g
@ hEnd = $h * $SectorsInCyl
echo part > DrivePartition.info
echo a >> DrivePartition.info
echo $aBegin >> DrivePartition.info
echo $aEnd >> DrivePartition.info
echo g >> DrivePartition.info
echo $gBegin >> DrivePartition.info
echo $gEnd >> DrivePartition.info
echo h >> DrivePartition.info
echo $hBegin >> DrivePartition.info
echo $hEnd >> DrivePartition.info
echo label >> DrivePartition.info
echo q >> DrivePartition.info
echo format >> DrivePartition.info
format -d $DriveType$DriveNumber -f DrivePartition.info
newfs /dev/r$DriveType$DriveNumber'a'
newfs /dev/r$DriveType$DriveNumber'g'
newfs /dev/r$DriveType$DriveNumber'h'
mount /dev/$DriveType$DriveNumber'a' /mnt
cd /
set Source = `df . | grep %`
set Root = $Source[1]
dump 0f - $Root | (cd /mnt; restore rf -)
cd /
umount /mnt
#---------------------Begin test using / "root"----------------------
set Count = 1
while ($Count <= $NumberofPasses)
echo "Starting pass '$Count'"
sleep 2
newfs /dev/r$DriveType$DriveNumber'g'
newfs /dev/r$DriveType$DriveNumber'h'
mount /dev/$DriveType$DriveNumber'g' /mnt
dump 0f - /dev/r$DriveType$DriveNumber'a' | (cd /mnt; restore rf -)
fsck /dev/r$DriveType$DriveNumber'g'
cd /
umount /mnt
mount /dev/$DriveType$DriveNumber'h' /mnt
dump 0f - /dev/r$DriveType$DriveNumber'g' | (cd /mnt; restore rf -)
fsck /dev/r$DriveType$DriveNumber'h'
umount /mnt
newfs /dev/r$DriveType$DriveNumber'a'
mount /dev/$DriveType$DriveNumber'a' /mnt
dump 0f - /dev/r$DriveType$DriveNumber'h' | (cd /mnt; restore rf -)
fsck /dev/r$DriveType$DriveNumber'a'
umount /mnt
@ Count++
end
-----------------
From: murphy!acmcr!vr@uunet.uu.net (Vicki Rosenzweig)
No answer to your problem--though I'd be interested if you hear of
one, since I've been blithely trusting SunOS format--but I think
St. Swithin's day is sometime in the summer.
Providing useless data at the drop of a hat,
Vicki Rosenzweig
Associate Editor, Computing Reviews
vr%acmcr.uucp@murphy.com
1-212-626-0666
-----------------
From: Dan Stromberg - OAC-CSG <strombrg@hydra.acs.uci.edu>
Eek! A sun3!
What you're seeing is:
1) detecting bad blocks isn't an easy job. I've heard that disk
vendors will often let a bad-block scan run for -weeks-. Some will
show up easily, but some aren't consistently a problem, and may only
flake out under certain strange conditions that aren't stressed by
your tester.
2) SunOS 4 and 5 use readahead - if you appear to be reading thru a
file and stop, it may well go ahead and bring another section of the
file, yet to come, into the buffer cache for later. So you may not
have read it yet (or so you think), but the OS really has. This
obfuscates error reporting for bad blocks. Work around (for this
issue): try scanning a few cylinders before, and a few cylinders
after, the block reported - with a read/write size of _1_ block.
You may still want to consider backing up before doing the analyze.
Mapping out random blocks can do horrible things to filesystems.
Recently I had a student map out a bad block in a root filesystem
while I was busy with something else - and the result was a filesystem
problem that caused panics in the boot sequence, prior to fsck!
Booting from external media and fsck'ing manually fixed it.
Dan Stromberg - OAC/CSG strombrg@uci.edu
-----------------
From: birger@morgan.vest.sdata.no (Birger A. Wathne)
Modern SCSI disks should hide their defects. On SCSI, format only gets
access to a virtual addressing layer. The disk is free to remap as it
chooses, so if the disk doesn't handle the error itself, you are out of
luck. This could mean that the drive's internal bad block list is full
(I have experienced this), or some other failure. Those options should
have been removed from format when working with newer SCSI drives...
Birger
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:21 CDT