SUMMARY: Replacement for format?

From: Jailbait (jailbait@intercon.com)
Date: Sat Apr 01 1995 - 10:54:57 CST


Greetings, and many apologies for the delays in posting this summary. I
HATE office moves...Only 2 more to go, though...:/

Thanks to the people who responded to my quest for information about format
and replacements. Unfortunately, the info was not particularly useful in this
context, but it was interesting.

(BTW: Noone had an answer for "What does format(1)->Repair (try to) do?",
so if anyone has an answer to that, let me know and I'll post a 2nd
summary...thanks...)

I ended up getting a replacement format program from a friend that enables
AWRE (Automatic Write Reallocation Enable) and ARRE (Automatic Read
Reallocation Enable) so that sector sparing is turned on and in the future,
I should be able to have any new bad blocks dynamically remapped...at least
assuming I'm willing to move the disk back over to the Sun4 that I formatted
it on, since the disk size problem is still architecture based...

UNFORTUNATELY, this program, as nice as it is, is NOT available to the
public, since the friend wrote it for his company and they maintain (most)
distribution rights. If he manages to get it released, I'll let you all know.

Thanks again.

JB
jailbait@intercon.com
---------------------
Original Question and Responses follow.

> Subject: Replacement for format?
> Greetings, all...
> and grumblings, too...
>
> So...by personal experience, and by digging in the s-m archives, I've
> determined that the format(1) that comes with SunOS (4.1.1 in this case,
> running on a 3/280) is more or less USELESS!
>
> A) It is unreliable...despite that you can run surface analysis from now
> till St. Swithins Day (a few days ago, if I remember right...:), it will
> still regularly fail to find bad blocks on your disk.
>
> B) If you happen to have a SCSI disk, format is unable to dynamically
> remap/reallocate those self same bad blocks that it was unable to find in
> part A, above.
>
> So my question is:
> Is there any replacement or replacements for format to do such low-level
> things as A) Surface Analysis Of Exacting Standards and B) Mapping Out
> Of Bad Blocks So You Don't Have To Backup And Reformat Your News Spool
> Disk When It Starts Throwing Disk Errors At You?
>
> Oh: And what does format->repair actually DO? and thus, what ISN'T it doing
> when it 'fail's?
>
> Thanks much, and any RAPID answers (before I start running format on said
> News Spool Disk which will only serve for some number of months before
> questionable blocks start showing up again) would be MOST appreciated.
>
> The hardware is, btw: a 3/280, running 4.1.1, with the UFS Jumbo Patch
> installed.
>
> Summary will, of course, follow...
>
> Thanks again,
> JB
> jailbait@intercon.com
>
----------
From: cindy@ddrsrv2.dny.rockwell.com (Cindy Yoho)

JB,

You might want to add this to your list of grumblings about format under 4.1.1:

It won't handle disks bigger than 1 GB, and very possibly corrupts their labels
if you try to even partition using it. There is a "patch" that fixes this,
which is basically just a new copy of the format executable, but I had to
get burned by it before I found out...

Cindy
-----------------
From: poehlem@img.wdc.com (Karl Poehleman (Poehlem) x52552)

Yes format is pretty useless, I've had sucess running say a hundred passes of
analyze afterward, to get the more gross errors, followed by running a home
brewed script called testdrive that writes data to and reads from the disk,
I've included that script.. There is 3rd party software out there, not sure
how muchg better it is than format.

Karl

Karl Poehleman
System Administration
Western Digital Corporation
Voice:414-335-2552
Email:poehlem@img.wdc.com

X-Sun-Data-Type: cshell-script
X-Sun-Data-Description: cshell-script
X-Sun-Data-Name: testdrive
X-Sun-Content-Lines: 193

#!/bin/csh -f

if ($#argv == 0) then
        echo "Usage: testdrive <drivetype, drivenumber, times>"
        exit 1
endif

echo "This test will destroy all data in your drive"
echo "are you sure you want to continue (y/n)? "
set Answer = $<
if ($Answer == "n") then
   echo "Terminating...."
   sleep 2
   exit 1
endif

set DriveType = $argv[1]
set DriveNumber = $argv[2]
set NumberofPasses = $argv[3]
if ( -f /DriveErrorLog.$DriveType$DriveNumber) then
        rm /DriveErrorLog.$DriveType$DriveNumber
endif
touch /DriveErrorLog.$DriveType$DriveNumber
if ( -f /DrivePartition.info) then
        rm /DrivePartition.info
endif
touch /DrivePartition.info
set temp = `dkinfo $DriveType$DriveNumber"c" | grep cylinders`
set Cylinders = $temp[1]
@ Sectors = $temp[1] * $temp[3] * $temp[5]

@ DriveSize = $Sectors / 2
@ Partition = $DriveSize / 3
set SizeofUsr = `du -s /usr`
if ($SizeofUsr[1] > $Partition) then
   goto UseRoot
else
   goto UseUsr
endif

UseUsr:
       
        if($Cylinders % 3 == 0) then
                   @ a = $Cylinders / 3
                @ g = $Cylinders / 3
                @ h = $Cylinders / 3
        else
                @ a = $Cylinders / 3
                @ g = $Cylinders / 3
                @ h = $Cylinders / 3 + $Cylinders % 3
        endif

@ SectorsInCyl = $Sectors / $Cylinders
@ aBegin = 0
@ aEnd = $a * $SectorsInCyl
@ gBegin = $a
@ gEnd = $g * $SectorsInCyl
@ hBegin = $a + $g
@ hEnd = $h * $SectorsInCyl

echo part > DrivePartition.info
echo a >> DrivePartition.info
echo $aBegin >> DrivePartition.info
echo $aEnd >> DrivePartition.info
echo g >> DrivePartition.info
echo $gBegin >> DrivePartition.info
echo $gEnd >> DrivePartition.info
echo h >> DrivePartition.info
echo $hBegin >> DrivePartition.info
echo $hEnd >> DrivePartition.info
echo label >> DrivePartition.info
echo q >> DrivePartition.info
echo format >> DrivePartition.info

format -d $DriveType$DriveNumber -f DrivePartition.info

newfs /dev/r$DriveType$DriveNumber'a'
newfs /dev/r$DriveType$DriveNumber'g'
newfs /dev/r$DriveType$DriveNumber'h'

mount /dev/$DriveType$DriveNumber'a' /mnt

set Source = `df -t 4.2 | grep usr`
set Usr = $Source[1]

dump 0f - $Usr | (cd /mnt; restore rf -)
cd /
umount /mnt

# --------------------Beginning of test------------------------

set Count = 1
while ($Count <= $NumberofPasses)
        echo "Starting pass $Count"
        sleep 2

        newfs /dev/r$DriveType$DriveNumber'g'
        newfs /dev/r$DriveType$DriveNumber'h'
        mount /dev/$DriveType$DriveNumber'g' /mnt
        dump 0f - /dev/r$DriveType$DriveNumber'a' | (cd /mnt; restore rf -)
        fsck /dev/r$DriveType$DriveNumber'g'
        cd /
        umount /mnt
        mount /dev/$DriveType$DriveNumber'h' /mnt
        dump 0f - /dev/r$DriveType$DriveNumber'g' | (cd /mnt; restore rf -)
        fsck /dev/r$DriveType$DriveNumber'h'
        umount /mnt
        newfs /dev/r$DriveType$DriveNumber'a'
        mount /dev/$DriveType$DriveNumber'a' /mnt
        dump 0f - /dev/r$DriveType$DriveNumber'h' | (cd /mnt; restore rf -)
        fsck /dev/r$DriveType$DriveNumber'a'
        umount /mnt
        @ Count++
end
exit 1

UseRoot:

if($Cylinders % 3 == 0) then
                @ a = $Cylinders / 3
                @ g = $Cylinders / 3
                @ h = $Cylinders / 3
        else
                @ a = $Cylinders / 3
                @ g = $Cylinders / 3
                @ h = $Cylinders / 3 + $Cylinders % 3
        endif

@ SectorsInCyl = $Sectors / $Cylinders
@ aBegin = 0
@ aEnd = $a * $SectorsInCyl
@ gBegin = $a
@ gEnd = $g * $SectorsInCyl
@ hBegin = $a + $g
@ hEnd = $h * $SectorsInCyl

echo part > DrivePartition.info
echo a >> DrivePartition.info
echo $aBegin >> DrivePartition.info
echo $aEnd >> DrivePartition.info
echo g >> DrivePartition.info
echo $gBegin >> DrivePartition.info
echo $gEnd >> DrivePartition.info
echo h >> DrivePartition.info
echo $hBegin >> DrivePartition.info
echo $hEnd >> DrivePartition.info
echo label >> DrivePartition.info
echo q >> DrivePartition.info
echo format >> DrivePartition.info

format -d $DriveType$DriveNumber -f DrivePartition.info

newfs /dev/r$DriveType$DriveNumber'a'
newfs /dev/r$DriveType$DriveNumber'g'
newfs /dev/r$DriveType$DriveNumber'h'
        
mount /dev/$DriveType$DriveNumber'a' /mnt
cd /
set Source = `df . | grep %`
set Root = $Source[1]

dump 0f - $Root | (cd /mnt; restore rf -)
cd /
umount /mnt

#---------------------Begin test using / "root"----------------------

set Count = 1

while ($Count <= $NumberofPasses)
        echo "Starting pass '$Count'"
        sleep 2
        
        newfs /dev/r$DriveType$DriveNumber'g'
        newfs /dev/r$DriveType$DriveNumber'h'
        mount /dev/$DriveType$DriveNumber'g' /mnt
        dump 0f - /dev/r$DriveType$DriveNumber'a' | (cd /mnt; restore rf -)
        fsck /dev/r$DriveType$DriveNumber'g'
        cd /
        umount /mnt
        mount /dev/$DriveType$DriveNumber'h' /mnt
        dump 0f - /dev/r$DriveType$DriveNumber'g' | (cd /mnt; restore rf -)
        fsck /dev/r$DriveType$DriveNumber'h'
        umount /mnt
        newfs /dev/r$DriveType$DriveNumber'a'
        mount /dev/$DriveType$DriveNumber'a' /mnt
        dump 0f - /dev/r$DriveType$DriveNumber'h' | (cd /mnt; restore rf -)
        fsck /dev/r$DriveType$DriveNumber'a'
        umount /mnt

        @ Count++
end
-----------------

From: murphy!acmcr!vr@uunet.uu.net (Vicki Rosenzweig)

No answer to your problem--though I'd be interested if you hear of
one, since I've been blithely trusting SunOS format--but I think
St. Swithin's day is sometime in the summer.

Providing useless data at the drop of a hat,

Vicki Rosenzweig
Associate Editor, Computing Reviews
vr%acmcr.uucp@murphy.com
1-212-626-0666
-----------------
From: Dan Stromberg - OAC-CSG <strombrg@hydra.acs.uci.edu>

Eek! A sun3!

What you're seeing is:

1) detecting bad blocks isn't an easy job. I've heard that disk
vendors will often let a bad-block scan run for -weeks-. Some will
show up easily, but some aren't consistently a problem, and may only
flake out under certain strange conditions that aren't stressed by
your tester.

2) SunOS 4 and 5 use readahead - if you appear to be reading thru a
file and stop, it may well go ahead and bring another section of the
file, yet to come, into the buffer cache for later. So you may not
have read it yet (or so you think), but the OS really has. This
obfuscates error reporting for bad blocks. Work around (for this
issue): try scanning a few cylinders before, and a few cylinders
after, the block reported - with a read/write size of _1_ block.

You may still want to consider backing up before doing the analyze.
Mapping out random blocks can do horrible things to filesystems.
Recently I had a student map out a bad block in a root filesystem
while I was busy with something else - and the result was a filesystem
problem that caused panics in the boot sequence, prior to fsck!
Booting from external media and fsck'ing manually fixed it.

Dan Stromberg - OAC/CSG strombrg@uci.edu
-----------------
From: birger@morgan.vest.sdata.no (Birger A. Wathne)

Modern SCSI disks should hide their defects. On SCSI, format only gets
access to a virtual addressing layer. The disk is free to remap as it
chooses, so if the disk doesn't handle the error itself, you are out of
luck. This could mean that the drive's internal bad block list is full
(I have experienced this), or some other failure. Those options should
have been removed from format when working with newer SCSI drives...

Birger



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:21 CDT