SUMMARY: SCSI Bus/Disk Problem

From: Terence P. Ma (tpm-sprl!tpm@uunet.uu.net)
Date: Wed May 08 1991 - 17:30:06 CDT


Subject: SUMMARY: Disk/Bus SCSI Errors

I wrote:

> ...
> SunOS4.1.1 has the disks running synchronously such that speeds that I am
> getting is a maximum transmit rate of 4.167 mb/s on the two Seagates. The
> Quantum is posted (upon boot) to give me a maximum transmit rate of 3.572
> mb/s. However, my understanding is that the Quantum drive is not really
> running synchronously, but running asychronously with a cache of some sort
> so that it pretends it is a synchronous device to the S-bus.
>
> This, unfortunately, is a problem. The reason is that I am getting SCSI
> errors ...
> ...
> QUESTIONS:
>
> 1) Has anyone else experienced this, and if so, what did you do?
> 2) Is there any reason to shorten the SCSI cables even more?
> 3) Any idea what my problem is?
> 4) Do I have any alternatives with respect to the drive (should I change it
> to a Synchronous drive?
> 5) Any advice?
> ...

The basic comment/suggestions (from what I got on this post and from my
conversations with service folks) are that:

1) The SCSI cable needs to be very short (no more than 0.5 meters).
2) Patch the kernel to run only asychronous.
3) Change and make sure I have top quality cables.
4) Play with the terminators.
5) Get a synchronous drive.

It seems to me that this is something that Sun Corp. might want to look at
because I think it is a relatively common problem showing up in different
forms. Below are excepts from the emails I received. Thanks to all (listed
alphabetically, and special thanks to Mark Seiden who helped me think more
clearly about this problem and for his comments on his experiences with this.

Konradin Stoehr <uunet!rzsun2.Rechenzentrum.Uni-Augsburg.DE!cronos>
uunet!ecn.purdue.edu!curt (Curt Freeland)
uunet!lsr-vax!art (Art Hays - PSTAFF)
Gerald Justice <uunet!dao.nrc.ca!justice>
uunet!seiden.com!mis (Mark Seiden)
uunet!calvin.doc.ca!andrew (Andrew Patrick)

<----------------------- BEGIN INCLUDED COMMENTS ----------------------->

From: Konradin Stoehr <uunet!rzsun2.Rechenzentrum.Uni-Augsburg.DE!cronos>

I think that your problems are related to the fact that the quantum
drive is running async. I don't remember whether you have an Exabyte
connected to your SCSI-bus, but if you have, then this is a possible
caller-for-troubles, too.
Sync SCSI does need a *very* clean operating environment. Some polder
devices are definitely causing troubles. The Exabyte 8200 is not
expected to run on Sync SCSI. A friend of mine at Oldenburg University
mentions that the total cable lenght should not exceed half a meter to
keep transmission problems low.
I suggest that you try swapping the Quantum for a synchronous drive(such
as the Maxtor LX200) or try disabling the syncsync scsi option in the
kernel (/uisr/sys/scsi/conf_data.c or something like that is the file
(sorry I don't remember) ... Nope, it's
/usr/sys/scsi/conf/scsi_confdata.c ..

From: uunet!ecn.purdue.edu!curt (Curt Freeland)

My best guess is that this means that the drive cache is giving you trouble.
I know the Fujitsu M2266 (1 G) and some other drives have been having
trouble on Suns when the cache was enabled. You might contact Quantum,
and see what the error code 0x90 means. On Fujitsu drives, it is an error
code 0x44 (I believe) that signifys the cache error. With the Fujitsu
drives, they seem to work fine if you disable the cache. Fujitsu is working
on a new version of firmware that will work on Suns. Maybe Quantum needs
to do the same thing?

Another thing you might try is to patch the kernel so that you are running
asynchronous SCSI on all drives. I do not remember the magic incantation
to do this, but it is possible. Maybe the fact that the Quantum is actually
an asynch drive is part of the problem.

From: uunet!lsr-vax!art (Art Hays - PSTAFF)

        To eliminate cable type problems as a culprit, you can verify that
you have non PVC insulated cables (the best is polyolefin) and that you
have an active terminator. There have been recent articles in EE Times
and Electronic Design about eliminating cabling/termination problems
on SCSI buses. If you do it right, you can have very long SCSI buses.

From: Gerald Justice <uunet!dao.nrc.ca!justice>

I had a similar problem with 4.0.3c and 4.1 (so the disks were running
asynchronously) and the solution was achieved by replacing the cables
with shorter and higher-quality cables. I take the "reset" error message
as a very generic one (several people have been posting similar reports
like yours and mine) which means simply that something is not right
electrically on the SCSI bus, therefore you must be sure about the
cables, internal and external, and the termination. Presumably a
disk controller could also cause this type of problem. Don't pay any
attention to the specific unit referenced in the error message, I always
got references to the internal drive yet the solution was replacing the
external cables. Try using the SCSI disk test (raw fs read-only) in
sundiag, for me this produced the errors sooner than "normal" operation.

From: uunet!seiden.com!mis (Mark Seiden)

 i have this same problem on my sparcstation 2.
intermittent errors at different locations, which occasionally crash
my system.

this is using two internal quantum 105s (there are also 4 external
devices).

From: uunet!seiden.com!mis (Mark Seiden)

quantum hardware error 0x90 is a "Synchronous Acknowledge Error"...

so i'm going to try going out of synch mode, after shortening my somewhat
long cabling...

From: uunet!seiden.com!mis (Mark Seiden)

surprise... turning off synchronous mode on my ss2 seems to make the
quantum 105 0x90 errors (synch acknowledgement error) disappear. no
cable changes (yet--i plan to shorten mine some). no noticeable
difference in performance (based on jacobson/leres disktest)...

wanna know how to do that?

To switch off synchronous SCSI:

(the first statement does it in the live/running kernel, the second in
the one on disk)

# adb -w -k /vmunix /dev/mem
scsi_options/W 58
scsi_options?W 58
$q
# /etc/fastboot

To switch on synchronous SCSI:

# adb -w -k /vmunix /dev/mem
scsi_options/W 78
scsi_options?W 78
$q
# /etc/fastboot

From: uunet!seiden.com!mis (Mark Seiden)

> 1) shorten the overall SCSI cable lengths
a good idea.
> 2) get a synchronous drive
probably bullshit. synch and asynch drivers are supposed to
work together on the same bus.
> 3) ignore the fact that it is showing up on the internal drive,
> it is a SCSI drive problem
probably not. i'll bet it's a bus problem being reported by the drive,
like crosstalk between a couple of lines. sun had a similar problem with
the ipc motherboard a while ago...
> 4) play with the terminators
don't do that. termination only works properly at both ends of the bus.

From: uunet!calvin.doc.ca!andrew (Andrew Patrick)

I am having very similar problems. In my case, most people have
suggested that the length of the SCSI bus is the cause. I am running
4 separate shoeboxes, with a total length of 14 feet. People tell me
that a length of 10 feet should be considered maximum.

One thing that worked for me is to re-build the kernal without the
Synch SCSI option. This has stopped the errors completely. I am
still looking for a better solution because I think Synch SCSI would
be an advantage for me, but running asynch at least lets me run and
use the Exabyte.

<----------------------- END INCLUDED COMMENTS ----------------------->

If anyone has any further comments, please let me know by email and I will
post a further summary if warranted.

Regards!
Tere

***** ***** ***** ***** ***** ***** ***** ***** ***** ***** ***** ***** *****
Terence P. Ma, Ph.D.
Department of Anatomy If it were easy, some one
University of Mississippi Medical Center would have done it already.
2500 North State Street -- anonymous
Jackson, MS 39216
VOICE: 601-984-1654 UUCP: tpm-sprl!tpm@uunet.uu.net
***** ***** ***** ***** ***** ***** ***** ***** ***** ***** ***** ***** *****



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:13 CDT