SUMMARY: SCSI errors

From: Veselin Terzic (terzic@deneb.mda.ca)
Date: Wed Feb 15 1995 - 11:19:21 CST


hello,

Here is original question:
>> My platform is "SunOS deneb 4.1.3_U1 1 sun4c". (Rev B)
> Every little while I get error messages on deneb's console like below. I have
> three hard drives (2 internals 1 external). SCSI cable is shorter than 1m.
> I tried replacing SCSI cable, terminator ( I put internal), external hard
> drive (sd3), and at the end motherboard, but nothing helped.
> Is that OS bug? Any patch?
> esp0: data transfer overrun
> State=DATA Last State=DATA_DONE
> Latched stat=0x11<XZERO,IO> intr=0x10<BUS> fifo 0x1
> last msg out: <unknown msg 0xff>; last msg in: COMMAND COMPLETE
> DMA csr=0x80000000
> addr=fff07800 last=fff05800 last_count=2000
> Cmd dump for Target 0 Lun 0:
> cdb=[ 0x28 0x0 0x0 0x28 0x91 0x30 0x0 0x0 0x10 0x0 ]
> pkt_state 0xb<XFER,SEL,ARB> pkt_flags 0x0 pkt_statistics 0x2
> cmd_flags=0x21 cmd_timeout 35
> Mapped Dma Space:
> Base = 0x5800 Count = 0x2000
> Transfer History:
> Base = 0x5800 Count = 0x2000
> current phase 0x26=DATAIN stat=0x11 0x2000
> current phase 0x20=SELECT stat=0x10 0x0 0x0
> current phase 0x1=CMD_START stat=0x10 0x28 0x20
> current phase 0xb=CMD_CMPLT stat=0x17 0x2000
> current phase 0x27=STATUS stat=0x17 0x0
> current phase 0xb=CMD_CMPLT stat=0x13
> current phase 0x26=DATAIN stat=0x11 0x2000
> current phase 0x20=SELECT stat=0x10 0x0 0x0
> current phase 0x1=CMD_START stat=0x10 0x28 0x20
> current phase 0xb=CMD_CMPLT stat=0x17 0x2000
> current phase 0x27=STATUS stat=0x17 0x0
> current phase 0xb=CMD_CMPLT stat=0x13
> current phase 0x26=DATAIN stat=0x11 0x2000
> current phase 0x1b=RESEL stat=0x17 0x0 0x0
> current phase 0x5=MSG_IN stat=0x17 0x4
> current phase 0x28=DISCONNECT stat=0x17 0x2000
> Jan 25 20:59:09 deneb vmunix: esp0: Target 0.0 reducing sync. transfer rate
> Jan 25 20:59:09 deneb vmunix: esp0: Reverting to slow SCSI cable mode
> Jan 25 20:59:09 deneb vmunix: sd3: SCSI transport failed: reason
> 'data_ovr': retrying command

After I replaced the motherboard, and terminated SCSI external hard drive
internally, the problem appeared again next day but only once(?!). I've been
watching things for 10 days but the errors didn't come up again. There is
no way that I can reproduce that situation. Before I replaced the motherboard
I had tried many short and long SCSI cables, active/pasive terminators and took
off CD-ROM from SCSI chain but (eh..).

Many thanks to everyone who replied!

Here is what I got:
>From: Kevin.Sheehan@uniq.com.au (Kevin Sheehan {Consulting Poster Child})
could be badly matched impedance on the cables, and < 1m for a given cable
is not the rule. The rule is us the little 6 inch jobbies, or the shortest
you can beg borrow or steal!

>From: "Ricardo Ruiz (SSD)" <rruiz@Census.GOV>
I'm having similar problems with a SPARC10 running SunOS v4.1.3_U1.
After many cable switching and moving things around, I called Sun and
there answer was that the SCSI connector went bad and I need to replace
the motherboard. Total cost: $5000 more or less.

>From: Tomasz Wolniewicz <twoln@mat.uni.torun.pl>
My guess would be that you are mixing fast scai withe ordinary scsi devices
and they do not behave, I would also strongly suspect the scsi cable, try
to do some chcning around.

>From: patp@juliet.ll.mit.edu ( Patrick Pawlak )
Is it possible that you have a mix of SCSI-1 and SCSI-2 devices on the same
SCSI bus, or is it possible that you have passive instead of active
termination?
Either one of these things could cause problems like what you describe.

>From: John Goggin - LTX Tech Support <jgoggin@ltx.com>
1) Do you have any other items on the SCSI bus than the disks, such as tape
units or
    a cdrom drive?

2) How long is the cable from the back of the Sparc to the external disk drive?

In my experience, Total SCSI Bus length is CRITICAL. Having more than 2-3
external
units connected with Sun's standard 2 foot SCSI cable will cause this problem.

>From: bismark@alta.jpl.nasa.gov (Bismark Espinoza)
What SCSI types do you have on the scsi
daisy chain? SCSI1, SCSI2 and SCSI2-fast
dont like to be on the same string.

Also, check the kernel for changes in SCSI
configuration from the original one given by SUN.

>From: david.warm@fi.gs.com (David Warm)
I had a similiar problem last night on a server - it was a Sparc 10 running
4.1.3 - I am changing the motherboard (contains the esp0)

>From: Manfred Liebchen <liebchen@rrz.Uni-Koeln.DE>
Hi,
i have observed the same error message on my sun4m(SPARC20) with a
4 GB Seagate Barracuda disk (ST15150N) althjough using this disk in
asynchronous mode scsi_options 0x58. I am running Solaris 2.3.
Seagate suggested that i have to prove of the newest Firmware release
is installed on the disk(it is ! 0017) Nevertheless the error seems
to remain. Please let me know if you receive any solutions

Veselin Terzic | MacDonald Dettwiler
MIME accepted terzic@mda.ca | 13800 Commerce Parkway
Phone: (604) 278-3411 Fax: 278-3786| Richmond, B.C.;Canada V6V 2J3
#include <disclaimers.h> | Key ID: 0xE24588D5 at keyservers

------- End of Forwarded Message



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:16 CDT