Sorry for the great delay in summarizing. Being the only unix
admin/informed-type in a group is hell on task followthrough.
In late March 1991 I wrote:
>I have a Sparcstation 1+ running the 4.1.1 GENERIC kernel
>(no patches) with 2 105Mb internal Quantums and two CDC
>Wren V's in an external shoebox. Recently, the 1+ has been
>dying with a BAD TRAP error message as shown below. The
>panic usually (14 out of 17 times) follows the scsi driver
>message "esp0: Disconnected command timeout for Target 2 Lun 0",
>where target 2 is a Wren V. This is the only device on the
>scsi bus that times out. There are times when the timeout
>hasn't caused a kernel panic but they are few and far between.
>All of this points to problems with the Wren V. Has anyone
>else seen something similar to this? What are some possible
>causes for the command timeout?
Bzzzt. The symptoms were caused by two problems. The Wren V was fine,
but the scsi cabling wasn't. The 4.1.1 scsi driver also has problems with
marginal scsi busses. The ribbon cable connecting the Wren to the DB-50 plug
on the shoebox was frayed at the Wren end. I ordered a replacement cable and
installed patch 100243-01 (which fixed the immediate problem). Thanks to:
Kevin Sheehan synergy!kevin@Sun.COM
Randy Holt randy@den.mmc.com
Ron Gaug ron@sarah.lerc.nasa.gov
A useful tool is Sun's patch/problem report system. It can be reached at
1-800-477-4768, login guest, and has a simple menu interface that allows you
to look at known bugs, order patches on disk, tape or email, and report bugs
as well. Useful.
Here is the README for patch 100243-01:
Patch-ID# 100243-01
Keywords: esp scsi recovery
Synopsis: SunOS 4.1.1 sun4c:esp host adapter can cause panic during error recovery
Date: 11-Mar-91
SunOS release: 4.1.1
Unbundled Product:
Unbundled Release:
Topic: scsa/esp host adapter
BugId's fixed with this patch: 1046580,1048141,1046305
Architectures for which this patch is available: sun4c
Patches which may conflict with this patch:
Obsoleted by: SVR4, 4.1.2
Problem Description:
1046580:
During some portions of SCSI error recovery, the target driver
can attempt try and get the host adapter driver to send either
a BUS DEVICE RESET message or a ABORT OPERATION message to
a target that appears to have had a command time out while
disconnected.
The problem is that the code in esp.c that forms a proxy command
to send to the target has a bug in it which can write random values
over a random portion of the esp's softc structure. This can wipe
out portions of important data in the softc structure- including
putting a garbage value into a pointer the DMA gate array CSR.
1048141: esp does not always recognize a marginal SCSI bus
1046305: some XXgetcap cases reversed. Only affects 3rd party SCSI target
drivers.
INSTALL:
as root:
mv /sys/sun4c/OBJ/esp.o /sys/sun4c/OBJ/esp.o.orig
cp sun4c/esp.o /sys/sun4c/esp.o
chmod 444 /sys/sun4c/esp.o
Rebuild and install a new kernel and reboot the system.
Please refer to the Systems and Networking Administration
Manual on building and installing a custom kernel.
Kean
Kean Stump (503)-737-4740
OSSHE Network Operations Center DOMAIN: kean@ucs.orst.edu
Oregon State System of Higher Education UUCP: hplabs!hp-pcd!orstcs!kean
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:15 CDT