SUMMARY: Ciprico 3523 causes watchdog reset

From: Doug Neuhauser (doug@perry.berkeley.edu)
Date: Wed Aug 14 1991 - 18:48:18 CDT


Original problem description:
========================================================================
Ciprico 3523 causes watchdog reset

I am having problems getting a Ciprico Rimfire 3523 VME SCSI controller to
work in my 4/490 CPU.

Software:
SunOS 4.1.1 with Ciprico driver (Revision 2.3, November 10, 1990). Driver
configured for (2) Wren 7 disk drives (WrenSC in Ciprico's options table).

Patches:
100173-03: Date: 01/April/91 NFS Jumbo Patch
100174-01: Date: 03-Dec-90 SunOS 4.1.1: fixes for tmpfs bugs.
100259-01: Date: 02/Apr/91 SunOS 4.1.1: ufs_inactive patch
100262-01 Date: 04-April-91 Fix to prevent zero divide panic originating from stclose().
                                        (Apparently supercedes 100211-02, 100250-01)
100261-02 Date: 08/April/91 Confirmed that the 4/490 was sending out misaligned frames.
100228-02 Date: 05/Mar/91 Kernel panics with "panic:psig" or "panic: psig action"

Backplane configuration: Jumper Jumper
Slot: Item BG3 IACK
----------------------------------------------------------------
1 Memory - 32 MB NA NA
2 NA NA
3 NA NA
4 CPU 4/490 OUT OUT
5 IN IN
6 Ciprico 3523 OUT OUT
7 Memory - 32 MB IN IN
8 Sun SCSI-3 OUT OUT
9 ALM-2 IN OUT
10 IPI disk controller OUT OUT

I started with my current conf file, which is pared down from Sun's GENERIC
conf file. After removing the line that references the undefined parameter
B_IOCACHE in the driver cs35.c, the kernel builds fine. The kernel will
boot if the Ciprico controller is not installed in the backplane. Likewise,
my old kernel without the ciprico controller works fine with the ciprico
controller in the backplane. However, when I use the new kernel with the
controller installed, I get a Watchdog Reset after the device probe on the
Ciprico controller. This happens irregardless of whether I have the SCSI
cable and drives connected to the controller.

I enabled full debugging in the device driver (with the SCSI bus
disconnected) and I get:

cf_probe enter (controller 0, address 0xff019000)
Controller is a 3500
Waiting 5 seconds for SCSI bus reset.
cfc0 at vme16d32 0x5000 vec 0xfc
cfslave enter: (device 0xf8148234, rf 0xff019000)
cfslave: Setting General Board Options
cf_sglcmd: pb=448 msw=0 lsw=448
cf_sglcmd: Waiting for ST_CC
cf_sglcmd: status port = 203
  0 0 0 0 1 29 6 ff 0 0 0 0 0 0 0 0
  7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 80 0 0 0 0
  0 0 0 0

cfslave: Setting Unit Options
cf_sglcmd: pb=448 msw=0 lsw=448
cf_sglcmd: Waiting for ST_CC
cf_sglcmd: status port = 202
  0 0 0 0 86 a0 0 ff 1 0 1f 3 0 0 18 2
  8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 80 0 0 0 0
  0 0 0 0

cfslave: Testing for device existence
cf_sglcmd: pb=448 msw=0 lsw=448
cf_sglcmd: Waiting for ST_CC
cf_sglcmd: status port = 203
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0

Watchdog reset.
>

I have tried:
1. Configuring only 1 disk. Same result.
2. Another (new) controller from Ciprico. Same result.
3. Rearranging the I/O controllers in the backplane, which involved trying
the Ciprico controller in 3 different slots. Same result.
4. Removing all but (2) memory boards, CPU, Ciprico and IPI disk controller.
Same result.
5. Swapped ALM-2 and Sun SCSI boards so they were in the "right order"
according to the Sun backplane config manual. Same result.
6. Removing the second memory board, and moving all of the I/O controllers
down so that slots 6 and 7 were empty (one of Ciprico's suggestions). Same
result.

The problems appears to be software related, but I am running out of ideas.
Any suggestions or experiences with this controller would be greatly
appreciated.
========================================================================
Several people responded with suggestions or requests for my solution:

1. From: Mike Raffety <miker@sbcoc.com>

I seem to recall something about the 490 PROM not working correctly with
SCSI drives (that's why Sun doesn't support'em) ... but that Sun came out
with a newer PROM rev to make it work anyway ... might want to check that
one out.

[My Sun FE had no info on this.]

2. From: webber@world.std.com (Robert D Webber)

Just a wild-ass guess: jumper Xylogics emulation OFF on your 3523 if you
haven't done so already. Insert jumpers in JD4L ~rand JTL (second position
from end clsest to VME connectors) as shown on page 2-3 of the Rimfire 3500
Installation Guide, Ciprico publication number 21018002.

[I neglected to specify that I had already disabled both tape and disk boot
emulation.]

3. From: shj@ultra.com (Steve Jay)

4/4xx CPU's previous to revision 12 were known to have VMEbus problems. If
your CPU is earlier than that, get it upgraded. I'm not sure if the
problems included watchdog resets. We also know a way to get a watchdog
reset on a 4/4xx, but it involves interaction between the on-board ethernet
controller and other VMEbus network interfaces (like FDDI or UltraNet). I
can't see how that would apply to your case.

When you get the watchdog reset, do a "d" command to the console prompt.
This will show you the registers, including the PC, at the time of the
reset. My experience, and confirmed by a Sun guru, is that you can trust
the output of "d" after a watchdgog reset. The combination of the PC when
it crashes, adb, and the driver source code should let you figure out
exactly what the driver was doing when the crash occurs.

4. From: perw@holtec.se (Per Westerlund)

Which rev is your CPU board? There are hardware problems that also
makes it impossible to use SMD controllers at the same time as IPI
controllers (needs rev-12). To use 1/2-inch tape (Xylogics controller)
you even need rev-14.

I have heard that the same kind of problem can be caused by any
controller doing fast DMA over the VME bus, adn I think the Ciprico is
quite fast.

[ I have a rev 14 CPU board. ]

5. From: Wilson N G <noel@essex.ac.uk>

To get the symptoms you describe - i.e. a loop with interrupts turned off so
the kernel clock code is not run, and the watchdog timer is thus not reset,
the problem has to be in the ciprico driver, which probably has not been
4.1.1-ised.

6. From: webber@world.std.com (Robert D Webber)

I've been chewing on this on and off. The only other thing I can think of
is changing the address of the controller. The only times I've seen
watchdog resets have been when I've flipped the diag/norm switch or had a
bad Fujitsu disk hanging off Xylogics 450, and then only when using format
on the disk. In other words, it's been basically due to hardware.

========================================================================
Solution:

Ciprico examined the first controller that I returned, and could find
nothing wrong with it.

I tried various combinations of:
a. reducing configuration to only 1 drive.
b. starting with a full GENERIC conf file and adding the controller and 1
drive to it.

All of the above still produced a watchdog reset.

I tried enabling more debugging in the Ciprico driver by defining the symbol
TRACE_ON, which enables tracing of subroutine entry and exits. With both
TRACE_ON and TURNON_DEBUG, the system would boot! When I connected drives
to the controller, the drives indeed worked. Seemed like a timing problem
to me.

The Ciprico tech supports rep that my VAR was dealing with mentioned that
there was a jumper on some of the controllers that affected timing. A quick
check of the manual showed an "Exact Burst / Normal mode" jumper (with no
description, of course). He suggested that we try it.

With the "Exact Burst" jumper installed, the system boots and works
apparently correctly both without and without debugging code enabled in the
driver. The tech support rep at Ciprico that I dealt with still have no
explaination as to why I should have to run with that jumper enabled. They
maintain that other customers are using the controller in 4/490 systems with
IPI drives, and have no problem.

Can you say "voodoo"? ...
----------------------------------------------------------------
Doug Neuhauser Seismographic Station
doug@perry.berkeley.edu ESB 475, UC Berkeley
Phone: 415-642-0931 Berkeley, CA 94720



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:20 CDT