Original Problem:
690 MP hangs with "Timeout - device busy" during attempted boot.
Configuration: 690 MP system:
1. 690 MP CPU board: 501-1894 (rev 09) VME slot 4/5
4 CPUs
SBus SCSI interface: 501-1850 (rev 02) SBus slot 0
(3) Seagate Wren-8 Elite disk drives
64 MB memory
2. ALM-2 16-line async ctrl 501-1203 (Rev 05) VME slot 9
3. IPI disk controller: 501-1539 (rev 09) VME slot 10
(4) Sun 911 MB 6 MB/sec IPI disk drives
The SCSI drives are powered off a separate AC circuit since the 690 PDU is
230V only.
Symptoms:
When I power up the 690 system, I often get the following messages:
...
SBus slot f lebuffer dma le eps
SBus slot 0 eps dma
SBus slot 1
SBus slot 2
SBus slot 3
Boot device: /iommu/vme/SUNW,pn/ipi3sc@0/id@0 File and args:
Timeout - device busy (after about 30 second delay)
The IPI drives have spun up and are ready before the CPU even finishes its
internal self-tests and starts to print any messages.
The timeout symptoms APPEARED to be correlated to the SCSI disks being
powered on when the 690 system was attempting to boot -- e.g. the system
would hang with the "Timeout" message if the SCSI disks were powered on, but
would boot successfully if they were powered off.
My FE said that my original SBus SCSI interface (501-1795) was "not
supported" on a 690. He brought in the up-to-date version of that board
(501-1850), and we saw the same timeout symptoms. When we temporarily
replaced it with the newer SBus SCSI/Buffered Ethernet interface (501-1869),
the system booted normally.
However, when we went back to the 501-1850 SBus SCSI interface, we appear to
not be able to boot the system irregardless of whether the SCSI disks are
powered on. If we cycle power to the CPU chassis ONLY, the system will then
successfully boot. If I cycle power to the entire 690 system (including IPI
disks, but leaving the SCSI drives powered on), the system will hang with
the timeout message at boot time.
1. Does anyone else have the Sun SBus SCSI interface (501-1850 or 501-1759) on
a 690 system, and does it present any of these symptoms?
2. Any suggestions as to why I would see these symptoms from this
interface, or am I barking up the wrong tree?
My Sun FE indicated that he could get very little help from Sun since it is
not a "supported configuration". My feeling on this is that if both the 690
CPU board and the SBUS SCSI interface properly implements the SBus standard
that the interface should not prevent the system from booting.
------------------------------------------------------------------------
Responses:
From: stern@sunne.East.Sun.COM (Hal Stern - NE Area Systems Engineer)
you cannot use a non-buffered (ie, old SBus scsi board) on a 600MP
because of the exact timing problem you described. the non-buffered
board tends to hold the Mbus for way too long, instead of doing
short bursts like the buffered board does. as a result, you get
some Mbus timeouts when the SCSI bus interface board is holding
onto the memory bus.
His responses to questions about standards:
a. Does the old SBus scsi board does not adhere to the the SBus spec?
no.
b. Does the 600MP does not adhere to the SBus spec (or MBus spec)?
no.
c. Are the specs are not well enough written?
maybe. booting is an entirely different issue than running
a live system.
of course, i'm not sure of the real details and part of this
is folklore, but i do know of problems booting from an
older, unbuffered controller, mostly due to you not really
using a full Mbus when you're booting (ie, you're treating
the Mbus like a uniprocessor bus for the purposes of getting
a kernel up and running)
From: wolfgang%sunspot.nosc.mil@nosc.mil (Lewie Folwfang)
I think the operative word is "standard", as in
SBus standard. I heard that Sun is pressing to change the
standard, rendering all the ASICs that it has been selling to
OEMs out of date. Perhaps the 690 is at the outside edge of
the envelope in anticipation of this change.
BTW, we have a 690 with 2 SCSI/Buffered Ethernet boards
and four SMD controller boards, all works well. The newer SCSI
boards don't cost all that much and they do give better performance
than the 3.0 Mbyte SMD controllers under some conditions.
(files < 4 MB)
From: Mike Raffety <miker@sbcoc.com>
I thought SCSI disks weren't supported at all on 690s ... but in any
case, check to see if your boot PROM is a "high" revision level. Ask
your FE if there's a newer version, and see if you can borrow one to
test with.
From: Jim.Seavey@West.Sun.COM (Jim Seavey - East Bay SE)
I got a response about the boot problem but I'm not so sure that it
provides much more info than we had; perhaps it confirms some of our
thoughts...The following is the response that I got:
------Begin Included Text-------
The oldest Sbus ethernet and Sbus SCSI don't work in a lot of the
newer systems because they're not tolerant of bus latency. Ie, if
the device wants to transfer something but doesn't get the Sbus
because somebody's got it for some other reason, they just drop
things. In this case, I think what's happening is that the SCSI is
trying to probe all of those drives (which takes a bit of time)
while the ipi string is being reset - but you may notice that
resetting the string takes a long time. They both want the bus, and
since only one can have it, you have a problem.
The folks in engineering speak kind of disparagingly about these two
boards, to the effect that they don't really implement Sbus
properly. The SBE/S and FSBE/S don't exhibit this problem. Recall
that an old 470 with IPI is supported in a 670 upgrade
configuration, as long as any new SCSI drives are connected via
SBE/S or better.
------End Included Text----------
Summary:
It appears as though the old unbuffered SCSI interface is not
"MBus-friendly". I guess I'll have to fork over the money for a new
interface.
----------------------------------------------------------------
Doug Neuhauser Seismographic Station
doug@perry.berkeley.edu ESB 475, UC Berkeley
Phone: 510-642-0931 Berkeley, CA 94720
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:48 CDT