SUMMARY: 690 boot timeout

From: Doug Neuhauser (doug@perry.berkeley.edu)
Date: Wed Aug 26 1992 - 17:13:52 CDT


Original Problem:
690 MP hangs with "Timeout - device busy" during attempted boot.

Configuration: 690 MP system:
1. 690 MP CPU board: 501-1894 (rev 09) VME slot 4/5
        4 CPUs
        SBus SCSI interface: 501-1850 (rev 02) SBus slot 0
                (3) Seagate Wren-8 Elite disk drives
        64 MB memory
2. ALM-2 16-line async ctrl 501-1203 (Rev 05) VME slot 9
3. IPI disk controller: 501-1539 (rev 09) VME slot 10
          (4) Sun 911 MB 6 MB/sec IPI disk drives

The SCSI drives are powered off a separate AC circuit since the 690 PDU is
230V only.

Symptoms:
When I power up the 690 system, I often get the following messages:

        ...
        SBus slot f lebuffer dma le eps
        SBus slot 0 eps dma
        SBus slot 1
        SBus slot 2
        SBus slot 3

        Boot device: /iommu/vme/SUNW,pn/ipi3sc@0/id@0 File and args:
        Timeout - device busy (after about 30 second delay)

The IPI drives have spun up and are ready before the CPU even finishes its
internal self-tests and starts to print any messages.

The timeout symptoms APPEARED to be correlated to the SCSI disks being
powered on when the 690 system was attempting to boot -- e.g. the system
would hang with the "Timeout" message if the SCSI disks were powered on, but
would boot successfully if they were powered off.

My FE said that my original SBus SCSI interface (501-1795) was "not
supported" on a 690. He brought in the up-to-date version of that board
(501-1850), and we saw the same timeout symptoms. When we temporarily
replaced it with the newer SBus SCSI/Buffered Ethernet interface (501-1869),
the system booted normally.

However, when we went back to the 501-1850 SBus SCSI interface, we appear to
not be able to boot the system irregardless of whether the SCSI disks are
powered on. If we cycle power to the CPU chassis ONLY, the system will then
successfully boot. If I cycle power to the entire 690 system (including IPI
disks, but leaving the SCSI drives powered on), the system will hang with
the timeout message at boot time.

1. Does anyone else have the Sun SBus SCSI interface (501-1850 or 501-1759) on
a 690 system, and does it present any of these symptoms?
2. Any suggestions as to why I would see these symptoms from this
interface, or am I barking up the wrong tree?

 My Sun FE indicated that he could get very little help from Sun since it is
not a "supported configuration". My feeling on this is that if both the 690
CPU board and the SBUS SCSI interface properly implements the SBus standard
that the interface should not prevent the system from booting.

------------------------------------------------------------------------
Responses:

From: stern@sunne.East.Sun.COM (Hal Stern - NE Area Systems Engineer)

        you cannot use a non-buffered (ie, old SBus scsi board) on a 600MP
        because of the exact timing problem you described. the non-buffered
        board tends to hold the Mbus for way too long, instead of doing
        short bursts like the buffered board does. as a result, you get
        some Mbus timeouts when the SCSI bus interface board is holding
        onto the memory bus.

His responses to questions about standards:
a. Does the old SBus scsi board does not adhere to the the SBus spec?
        no.
b. Does the 600MP does not adhere to the SBus spec (or MBus spec)?
        no.
c. Are the specs are not well enough written?
        maybe. booting is an entirely different issue than running
        a live system.
        of course, i'm not sure of the real details and part of this
        is folklore, but i do know of problems booting from an
        older, unbuffered controller, mostly due to you not really
        using a full Mbus when you're booting (ie, you're treating
        the Mbus like a uniprocessor bus for the purposes of getting
        a kernel up and running)

From: wolfgang%sunspot.nosc.mil@nosc.mil (Lewie Folwfang)

                I think the operative word is "standard", as in
        SBus standard. I heard that Sun is pressing to change the
        standard, rendering all the ASICs that it has been selling to
        OEMs out of date. Perhaps the 690 is at the outside edge of
        the envelope in anticipation of this change.

                BTW, we have a 690 with 2 SCSI/Buffered Ethernet boards
        and four SMD controller boards, all works well. The newer SCSI
        boards don't cost all that much and they do give better performance
        than the 3.0 Mbyte SMD controllers under some conditions.
        (files < 4 MB)

From: Mike Raffety <miker@sbcoc.com>

        I thought SCSI disks weren't supported at all on 690s ... but in any
        case, check to see if your boot PROM is a "high" revision level. Ask
        your FE if there's a newer version, and see if you can borrow one to
        test with.

From: Jim.Seavey@West.Sun.COM (Jim Seavey - East Bay SE)

        I got a response about the boot problem but I'm not so sure that it
        provides much more info than we had; perhaps it confirms some of our
        thoughts...The following is the response that I got:

        ------Begin Included Text-------

        The oldest Sbus ethernet and Sbus SCSI don't work in a lot of the
        newer systems because they're not tolerant of bus latency. Ie, if
        the device wants to transfer something but doesn't get the Sbus
        because somebody's got it for some other reason, they just drop
        things. In this case, I think what's happening is that the SCSI is
        trying to probe all of those drives (which takes a bit of time)
        while the ipi string is being reset - but you may notice that
        resetting the string takes a long time. They both want the bus, and
        since only one can have it, you have a problem.
        
        The folks in engineering speak kind of disparagingly about these two
        boards, to the effect that they don't really implement Sbus
        properly. The SBE/S and FSBE/S don't exhibit this problem. Recall
        that an old 470 with IPI is supported in a 670 upgrade
        configuration, as long as any new SCSI drives are connected via
        SBE/S or better.

------End Included Text----------

Summary:
        It appears as though the old unbuffered SCSI interface is not
"MBus-friendly". I guess I'll have to fork over the money for a new
interface.

----------------------------------------------------------------
Doug Neuhauser Seismographic Station
doug@perry.berkeley.edu ESB 475, UC Berkeley
Phone: 510-642-0931 Berkeley, CA 94720



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:48 CDT