128Mb memory upgrade to 4/470 crashing (REVISITED and SUMMARY)

From: Cornell Kinderknecht (cornell@inoc.dl.nec.com)
Date: Thu Mar 12 1992 - 13:19:13 CST

Thanks to those who responded to my original posting:
     Subject: 128Mb memory added to 4/470--now system crashing without a trace
which commented on mysterious crashes after adding a 128Mb memory board
(Parity Systems) to our Sun 4/470's original 32Mb (for a total of 160Mb).
The problem was that the system would go down without giving a panic or
and messages in /var/adm/messages. This would happen during moderate
and light loads on the system (haven't had a heavy load on it yet).

The responses I got fell into two categories--heat and jumper positions.
I did get a panic message with the last crash and I'd like to know if
this verifies the jumpering issue.

4/470 setup:
        Original 32Mb board in slot 1 jumper position 0
        128Mb addition in slot 2 jumper position 2
        ALM-2 board in slot 7
                Since ALM-2 board was already present, I chose
                not to use slot 7 (which was recommended) for
                the position of the 128Mb board.
        eeprom -> memsize=160; memtest=160

        This worked fine for two days over the weekend and then
        has crashed three times--twice without panic message and
        once with.

SUMMARY of responses:
        One category of responses dealt with jumper positions. It
        was suggested by several people that my problem lies with
        jumpering the 32Mb board to be jumpered "logically" (position
        0) before the 128Mb board (position 2). It was recommended
        that the board with larger simms come logically before the
        board with smaller simms (i.e. jumper 128Mb board to 0 and
        32Mb board to 1).

        The other category of responses dealt with heat considerations.
        Possibly heat built up by the new board could cause problems
        with the systems if airflow was interrupted.

ADDITIONAL information:
        I have not changed any jumpers at this time or slot positions
        and crashed again. This time I got the following messages:
                vmunix: Memory Error Register 74d3<INTR,INTENA,CE_ENA,UE,CE>
                vmunix: DVMA = 0, context = 3a, virtual address = 13f10
                vmunix: pme = 82003cc7, physical address = 798ff10
                vmunix: panic: uncorrectable ECC error
                vmunix: mem2: soft ecc addr 398ff00 syn e9<S32,S16,S8,S2,S
                        X> No bit information

        Does anyone know if these messages verifies the first category
        of suggestions (jumper positioning)? The power-up memory check
        goes through without any problems.

        Since things appear to run OK for quite a while and then crash
        out of nowhere, it's difficult to test configurations.

