SUMMARY: Gigabit crashes 450

From: Bill Adams <badams_at_simplex.com>
Date: Tue Dec 04 2001 - 17:37:47 EST
Thanks to:

Frank Huang 	<huang@monair.com>
Wayne McCormick	<Wayne_McCormick@pancanadianenergy.com>
David Foster	<foster@dim.ucsd.edu>
John DiMarco	<jdd@cs.toronto.edu>

450 Hardware Bug

Frank had a similar problem using a SCSI-160 LSI controller card in a 450. 
LSI told him that it was a design problem in the 450.  At LSI, Sun
resolved the matter for LSI by replacing LSI's 450 with a newly
manufactured 450. 

Frank and John suggested trying another PCI slot.

Wayne has two 450s with Gigabit ethernet running NetBackup with no
problems.  David is successfully using Gigabit in an Ultra 80. 

My Resolution...

...was to replace the 450 with a 420R.  I've downgraded the 450 to
100BaseT and put it to another use.  The 420R has been stable for over two
weeks using the Gigabit and Differential SCSI cards from the 450. 

---------- Forwarded message ----------
Date: Tue, 6 Nov 2001 11:34:56 -0800 (PST)
From: Bill Adams <badams@simplex.com>
To: sunmanagers@sunmanagers.org
Subject: Gigabit crashes 450

Dear All,

I'm trying to use a Sun PCI Gigabit card in an Ultra Enterprise 450
running Solaris 8.  I recently installed the card.  The 450 has for
some time been the server for NetBackup DataCenter 3.4.1, until now on 
a single 100FDX line.

If I unplumb hme0, and configure and use the Gigabit interface ge0, within
a few hours I get a crash such as:  
Oct 25 09:09:20 lizard unix: WARNING: uncorrectable error from pci0 (upa mid 4) during
Oct 25 09:09:20 lizard DVMA read transaction
Oct 25 09:09:20 lizard unix:  Transaction was a block operation. 
Oct 25 09:09:20 lizard unix:  AFSR=40000000.24800000 AFAR=00000000.65b68e48,
Oct 25 09:09:20 lizard double word offset=1, Memory Module 180x id 4. 
Oct 25 09:09:20 lizard unix: 
Oct 25 09:09:20 lizard panic[cpu2]/thread=2a10019fd40: 
Oct 25 09:09:20 lizard unix: Fatal PCI UE Error
Oct 25 09:09:20 lizard unix: 
Oct 25 09:09:20 lizard
Oct 25 09:09:20 lizard unix: 000002a100197e60 pcipsy:ecc_intr+1a0
	The memory module and bank vary from crash to crash.

When just using 100FDX over hme0 the system runs clean indefinitely.  The
switch is a 3Com 4300.

I found this in the system dump:
pci_add_upstream_kstat+0x49c:   Fatal PCI UE Error

The Gigabit card is a Sun X1141A.

My unsuccessful efforts to resolve this include:
- installing the latest GigaBit 3.0 driver patch 108813-06
- applying the latest kernel patch 108528-11
- applying the latest /kernel/drv/ip patch 109279-18
- applying the latest /kernel/drv/tcp patch 109472-07
- running prtdiag, POST and OBDiag at maximum levels - clean

The patches did not resolve the crashes.  Suggestions?

The system:
Sun Ultra 450 (4 X UltraSPARC-II 296MHz)
System clock frequency: 99 MHz
Memory size: 4096 Megabytes
     Bus   Freq
Brd  Type  MHz   Slot  Name                              Model
---  ----  ----  ----  --------------------------------  ---------------
SYS   PCI    33     4   pciclass,001000                   Symbios,53C875        
SYS   PCI    33     6   pciclass,001000                   Symbios,53C875        
SYS   PCI    33     7   pciclass,020000                   SUNW,pci-gem          
SYS   PCI    33     8   pciclass,001000                   Symbios,53C875        
OBP 3.22.0 2000/12/20 16:31   POST 6.1.0 2000/12/20 16:32

Storage
- 19 internal disks on 5 SCSI channels, mostly configured as a RAID5 using
DiskSuite 4.2.1, but booting off /dev/dsk/c0t0d0s0
- a DLT8000 STK L700 tape library with 4 drives connected over 2 diff.
SCSI channels

TIA
Bill
Received on Tue Dec 4 22:37:47 2001

This archive was generated by hypermail 2.1.8 : Wed Mar 23 2016 - 16:32:36 EDT