SUMMARY: Why did we panic?

From: Oran Davis (oran@spg.amdahl.com)
Date: Thu Feb 20 1992 - 05:49:23 CST


Managers,

A late summary. Thanks very much to all those who took the time to answer.
Only one answer was actual action bearing from Hal Stern.
I have not done anything and the system is up since - touch wood.

I have included all answers since some are amusing.

>- Oran

>Latly one of my user's IPC paniced. Then rebooted and is now up for two days.
>Any pointers to the nature of problem (patch?) appreciated.
>
> >- Oran
>.....
>DVMA Parity Error, ctx = 0x0, virt addr = 0xfff14bc0
>pme = e2000973, phys addr = 973bc0
>Parity Error Register 91<ERROR,CHECK,ERR24>
> bad module/chip at: ?
>System operation cannot continue, will test location anyway.
>parity error at 973bc0 is transient.
>panic: dvma parity error
>esp0: Unrecoverable DMA error on dma send
>sd0: SCSI transport failed: reason 'tran_err': retrying command
>syncing file systems... esp0: Target 3 now Synchronous at 4.167 mb/s max transmit rate
>......

--------------------------------------------------
From: celeste@stokely.mtview.ca.us (Celeste Stokely)
To: spg.amdahl.com!oran@netcom.com
Subject: Re: Whay did we panic?

You panic'd because of a random memory parity error. If it only happens
once, chalk it up to cosmic rays. If it continues, you may have a simm that
has worked loose, or a bad simm.

..Celeste Stokely
--------------------------------------------------
From: ron@arfalas.horizon.com (Ron McDaniels)
To: oran@spg.amdahl.com
Subject: Re: Whay did we panic?
Cc: brummp@spg.amdahl.com

Very familiar. I installed a SIMM that had a flakey chip. Memory parity
errors are recoverable if detected by the CPU; fatal if during I/O dma.
My problems went away after I found and replaced the bad SIMM (found it
on a Single board computer my company makes. We make a decent diagnostic).

Ron McDaniels
--------------------------------------------------
From: admin%esrg@hub.ucsb.edu (system administrator)
To: oran@spg.amdahl.com
Subject: Re: Whay did we panic?

If you find out why, could you mail me?
I think mine did the same thing....
following from /var/adm/messages

:
--------------------------------------------------
From: hobbit@ftp.com (*Hobbit* )
Reply-To: hobbit@ftp.com
Sender: hobbit-e@ftp.com
Repository: babyoil.ftp.com
Originating-Client: cukes

Gamma rays from the planet Mongo

_H*
--------------------------------------------------
From: stern@sunne.East (Hal Stern - NE Area Systems Engineer)
To: oran@spg.amdahl.com
Subject: Re: Whay did we panic?

bug in the esp driver.

get patch 100343-01, which fixes this and gets you
1.3Gbyte disk features.

--hal
--------------------------------------------------
From: Brendan Kehoe <brendan@cs.widener.edu>
To: oran@spg.amdahl.com
Subject: Re: Whay did we panic?
Newsgroups: widener.mail.sun-managers
In-Reply-To: <ko02fhINN7vm@cs.widener.edu>
Organization: Widener University Computer Science Dept, Chester PA
Cc:

The parity error message means one of your memory chips (probably,
possibly a different one given the `at chip: ?' msg) has gone bad.
Give your hardware support a call.
--------------------------------------------------
From: kevin@toad.com

No patch involved as far as I can see - that was a parity error while
doing DMA for the disk. That's hardware mate :-)

        l & h,
        kev
--------------------------------------------------



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:36 CDT