SUMMARY: writeback panic

From: David Way (dpw@kate.as.utexas.edu)
Date: Tue Nov 17 1992 - 05:32:31 CST


Sun Managers,

On Monday, November 9, 1992, I wrote:
 
> Can anyone tell me a likely cause for the following problem on a 4/490
> running 4.1.1:
>
> Nov 9 17:59:08 astro vmunix: Memory Error Register 60d4<INTR,INTENA,CE_ENA,WBAC
> KERR>
> Nov 9 17:59:08 astro vmunix: DVMA = 0, context = 30, virtual address = ff916ff0
> Nov 9 17:59:08 astro vmunix: pme = a3000b07, physical address = 160eff0
> Nov 9 17:59:08 astro vmunix: panic: writeback error

Almost everyone said this occurs as a result of improperly defining the
primary swap error in /etc/fstab. This wasn't the case in our situation,
as we define it in the kernel only. A couple of responders suggested it
could also be due to a hardware fault, and so we are currently pursuing
this possibility.

Thanks to the responders:

Badri.Pillai@ecrc.de (Badri Pillai)
fuat@ans.net (Fuat Baran)
montjoy@thor.ece.ec.edu (Robert Montjoy)
johnb@edge.cis.mcmaster.ca (John Benjamins)
ups!kevin@fourx.Aus.Sun.Com (Kevin Sheehan)
ups!glenn@fourx.Aus.Sun.Com (Glenn Satchell)

-----------------------------------------------------------------------------

>From Badri.Pillai@ecrc.de Tue Nov 10 06:39:22 1992
Posted-Date: Tue, 10 Nov 92 13:39:33 +0100
Received-Date: Tue, 10 Nov 92 06:39:17 CST
Date: Tue, 10 Nov 92 13:39:33 +0100
From: Badri Pillai <Badri.Pillai@ecrc.de>
Local-Tel-Ext: 119
To: dpw@kate.as.utexas.edu
Subject: Re: Writeback panic
Content-Length: 50
X-Lines: 4
Status: RO

Looks like OS bug, I don't the patch ID.

badri

-----------------------------------------------------------------------------

>From fuat@ans.net Tue Nov 10 14:32:05 1992
Posted-Date: Tue, 10 Nov 92 15:29:37 EST
Received-Date: Tue, 10 Nov 92 14:31:55 CST
Date: Tue, 10 Nov 92 15:29:37 EST
From: Fuat Baran <fuat@ans.net>
To: dpw@kate.as.utexas.edu (David Way)
Cc: fuat@ans.net
Phone: 914-789-5328, Fax: 914-789-5310
Subject: Re: Writeback panic
Content-Length: 788
X-Lines: 23
Status: RO

>Can anyone tell me a likely cause for the following problem on a 4/490
>running 4.1.1:

Check your /etc/fstab file. Do you have the default swap partition
(configed inthe kernel, typically something like /dev/sd0b or
/dev/xd0b, etc.) in their? If you do remove it. The default swap
gets activated by default, and in SunOS 4.1.1 having it also in
/etc/fstab caused swapon to try to activate it a second time and when
you eventually swap (typically when you're running something like
/etc/dump) you'll crash with the above panic.

Note: This is just one case when you'll see the above panic. Others
include bad memory, etc.

Hope this helps.

                                                        --Fuat

Advanced Network & Services, Inc. fuat@ans.net
100 Clearbrook Road 914-789-5328
Elmsford, NY 10523 914-789-5310 (Fax)

-----------------------------------------------------------------------------

>From montjoy@thor.ece.uc.EDU Tue Nov 10 14:38:11 1992
Posted-Date: 10 Nov 1992 15:35:45 -0500 (EST)
Received-Date: Tue, 10 Nov 92 14:38:08 CST
Date: 10 Nov 1992 15:35:45 -0500 (EST)
From: montjoy@thor.ece.uc.EDU (Robert Montjoy)
Subject: Re: Writeback panic
To: dpw@kate.as.utexas.EDU
X-Envelope-To: dpw@kate.as.utexas.EDU
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7BIT
X-Mailer: ELM [version 2.4 PL3]
Content-Length: 535
X-Lines: 19
Status: RO

HI..

Are you continousily getting these errors? If so you have
a "bad" cpu board. We had the same problem last July and
they replaced the cache memory chips on the CPU board. These
look like soldered on Simms.

I got ours fixed for around 1800 dollars.

-- 
Rob Montjoy                   		- Rob.Montjoy@UC.Edu
Computer Engineer    	      		- montjoy@ucbeh.BITNET
University of Cincinnati      		- montjoy@babbage.ece.uc.edu
Electrical and Computer Engineering	- uunet!uceng!rmontjoy

-- To Save the Earth. The Humans must die.

-----------------------------------------------------------------------------

>From ups!upstage!glenn@fourx.Aus.Sun.COM Tue Nov 10 14:43:23 1992 Posted-Date: Wed, 11 Nov 92 07:19:57 EST Received-Date: Tue, 10 Nov 92 14:43:05 CST Date: Wed, 11 Nov 92 07:19:57 EST From: ups!upstage!glenn@fourx.Aus.Sun.COM (Glenn Satchell) To: ups!fourx!kate.as.utexas.edu!dpw@fourx.Aus.Sun.COM Subject: Re: Writeback panic Content-Length: 1508 X-Lines: 39 Status: RO

The classic cause for this is to have an entry for your primary swap partition in /etc/fstab. The system tries to swap on the same partition twice. So, just remove the line from /etc/fstab to fix this one.

regards, -- Glenn Satchell ups!glenn@fourx.Aus.Sun.COM | Uniq Professional Services Pty Ltd ACN 056 279 335 | "The answer is no, PO Box 70, Paddington, NSW 2021, (Sydney) Australia | and I'll negotiate Phone: +61-2-360-7434 Fax: +61-2-331-2572 | from there." "Sun Accredited System Consultants" |

> From ups!fourx!ra.mcs.anl.gov!sun-managers-relay Wed Nov 11 06:27:13 1992 > Date: Mon, 9 Nov 92 19:04:31 GMT > From: ups!fourx!kate.as.utexas.edu!dpw (David Way) > Posted-Date: Mon, 9 Nov 92 19:04:31 GMT > To: sun-managers@delta.eecs.nwu.edu > Subject: Writeback panic > Cc: dpw@kate.as.utexas.edu > Content-Length: 585 > X-Lines: 16 > > Sun Managers, > > Can anyone tell me a likely cause for the following problem on a 4/490 > running 4.1.1: > > Nov 9 17:59:08 astro vmunix: Memory Error Register 60d4<INTR,INTENA,CE_ENA,WBAC > KERR> > Nov 9 17:59:08 astro vmunix: DVMA = 0, context = 30, virtual address = ff916ff0 > Nov 9 17:59:08 astro vmunix: pme = a3000b07, physical address = 160eff0 > Nov 9 17:59:08 astro vmunix: panic: writeback error > > Thanks in advance, and will summarize. > -- > David Way McDonald Observatory/Astronomy Dept.- Univ. of Texas, Austin > (office) RLM 16.206 (voice) 471-7439 (internet) dpw@astro.as.utexas.edu > >

-----------------------------------------------------------------------------

>From johnb@edge.cis.mcmaster.ca Tue Nov 10 14:53:24 1992 Posted-Date: Tue, 10 Nov 1992 15:55:41 -0500 Received-Date: Tue, 10 Nov 92 14:52:55 CST From: johnb@edge.cis.mcmaster.ca (John Benjamins) Date: Tue, 10 Nov 1992 15:55:41 -0500 X-Department: Computing and Information Services, McMaster University X-Disclaimer: These are MY opinions, not CIS' or McMaster University's X-Mailer: Mail User's Shell (7.2.4 2/2/92) To: dpw@kate.as.utexas.edu (David Way) Subject: Re: Writeback panic Content-Length: 8786 X-Lines: 231 Status: RO

David Way,

On Nov 9, 7:04pm, you wrote: } Subject: Writeback panic } Sun Managers, } } Can anyone tell me a likely cause for the following problem on a 4/490 } running 4.1.1: } } Nov 9 17:59:08 astro vmunix: Memory Error Register 60d4<INTR,INTENA,CE_ENA,WBAC } KERR> } Nov 9 17:59:08 astro vmunix: DVMA = 0, context = 30, virtual address = ff916ff0 } Nov 9 17:59:08 astro vmunix: pme = a3000b07, physical address = 160eff0 } Nov 9 17:59:08 astro vmunix: panic: writeback error } } Thanks in advance, and will summarize. } -- } David Way McDonald Observatory/Astronomy Dept.- Univ. of Texas, Austin } (office) RLM 16.206 (voice) 471-7439 (internet) dpw@astro.as.utexas.edu } }-- End of excerpt of Nov 9, 7:04pm

I sent out the following summary a little over a year ago on this same error, but on a 3/280 running 4.1

----------Beginning of forwarded message---------- Date: Wed, 26 Jun 1991 23:08:25 EDT From: johnb@edge.Cis.McMaster.CA Subject: SUMMARY: panic: writeback error To: sun-managers@eecs.nwu.edu

On Jun 25, 10:05am, I wrote: } We have a Sun 3/280S, running SunOS 4.1 which has been crashing 2-3 times a } day for about the last week. Here's the autoconfig output when vmunix boots: } } SunOS Release 4.1 (PHYSUN) #1: Tue Nov 6 14:01:30 EST 1990 } Copyright (c) 1983-1990, Sun Microsystems, Inc. } [ ... configuration messages deleted for brevity ... ] } } It keeps dying with the following error: } } panic: writeback error } syncing file systems .... } MEMORY ERROR! Status C4, DVMA-BIT 0, Context 4, } Vaddr: 2677C, Paddr: 0000077C, Type 0 at 00000000 } } Break FFFFFFFF at 0E05C710 } } I have run the extended memory tests from the PROM, which show no errors. } I've run the SunDiag tests, which also show no error. Is this a bad memory } board, or something else? } }-- End of excerpt of Jun 25, 10:05am Well, the response was great! I got my first reply, before I got my own message mailed to me! The responses fall into 2 camps: a) a bad CPU cache (i.e. replace the CPU board:-(, or b) having the default swap partition in /etc/fstab as well. Detailed replies follow. I have taken the /dev/xd0b swap swap rw 0 0 line out of fstab, since xd0 is my boot disk (/ == /dev/xd0a). If the problem persists, we'll get the CPU board replaced. As Chris Drake points out, this problem did just suddenly start to happen with no other real changes, and so it probably is a CPU board problem. Since commenting out the line in /etc/fstab is cheaper, I'll try that first:-) Thank you very much to all who replied: Chris.Drake@Corp.Sun.COM (Chris Drake) "William (Bill) Gray" <bill%wintermute.utcc.utk.edu@utkux1.utk.edu> trr@lpi.liant.com (Terry Rasmussen) carlson@frith.egr.msu.edu (Jackie Carlson) riess@csq.uta.edu (Bill Riess) ddull@Rational.COM (David Dull) cam@janus.Berkeley.EDU (Carol Martin) Paul Quare <pq@computer-science.manchester.ac.uk> liz@neit.cgd.ucar.EDU (Liz Coolbaugh) From: Chris.Drake@Corp.Sun.COM (Chris Drake) } If this just started happening magically, without relation to any other } changes, then I'd say hardware. The 'writeback' refers to the CPU cache; } while there were a few odd cases where software could cause a panic: writeback , } these appear to have been in SunOS 3.4 or 3.5, and should not affect a 4.1 } system. If this is repeatable (like, whenever you run your application..) } then there is possibly software involvement. One way to check is to look at } the traceback information from coredumps, if you can save any: if the stack } seems to be pretty random, and the user process which was running isn't the } same one every time, then that's a good indication that your CPU board is } starting to flake out. } } Chris Drake } US Answer Center } Sun Microsystems Software Support This did just start to happen. The only thing I can remember changing at the time this started is setting up the automounter, and running it. From: "William (Bill) Gray" <bill%wintermute.utcc.utk.edu@utkux1.utk.edu> } Sun has a bug that can cause writeback errors. I had the problem on } a 4/280 running 4.1.1 and a 3/260 running 4.1. It is bug #1039410. } It is caused by having the primary swap area selected in the rc file(s) } via "swapon -a" and having it also in /etc/fstab. The workaround } is to NOT have it in both places. Here is what I did on the 3/260: } } bill mathsun1> tail -5 /etc/fstab } /dev/xy1g /export/sun4 4.2 rw 1 4 } # Per Sun the primary swap area must NOT be in /etc/fstab :bug #1039410 17Ma y91 } #/dev/xd0b swap swap rw 0 0 } /dev/xy0b swap swap rw 0 0 I have changed my /etc/fstab! From: trr@lpi.liant.com (Terry Rasmussen) } Have you tried any of the following tests: } } A) Swapping memory boards with another machine. } } B) Exchanging the cards arround on the back plane } (which of course would mean playing arround } with jumpers, no doubt for #A above as well...) } } C) Pulling a board and running with less memory for } a while, if the problem persists, then swap out } a memory board for the one you originally pulled. } Needless to say this can be a time consuming and } frustrating procedure. } } Lastly, I will bet that the problem is on the CPU board and } that the PMMU has gone bad in some "wonderful and strange } way" that is not easily or reasonably reproducable. } } Any way it goes I wish you much luck. We have a machine on } site where ultimately everything was replaced (it kept having } memory problems and "eating" system disks.) When I say every } thing was replaced I mean that the only thing factory installed } on the machine is the cabinet, over time everything had been } reaplced, even the backplane. This machine is now a "stereo rack" } for our UPS's and we are using it's system disk as data disk on } another system a few feet away. You can't win them all, but you } can sure try! Haven't tried any of these, though I have done similar things before, and may try pulling/swapping memory boards. I have replaced 2 memory boards, and the Fujitsu M2361 SMD disk in this machine already this year:-( I have a sinking feeling it's the CPU though as you also point out. From: carlson@frith.egr.msu.edu (Jackie Carlson) } The one and only time I saw this message was when I had } mounted, by including in the /etc/fstab, the root swap partition. } It's okay to mount addition swap partitions, but not the root swap } in fstab. } From: riess@csq.uta.edu (Bill Riess) } A problem which looks like a memory error, but isn't } because memory checks good, but involves the disks, is } very likely a DMA problem meaning disk controller or } motherboard. In our case we had to replace the mother } board. From: ddull@Rational.COM (David Dull) } DVMA is a virtual memory construct. First suspect is the disk drive, second } is the MMU. Third is the RAM. Definitely time for a hardware call. From: cam@janus.Berkeley.EDU (Carol Martin) } We've had the same problem on one of our 3/280s. We changed } the first memory board to no effect and decided that the } problem must be in the cache on the cpu (also suggested } by "writeback" error). We've now changed the cpu and the } verdict is still out. Let me know if this solved your problem. If mine goes away with the fstab changes, I will let you know. From: Paul Quare <pq@computer-science.manchester.ac.uk> } Check that you don't have a line in /etc/fstab for your primary } swap device. From: liz@neit.cgd.ucar.EDU (Liz Coolbaugh) } Panic writeback error: Very familiar! Look in your /etc/fstab file } for entries like: } } #/dev/xd0b swap swap rw 0 0 } #/dev/xd1b swap swap rw 0 0 } } If you also have the same partitions configured in your kernel: } } config vmunix swap on xd0b swap on xd1b } } this may be the cause of your writeback error. Try commenting the lines } out of your fstab and rebooting. It worked for us. } } Credit goes to Sun support who responded quickly with this information } once I called them ... My config line was: config vmunix root on xd0b swap generic (or something close to that, this is from memory - mine is more faulty than this machines' I'm sure:-) Again thanks to all who replied. ----------End of forwarded message----------

Hope this helps.

-- // E. John Benjamins -- <johnb@edge.cis.mcmaster.ca> // "Facts are simple and facts are straight. Facts are lazy and facts are late. \\ Facts all come with points of view. Facts don't do what I want them to." \\ - David Byrne (Talking Heads)

>From ups!kevin@fourx.Aus.Sun.COM Wed Nov 11 18:30:15 1992 Posted-Date: Thu, 12 Nov 1992 10:52:54 EST Received-Date: Wed, 11 Nov 92 18:30:13 GMT From: ups!kevin@fourx.Aus.Sun.COM (Kevin Sheehan {Consulting Poster Child}) Date: Thu, 12 Nov 1992 10:52:54 EST X-Mailer: Mail User's Shell (7.1.2 7/11/90) To: fourx!kate.as.utexas.edu!dpw@fourx.Aus.Sun.COM (David Way) Subject: Re: Writeback panic Content-Length: 1040 X-Lines: 28 Status: RO

[ Regarding "Writeback panic", fourx!kate.as.utexas.edu!dpw writes on Nov 9: ]

> Sun Managers, > > Can anyone tell me a likely cause for the following problem on a 4/490 > running 4.1.1:

A writeback error is when the cache tries doing a write to the physical memory and has a problem. One cause is unmapping the physical page without doing a flush first, but in your case it looks like a real memory error occured. I'd not worry unless it happens again anytime in the next year... >

-- David Way McDonald Observatory/Astronomy Dept.- Univ. of Texas, Austin (office) RLM 16.206 (voice) 471-7439 (internet) dpw@astro.as.utexas.edu



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:53 CDT