Thanks to : Justin Stringfellow who gave me full explanation of the problem; here is his mail 2 warnings are raised: CPU0 first, it fails writing data back to cache, the parity check fails on the data being written back - this is known as a writeback parity or "WP" event: Jun 25 23:13:29 ms1 unix: WARNING: [AFT1] WP event on CPU0, errID 0x000a52a2.24f3553f Jun 25 23:13:29 ms1 unix: AFSR 0x00000000.00800004<WP> AFAR 0x000001fe.01800f00 Jun 25 23:13:29 ms1 unix: AFSR.PSYND 0x0004(Score 95) AFSR.ETS 0x00 Fault_PC 0x100662c0 Jun 25 23:13:29 ms1 unix: UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00 CPU2 complains too; it owns the cache that is being written to, and notices that someone is trying to write crap into it's cache: Jun 25 23:13:38 ms1 unix: WARNING: [AFT1] Uncorrectable Memory Error on CPU2 Data access at TL=0, errID 0x000a52a4.417a572b Jun 25 23:13:38 ms1 unix: AFSR 0x00000000.80200000<PRIV,UE> AFAR 0x00000000.fc0662e8 Jun 25 23:13:38 ms1 unix: AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x10020f98 Jun 25 23:13:38 ms1 unix: UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0203<UE> UDBL.ESYND 0x03 Jun 25 23:13:38 ms1 unix: UDBL Syndrome 0x3 Memory Module U1404 U0404 U1403 U0403 The key piece of information is the "score". We see under CPU0 there is a "Score 95". This is the kernel decoding the asynchronous fault address register bits ("AFAR") and asynchronous fault status register bits ("AFSR") and deciding who is the culprit, and who is the victim. It gives a higher score (from 0-100) to a more likely culprit. It would be easy without this knowledge, to think that CPU2 had a problem as well - but it doesn't. ///// (o o) -----------------------------------ooO---(_)---Ooo---------------------- ------------- Elmehdi BENDRISS Administrateur Systhme email : e.bendriss@menara.ma Direction Internet Tel : +212 37 71 88 73 Maroc Telecom GSM : +212 61 47 47 09 _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Thu Jun 26 07:00:48 2003
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:15 EST