SUMMARY: e450 failure

From: <ktn_at_dodo.com.au>
Date: Tue Mar 16 2004 - 22:35:11 EST
Dear managers,

Thank you to Ashish, Steve, Glenn (sorry if I missed anyone) for pointing
that it's most likely a faulty CPU. I did have a faulty CPU, which results
in the machine not booting at all. I initially thought that I could use the
diagnostics given in
http://lios.apana.org.au/~cdewick/sunshack/data/sh/2.0/infoserver.central/cgi-bin/doc2html2786-2.html?intsrdb/21220
to check which CPU is faulty, but the machine never did get to OBP when
switched on, which made me falsely think it was a system board problem (Sun
suggested it was most likely that). So really I should have removed the CPUs
on by one to find out which one was not letting the e450 boot (some say
faulty memory might also cause the RED state exception but perhaps not as
bad as this).

One more thing I was confused about. In
http://sunsolve.sun.com/handbook_pub/Devices/CPU_Module/UltraSPARC_480MHz_UltraII.html,
it says that an empty CPU slot `requires' a filler. This doesn't seem to
matter as I got the machine running fine with the empty slot.

Thanks heaps:)

Some good advice from people which I find also useful for general
diagnostics:

--
1.try to run ur server in minimal configuration i.e. 4 rams (1 bank) & 1
cpu...while doing this try to see which banks & which cpu slots have to be
filled up....i  think u might have removed the cpu slot which has to be
filled thats why no display {no requirement for the system to be filled with
4 cpus } ...you can get all info on the default locations for rams & cpus
from "docs.sun.com" --> e450 service manual.

2. for "red state" exception error try to remove all ur non essential cards
& then see whether u r getting the error....also at OK prompt give
"test-all"

3.try connecting a console cable to serial port A with "diag-level=max" 
;"diag-switch?=true";"output-device=ttya" ...also put "auto-boot?=false" if
u dont want the system to automatically boot after the tests.

ashish n

--
Sounds like you likely have one of many different problems. The Red state
exception is almost assuredly a CPU problem. Not likely a Mainboard. When
you remove the system down to less than 4 CPU's you have to make sure that
you put them in the proper locations. They are marked on the main board
indicating which slot must be populated first, second, etc.. They MUST go in
that order.

You may also have a DC to DC Converter.

If you look at the mainboard, you will see 4 small boards with capacitors on
them directly underneath the memory slots. (Which incidentally also has to
be inserted in a specific order)


Here is what I would attempt to bring the machine back to life.

First, remove all memory, CPUs, and DC converters. Install 1 bank of memory,
1 CPU and preferably a different one of the DC converters in the appropriate
slots. Attempt to bring the machine up with that configuration. If that does
not work, try swapping the CPU with one of the other ones, then try again.
If nothing still, swap the DC Converter. Keep this pace up trying to boot
the machine after each and every change.

Once you have the machine booting again slowly begin populating the
remainder of the memory, CPUs, and DC converters. Boot the machine between
each of the adds. I would probably install all of the memory after the
machine booted again, then boot it. This is not likely a memory issue
anyway, but you want to be certain that it is not causing issues for some
strange reason.

You will most likely come across one of the CPUs that will hose the system
during the boot, this is your culprit.

Once you have the machine up and running again (If in the unlikely chance
you have no additional errors) install SUNvts on the machine, and let it
walk through the memory and cpus for errors. It is possible, but unlikely
that it will find anything, but you want to be certain especially if this is
a production machine.

Glenn May

--

Original post:

Dear managers,

When our enterprise 450 produced RED state exception errors
(throughout, even on reboot), we purchased another motherboard
(for 4 480 MHz CPUs with 8MB cache), after Sun's advice. Lo and
behold, the problem still exists (saw it once, on the first boot):( So
I'm afraid it's one (or more) of the CPUs causing problems (really
should have guessed that in the first place).  However, when I removed
a CPU or two, (in fact, now after placing everything back the current
problem is the same - ), nothing comes up on the console or
monitor. In fact, I only get a green light on the status LED
indicating power (only), no POST or general activity at all.

Does this happen for < 4 CPUs when you do not place a filler in the
empty CPU slots? Or am I doing something really wrong? Perhaps it's a
loose connection somewhere but I'm not sure where to look. All help is
appreciated, and I will summarize
________________________________________________

Message sent using Dodo
Internet Webmail Server
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Tue Mar 16 21:41:32 2004

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:29 EST