Summary: system board

From: Dottie Weaver <dcweaver_1999_at_hotmail.com>
Date: Thu Aug 30 2001 - 09:05:03 EDT
Thank you to everyone for the input you have been most helpful. It has been 
resolved by the client wanting the system board pulled until their new 
system arrives and by me seeing that you can't live by OS and software 
alone.

Again thanks for the help and below are the responses.

Plug in your laptop to the 25pin port (serial) on the back of
system,
run reboot, it will gives you all information that you need to tell your
Client.

Good luck.
Hoang
Verizon Global Network Inc.

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@


A bad system board could show up as any number of errors.  Depends on
whether it is an I/O board or a CPU/memory board.  If it is an I/O
board,
I'd expect random problems with some or all of the I/O devices connected
to
it (ethernet controller, SCSI controllers, etc).  If it is a CPU/memory
board, probably random CPU panics, watchdog resets, memory allocation
errors, ECC errors, etc.  Of course, it could possibly show almost any
random error that could be caused by the memory or processor hiccuping
because of the board.  What kind of errors are you seeing?

Is this the only board of its type in the machine?  If not, can you pull
it
out?  Let the system run for awhile without it and see if it stabilizes.
If it does, that certainly points to the board or the components on the
board.   If not, then it points to something else.

In any case, take the board out and make sure everything is seated
correctly.  If it is a CPU/memory board, try and torque down the
processors
to make sure they have a good connection.

Why doesn't the client believe SUN?  Do they have a specific reason that
they think they know better than the manufacturer?  If the system is
under
maintenance, let SUN come out and replace the board and see if that
solves
the problem.  If not, then you know they were wrong.

-spp
--
Stephen P Potter	Columbus, Ohio, USA		 spp@spotter.yi.org


^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


Run SUNvts (validation test suite)  -Val
Val Popa


&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&



Tell them SUN wants to replace the system board.

Let them bitch about it, and let them tell SUN to replace everything
else.
Then, when they finally have to replace the system board, know that they
are
sitting in their beancounter offices waiting for you to come by and tell
them "I told you so".

You're the tech person there, why are they telling you how to diagnose
hardware anyway :)

--Mark

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


I have had experience with Sun in which they have been wrong about the
system board, so I can sympathize with you, and your customer...  Having
seen it from both sides.  Sun, in my experience is very reluctant to
recommend a system board replacement, if you are under contract, so that
might be the best indicator, if your customer is under contract.  If
not...

In our experience with the Non-ultra (read Sparc II platforms), a
failure
of the system board can result in system panics for unknown reasons,
some
even at a level that do not allow you to gather "savecore" after the
panic.
System board problems that we have encountered have even shown up as
memory
problems.

I am assuming that running advanced diagnostics (diag-mode true on
eeprom,
and diag switch set at reset), have shown nothing (not uncommon), and
that
setting KADB mode and the deadman kernel switch also have not resolved
the
problem.  When all of these fail to indicate a problem, it has been my
experience that you are dealing with a system board, CPU, or internal
component problem.

On the UE10000, system board problems manifest themselves in strange
ways
also, we have seen system boards fail, and it appears to be a qfe
network
interface card failure.

Hope this helps you.  Feel free to contact me if you have any more
questions, or want to discuss my opinion on a specific set of
circumstances.

Glenn M. Richards
Senior Systems Administrator
Yellow Technologies, Inc.
glenn.richards@yellowcorp.com


############################################################################

I fhtey refuse to believe sun, go ahead and rebuild the server for them,
BUT provide them with a written release that YOU feel that it is a bad
system board, and that they are refusing the reccomended repair of the
unit, and you can make no guarentees of the rebuild.  odds are that you
will NOt be able to convince them that it is a bad system board.  You can
also load sun VTS it's on the solaris 8 media, and run it to isolate any
problems on the system.
Geoff Reed

*************************************************************************************************************

I trust you know about the "-v" switch for prtdiag.  "v" for verbose.
It will give you quite a bit more information: cpu temperatures, fans,
power supplies, memory, etc., etc.  You can cat it to a file and email
it to Sun and let them diagnose it.

(I don't have prtdiag loaded on my machine so I can't see what all it
does or if the man pages for it are loaded.  All of this is from my
less-than-clear memory.)

Bad system board: memory errors (permanent and trasitory), I/O problems
(hard drives disappear, arrays dropped, network cards not working,
etc.), CPUs not seen,  incorrect time, just not working.

If the CPUs are screwed down, they may need to be re-torx-ed.  Loose
CPUs will cause the many of the same type of problems as a bad system
board.  Sun has a special tool to re-torx the CPUs.

I am a Sun Field Engineer with a Sun partner and I have never seen the
1-800-USA-4-SUN hardware guys NOT get it right.  They are usually Sun
FEs that retire to nice air conditioned call centers.  And they like to
be accurate.

I hope you are paid by the hour.  If I was, I would tell them that I
agreed with Sun but would be more than happy to reload the server.  (Do
you want that during 8-to-5 or would you prefer after hours/overtime?)
But I have been in similar situations to what you have described.  When
I contrasulted I told the younger/newer folks to remember what we are in
it for: the paycheck!  That always seemed to relieve some of the stress
for me.

(The last paragraph was meant to be encouraging.  Being a consultant, or
a contractor like me, can be tough at times.)


Hope this helps,

Michael Horton

__________________________________________________________________

I had a bad MB on a SB1000 that caused the machine to dump core
randomly.
I just had Sun swap the MB.  Hasn't crashed since.

I've also had a couple E450's where they will just drop off the network,
but when I go to check them out everything seems fine.  The on-board NIC
had gone bad, and a motherboard replacement was the cure.

HTH,
Will
MIS
Will Froning



-----Original Message-----
Subject: SUN system board


Help please, I'm at my wits end.

As a consultant I have replied on SUN for some diagnosis but this client
doesn't believe what SUN has to say, so I need help.

SUN has diagnosed a problem with a server as being a bad system board
but
the client doesn't think that this is the problem. I know that the
prtdiag
isn't going to be enough to convince them, so does anyone have any other
ideas as to what I could use to show them the problem.
Rather than replace the board they want to rebuild the server.

What type of problems have been seen out there that were result of a bad
system board?

Thanks so much, its hard being a consultant some days.




Dottie Weaver


_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp
Received on Thu Aug 30 14:05:03 2001

This archive was generated by hypermail 2.1.8 : Wed Mar 23 2016 - 16:25:03 EDT