SUMMARY: Am I having a hardware problem, or am I misinterpreting data?

From: Jeff Welsch <jeff.welsch_at_enviz.com>
Date: Tue Aug 06 2002 - 12:29:18 EDT
Managers,

Thanks for the responses to my original query, which is included below. 
The consensus was, and I have known this but not quite grasped it
apparently, that a single thread cannot take more that 100/N where N is
the number of CPUs, but a single multi-threaded process can.  

Thanks to:

Darren Dunham <ddunham@taos.com>
Dave Mitchell <davem@fdgroup.com>
Kevin Buterbaugh <Kevin.Buterbaugh@lifeway.com>
Wanke Matthias <Matthias.Wanke@itellium.com>

Original question:
	
Managers,

I am not sure if I am misinterpreting data, or what I see is evidence of
an imminent hardware failure.  Let me explain:

Recently I had an E220R (2x450 2GB RAM with attached A1000) panic and
reboot itself for no apparent reason.  When speaking with Sun support
they informed me that unless I had a logging terminal connected to the
console I would not be able to capture the output of any associated
error messages (anyone know if a Cisco 2511 acting as a term server has
this capability?).  

The day before this reboot occured I had noticed strange behavior on the
system, and now that I see the same behavior, I wonder if the machine
will be rebooting itself anytime soon.  The behavior I saw was an
increase in load from a normal 1-2 at idle to 3-4 at idle.  Also, during
peak load the average load would spike to around 14.  What was most odd
however was that the percentage of the java process running on this
sytem (the only application running on the server) would take upwards of
90% in top.  Prior to this odd behavior the most java would consume was
45%. 

My understanding of CPU usage in an SMP environment is that the most CPU
a single process could consume was 100/N percent, where N is the number
of processors.  Java topping out around 50% would empirically verify
this rule.  

Am I correct in the 100/N rule of CPU usage?  It seems that when top
suddenly reports that java is consuming twice as much resources that a
CPU has been taken offline.  psrinfo reports that both processors are
online and there are no error messages in dmesg.  Is this evidence of an
upcoming hardware problem, or am I mistaken in my understanding of CPU
usage in an SMP system?
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Tue Aug 6 12:32:09 2002

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:42:51 EST