SUMMARY - Multiple Power Supply Replacements in Sun E250

From: Chris Hoogendyk <hoogendyk_at_bio.umass.edu>
Date: Thu Sep 24 2009 - 15:18:08 EDT
Problem resolved.

Original email at bottom.

Thanks to Sean Walmsley, Francisco Roque, Paul Kraus, Bryan hodgson, Tim 
Bradshaw, Michael Horton, J.E., Todd A. Cox, and Joseph A. Belford. Todd 
nailed it, and a couple of others had pieces of the same puzzle.

First I took the original "failed" power supplies and put them in a 
spare E250 that had been set up with adequate hardware components to 
run. They came up fine and continued to be fine for over 24 hours.

Then we took an E250 that we had been gutting for parts and removed the 
Power Distribution Board (The E250 Owner's Manual has simple 
directions). Last night, around 8pm, my boss and I both came in, took 
down the server, replaced the power distribution board, and brought it 
back up. I had set the eeprom for diag-level=max before we took it down. 
It came up clean without any trouble. It has remained clean through 
today. Previously, we had incessant complaints in /var/adm/messages 
about power supply 1 having failed again.

The only slight bit of difficulty we had was that there are a lot of 
cables connecting to the power distribution board. They all have to be 
disconnected and reconnected correctly, and where the excess hangs has 
to be out of the way of things that move during hot swap. First time we 
tried to slide a power supply back in, part of a cable loop had gotten 
in the way. We had to rearrange things and then try again.

Note: in looking over the internals of several E250's, mostly in the 
direction of 10 years old, there was no sign of any of the capacitor 
swelling or leakage that we routinely see in low cost PCs.


---------------

Chris Hoogendyk

-
   O__  ---- Systems Administrator
  c/ /'_ --- Biology & Geology Departments
 (*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst 

<hoogendyk@bio.umass.edu>

--------------- 

Erdvs 4




-------- Original Message --------
Subject: 	Multiple Power Supply Replacements in Sun E250
Date: 	Mon, 21 Sep 2009 16:19:49 -0400
From: 	Chris Hoogendyk <hoogendyk@bio.umass.edu>
To: 	Sun Managers List <sunmanagers@sunmanagers.org>



I'm tossing this to the list because I'm sure there is something I'm 
missing here.

We have a number of E250's that have been in operation for a number of 
years. We haven't had any trouble with any of them.

A couple of years ago, we also took in 10 used E250's that were being 
discarded by another department on campus. We put 3 of them into 
operation, collecting parts from some of the others and adding new disk 
drives. The rest were set aside in our store room for scavenging. 
They've just been sitting there for a couple of years now.

Now to the problem. Around the beginning of September we noticed a 
service light on the front of one our E250's. Turns out it was 
complaining that power supply 1 had faulted. That power supply showed AC 
in but no DC out on its indicator lights. So, we went back to our store 
room, pulled a power supply, and hotswapped it. Since the hotswapped 
supply had been in the off mode when it was put in, we had to turn the 
switch on the front of the E250 to diagnostic and back to run. That 
turned off the service light. Cool. That was Sept. 3.

Then on the weekend of Sept. 12/13 there were 3 warnings in 
/var/adm/messages on Saturday night saying first that power supply 0 was 
faulting and then that power supply 1 was faulting. However, they seemed 
to be separated in time in some way so that it didn't take down the 
server. Then, on Sunday around 4pm, the server went down. The indicator 
lights pointed to power supply 0. My boss swapped that out. Weird.

Then, same E250, started reporting power supply 1 faulted midweek the 
following week. We've been under an onslaught of other work, so we 
didn't notice it right away. Anyway, when we did notice it, I did an 
inventory of our stored E250's, picked the newest one based on serial 
numbers, that had been stored above ground level (paranoia about water 
leakage), and pulled its upper power supply 1, and replaced that for the 
"faulted 1" in our running E250. That gave us about 10 minutes of 
respite from the warnings. Then the warnings resumed, saying power 
supply 1 not ok.

This just doesn't make sense.

Is there something we are doing wrong? Is flipping the switch to 
diagnostic and back to run inadequate to really set the power supply to 
be in the on mode? Is there likely something more serious wrong with 
this E250? Should we be looking at swapping out the whole box? Have 
these additional power supplies just gone stale from sitting idle for a 
couple of years? And, can anyone give any guidance on how to 
authoritatively diagnose what the problem really is? This happens to be 
the one department that has the most trouble coming up with money for 
any kind of equipment updates/additions/repairs.

Thanks,


 

-- 
---------------

Chris Hoogendyk

-
   O__  ---- Systems Administrator
  c/ /'_ --- Biology & Geology Departments
 (*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst 

<hoogendyk@bio.umass.edu>

--------------- 

Erdvs 4
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Thu Sep 24 15:19:18 2009

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:44:14 EST