First, thanks to Eric Cortes Trujillo, Casper Dik, Tim Chapman, Chris Hoogendyk, Dan Green, Ryan Krenzischek and Lar Hecking. I have my latest SunFire 280R back in service at this time. Sun replaced both power supplies, stating that both had failed at the same time. Guess its not going to be redundant when there isn't anything to be redundant with. The new power supplies are not the same part number, not sure if thats a Sun thing or just a different batch. My biggest issues were with the lack of error messages and a warning that my power supplies failed/were failing, along with a prtdiag that was very incomplete on the 4 failed servers. There were also differing packages for power in /var/sadm/pkg. I also found out that many of my other 280R servers had incomplete prtdiags along with differing power packages.. All servers were built from a jumpstart and identical except for additional disk space layouts. All servers were patched with the Solaris8 recommended cluster patches. Since I hadn't turned this latest failed server back over to the team yet, I went ahead and rebuilt it using the jumpstart and then applying the cluster patches again. My results were mixed, with an incomplete prtdiag and patch level. I applied the patch cluster again looking for failures or successes and checked again. Better but not identical to the good servers. My logs showed success but the patches were incomplete. Casper recommended not using the install cluster due to problems seen with loading these cluster patches. I went ahead and did another jumpstart, and a clean build, then applied the patches without the install script. Voila! Success. I then pulled the primary power supply and got loads of errors, but the server stayed up. prtdiag showed the failed power supply. I then put the power supply back in, maintenance light went out, prtdiag showed clean, no failures. I then pulled the second power supply and the server shutdown. I checked the power cords, made sure everything was installed correctly and hit the power button. The server came up fine to single user and all my errors were that the second power supply was bad. I then put the second power supply in again and all errors cleared. Its been up and running since early this morning. I've been told that I took the power supply out too quickly while testing and that I didn't leave sufficient time to allow the system to catch up. I haven't had time to check it. Also of note, with the system up and the second power supply pulled, there wasn't a maintenance light on the front panel. I built another 280R using the 02/02 release and the patch cluster and it seems to be working out so far. I don't think its an issue with the jumpstart since some of the servers are fine, but am leaning towards patch cluster issues at this time. I tried on the previous build to download another recommended and run it, but it appeared to be identical issues with installing as a cluster. Summary: I will be applying patches across the board to all servers and ensure that they are all identical and have the same packages and patch level. Also for those using BigBrother, with everything correct on the server, BB picked up the failed disk when it was pulled. thanks again to all. Skip _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Tue Jun 22 12:59:44 2004
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:34 EST