Hello all, You wont believe this, but in addition to several suggestions via email on how to go about diagnosing this issue, we also received a phone call from the people we purchased the server from, they are sending us a new system board!!! these people are serious aren't they? I do not have the name of the company, oddly enough, so I cannot mention them. Anyways, here are the suggestions; I will go through them after I got the new system board, I also installed SunVTS 5.0 and will have it check the whole thing. Also, prtdiag -v gives this unequivocal report : Failed Field Replaceable Units (FRU) in System: ============================================== SUNW,UltraSPARC-II unavailable on CPU Board #0 PROM fault string: fail Failed Field Replaceable Unit is UltraSPARC module Board 0 Module 1 Thank you all, Mohamed~ On Thu, 2002-10-10 at 19:07, Tony Walsh <Tony.Walsh@Sun.COM> wrote: > > The "(Score 05)" part of this particular message indicates that CPU1 has a > 5% chance of being the cause of this Ecache error, so in this context CPU1 > is NOT a target for replacement. At some point earlier in this stream of > messages you should see a "(Score 95)" indicating a particular CPU has a > 95% chance of being faulty. If you find this "Score 95" then you should > change that CPU out, but if you don't see it, you may then have a memory > issue or some other condition to indicate what you original fault may be. > > You will need to find this "Score 95" message to be more sure. > On Thu, 2002-10-10 at 13:03, kboykin <kboykin@coserv.net> wrote: ... > You might need to limit the ecache to 4mb (if they are 8mb)as a > workaround to an ecache scrubbing problem. > > I don't see a CPU panic in there...but it's possible that CPU1 is bad. > You can disable a CPU from the OS: > > psrinfo to see the status > psradm -f (the id of the CPU you want to take offline, ie, 1) > psradm -n (the id of the CPU you want to bring online) > > And you can always try to reseat the CPUs, sometimes there are contact > problems with 4500 CPUs. > On Thu, 2002-10-10 at 12:42, mike.salehi@kodak.com wrote: > > It could be the fan... > Anyway if you do not or cannot fix it you have to get > that board out of there, you could transfer all memory to the > other board. On Thu, 2002-10-10 at 12:25, Tim Chipman <chipman@ecopiabio.com> wrote: > Based on this line, > > Oct 10 03:39:51 ganymede E$tag 0x00000000.0e402006 E$State: Shared > E$parity 0x07 > > it suggests that you may have E-cache error on one of your CPUs. A > pretty common problem with e3500 (8mb cache) UltraSparcII CPUs. On Thu, 2002-10-10 at 20:53, Hichael Morton <mh1272@yahoo.com> wrote: ... > the first thing to do is retorque all the CPUs. (the user/service manual and order the system engineer handbook will have information on this. it requires a specific torgue settings and a torque wrench.) > > if re-torqueing doesn't help, you can try swapping the boards to see if the error message follows the CPU. > > while you have all the server "open", make sure the memory modules are configured properly. (the above manuals/documentation will have this information also.) > > if you are in the Knoxville, TN are, let me know. _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Fri Oct 11 10:47:10 2002
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:42:56 EST