For the benefit of others who might run into the same problem, Here are the responses to my post regarding: Sunfire v880 reboot (My original post from Mon, Feb 14, 2005 is included at the end.) I should have mentioned in my original post that this system has been running the same levels of software (Solaris 5.9), firmware (OBP), and hardware configuration since August 2004, and such a thing has never happened before. Many thanks to those who offered opinions, theories, possibilities, and suggestions! My remarks concerning my particular situation are intertwined in their responses. --------------------------------------------- Bill R. Williams <brw@etsu.edu> ------------------------ ETSU Library Systems From: Peter A. van Gemert >I have no clue on what went on in your system but could it be an >faulty UPS? Possibly, but I don't think the UPS is the culprit. From: Eric Noriega >Have you looked for a crash dump under /var/crash ? There was no crash dump in there. (That is the area defined in my 'dumpadm'.) The following from joe_fletcher gets my vote for most probable cause: From: "joe_fletcher" >Usual thing in these situations is a watchdog reset. Tends >to be nothing in the logs as it's about as hard a reset as >you can get short of using a hammer. The only place you will >see anything is on the console so, assuming you have it >configured, take a look in the RSC buffer logs for whatever >records remain. >Cause is generally hardware related. I'd also run psrinfo. >You might find the thing is now running on an odd number of CPUs. I've >seen this happen a few times. My CPUs are all online & functioning. Also, prtdiag -v indicates everything within tolerances and "OK". From: "Michael Horton" >How is your power run? > >3 v880 power supplies into 1 ups? >(no redundancy) >3 v880 power supplies into 1 power circuit? >(no redundancy) > >if your ups has a glitch (and they do), you have a power event. I am not going to rule this out. From: "Eric Paul" >We had a similar issue a few months ago with one of our servers... >They replaced two CPU modules, and several banks of RAM before the >problem went away. Something to be aware of, there is an FCO for >certain memory modules which were installed on a number of 880s >(though Sun is not talking about it much...) I only found out from >my FE. You might want to put in a call to tech support and see if >they can give you the lot numbers and check the RAM out. > >The other thing you might want to do it set up syslog to point to a >central logging server. I've found a lot of times when Sun boxes go >down hard, they don't flush the last logs to disk. But the central >server does get the logs and that's given me more information to go >on. From: Daniel Vega >obp down rev maybe? On Mon, Feb 14, 2005 at 06:00:17PM -0500, Bill R. Williams wrote: > SunOS localhost 5.9 Generic_117171-07 sun4u sparc SUNW,Sun-Fire-880 > This afternoon, this machine just rebooted, and I cannot find the why! > > Following the reboot, all status lights on the v880 are normal, and > all disk drives are functioning. > There is no crash dump, and the only thing I can find in the logs > which indicate a glitch is in the /var/adm/messages file: the last > entry before the "new" boot-up entries is a "line" of ~308 NULL bytes. > > I've run prtdiag and all temperatures, fans, etc. look Ok. > Things look correct from 'metastat'. > > This unit has 3 power supplies which are plugged to UPS, so it wasn't > a glitch in power service coming to the machine, and if it's a power > supply the thing is supposed to be able to continue with two of them > functioning. And there's no indication of any problems (prtdiag) with > either of the three. > > Anybody seen this sorta thing happen? > (Maybe there's some gremlin in the v880 and/or Solaris 9 that I've > missed.) > > This sorta thing makes me nervous. _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Fri Feb 18 15:40:55 2005
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:43 EST