Hi All, first, thank you to all of you. My problem was (summary): This machine boots/runs fine for 1 hour to 1..2 days (we runs some apps, compilers, editors, X), then suddently stops/halts without any messages in the logs or in the console, nothing. (see complete description at the end). The root cause seemed to be the power supply. I changed it 10 days ago and the machine didn't crash since then. (I had removed the power management packages, the machine was uncovered so it got plenty of air, i removed/reinserted the memory and cpu modules several times, and my cpus are 300Mhz U2 before replacing the power supply) See the different answers i got below. Again, Thank you to all you. This is a great list. Bob >From Stephen: What CPU modules do you have in there? Some later (400 and 450 MHz UltraSparc-II) modules had problems that match quite well with what you're describing (Ecache, parity tag problems). Sometimes it shows up in the AFSR/AFAR registers, sometimes not. Updating to a newer OBP *might* help in getting better diagnostics. From Pete: I've seen two reasons for this happening: 1> the processor isn't seated 2> (most common) over-heating. Try pulling the cover off and let it run. If you've got some canned air, hit the area around the CPU and power supply >From Jeff: My best bet is you installed Solaris 8 with the power manager enabled by default. I had this same problem on an Ultra 60 I installed Sol 8 onto, and it took me a while to figure out that the power manager was enabled and was powering down the system after some period of non use. >From Stephen: You have a console/terminal server to log the console messages? Just curious, since a directly attached monitor wouldn't help if you have power-offs. I would suspect power management here, but on the other hand I would also expect it to report some action it's taking via console at least, if not syslog (/var/adm/messages) as well. I believe the config file is /etc/power.conf if you want to check it. I'm no longer sure, since we long ago purged all power management packages (along with most of the other 700+ fluff packages in the full Sun dist) from our servers, since they did nothing good for us and potentially something bad. Does "prtdiag -v" show anything interesting? >From Bruce: This sounds like the problem we were experiencing on one of our Ultra 10's. The server just stopped working without warning or anything in /var/adm/messages. It turned out to be a bad power supply. brsys wrote: > > Hi All, > > We got a new used ultra 60 on which we installed solaris 8 + last pack > of patches from sun. > > This machine boots/runs fine for 1 hour to 1..2 days (we runs some > apps, compilers, editors, X), > then suddently stops/halts without any messages in the logs or in the > console, nothing. > It is not a "normal" shutdown or halt, since it looks like the power > is switched off ! > > It's lighly loaded (1 user) connected through ssh. The stops occurs > indifferently when the > user is working or idle. I'm currently trying to reproduce when the > user works directly > on the machine screen/keyboard. > > I made some diagnostic tests but there was no problems. > > Any advice on where i should start to solve this problem ? Or any url > that can describe > some kind of process for finding the problem ? > > > Thanks a lot!!! > > Bob _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Sun Oct 5 01:37:56 2003
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:20 EST