Ok. This my summary to the strage fileserver chrashes that I had had a week
Thank all you who replied. Specifically:
Earl R. Cooke Rik Schneider
Gwendolynn ferch Elydyr
Whew, what a list, hope I didn't miss anyone.
Orig. help question.
>I have had the strangest crashes on my fileserver. Its a SS20, with dual
>75 mhz cpus. It has 18 SCSI drives on it, and 2 scsi cards. Its a very
>important system, that is crashing almost everyday, and its driving me
>crazy! Its running Solaris 2.5, its a NIS+ server and our main file server.
>Its crashes happen at many differnt times of the day, but has only
>happened when people are actually working( no 2am crashes.) It doesn't log
>anything strange in the /var/adm/messages. Its files systems are NOT
>Basically what happens, is that every machine loses contact with it,
>I can still sit down at it and log in. Sitting at the machine, everything
>seems fine. Except I can't get out on the network at all. I monitor the
>server's port on its switch, but there doesn't seem to be any errors going
>over the network. And other machines that don't rely on the file server(NT
>machines) all can communicate just fine with each other. A reboot of the
>machine, brings everything back to life as normal. Untill maybe 20-40 hours
>later, and then it goes into this state again.
>So it seems like the the culprit is the server itself, maybe its network
>interface, or a driver. Has anyone seen anything like this before? Is
>there a way I can try to reset the network card itself, bring it down then
>bring it back up?
>The only thing on the network that has really changed is that we added
>another router connecting us to our other offices. I would think maybe its
>a router problem, but it works just fine for a few days, and then the
>machine dies. I would have thought if its a router problem, that the
>machine wouldn't work all the time. Not just at random times.
>Thanks in advance for your help. I'll summarize.
I never really got my problem solved. But at least the machine is crashing
differently now. Doh. Heres what I did.
I found that I couldn't ping or connect to anything whatsover. It seemed my
network card, or port was bad. The switch I was connected to was just fine.
I tried switching from a 3com 3300 to a 3com 1000 and it was still crashed.
I started to log a number of important things. I noticed that the netstat
-r routing tables would forget about all the routers I have in our network.
I tried setting a default router, but that did not help.
Finally, I have gotten the machine to run, and not crash like it did
before. The bad part was, I made two big changes, so I really don't know
which one it was. I switch from the built int ethernet port, to an
transceiver off of a AUI port. I also laid down the law and told everyone
that if they logged on to do anything on the file server I would kill their
process. Well, the system ran stable for a week. And never crashed. Untill
It was a different crash. This time, instead of just losing conection. The
server went dead. Stop-a didn't even work. Nobody(I think) was logged into
it. It finally came back up, and it paniced 20 minutes later and crashed
I basicalyl decided to call it quits with this machine, and tell the big
heads we gotta replace our aging file server. With 90% of our machines
faster and better than our file server, I see no reason to keep on using
it. Except that its gonna be an all nighter switching everything over.
So thanks everyone for your help, I don't think I really solved the
problem. But I hopefully will with a whole new machine. If anyone has any
detailed questions as to some of the little things I did feel free to ask.
Nothing seemed to work on my end. And I got users up in arms hounding me to
just replace the damn machine. So thats what I will do.
Grant Schoep, email@example.com
L3 Communications Telemetry & Instrumentation
San Jose,CA (408)271-0800, Ext. 135
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:12:50 CDT