Summary: Kill process problem

From: Andreas Höschler <ahoesch_at_smartsoft.de>
Date: Mon Jul 14 2008 - 03:59:04 EDT
Dear all,

The question was:

> I have a bad acroread process running in a zone.
>
> 	 11573 ahoesch    89M   14M cpu1    60    0   5:24:40  50% acroread/1
>
> I tried "kill 11573" and "kill -9 11573" in the zone. Nothing! I tried
> I tried "kill 11573" and "kill -9 11573" in the global zone. Nothing!
> the process still runs. I then tried to reboot the zone hosting the
> process. The zone went down and never came up again. In the global zone
> prstat still shows the process. I am stuck! What is this? Looks like a
> serious bug in Solaris!? I don't dare to reboot the whole system since
> it probably won't go down cleanly anyway. What can I do?

The bottom line of your responses was that "kill -9 ..." won't kill a  
process that hangs within a system call (e.g. read()). The process  
blocked a complete CPU for 5 hours. Seconds before I was going to hit  
reboot <enter>, the process was suddenly gone and the zone shut down.  
So I did not have to do a complete system reboot this time.

I am attaching responses in no special order.

Thanks a lot!

Regards,

   Andreas

************************************************************************ 
**********
Most probably the process is stuck in BIOREAD state, you have triggered  
the bug with the zones.

Try to:

- Kill all processes within zone concerned
- Halt the zone
- Forcibly umount all mount points referenced by the zone (using umount  
-f)
- Boot the zone again.

This always helped to me (for example if NFS mount into zone timed out.)
************************************************************************ 
**********
This:

http://opensolaris.org/jive/thread.jspa?messageID=147538

is what you are triggering.

On the other hand, sometimes you can use ``preap'' if the process is  
stuck in zombie state.

Just for my interest, what was the output of "zoneadm list -civ" after  
you issued "zoneadm halt" ?
************************************************************************ 
**********
Try truss'ing the process: truss -p 11573
this should provide some clue as to what the process is doing, possibly  
why it cannot be killed.  If this doens't work, mdb might reveal more  
information.

You could be facing a bug similar to this:
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6455727

Good luck,

-f
http://www.blackant.net/
************************************************************************ 
**********
man preap
************************************************************************ 
**********
Zone or no zone, processes can only be killed when they are not in the
middle of a system call.  If acroread is making a call (like read())
that doesn't return, it will not die.

You might try to 'truss' it and see if it's making such a call.

If a process won't die, I don't know any method of disconnecting it from
a zone so that the zone can be restarted.
************************************************************************ 
**********
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Mon Jul 14 04:01:01 2008

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:44:11 EST