SUMMARY:300 <defunct> processes!

From: Chang, Lincoln (CHANGL@svlv03.scs.philips.com)
Date: Wed Aug 23 1995 - 17:26:00 CDT


Hi Sun manager:

My original question:
>
> I have a Sun Sparc 5 with Sun OS 4.1.3, and just got console error msg
> 'vmunix proc: is full'. I have already rebuilt my kernel several months ago
> to increase the max. proccesses to 28. After further check, I found I have
> 300 processes (!!) of the following:
>
> root 19813 0.0 0.0 0 0 ? Z Aug 11 0:00 <defunct>
> root 23931 0.0 0.0 0 0 ? Z Aug 11 0:00 <defunct>
> root 23929 0.0 0.0 0 0 ? Z Aug 11 0:00 <defunct>
> :
>
> Would anyone told me what's <defunct>? Can I safely kill all these processes
> without rebooting the system? They all look alike except proc PID is
> different. Well, my system is a mail gateway, socks server and a bastion
> host.
>
> I suspect something wong at Aug 11 0:00 that it created so many processes! -
> a intruder (?) How can I check it out?

Summary:

These processes are zombie processes - that means they have exited (or dead),
and whose parent process didn't properly wait for before terminating (ie.
By design, processes will notcompletely exit until the parent has
picked up their exit status).

The "zombie" name comes from the status entry which should be a "Z."
Since the process has essentially exited, information regarding the
name of the program is no longer available, so the name <defunct> is
used indicating it's dead and gone.

I cannot kill these defunct processes individually. It's often a problem
with long-lived daemons who fail to properly reap their children (if
the parent dies then the defunct processes get picked up by the parent's
parent & so forth, and usually you get back to init or inetd, both of whom
are good about picking up the corpses).

I use 'ps -alx' and found out their parent is 'sockd'. Once I kill their
process 'sockd', these defunct processes is gone. Great!

Thanks a lot to the following people (if I miss someone - sorry):

Kevin Sheehan, Thomas Chai, Martin Redmond, Casper Dik, Adam Nevins, David
Gunn, Rahul Roy, Tim Bradshaw, Jon Masyga, Jon Masyga, Amy Hollander,
John Rosenberg, Michael T. Sullivan, Ray Trzaska, Stephen Harris, Andy
McCammont, Jerome Alphonse, Don Lewis, Gregory Bond.

Well, I have the other problem now as following:

These <defunct> processes from 'sockd' increases daily, and as soon as I
kill their parent process, these processes are gone. My problem is there is
some time several 'sockd' processes exist at the same time in the system (I
only activate one in rc.local). If I kill that special parent 'sockd' and
restart the main (?) 'sockd' daemon by typing '/usr/etc/sockd', I get
the following error if there is still other 'sockd' exist in the system:
        sockd [17393]: error - main bind() Address already in use

and the main (?) 'sockd' process WILL NOT BE STARTED. I must wait the other
'sockd' automatically exit itself or when there is no more 'sockd' exist in the
system, then I can restart the 'sockd' process.

I will post the above problem to 'sockd' mailing list, and I will not
summary in here but mail to those 'me too' people (unless many people
interest in it).

Best Rgds,

Lincoln Chang
(VAX/SUN/Internet system manager, Philips Semiconductors Sunnyvale site)



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:32 CDT