SUMMARY: Too many open files

From: Robert.Stringer@CCPEDQ.desjardins.com
Date: Wed Oct 13 1999 - 08:11:34 CDT


Hi

Here's the summary, My original message was:

> Hi
>
> I have an Ultra5 running Solaris 2.6. This server's my DNS &
> DHCP server. The
> software we are using for these services is QIP from Lucent
> Technologies.
>
> Everything runs fine until this morning when I received this
> message in the
> /var/adm/messages:
>
> "Oct 8 01:00:12 DNS1 qip-dhcpmsgd[29833]: Dynamic library
> libresolv.so failed
> to load: ld.so.1: /opt/qip50/usr/bin/qip-dhcpmsgd: fat
> al: /dev/zero: open failed: Too many open files"
>
> The server was up for 117 days before the message.
>
> Is that a problem with the daemon or with Solaris?
>
> Thank you
>
> Robert Stringer
> robert.stringer@ccpedq.desjardins.com

Thanks to:

Mike Mehran Salehi

Robert,

     It may be due to a bug in software or add the following to
/etc/system
and boot -r it, to increase descriptors.
   set rlim_fd_cur=512
   set rlim_fd_max=4096

Mike Mehran Salehi

Ken Robson

Hi Robert,

This message means that the process has exceeded more than 64 open
file descriptors, this is a per-process limit. It may be due to
errors in your QIP server, it may be failing to close file
descriptors, speak with Lucent about this. If it is not a fault you
can increase to a maximum of 1024, using ulimit.

Hope this helps,

Ken.

Rich Quinn

Robert,

more than likely one of your procs/daemons spawned off a bunch of child
procs or perhaps is holding a lot of files open.

Maybe bouncing the in.named/named daemon would do the trick
You could try running lsof against the named pid to see what all he has open.
It might be a totally unrelated proc that is doing this. You may wanna run
lsof against all your running processes.

It may be that your machine is due for a reboot but I am not advising that
as I do not know all the dynamics involved.

There is a kernel parameter that you can adjust(I think it would be in
/stand/system) that will let your system allow for more open files, but
changing that parameter would, no doubt, impact the value of other kernel
paramters. But that is an option you may wish to investigate.

Rich

Kris Briscoe

You might have a memory leak. But what I suspect is an application is
attempting to open more than the default 64 filehandles. Use the
/usr/proc/bin/pfiles <pid> command on the offending application and the first
line will show you what the default filehandle value is set to. Unless you are
modifying it using either the /etc/system file or the 'ulimit -n <value>'
command
it should read 64. To find out how many the offending app has open, use the
same pfiles command, but pipe it to 'wc -l' to find out the total number of open
files.

Regards,
Kris

Finally, the problem seems to come from the process "in.named" and and a
library file called "libbind.so". There are new versions of those processes.
I'll install and test them.

All your answers were very helpful and will give me useful hints for the future.

Robert



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:13:29 CDT