SUMMARY: Programs hang that access /etc/.name_service_door?

From: Michael S. Peek <peek_at_apis.tiem.utk.edu>
Date: Thu Aug 08 2002 - 15:10:13 EDT
Verily, this list doth rock most righteously.

Thanks to everyone who replied to my question.

The winning Kudo goes to Doug Winter for correctly spotting that the cause is
the following line in /etc/ssh/ssh_prng_cmds:

	"arp -a -n" /usr/sbin/arp 0.02

The arp command is just one of many commands OpenSSH runs to gather entropy
for it's randomizer, and this particular command was hanging.  The -n command
isn't supported under Solaris, and there were a number of hosts in the arp
table that didn't have a name in reverse dns.  (In fact, this problem was
already reported on the archived openssh-unix-dev mailing list...  Silly me.)

Commenting out the offending line from /etc/ssh/ssh_prng_cmds did the trick.
My ssh connect time went from 1:30 to 0:08.

Thanks!

Michael

> Hello all,
> 
> I have just upgraded ssh on my Solaris 8 system and everyting works
> wonderfully except on three systems.  On these three systems ssh hangs for 37
> seconds when trying to ssh from one of these three systems to anywhere else.
> I believe I have tracked this problem down, but I don't understand the cause.
> 
> Using truss (with a rather extreme set of options: -f -a -e -l -d -tall -vall
> -xall -sall -mall -rall -wall -uall) I see the following:
> 
> 18785/1:	 1.4638	open64(0xFF226358, 0)				= 6
> 18785/1:	     0xFF226358: "/etc/.name_service_door"
> 
> ...
> 
> 18750/1:	 1.5779	close(6)					= 0
> 18785/1:	door(6, 0xFFBED430)		(sleeping...)
> 18750/1:	waitid(0, 18785, 0xFFBED748, 03) (sleeping...)
> 18785/1:	38.0478	door(6, 0xFFBED430)				= 0
> 18785/1:	38.0481	door(6, 0xFFBED4C8)				= 0
> 
> If I am reading this right, the timstamps show that between the close(6)
> (timestamp 1.5779) and the second door(6, 0xFFBED430) (timestamp 38.0478),
> there's a 37 second delay.  According to the manual page for truss, the
> timstamps signify the completion of the command, which means, if I am correct,
> that the cause is that second door(6, 0xFFBED430) on /etc/.name_service_door.
> 
> This only happens on these three hosts.  So my question is:
> 
> (1) is my diagnosis correct in thinking that the problem is with
> /etc/.name_service_door?
> 
> and
> 
> (2) what uses /etc/.name_service_door?
> 
> (This is confusing my (l)users into thinking that there's something wrong with
> their account, and they're griping to me about it.)  I'd like to restart
> whatever service is causing the slowdown, but I don't know what it is, and
> there is no mention of .name_service_door in any of the Answerbook2 libraries
> or man pages.  (I thought I would be slick and look at the inode of
> /etc/.name_service_door and then look for that inode in /proc/*/fd/*, but
> there are an awful lot of programs that have something open to that door!)
> 
> Any ideas anyone?  Should I just "punt" and reboot them?
> 
> Thanks for your input,
> 
> Michael Peek
> 
> 
> Michael Peek                                                 peek@tiem.utk.edu
> ------------------------------------------------------------------------------
> Systems Administrator / C++ Database Programmer      569 Dabney Hall
> Department of Ecology and Evolutionary Biology       Knoxville, TN  37996-1610
> University of Tennessee at Knoxville
> ------------------------------------------------------------------------------
> (865)974-0224 phone, (865)974-3067 fax           http://www.tiem.utk.edu/~peek
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Thu Aug 8 15:15:20 2002

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:42:51 EST