Hi Folks,
Sorry about the delayed summary, but here it is.
I posted:
> I have two problems on our Sun-3/80 under SunOS 4.1.1_U1. Another of our
> offices has similar problems on a Sparc, with standard 4.1.1b.
>
> 1) The first problem is to do with UUCP not dropping the DTR signal
> sometimes at the end of a call or call attempt. Our modem is a
> Trailblazer, and is set with S50=0 to accept incoming calls at both 2400
> and PEP. The dial script for PEP dialout includes S50=255, and relies on
> the DTR line being dropped at the end of a call to restore the default
> parameters. Sometimes the modem is getting left with S50=255, causing it
> not to answer incoming calls at 2400. I haven't determined exactly when
> this happens, but I suspect it is when an outgoing call fails under
> some/all circumstances.
>
> 2) The second problem is to do with getty and UUCP competing for the
> modem port, which is set up as /dev/cua0 and /dev/ttyd0.
>
> The Trailblazer modem normally sits there with the DTR light on.
> Occasionally I find that the DTR light is remaining off, and when that
> is the case I find the following in /var/adm/messages:
>
> Mar 15 00:46:54 mwuk getty: ioctl(TCGETS): Bad file number
> Mar 15 00:46:54 mwuk getty: ioctl(TCGETS): Operation not supported on socket
> Mar 15 00:46:54 mwuk getty: ioctl(TCGETS): Operation not supported on socket
>
> In this situation, the system cannot answer incoming calls, which is a
> problem to us. If I kill the getty which is on ttyd0, the new getty
> correctly sets the DTR light on again, and all is back to normal.
>
> What causes this situation? Is it something I have done wrong, or a bug
> which needs patching?
One simple idea was to add :to#60: to /etc/gettytab for the modem port.
This will cause the getty to die after 60 seconds, and is worth doing
anyway.
The actual problem was summarized as follows:
Here is what is happening. getty thinks that it has the modem
line open. It proceeds to dup it a couple of times, so that
file descriptors 0, 1 & 2 will be open on the modem. It goes on
to condition the line with several ioctl's. It doesn't bother
to check the return values of the dup's, but it does check on
the ioctl's. The hitch is that the line is *not* open. The
dup's fail without notice. When the ioctl's fail (because they
are handed an invalid file descriptor), getty reports the
failures via syslog. syslog opens a socket to log the errors.
This open returns 0, the first available fd! The ioctl failures
are logged, but getty proceeds anyway. Satisfied that all is in
order, getty does a read on fd 0 to get a login name. But fd 0
is the socket that syslog opened, not the modem. Of course,
there is never any input on this file descriptor, so getty waits
forever. What remains to be explained is how getty gets itself
into this embarassing position.
The explanation of this was:
The problem is a bit more involved than that. What happens is something
along the following lines:
[1] An interactive session ends, or the uucico program closes the port.
[2] The Sun drops DTR
[3] The Telebit notices that DTR has gone low for longer than the time
specified in the S-register, and starts the disconnection process.
[4] Some amount of time passes as the Telebit tells its peer that
disconnection is taking place, and the two modems agree to hang up.
[5] The Sun reaches the end of the "hold DTR low" period. The port is
released.
[6] Another process (usually a "getty") tries to open the port. Carrier
Detect is still high, so the open is allowed to proceed, and the tty
driver and getty begin to initialize the port.
[7] The Telebit finally gets around to dropping Carrier Detect,
signalling that the previous connection has dropped.
If you're lucky, step [6] completed before step [7] occurred. If this
happens, the getty simply receives an immediate hangup signal from the
tty driver, and exits harmlessly.
If you're unlucky, step [6] is only half-complete when [7] occurs, and
the tty driver gets hit by a hangup as it's halfway through port
initialization. Due to a bug (I infer) in the tty driver, the driver
leaves the port in a zombified state... busy, but unable to be reset.
The port becomes useless (locked up with DTR low) until the system is
rebooted.
Patching the "zsadtrlow" value in the kernel to a nice safe number like
5 or 7 prevents the problem from occurring... it ensures that the Sun
won't try to hand the port over to a new session until well after the
modems have deasserted Carrier Detect.
The actual patch, which cleared the problem for me was:
1) patching the kernel on the fly:
# adb -w -k /vmunix /dev/mem
zsadtrlow/W 6
zsadtrlow?W 6
$q
- or -
2) patching object files and make a new kernel
# cd /usr/sys/`arch`/OBJ
# adb -w zs_async.o
zsadtrlow?W 6
$q
Some respondents also said there was a similar-looking problem not fixed
by the above, since it had a different cause, but that it was fixed in
Sun patch number 100358-01, which is also incorporated in jumbo patch
100513-02.
I haven't applied either of these patches.
Many thanks to:
Real.Page@Matrox.COM (Real Page)
mike%trdlnk@uunet.UU.NET (Michael Sullivan)
tbr@tfic.bc.ca (Tom Rushworth )
Don Lewis <gdonl@ssi1.com>
Kevin Cosgrove <kevinc%solomon%qiclab@uknet.ac.uk>
and others whose past responses on this issue were forwarded by the above.
-- Tony Mountifield (G4CJO) | Microware Systems (UK) Ltd. -----------------------------------| Leylands Farm, Nobs Crook, Email: tony@microware.co.uk | Colden Common, WINCHESTER, SO21 1TH. (or: ...!uknet!mwuk!tony) | Tel: 0703 601990 Fax: 0703 601991
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:07:41 CDT