1.9.4: TCP queries when some threads are full

Mon Nov 25 22:56:17 UTC 2019

Hi,

following up on my own message:

> Is there a different timeout used between the connection is
> established and the first query is received than after the first
> query is received?  I'm going to assume the intention is "no".
>
> The experience I had with a mis-configured unbound for
> dns-over-tls (all 10 slots used up pretty quickly, and connection
> requests piling up, taking ages to respond if at all), and also
> the other message at
>
>   https://nlnetlabs.nl/pipermail/unbound-users/2019-August/011748.html
>
> seems to indicate that the tcp-idle-timeout feature doesn't
> actually work the way one would think it is supposed to work.
> This makes me wonder if this feature has actually been tested and
> verified to work as intended?

Hmm, Patrik pointed to comm_point_start_listening(), and it sets up a
time-out on the event if passed a suitable value in the msec argument.
However, reading my event(3) man page reveals that this function creates
a non-persistent event, and the timeout therefore gets cancelled once
the first data is handled by the event.  Is it so that the functions
tcp_callback_writer() and tcp_callback_reader() will be invoked, and
therefore re-activate the event + timeout when either reading or writing
of the DNS message is complete?

However, It looks like tcp_callback_reader() will call
comm_point_start_listening(), but only if the callback function returns
non-0, and I don't know what the possible functions invoked via
(*c->callback) might be or inspect what they may return and under what
conditions -- reading the code beyond that indirection is a little hard.
And perhaps at that point in time, the timeout value should be
re-computed?

And ... isn't the invocation of comm_point_start_listening() in
tcp_callback_writer() in effect turning *off* timeouts with the last -1
argument, when according to the comment it goes back to listening(read)?
Is it as simple as changing the last -1 in that invocation to
c->tcp_timeout_msec?

It looks like the "event & UB_EV_TIMEOUT" part in
comm_point_tcp_handle_callback() is responsible for doing a
server-side-initiated close() based on a triggered time-out?
Has the associated log message ever been observed?

Regards,

- Håvard