DoT with unbound
Havard Eidnes
he at uninett.no
Sun Nov 24 14:35:18 UTC 2019
Earlier, I wrote:
> we're using unbound (1.9.4) on NetBSD 8.0 to provide DNS-over-TLS
> service to a diverse set of customers, at the moment mostly as an
> experiment.
>
> What I find is that over time there will pile up a set of TCP
> connections in CLOSE_WAIT state, all with 147 bytes stuck in the
> receive queue of the socket (according to "netstat -n -f inet").
I've taken a closer look, but not found a solution to this issue.
I'm doing testing with "kdig" from the knot package. What I find
is that it can take an inordinately long time to get a response,
e.g.:
% time kdig @158.38.0.1 +timeout=120 +tls vg.no. a
...
0.000u 0.008s 0:31.80 0.0% 0+0k 0+0io 0pf+0w
31 seconds, in other words, and I've seen response times as long
as 1minute+. The randomness pool on the server host is full, as
is it at the client, so it's not that this is running low.
While it's waiting, I can find:
tcp 337 0 158.38.0.1.853 158.38.62.77.64808 ESTABLISHED
in "netstat -f inet -n | grep 853" output on the server. In
other words: the TCP connection is established, but the server
isn't consuming the data the client has sent. Other clients will
leave a little less data, typically 147 bytes:
tcp 147 0 158.38.0.1.853 158.38.204.xxx.43370 ESTABLISHED
tcp 147 0 158.38.0.1.853 158.36.214.xx.41054 ESTABLISHED
tcp 147 0 158.38.0.1.853 158.38.204.xxx.49226 ESTABLISHED
tcp 147 0 158.38.0.1.853 158.38.204.xxx.41550 ESTABLISHED
These will in all probability end up as CLOSE_WAIT sockets,
because the client times out, and by the looks of it, the server
is very slow in cleaning these out as well.
Furthermore, I observe this behaviour both with libevent (2.1.11)
and when using the unbound-internal "mini-event" code.
OpenSSL is the in-tree code from NetBSD, version OpenSSL 1.0.2k
26 Jan 2017.
In the process, I ran across
https://github.com/libevent/libevent/issues/854
but wedging that into libevent 2.1.11 didn't make any significant
improvement that I could see.
I added some logging at the top of ssl_handshake() in
util/netevent.c, and the latest test I did resulted in:
% time kdig @158.38.0.1 +timeout=120 +tls vg.no. a
...
;; Time 2019-11-24 15:06:21 CET
;; From 158.38.0.1 at 853(TCP) in 11.1 ms
0.000u 0.007s 1:08.18 0.0% 0+0k 0+0io 0pf+0w
and the corresponding log entry:
Nov 24 15:07:29 oliven unbound: [13367:0] notice: ssl handshake with 158.38.62.77 port 64713
So ... apparently unbound is Very Slow in getting around to
handling the socket and handing it off for SSL handshake. The
reason for this has yet to be found. Hints gratefully accepted.
Regards,
- Håvard
More information about the Unbound-users
mailing list