DoT with unbound

Havard Eidnes he at uninett.no
Sun Nov 24 14:35:18 UTC 2019


Earlier, I wrote:

> we're using unbound (1.9.4) on NetBSD 8.0 to provide DNS-over-TLS
> service to a diverse set of customers, at the moment mostly as an
> experiment.
>
> What I find is that over time there will pile up a set of TCP
> connections in CLOSE_WAIT state, all with 147 bytes stuck in the
> receive queue of the socket (according to "netstat -n -f inet").

I've taken a closer look, but not found a solution to this issue.

I'm doing testing with "kdig" from the knot package.  What I find
is that it can take an inordinately long time to get a response,
e.g.:

% time kdig @158.38.0.1 +timeout=120 +tls vg.no. a
...
0.000u 0.008s 0:31.80 0.0%      0+0k 0+0io 0pf+0w

31 seconds, in other words, and I've seen response times as long
as 1minute+.  The randomness pool on the server host is full, as
is it at the client, so it's not that this is running low.

While it's waiting, I can find:

tcp      337      0  158.38.0.1.853         158.38.62.77.64808     ESTABLISHED

in "netstat -f inet -n | grep 853" output on the server.  In
other words: the TCP connection is established, but the server
isn't consuming the data the client has sent.  Other clients will
leave a little less data, typically 147 bytes:

tcp      147      0  158.38.0.1.853         158.38.204.xxx.43370   ESTABLISHED
tcp      147      0  158.38.0.1.853         158.36.214.xx.41054    ESTABLISHED
tcp      147      0  158.38.0.1.853         158.38.204.xxx.49226   ESTABLISHED
tcp      147      0  158.38.0.1.853         158.38.204.xxx.41550   ESTABLISHED

These will in all probability end up as CLOSE_WAIT sockets,
because the client times out, and by the looks of it, the server
is very slow in cleaning these out as well.

Furthermore, I observe this behaviour both with libevent (2.1.11)
and when using the unbound-internal "mini-event" code.

OpenSSL is the in-tree code from NetBSD, version OpenSSL 1.0.2k
26 Jan 2017.

In the process, I ran across

  https://github.com/libevent/libevent/issues/854

but wedging that into libevent 2.1.11 didn't make any significant
improvement that I could see.

I added some logging at the top of ssl_handshake() in
util/netevent.c, and the latest test I did resulted in:

% time kdig @158.38.0.1 +timeout=120 +tls vg.no. a
...
;; Time 2019-11-24 15:06:21 CET
;; From 158.38.0.1 at 853(TCP) in 11.1 ms
0.000u 0.007s 1:08.18 0.0%      0+0k 0+0io 0pf+0w

and the corresponding log entry:

Nov 24 15:07:29 oliven unbound: [13367:0] notice: ssl handshake with 158.38.62.77 port 64713

So ... apparently unbound is Very Slow in getting around to
handling the socket and handing it off for SSL handshake.  The
reason for this has yet to be found.  Hints gratefully accepted.

Regards,

- Håvard



More information about the Unbound-users mailing list