Tuning EDNS0 retries?
Viktor Dukhovni
ietf-dane at dukhovni.org
Sun May 21 18:30:14 UTC 2017
On a busy unbound 1.6.2 server I observed the following sequence of events,
in which an initial query socket is closed quickly (for a retry with a
smaller EDNS0 buffer size) and ICMP unreachable is returned by the time the
answer arrives, with the retry answer finally accepted at the retry socket
60ms after the first answer, which was dropped.
-----
1. Initial query with EDNS0 UDPsize = 8192
13:27:49.502228 IP (tos 0x0, ttl 64, id 61879, offset 0, flags [none], proto UDP (17), length 75)
108.21.89.116.30230 > 199.254.50.1.53: 65168% [1au] DS? hairbylorelei.info. ar: . OPT UDPsize=8192 OK (47)
2. ~90ms later retry with UDPsize=1472
13:27:49.591319 IP (tos 0x0, ttl 64, id 51021, offset 0, flags [none], proto UDP (17), length 75)
108.21.89.116.41507 > 199.254.50.1.53: 64543% [1au] DS? hairbylorelei.info. ar: . OPT UDPsize=1472 OK (47)
3. ~120ms from initial query response to that query
13:27:49.621226 IP (tos 0x0, ttl 58, id 38806, offset 0, flags [none], proto UDP (17), length 786)
199.254.50.1.53 > 108.21.89.116.30230: 65168*- q: DS? hairbylorelei.info. 0/6/1 ns: info. SOA a0.info.afilias-nst.info. noc.afilias-nst.info. 2011722024 3600 1800 604800 3600, info. RRSIG, adnsd9nk7nk82he8h21rj0jjhj11o5gb.info. Type50, adnsd9nk7nk82he8h21rj0jjhj11o5gb.info. RRSIG, 5p19pe3bk0hiejutcthqm2f2n674rv1g.info. Type50, 5p19pe3bk0hiejutcthqm2f2n674rv1g.info. RRSIG ar: . OPT UDPsize=4096 OK (758)
4. An immediate ICMP port unreachable, the original queries UDP socket is already closed:
13:27:49.621228 IP (tos 0x0, ttl 64, id 11225, offset 0, flags [none], proto ICMP (1), length 56)
108.21.89.116 > 199.254.50.1: ICMP 108.21.89.116 udp port 30230 unreachable, length 36
IP (tos 0x0, ttl 58, id 38806, offset 0, flags [none], proto UDP (17), length 786)
5. Finally a reply to the second query:
13:27:49.668619 IP (tos 0x0, ttl 58, id 56774, offset 0, flags [none], proto UDP (17), length 786)
199.254.50.1.53 > 108.21.89.116.41507: [udp sum ok] 64543*- q: DS? hairbylorelei.info. 0/6/1 ns: adnsd9nk7nk82he8h21rj0jjhj11o5gb.info. Type50, adnsd9nk7nk82he8h21rj0jjhj11o5gb.info. RRSIG, info. SOA a0.info.afilias-nst.info. noc.afilias-nst.info. 2011722024 3600 1800 604800 3600, info. RRSIG, 5p19pe3bk0hiejutcthqm2f2n674rv1g.info. Type50, 5p19pe3bk0hiejutcthqm2f2n674rv1g.info. RRSIG ar: . OPT UDPsize=4096 OK (758)
-----
It seems I need a non-zero value of "delay-close". What is a sensible
value for this? I've seen mention of 1500, is that about right?
How should such the server be tuned? It is used for broad DNSSEC/DANE
adoption surveys, so it is not uncommon to run around 1200 queries per
second for O(12) hours. The machine has 4 hyper-threaded cores and
64GB of ram. Unbound is linked with libevent. Relevant configuration:
num-threads: 8
infra-cache-slabs: 8
key-cache-slabs: 8
msg-cache-slabs: 8
rrset-cache-slabs: 8
key-cache-size: 256m
rrset-cache-size: 256m
msg-cache-size: 128m
neg-cache-size: 16m
jostle-timeout: 2000
interface: 127.0.0.1
so-reuseport: yes
access-control: 127.0.0.0/8 allow
edns-buffer-size: 8192
max-udp-size: 8192
outgoing-range: 6144
num-queries-per-thread: 3077
outgoing-port-permit: 1024-65535
outgoing-port-avoid: 1-1023
outgoing-num-tcp: 256
incoming-num-tcp: 256
so-rcvbuf: 12m
so-sndbuf: 12m
infra-cache-numhosts: 100000
--
Viktor.
More information about the Unbound-users
mailing list