odd increase in SERVFAIL with "misc failure" reason
Wolfgang Breyha
unbound at blafasel.at
Wed Nov 6 13:03:03 UTC 2024
Hi!
I'm operating a small private (low volume) recurser for my own purpose for
years using unbound since about 1.6.x. Without (recognized) issues so far.
But with 1.22+ I noticed some oddities with unexpected SERVFAILs.
Incoming requests are made with DoT on port 853 and locally (classic port
53). My config mostly uses defaults except [0].
I first recognized it with failed mail reception from GMX, because unbound
occasionally was not able to resolve the PTR RRs of their outgoing mail
relay. The "verb 1; log-servfail: yes" log showed only
error: SERVFAIL <18.15.227.212.in-addr.arpa. PTR IN>: misc failure
A closer look to the logs showed a lot of rather odd "misc failure"s. eg.:
error: SERVFAIL <ctldl.windowsupdate.com. AAAA IN>: misc failure
error: SERVFAIL <alexa.amazon.de. A IN>: misc failure
error: SERVFAIL <www.paypal.com. A IN>: misc failure
All of them worked at a later retry as expected.
I searched the source for the "misc failure" message and found the new (at
least to me) option "max-global-quota" as one reason. Afterwards I raised
the verbosity to 3 to see more details. At the same time I added
msg-cache-size: 4m
num-queries-per-thread: 4096
rrset-cache-size: 8m
cache-min-ttl: 10
cache-max-negative-ttl: 3600
infra-cache-min-rtt: 100
to [0]. But I still didn't change the "max-global-quota" default.
To my surprise this also influenced the "misc failure" rate positively and
only some "in-addr.arpa" SERVFAILed with it. They all triggered the
"request xxxx has exceeded the maximum global quota on number of upstream
queries yyy" message in the debug log.
I then removed the modifications from the config again and returned to
plain [0] and the raised rate of "misc failures" including quite prominent
zones returned as well.
eg.:
debug: request 3.pool.ntp.org. has exceeded the maximum global quota on
number of upstream queries 155
debug: return error response SERVFAIL
Searching for the highest "number of upstream queries" gave 180 for
error: SERVFAIL <at.mirrors.cicku.me. AAAA IN>: misc failure
This one failed again when I retried while writing this mail with "139".
The second try gave the correct answer.
Obviously the cache size and primarily the contents influences the needed
maximum number of requests.
I'm wondering if I'm the only one seeing this?
IMO either the default of 128 is simply to low for low volume recursers or
there is some other oddity with this option.
Greetings,
Wolfgang Breyha
[0] config (stripped access, tls keys, common stuff)
outgoing-port-permit: 32768-60999
outgoing-port-avoid: 0-32767
so-rcvbuf: 4m
so-sndbuf: 4m
so-reuseport: yes
ip-transparent: yes
max-udp-size: 4096
log-servfail: yes
harden-glue: yes
harden-dnssec-stripped: yes
harden-below-nxdomain: yes
harden-referral-path: yes
qname-minimisation: yes
aggressive-nsec: yes
use-caps-for-id: no
unwanted-reply-threshold: 10000000
prefetch: yes
prefetch-key: yes
rrset-roundrobin: yes
minimal-responses: no
val-clean-additional: yes
val-permissive-mode: no
serve-expired: no
val-log-level: 1
--
Wolfgang Breyha <wbreyha at gmx.net> | https://www.blafasel.at/
Vienna University Computer Center | Austria
More information about the Unbound-users
mailing list