odd increase in SERVFAIL with "misc failure" reason

Wolfgang Breyha unbound at blafasel.at
Wed Nov 6 13:03:03 UTC 2024


Hi!

I'm operating a small private (low volume) recurser for my own purpose for
years using unbound since about 1.6.x. Without (recognized) issues so far.

But with 1.22+ I noticed some oddities with unexpected SERVFAILs.

Incoming requests are made with DoT on port 853 and locally (classic port
53). My config mostly uses defaults except [0].

I first recognized it with failed mail reception from GMX, because unbound
occasionally was not able to resolve the PTR RRs of their outgoing mail
relay. The "verb 1; log-servfail: yes" log showed only
error: SERVFAIL <18.15.227.212.in-addr.arpa. PTR IN>: misc failure

A closer look to the logs showed a lot of rather odd "misc failure"s. eg.:
error: SERVFAIL <ctldl.windowsupdate.com. AAAA IN>: misc failure
error: SERVFAIL <alexa.amazon.de. A IN>: misc failure
error: SERVFAIL <www.paypal.com. A IN>: misc failure

All of them worked at a later retry as expected.

I searched the source for the "misc failure" message and found the new (at
least to me) option "max-global-quota" as one reason. Afterwards I raised
the verbosity to 3 to see more details. At the same time I added
	msg-cache-size: 4m
	num-queries-per-thread: 4096
	rrset-cache-size: 8m
	cache-min-ttl: 10
	cache-max-negative-ttl: 3600
	infra-cache-min-rtt: 100
to [0]. But I still didn't change the "max-global-quota" default.

To my surprise this also influenced the "misc failure" rate positively and
only some "in-addr.arpa" SERVFAILed with it. They all triggered the
"request xxxx has exceeded the maximum global quota on number of upstream
queries yyy" message in the debug log.

I then removed the modifications from the config again and returned to
plain [0] and the raised rate of "misc failures" including quite prominent
zones returned as well.

eg.:
debug: request 3.pool.ntp.org. has exceeded the maximum global quota on
number of upstream queries 155
debug: return error response SERVFAIL

Searching for the highest "number of upstream queries" gave 180 for
error: SERVFAIL <at.mirrors.cicku.me. AAAA IN>: misc failure

This one failed again when I retried while writing this mail with "139".
The second try gave the correct answer.

Obviously the cache size and primarily the contents influences the needed
maximum number of requests.

I'm wondering if I'm the only one seeing this?

IMO either the default of 128 is simply to low for low volume recursers or
there is some other oddity with this option.

Greetings,
Wolfgang Breyha


[0] config (stripped access, tls keys, common stuff)
        outgoing-port-permit: 32768-60999
        outgoing-port-avoid: 0-32767
        so-rcvbuf: 4m
        so-sndbuf: 4m
        so-reuseport: yes
        ip-transparent: yes
        max-udp-size: 4096
        log-servfail: yes
        harden-glue: yes
        harden-dnssec-stripped: yes
        harden-below-nxdomain: yes
        harden-referral-path: yes
        qname-minimisation: yes
        aggressive-nsec: yes
        use-caps-for-id: no
        unwanted-reply-threshold: 10000000
        prefetch: yes
        prefetch-key: yes
        rrset-roundrobin: yes
        minimal-responses: no
        val-clean-additional: yes
        val-permissive-mode: no
        serve-expired: no
        val-log-level: 1


-- 
Wolfgang Breyha <wbreyha at gmx.net> | https://www.blafasel.at/
Vienna University Computer Center | Austria



More information about the Unbound-users mailing list