odd increase in SERVFAIL with "misc failure" reason

Otto Retter otto at relax.theregoesmy.email
Wed Nov 6 14:55:00 UTC 2024


Hi Wolfgang,

I observe the same increased SERVFAILs ("misc failure") after updating
to Unbound 1.22.0. Also on a low-volume recursor.

I have not had the opportunity to take a closer look, but wanted to
provide anecdotal evidence that you are not alone!

Cheers,
Otto

Wolfgang Breyha via Unbound-users wrote:
> Hi!
> 
> I'm operating a small private (low volume) recurser for my own purpose for
> years using unbound since about 1.6.x. Without (recognized) issues so far.
> 
> But with 1.22+ I noticed some oddities with unexpected SERVFAILs.
> 
> Incoming requests are made with DoT on port 853 and locally (classic port
> 53). My config mostly uses defaults except [0].
> 
> I first recognized it with failed mail reception from GMX, because unbound
> occasionally was not able to resolve the PTR RRs of their outgoing mail
> relay. The "verb 1; log-servfail: yes" log showed only
> error: SERVFAIL <18.15.227.212.in-addr.arpa. PTR IN>: misc failure
> 
> A closer look to the logs showed a lot of rather odd "misc failure"s. eg.:
> error: SERVFAIL <ctldl.windowsupdate.com. AAAA IN>: misc failure
> error: SERVFAIL <alexa.amazon.de. A IN>: misc failure
> error: SERVFAIL <www.paypal.com. A IN>: misc failure
> 
> All of them worked at a later retry as expected.
> 
> I searched the source for the "misc failure" message and found the new (at
> least to me) option "max-global-quota" as one reason. Afterwards I raised
> the verbosity to 3 to see more details. At the same time I added
> 	msg-cache-size: 4m
> 	num-queries-per-thread: 4096
> 	rrset-cache-size: 8m
> 	cache-min-ttl: 10
> 	cache-max-negative-ttl: 3600
> 	infra-cache-min-rtt: 100
> to [0]. But I still didn't change the "max-global-quota" default.
> 
> To my surprise this also influenced the "misc failure" rate positively and
> only some "in-addr.arpa" SERVFAILed with it. They all triggered the
> "request xxxx has exceeded the maximum global quota on number of upstream
> queries yyy" message in the debug log.
> 
> I then removed the modifications from the config again and returned to
> plain [0] and the raised rate of "misc failures" including quite prominent
> zones returned as well.
> 
> eg.:
> debug: request 3.pool.ntp.org. has exceeded the maximum global quota on
> number of upstream queries 155
> debug: return error response SERVFAIL
> 
> Searching for the highest "number of upstream queries" gave 180 for
> error: SERVFAIL <at.mirrors.cicku.me. AAAA IN>: misc failure
> 
> This one failed again when I retried while writing this mail with "139".
> The second try gave the correct answer.
> 
> Obviously the cache size and primarily the contents influences the needed
> maximum number of requests.
> 
> I'm wondering if I'm the only one seeing this?
> 
> IMO either the default of 128 is simply to low for low volume recursers or
> there is some other oddity with this option.
> 
> Greetings,
> Wolfgang Breyha
> 
> 
> [0] config (stripped access, tls keys, common stuff)
>          outgoing-port-permit: 32768-60999
>          outgoing-port-avoid: 0-32767
>          so-rcvbuf: 4m
>          so-sndbuf: 4m
>          so-reuseport: yes
>          ip-transparent: yes
>          max-udp-size: 4096
>          log-servfail: yes
>          harden-glue: yes
>          harden-dnssec-stripped: yes
>          harden-below-nxdomain: yes
>          harden-referral-path: yes
>          qname-minimisation: yes
>          aggressive-nsec: yes
>          use-caps-for-id: no
>          unwanted-reply-threshold: 10000000
>          prefetch: yes
>          prefetch-key: yes
>          rrset-roundrobin: yes
>          minimal-responses: no
>          val-clean-additional: yes
>          val-permissive-mode: no
>          serve-expired: no
>          val-log-level: 1
> 
> 


More information about the Unbound-users mailing list