odd increase in SERVFAIL with "misc failure" reason
Otto Retter
otto at relax.theregoesmy.email
Wed Nov 6 14:55:00 UTC 2024
Hi Wolfgang,
I observe the same increased SERVFAILs ("misc failure") after updating
to Unbound 1.22.0. Also on a low-volume recursor.
I have not had the opportunity to take a closer look, but wanted to
provide anecdotal evidence that you are not alone!
Cheers,
Otto
Wolfgang Breyha via Unbound-users wrote:
> Hi!
>
> I'm operating a small private (low volume) recurser for my own purpose for
> years using unbound since about 1.6.x. Without (recognized) issues so far.
>
> But with 1.22+ I noticed some oddities with unexpected SERVFAILs.
>
> Incoming requests are made with DoT on port 853 and locally (classic port
> 53). My config mostly uses defaults except [0].
>
> I first recognized it with failed mail reception from GMX, because unbound
> occasionally was not able to resolve the PTR RRs of their outgoing mail
> relay. The "verb 1; log-servfail: yes" log showed only
> error: SERVFAIL <18.15.227.212.in-addr.arpa. PTR IN>: misc failure
>
> A closer look to the logs showed a lot of rather odd "misc failure"s. eg.:
> error: SERVFAIL <ctldl.windowsupdate.com. AAAA IN>: misc failure
> error: SERVFAIL <alexa.amazon.de. A IN>: misc failure
> error: SERVFAIL <www.paypal.com. A IN>: misc failure
>
> All of them worked at a later retry as expected.
>
> I searched the source for the "misc failure" message and found the new (at
> least to me) option "max-global-quota" as one reason. Afterwards I raised
> the verbosity to 3 to see more details. At the same time I added
> msg-cache-size: 4m
> num-queries-per-thread: 4096
> rrset-cache-size: 8m
> cache-min-ttl: 10
> cache-max-negative-ttl: 3600
> infra-cache-min-rtt: 100
> to [0]. But I still didn't change the "max-global-quota" default.
>
> To my surprise this also influenced the "misc failure" rate positively and
> only some "in-addr.arpa" SERVFAILed with it. They all triggered the
> "request xxxx has exceeded the maximum global quota on number of upstream
> queries yyy" message in the debug log.
>
> I then removed the modifications from the config again and returned to
> plain [0] and the raised rate of "misc failures" including quite prominent
> zones returned as well.
>
> eg.:
> debug: request 3.pool.ntp.org. has exceeded the maximum global quota on
> number of upstream queries 155
> debug: return error response SERVFAIL
>
> Searching for the highest "number of upstream queries" gave 180 for
> error: SERVFAIL <at.mirrors.cicku.me. AAAA IN>: misc failure
>
> This one failed again when I retried while writing this mail with "139".
> The second try gave the correct answer.
>
> Obviously the cache size and primarily the contents influences the needed
> maximum number of requests.
>
> I'm wondering if I'm the only one seeing this?
>
> IMO either the default of 128 is simply to low for low volume recursers or
> there is some other oddity with this option.
>
> Greetings,
> Wolfgang Breyha
>
>
> [0] config (stripped access, tls keys, common stuff)
> outgoing-port-permit: 32768-60999
> outgoing-port-avoid: 0-32767
> so-rcvbuf: 4m
> so-sndbuf: 4m
> so-reuseport: yes
> ip-transparent: yes
> max-udp-size: 4096
> log-servfail: yes
> harden-glue: yes
> harden-dnssec-stripped: yes
> harden-below-nxdomain: yes
> harden-referral-path: yes
> qname-minimisation: yes
> aggressive-nsec: yes
> use-caps-for-id: no
> unwanted-reply-threshold: 10000000
> prefetch: yes
> prefetch-key: yes
> rrset-roundrobin: yes
> minimal-responses: no
> val-clean-additional: yes
> val-permissive-mode: no
> serve-expired: no
> val-log-level: 1
>
>
More information about the Unbound-users
mailing list