odd increase in SERVFAIL with "misc failure" reason
Yorgos Thessalonikefs
yorgos at nlnetlabs.nl
Wed Nov 6 15:32:14 UTC 2024
Hi Wolfgang, Otto,
Thanks for bringing this up!
We also had other operational feedback about the value and we decided to
bump it up to 200 from the initial 128.
Still keeping the possible amplification factor for CAMP-style issues in
the hundreds.
https://github.com/NLnetLabs/unbound/commit/fd1a1d5fa0f012e8eeaa0ecc89da52d9ca25c216
Best regards,
-- Yorgos
On 06/11/2024 15:55, Otto Retter via Unbound-users wrote:
> Hi Wolfgang,
>
> I observe the same increased SERVFAILs ("misc failure") after updating
> to Unbound 1.22.0. Also on a low-volume recursor.
>
> I have not had the opportunity to take a closer look, but wanted to
> provide anecdotal evidence that you are not alone!
>
> Cheers,
> Otto
>
> Wolfgang Breyha via Unbound-users wrote:
>> Hi!
>>
>> I'm operating a small private (low volume) recurser for my own purpose
>> for
>> years using unbound since about 1.6.x. Without (recognized) issues so
>> far.
>>
>> But with 1.22+ I noticed some oddities with unexpected SERVFAILs.
>>
>> Incoming requests are made with DoT on port 853 and locally (classic port
>> 53). My config mostly uses defaults except [0].
>>
>> I first recognized it with failed mail reception from GMX, because
>> unbound
>> occasionally was not able to resolve the PTR RRs of their outgoing mail
>> relay. The "verb 1; log-servfail: yes" log showed only
>> error: SERVFAIL <18.15.227.212.in-addr.arpa. PTR IN>: misc failure
>>
>> A closer look to the logs showed a lot of rather odd "misc failure"s.
>> eg.:
>> error: SERVFAIL <ctldl.windowsupdate.com. AAAA IN>: misc failure
>> error: SERVFAIL <alexa.amazon.de. A IN>: misc failure
>> error: SERVFAIL <www.paypal.com. A IN>: misc failure
>>
>> All of them worked at a later retry as expected.
>>
>> I searched the source for the "misc failure" message and found the new
>> (at
>> least to me) option "max-global-quota" as one reason. Afterwards I raised
>> the verbosity to 3 to see more details. At the same time I added
>> msg-cache-size: 4m
>> num-queries-per-thread: 4096
>> rrset-cache-size: 8m
>> cache-min-ttl: 10
>> cache-max-negative-ttl: 3600
>> infra-cache-min-rtt: 100
>> to [0]. But I still didn't change the "max-global-quota" default.
>>
>> To my surprise this also influenced the "misc failure" rate positively
>> and
>> only some "in-addr.arpa" SERVFAILed with it. They all triggered the
>> "request xxxx has exceeded the maximum global quota on number of upstream
>> queries yyy" message in the debug log.
>>
>> I then removed the modifications from the config again and returned to
>> plain [0] and the raised rate of "misc failures" including quite
>> prominent
>> zones returned as well.
>>
>> eg.:
>> debug: request 3.pool.ntp.org. has exceeded the maximum global quota on
>> number of upstream queries 155
>> debug: return error response SERVFAIL
>>
>> Searching for the highest "number of upstream queries" gave 180 for
>> error: SERVFAIL <at.mirrors.cicku.me. AAAA IN>: misc failure
>>
>> This one failed again when I retried while writing this mail with "139".
>> The second try gave the correct answer.
>>
>> Obviously the cache size and primarily the contents influences the needed
>> maximum number of requests.
>>
>> I'm wondering if I'm the only one seeing this?
>>
>> IMO either the default of 128 is simply to low for low volume
>> recursers or
>> there is some other oddity with this option.
>>
>> Greetings,
>> Wolfgang Breyha
>>
>>
>> [0] config (stripped access, tls keys, common stuff)
>> outgoing-port-permit: 32768-60999
>> outgoing-port-avoid: 0-32767
>> so-rcvbuf: 4m
>> so-sndbuf: 4m
>> so-reuseport: yes
>> ip-transparent: yes
>> max-udp-size: 4096
>> log-servfail: yes
>> harden-glue: yes
>> harden-dnssec-stripped: yes
>> harden-below-nxdomain: yes
>> harden-referral-path: yes
>> qname-minimisation: yes
>> aggressive-nsec: yes
>> use-caps-for-id: no
>> unwanted-reply-threshold: 10000000
>> prefetch: yes
>> prefetch-key: yes
>> rrset-roundrobin: yes
>> minimal-responses: no
>> val-clean-additional: yes
>> val-permissive-mode: no
>> serve-expired: no
>> val-log-level: 1
>>
>>
More information about the Unbound-users
mailing list