Unbound can be made unresponsive when using DoT

RayG rgsub1 at btinternet.com
Mon Jun 29 15:49:14 UTC 2020


Hi Renaud,

I think the performance is OK now with your suggestions - Thanks

That said, I still see errors in the log file. Those errors however are not easy to decipher to see what is failing..

e.g.
29/06/2020 15:44:35 C:\Program Files\Unbound\unbound.exe[1776:0] debug: tcp error for address 2606:4700:4700::1111 port 853
And
29/06/2020 15:44:36 C:\Program Files\Unbound\unbound.exe[1776:0] debug: request db3pap001.storage.live.com. has exceeded the maximum number of glue fetches 17 to a single delegation point
29/06/2020 15:44:36 C:\Program Files\Unbound\unbound.exe[1776:0] debug: return error response SERVFAIL

And yet this works OK if I have read things correctly:
29/06/2020 15:44:35 C:\Program Files\Unbound\unbound.exe[1776:0] info: sending query: db3pap001.storage.live.com. AAAA IN
29/06/2020 15:44:35 C:\Program Files\Unbound\unbound.exe[1776:0] debug: sending to target: <.> 1.0.0.1#853
29/06/2020 15:44:35 C:\Program Files\Unbound\unbound.exe[1776:0] debug: cache memory msg=78220 rrset=89465 infra=8804 val=71316
29/06/2020 15:44:35 C:\Program Files\Unbound\unbound.exe[1776:0] debug: iterator[module 1] operate: extstate:module_wait_reply event:module_event_reply
29/06/2020 15:44:35 C:\Program Files\Unbound\unbound.exe[1776:0] info: iterator operate: query db3pap001.storage.live.com. A IN
29/06/2020 15:44:35 C:\Program Files\Unbound\unbound.exe[1776:0] info: iterator operate: chased to l-0003.l-msedge.net. A IN
29/06/2020 15:44:35 C:\Program Files\Unbound\unbound.exe[1776:0] info: response for db3pap001.storage.live.com. A IN
29/06/2020 15:44:35 C:\Program Files\Unbound\unbound.exe[1776:0] info: reply from <.> 1.0.0.1#853
29/06/2020 15:44:35 C:\Program Files\Unbound\unbound.exe[1776:0] info: query response was ANSWER
29/06/2020 15:44:35 C:\Program Files\Unbound\unbound.exe[1776:0] info: finishing processing for db3pap001.storage.live.com. A IN

But I also see these:

29/06/2020 15:45:12 C:\Program Files\Unbound\unbound.exe[1776:0] error: SERVFAIL <cid-d42a2173fbacf7ce.users.storage.live.com. A IN>: could not fetch nameservers for 0x20 fallback
29/06/2020 15:45:12 C:\Program Files\Unbound\unbound.exe[1776:0] reply: ::1 cid-d42a2173fbacf7ce.users.storage.live.com. A IN SERVFAIL 0.937464 0 61

And you might expect that queries to here would not fail:
29/06/2020 16:19:25 C:\Program Files\Unbound\unbound.exe[1776:0] info: Capsforid: reply is equal. go to next fallback
29/06/2020 16:19:25 C:\Program Files\Unbound\unbound.exe[1776:0] info: processQueryTargets: www.internic.net. AAAA IN
29/06/2020 16:19:25 C:\Program Files\Unbound\unbound.exe[1776:0] debug: request www.internic.net. has exceeded the maximum number of glue fetches 17 to a single delegation point
29/06/2020 16:19:25 C:\Program Files\Unbound\unbound.exe[1776:0] debug: return error response SERVFAIL

Changing from CloudFlare to Google as the forward server I still get:
29/06/2020 16:24:21 C:\Program Files\Unbound\unbound.exe[9440:0] debug: tcp error for address 8.8.4.4 port 853
29/06/2020 16:24:21 C:\Program Files\Unbound\unbound.exe[9440:0] debug: tcp error for address 8.8.8.8 port 853

BUT

I no longer see 0x20 or glue errors - it would be nice to know what is going on as I cannot see if Unbound is getting it wrong or CloudFlare is not doing what it should.

One last point, I saw a lot of failed attempts at these addresses (both CloudFlare & Google):

29/06/2020 16:25:45 C:\Program Files\Unbound\unbound.exe[9440:0] query: ::1 wpad.home. A IN
29/06/2020 16:25:45 C:\Program Files\Unbound\unbound.exe[9440:0] reply: ::1 wpad.home. A IN NXDOMAIN 0.000000 1 27
29/06/2020 16:25:45 C:\Program Files\Unbound\unbound.exe[9440:0] query: ::1 wpad.home. AAAA IN
29/06/2020 16:25:45 C:\Program Files\Unbound\unbound.exe[9440:0] reply: ::1 wpad.home. AAAA IN NXDOMAIN 0.000000 1 27

There were others like:

ahbgrtoputryz.home.

several random looking sets of characters before the .home. As they always failed either NXDOMAIN or SERVFAIL I added this entry:

local-zone: home always_nxdomain

so now there is no need for Unbound to go any further. I was however unable to ascertain what Windows was trying to do or which process was attempting the lookup. I have no "home" zone in the configuration. I also have no Proxy set up.

Just FYI - The Google are set up completely differently to CloudFlare with interesting results in the spoofabilty test.

If anyone else has ideas on the above...

Thanks

Ray

-----Original Message-----
From: Renaud Allard <renaud at allard.it> 
Sent: 29 June 2020 08:24
To: RayG <rgsub1 at btinternet.com>
Subject: Re: Unbound can be made unresponsive when using DoT

Hi Ray,

There are a few ways to optimize speed, but it depends a little bit on 
your setup and might give you issues in some specific cases.

         cache-min-ttl: 0
         serve-expired: yes
         qname-minimisation: yes
         prefetch: yes
         prefetch-key: yes
         minimal-responses: yes


On 6/28/20 4:57 PM, RayG wrote:
> Hi Renaud,
> 
> Thanks for that suggestion - there is a definite improvement and it is possible to use DIG etc to carry out other queries when that DNS Spoofability test is running. That test runs MUCH quicker and the results are excellent (which is good)
> 
>  From that I can see that the Quad9 servers are not as well set up as Cloudflare.
> 
> I am still looking at the performance side along with testing some other parameters that may (or may not) improve things.
> 
> I will let you know if there is interest?
> 
> So far that one single change has made a world of difference - thanks.
> 
> Ray
> 
> 
> -----Original Message-----
> From: Renaud Allard <renaud at allard.it>
> Sent: 27 June 2020 17:56
> To: RayG <rgsub1 at btinternet.com>
> Subject: Re: Unbound can be made unresponsive when using DoT
> 
> Hi Ray,
> 
> Could you test with "so-reuseport: no" in your config? I don't know if
> windows uses this, but I had stalling issues with DoT on BSD and they
> all stopped when I disabled port reuse.
> 
> Regards
> 
> On 27/06/2020 18:16, RayG via Unbound-users wrote:
>> Hi Eric,
>>
>> Thanks for your thoughts - did you have any suggestions as to which
>> parameters should be adjusted to what sort of value?
>>
>> It seems that a lot of the issues I am seeing revolve around these entries:
>>
>> 7/06/2020 16:55:33 C:\Program Files\Unbound\unbound.exe[1756:0] info:
>> Capsforid: reply is equal. go to next fallback
>> 27/06/2020 16:55:33 C:\Program Files\Unbound\unbound.exe[1756:0] info:
>> processQueryTargets: cid-d42a2173fbacf7ce.users.storage.live.com. AAAA IN
>> 27/06/2020 16:55:33 C:\Program Files\Unbound\unbound.exe[1756:0] debug:
>> request cid-d42a2173fbacf7ce.users.storage.live.com. has exceeded the
>> maximum number of glue fetches 17 to a single delegation point
>> 27/06/2020 16:55:33 C:\Program Files\Unbound\unbound.exe[1756:0] debug:
>> return error response SERVFAIL
>>
>> I see many of them and there seems to be a limit of 17 - I have to admit I
>> am not sure which parameter to tweak, I have tried many of the more obvious
>> ones but to no avail. Apart from the unresponsiveness the errors above are
>> random in that the queries work sometimes but not every time. This causes
>> processes to fail as they think they can no longer access the resource they
>> are after on the internet, some retry but others just give up and exit.
>>
>> With respect to the Capsforid This changes queries to a random
>> upper/lowercase characters which is present to thwart spoofing. That said
>> unbound does not as far as I can see show you what was sent and what was
>> received so its difficult to ascertain if it's a specific server or
>> something else. The query example above will go around a number of servers
>> each look as above and then the whole thing gives up. I really am not sure
>> what is going on here?
>>
>> I saw this bug:
>> https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=4243
>> which is still unresolved I think but I have 'qname-minimisation:' set to no
>> anyway.
>>
>> Any further suggestions willing accepted and tried out.
>>
>> Thanks
>>
>> Ray
>>
>> 2-----Original Message-----
>> From: Eric Luehrsen <ericluehrsen at gmail.com>
>> Sent: 27 June 2020 02:34
>> To: unbound-users at nlnetlabs.nl
>> Subject: Re: Unbound can be made unresponsive when using DoT
>>
>> On 6/23/20 11:38 AM, RayG via Unbound-users wrote:
>>> Hi,
>>>
>>> I have DoT & DNSSEC all set up and working and was carrying out some
>>> tests to ensure that the server and the forward servers (Cloudflare)
>>> was working as I expected.
>>>
>>> To that end I was using this test:
>>>
>>> https://www.grc.com/dns/dns.htm
>>>
>>> down the page you will see a button:
>>>
>>> "Initiate standard DNS spoofability test"
>>>
>>> When run, it carries out the test and returns results. If however you
>>> try using Dig or even a browser while the test is running nothing will
>>> function, Unbound is unresponsive.
>>>
>>> After the test returns you still have to wait some time before Unbound
>>> recovers and is once again useable.
>>>
>>> I am on Windows 10/64 (B18363.900-V1909) with an Intel Core i7 4930K @
>>> 3.40GHz Ivy Bridge-E 22nm with 32GB Memory. Using Unbound v1.10.1
>>>
>>> When I run the same test without DoT to the same forward servers
>>> everything seems to be OK and there is no hang or unresponsiveness.
>>>
>>> I appreciate that there is much more TCP traffic when using DoT but
>>> should Unbound become unresponsive?
>>>
>>> Is this an Unbound problem or something that I can resolve in the
>>> configuration?
>>
>> There are more than a few Unbound resource settings. These include the
>> number of TCP and UDP ports to allow to be open at the same time. It is
>> probably best to give "unbound.conf" a read on the documentation page.
>> Also Windows home-style editions often have some down tuning of these
>> available resources with respect to Windows professional-style editions.
>>
>> - Eric
>>
>>
> 
> 




More information about the Unbound-users mailing list