auth-zones: AXFR never progresses to 2nd master [Was: Re: auth-zones and DNS NOTIFY]

Wouter Wijngaards wouter at nlnetlabs.nl
Thu Apr 11 13:59:15 UTC 2019


Hi Harry,

On 3/18/19 9:07 PM, Harry Schmalzbauer wrote:
> Am 26.06.2018 um 15:49 schrieb W.C.A. Wijngaards:
>> Hi Harry,
>>
>> On 24/06/18 20:20, Harry Schmalzbauer wrote:
>>> Am 23.06.2018 um 20:26 schrieb Harry Schmalzbauer via Unbound-users:
>>>> Am 17.04.2018 um 15:26 schrieb W.C.A. Wijngaards via Unbound-users:
>>>>> Hi Harry,
>>>>>
>>>>> Yes, DNS NOTIFY is implemented in the current code repo version.  You
>>>>> can specify additional sources with allow-notify.
>>>> Dear all, Wouter,
>>>>
>>>> sorry for bringing it up again, but I'm having real-world problems
>>>> with this nice new auth-zone: and allow-notify: feature ;-)
>>>>
>>>> My auth-zone: has two master: definitions.
>>>> It seems that the second defintion is probed first, when a NOTIFY
>>>> comes in (at least if the NOTIFY is not from one of the master);
>>>> haven't verified/falsified, neither by code inspection nor by testing
>>>> beyond lowest level yet.  As long as it's a static and documented
>>>> behaviour everything is fine.
>> Thank you for the bugreport.  I have fixed the code, so that I does not
>> stop the probe when a master replies with the current serial.  Instead,
>> it'll continue and probe the masters, until one has an update.  If all
>> of them respond with the current serial, it assumes it is up to date and
>> waits (the SOA timer).
>>
>> The first master that gets a query is the same master that sent the
>> NOTIFY.  After that it should scan them in order they appeared in config.
>>
>> (The code is in the repository, pick up services/authzone.c and
>> services/authzone.h if you want to have the update).
> 
> Dear Wouter,
> 
> thanks a lot for all the nice improvements!
> 
> I hadn't found time to start over with my unbound deployments for some
> time, but did so last weekend.
> And now I'm nagging again ;-)
> 
> It's again about auth-zone: and notify resp. TCP transfer.
> Without inspecting the code, I guess my issues are tightly related:
> 
> auth-zone:
>         name: "a.b.c.de."
>         master: 169.254.0.53
>         master: 169.254.0.54
>         allow-notify: 172.17.2.231
>         allow-notify: 172.17.2.232
> a.b.c.de. get's notify from non-master, but listed in allow-notify:
> Log:
> ... unbound[68691]: [68691:0] info: received NOTIFY serial 2019031715
> for a.b.c.de. from 172.17.2.232 port 57053
> 
> _For my test case_, both masters were reachable by UDP, but the first
> master doesn't respond to TCP (axfr).
> Then the second master never get's asked, just this is logged:
> ... peleus unbound[68691]: [68691:0] debug: tcp took too long, dropped
> 
> So far just a not very realistic test case,
> but I guess the following problem does have the exactly same root cause.
> I use the same _zonfile-less_ auth-zone: from above (having two masters
> defined) and start unbound (without using a zonefile).
> Problem:
> If he first master is down, the second master never get's any TCP axfr
> attempt and unbound will permanently return SERVFAIL, instead ot trying
> the second – available – master for loading the zone at startup!

So, it turns out there is an issue with tcp timeouts, this is fixed now.
 Also logging has been improved for auth zones.  And there is a fix for
using incorrect socket type for SOA probes.

I think this may have fixed your bug.  When I hit tcp timeouts, they
worked fine and the next master was attempted.  But the logging of that
now makes that visible.

The timeout is at 10s now for these transfers.  The design is to load
from any master that can be contacted.  It attempts first to contact the
master with the IP address that the NOTIFY or SOA-probe packet
indicates.  Then it tries all of them, in sequence.

Best regards, Wouter

> 
> If the second master is down, this is no issue of course.
> But for my planned setup it's crucial that auth-zones get loaded from
> *any* available master.
> 
> So I guesst an additional TCP/axfr timer was needed (post notify, resp.
> at startup unrelated to notify) to continue asking multiple masters.
> Some timer must already be in place, since this line is logged:
> ... peleus unbound[68691]: [68691:0] debug: tcp took too long, dropped
> But afterwards, the other master(s) should be contacted, not continuing
> with the first for AXFR.
> I'd highly appreciate if that timeout was adjustable, or at lease
> reduced.  As far as I remember it was in a minute range, while 10-20s
> would better fit, I think.
> 
> Do you think this is worth adding/fixing?
> 
> Thanks,
> 
> -harry
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.nlnetlabs.nl/pipermail/unbound-users/attachments/20190411/17fd2b16/attachment.bin>


More information about the Unbound-users mailing list