Unbound strange stub_zone behavior?
Andrew Forgue
andrew at forgue.io
Thu Aug 20 16:43:36 UTC 2020
> On Jul 14, 2020, at 11:48 AM, Andrew Forgue via Unbound-users <unbound-users at lists.nlnetlabs.nl> wrote:
>
>
>
>> On Jul 13, 2020, at 11:55 AM, Jan Komissar (jkomissa) <jkomissa at cisco.com> wrote:
>>
>> Hi Andrew,
>>
>> I believe that stub-zones will not work correctly for +norecurse (RD (recursion desired) flag unset) queries. Also, if your blah.example.com has delegations to subzones (even on the same server) and you use a non-standard port, you would need a stub-zone for each sub-zone.
>
> After restarting unbound, non-recursive queries work fine for several days, until they don't (not sure why). My understanding is that stub_zone presents as if it's local data, and the behavior you're describing would be more like the behavior of a forward zone.
>
>> I would follow Eric's advice to use an auth-zone, either as primary or secondary server (depending on your authoritative requirements).
>
> Yeah, Thanks Eric & Jan I'll take a look at that, but I'm not sure the "proxied" dns server can do notifies, but seems to be a good lead.
Just to bump this again -- here's the progress so far. We've been able to reproduce this with auth_zones too.
With my limited knowledge of unbound code and gdb it *appears* that in answer_norec_from_cache:
daemon/worker.c:492 (or so):
answer_norec_from_cache(...) {
...
dp = dns_cache_find_delegation(&worker->env, qinfo->qname,
qinfo->qname_len, qinfo->qtype, qinfo->qclass,
worker->scratchpad, &msg, timenow);
if(!dp) { /* no delegation, need to reprime */
return 0;
}
in the happy case, `dp` is NULL meaning there's no delegation (so it hits return 0), and the correct answer is returned.
In the failure case: `dp` is a delegation point to what looks like the root zones:
--
(gdb) p *dp
$27 = {name = 0x7f66cc516d58 "", namelen = 1, namelabs = 1, nslist = 0x0, target_list = 0x0, usable_list = 0x0, result_list = 0x0, bogus = 0, has_parent_side_NS = 0 '\000', dp_type_mlc = 0 '\000', ssl_upstream = 0 '\000', auth_dp = 0 '\000', no_cache = 0}
--
broken node:
dns1 :: ~ » sudo unbound-control lookup blah.example.com
The following name servers are used for lookup of blah.example.com
;rrset 17491 13 1 5 2
. 17491 IN NS c.root-servers.net.
. 17491 IN NS d.root-servers.net.
. 17491 IN NS j.root-servers.net.
. 17491 IN NS e.root-servers.net.
. 17491 IN NS l.root-servers.net.
. 17491 IN NS h.root-servers.net.
. 17491 IN NS f.root-servers.net.
. 17491 IN NS k.root-servers.net.
. 17491 IN NS m.root-servers.net.
. 17491 IN NS i.root-servers.net.
. 17491 IN NS a.root-servers.net.
. 17491 IN NS g.root-servers.net.
. 17491 IN NS b.root-servers.net.
. 17491 IN RRSIG NS 8 0 518400 20200827170000 20200814160000 46594 . ZxJeYw7vVyjxZg8y7mtt5N3YtejDrho11npxtnjt7MMZm/MlbSErowznceyvXYhTkgF4dJOFGcrUkwFekcN86Zw0tN+cHYYb4lpV2o/pYtXIzo2w2OtA0WJURMB1pWcclhma9y648OiGUsEwImRXpCQS7Mgk+XKU05KFCg5yrFW+UC4faaQ1ZiisVnK9GF8CwsHCC82xT7HU/pAMFgF2vEovsomysMuDhBKE1QTP9MN/DqD6bitdqGmhQSC9GxxcRrNCCU8fSnW4UVIiOJ95kaEMDk0kdpTGowBcKx2WCbXN8oKGSYRpJjE+y77mc2mv3cBUBwK9jnqB86jXwZ7enA== ;{id = 46594}
Delegation with 13 names, of which 13 can be examined to query further addresses.
It provides 0 IP addresses.
cache delegation was useless (no IP addresses)
Any other help finding out how dns_cache_find_delegation returns the root delegations instead of the auth_zone (in this case example.com <http://example.com/> is the auth zone, with a proper zone file on disk)?
-Andrew
>
> -Andrew
>
>> Regards,
>>
>> Jan.
>>
>> On 7/12/20, 12:00 PM, "Unbound-users on behalf of Eric Luehrsen via Unbound-users" <unbound-users-bounces at lists.nlnetlabs.nl on behalf of unbound-users at lists.nlnetlabs.nl> wrote:
>>
>> On 7/11/20 11:49 AM, Andrew Forgue via Unbound-users wrote:
>>> I have an unbound server that acts as a recursive resolver for clients and also acts as a target for fully delegated DNS (i.e. unbound is the NS record). For the fully-delegated domain it is a simple stub zone with an upstream of localhost on a different port. Let's call it "blah.example.com".
>>>
>>> Occasionally, unbound (has happened on versions 1.10.1 and 1.7.3) will start responding to non-recursive queries with the list of root zones instead of a response from the stub-zone. It seems that clients that use the `rd` flag are fine and continue to be able to resolve records in the stub-zone. Only recursive desired clients will receive correct records from unbound (using the stub server). All records in seemingly all stub zones have this behavior simultaneously.
>>>
>>> I don't know what triggers it, but a full restart of unbound is the only thing that fixes it. I've tried flushing cache, flushing infra, and everything, nothing seems to matter. I've seen only 2 things that may point to the issue.
>>>
>>> - With verbosity turned up to 10, there's an entry produced in strace (but not in the actual log - maybe a misconfig): "unbound[2213085:5] debug: answer from the cache failed"
>>>
>>> - stracing the "broken" unbound process is a very tight recvmsg() (of the request) and sendmsg() (with the root servers) with no syscalls in between.
>>>
>>> Again, Using dig with +recurse works all the time, even when unbound gets in this state. So seems like an unbound bug / cache corruption or something?
>>
>> If it is a bug, you may want to try a work around while waiting for a
>> fix. You could try "auth-zone:" instead of "stub-zone:" or as a
>> companion to "stub-zone:" You may need to give the authoritative server
>> permission for a wholesale zone transfer to the Unbound instance. This
>> may help avoid some undiscovered bug in piecemeal zone recursion.
>> - Eric
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nlnetlabs.nl/pipermail/unbound-users/attachments/20200820/fba01411/attachment.htm>
More information about the Unbound-users
mailing list