Unbound strange stub_zone behavior?

Andrew Forgue andrew at forgue.io
Thu Aug 20 16:43:36 UTC 2020

> On Jul 14, 2020, at 11:48 AM, Andrew Forgue via Unbound-users <unbound-users at lists.nlnetlabs.nl> wrote:
>> On Jul 13, 2020, at 11:55 AM, Jan Komissar (jkomissa) <jkomissa at cisco.com> wrote:
>> Hi Andrew,
>> I believe that stub-zones will not work correctly for +norecurse (RD (recursion desired) flag unset) queries. Also, if your blah.example.com has delegations to subzones (even on the same server) and you use a non-standard port, you would need a stub-zone for each sub-zone.
> After restarting unbound, non-recursive queries work fine for several days, until they don't (not sure why).  My understanding is that stub_zone presents as if it's local data, and the behavior you're describing would be more like the behavior of a forward zone.
>> I would follow Eric's advice to use an auth-zone, either as primary or secondary server (depending on your authoritative requirements).
> Yeah, Thanks Eric & Jan I'll take a look at that, but I'm not sure the "proxied" dns server can do notifies, but seems to be a good lead.

Just to bump this again -- here's the progress so far.  We've been able to reproduce this with auth_zones too.

With my limited knowledge of unbound code and gdb it *appears* that in answer_norec_from_cache:

daemon/worker.c:492 (or so):

answer_norec_from_cache(...) {
	dp = dns_cache_find_delegation(&worker->env, qinfo->qname,
	    qinfo->qname_len, qinfo->qtype, qinfo->qclass,
	    worker->scratchpad, &msg, timenow);
	if(!dp) { /* no delegation, need to reprime */
	    return 0;

in the happy case, `dp` is NULL meaning there's no delegation (so it hits return 0), and the correct answer is returned.

In the failure case: `dp` is a delegation point to what looks like the root zones:

(gdb) p *dp
$27 = {name = 0x7f66cc516d58 "", namelen = 1, namelabs = 1, nslist = 0x0, target_list = 0x0, usable_list = 0x0, result_list = 0x0, bogus = 0, has_parent_side_NS = 0 '\000', dp_type_mlc = 0 '\000', ssl_upstream = 0 '\000', auth_dp = 0 '\000', no_cache = 0}

broken node:

dns1 :: ~ » sudo unbound-control lookup blah.example.com
The following name servers are used for lookup of blah.example.com
;rrset 17491 13 1 5 2
.       17491   IN      NS      c.root-servers.net.
.       17491   IN      NS      d.root-servers.net.
.       17491   IN      NS      j.root-servers.net.
.       17491   IN      NS      e.root-servers.net.
.       17491   IN      NS      l.root-servers.net.
.       17491   IN      NS      h.root-servers.net.
.       17491   IN      NS      f.root-servers.net.
.       17491   IN      NS      k.root-servers.net.
.       17491   IN      NS      m.root-servers.net.
.       17491   IN      NS      i.root-servers.net.
.       17491   IN      NS      a.root-servers.net.
.       17491   IN      NS      g.root-servers.net.
.       17491   IN      NS      b.root-servers.net.
.       17491   IN      RRSIG   NS 8 0 518400 20200827170000 20200814160000 46594 . ZxJeYw7vVyjxZg8y7mtt5N3YtejDrho11npxtnjt7MMZm/MlbSErowznceyvXYhTkgF4dJOFGcrUkwFekcN86Zw0tN+cHYYb4lpV2o/pYtXIzo2w2OtA0WJURMB1pWcclhma9y648OiGUsEwImRXpCQS7Mgk+XKU05KFCg5yrFW+UC4faaQ1ZiisVnK9GF8CwsHCC82xT7HU/pAMFgF2vEovsomysMuDhBKE1QTP9MN/DqD6bitdqGmhQSC9GxxcRrNCCU8fSnW4UVIiOJ95kaEMDk0kdpTGowBcKx2WCbXN8oKGSYRpJjE+y77mc2mv3cBUBwK9jnqB86jXwZ7enA== ;{id = 46594}
Delegation with 13 names, of which 13 can be examined to query further addresses.
It provides 0 IP addresses.
cache delegation was useless (no IP addresses)

Any other help finding out how dns_cache_find_delegation returns the root delegations instead of the auth_zone (in this case example.com <http://example.com/> is the auth zone, with a proper zone file on disk)?


> -Andrew
>> Regards,
>> Jan.
>> On 7/12/20, 12:00 PM, "Unbound-users on behalf of Eric Luehrsen via Unbound-users" <unbound-users-bounces at lists.nlnetlabs.nl on behalf of unbound-users at lists.nlnetlabs.nl> wrote:
>>   On 7/11/20 11:49 AM, Andrew Forgue via Unbound-users wrote:
>>> I have an unbound server that acts as a recursive resolver for clients and also acts as a target for fully delegated DNS (i.e. unbound is the NS record). For the fully-delegated domain it is a simple stub zone with an upstream of localhost on a different port.  Let's call it "blah.example.com".
>>> Occasionally, unbound (has happened on versions 1.10.1 and 1.7.3) will start responding to non-recursive queries with the list of root zones instead of a response from the stub-zone.  It seems that clients that use the `rd` flag are fine and continue to be able to resolve records in the stub-zone.  Only recursive desired clients will receive correct records from unbound (using the stub server).  All records in seemingly all stub zones have this behavior simultaneously.
>>> I don't know what triggers it, but a full restart of unbound is the only thing that fixes it.  I've tried flushing cache, flushing infra, and everything, nothing seems to matter. I've seen only 2 things that may point to the issue.
>>> - With verbosity turned up to 10, there's an entry produced in strace (but not in the actual log - maybe a misconfig): "unbound[2213085:5] debug: answer from the cache failed"
>>> - stracing the "broken" unbound process is a very tight recvmsg() (of the request) and sendmsg() (with the root servers) with no syscalls in between.
>>> Again, Using dig with +recurse works all the time, even when unbound gets in this state.  So seems like an unbound bug / cache corruption or something?
>>   If it is a bug, you may want to try a work around while waiting for a 
>>   fix. You could try "auth-zone:" instead of "stub-zone:" or as a 
>>   companion to "stub-zone:" You may need to give the authoritative server 
>>   permission for a wholesale zone transfer to the Unbound instance. This 
>>   may help avoid some undiscovered bug in piecemeal zone recursion.
>>   - Eric

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nlnetlabs.nl/pipermail/unbound-users/attachments/20200820/fba01411/attachment.htm>

More information about the Unbound-users mailing list