Jostle logic seems to randomly stop working
W.C.A. Wijngaards
wouter at nlnetlabs.nl
Thu May 31 13:36:06 UTC 2018
Hi Dmitri,
There is a fix slated for the next release, which is as a patch below.
I think this will solve those non-jostle list too full errors. It
decrements the num_reply_states counter and thus the incoming queries
won't get dropped any more because that counter became too big.
Best regards, Wouter
diff -Naur tags/release-1.7.1/services/mesh.c trunk/services/mesh.c
--- tags/release-1.7.1/services/mesh.c 2018-05-07 10:17:33.605761180 +0200
+++ trunk/services/mesh.c 2018-05-07 13:28:16.907313134 +0200
@@ -1173,6 +1173,10 @@
while((c = mstate->cb_list) != NULL) {
/* take this cb off the list; so that the list can be
* changed, eg. by adds from the callback routine */
+ if(!mstate->reply_list && mstate->cb_list && !c->next) {
+ /* was a reply state, not anymore */
+ mstate->s.env->mesh->num_reply_states--;
+ }
mstate->cb_list = c->next;
if(!mstate->reply_list && !mstate->cb_list &&
mstate->super_set.count == 0)
On 25/05/18 06:03, Dmitri Kourennyi wrote:
> My original reply was rejected by the server (I think) as I included too
> much log data. Apologies this gets posted twice.
>
>> Can you show the work that it does for looking up one of the root
>> servers? Not getting an address must be causing it to not have content.
>> But it does work (eventually), you say, once the long list appears,
>> that means the AXFR has completed. Meanwhile you should have normal
>> service, because the zone is loaded (the file that is configured has
>> content, right?)? When a normal query arrives, it should just be
>> answered with the auth-zone?
>
> Detailed log of lookups included below. I would like to add that the
> issue may be associated specifically be the lookup of the auth-zone
> domain names. At work, I use unbound as a local caching server on my
> laptop, with a similar configuration as my home router, but I used IP
> addresses instead of hostnames, and I have not had any problems with it
> (this might be a useful work-around to try for anyone who still want to
> use the auth-zone functionality):
>
> auth-zone:
> name: "."
> master: 193.0.14.129 # k.root-servers.net
> master: 192.112.36.4 # g.root-servers.net
> master: 2001:500:2f::f # f.root-servers.net
> master: 2001:500:84::b # b.root-servers.net
> master: 192.0.47.132 # xfr.cjr.dns.icann.org
> master: 2001:7fd::1 # k.root-servers.net
> master: 192.0.32.132 # xfr.lax.dns.icann.org
> master: 192.33.4.12 # c.root-servers.net
> master: 2620:0:2830:202::132 # xfr.cjr.dns.icann.org
> master: 2620:0:2d0:202::132 # xfr.lax.dns.icann.org
> master: 192.228.79.201 # b.root-servers.net
> master: 192.5.5.241 # f.root-servers.net
> fallback-enabled: yes
> for-downstream: no
> for-upstream: yes
> zonefile: root.zone
>
> Looking at the logs again, I noticed there was one incident of a
> dropped query ~15min before all queries start to fail. However, that
> query ends up completing, I think, and at 14:43:40 a different query
> goes through without issue. It's at 15:00 that queries begin to fail.
> I don't know if this error before the root server lookup is part of
> the cause or a red herring.
>
> May 18 14:43:35 homebrew unbound[30620]: [30620:0] info: 0RDd mod1 rep
> www.google.com. AAAA IN
> May 18 14:43:35 homebrew unbound[30620]: [30620:0] info: 1RDd mod1 rep
> connectivitycheck.gstatic.com. AAAA IN
> May 18 14:43:35 homebrew unbound[30620]: [30620:0] debug: cache memory
> msg=382505 rrset=462002 infra=7600 val=135128
> May 18 14:43:35 homebrew unbound[30620]: [30620:0] debug: answer from
> the cache failed
> May 18 14:43:35 homebrew unbound[30620]: [30620:0] debug: udp request
> from ip4 192.168.0.154 port 64667 (len 16)
> May 18 14:43:35 homebrew unbound[30620]: [30620:0] warning: No jostle
> attempt. m=(nil) # <= This is logging I added to find out which if
> condition is failing
> May 18 14:43:35 homebrew unbound[30620]: [30620:0] debug: Too many
> queries. dropping incoming query.
> May 18 14:43:35 homebrew unbound[30620]: [30620:0] debug: cache memory
> msg=382505 rrset=462002 infra=7600 val=135128
> May 18 14:43:35 homebrew unbound[30620]: [30620:0] debug: comm point
> listen_for_rw 15 0
> May 18 14:43:35 homebrew unbound[30620]: [30620:0] debug: comm point
> listen_for_rw 16 0
>
> *Log of root server lookups below. Note that this occurs when I am not
> home, with a single tablet as the only active device, so it's not that
> there's a large amount of traffic or anything like that.
>
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: auth zone .:
> master lookup for task_probe b.root-servers.net. A IN
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: mesh_run: start
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug:
> validator[module 0] operate: extstate:module_state_initial
> event:module_event_new
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: validator
> operate: query b.root-servers.net. A IN
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: validator:
> pass to next module
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: mesh_run:
> validator module exit state is module_wait_module
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug:
> iterator[module 1] operate: extstate:module_state_initial
> event:module_event_pass
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug:
> process_request: new external request event
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: iter_handle
> processing q with state INIT REQUEST STATE
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: resolving
> b.root-servers.net. A IN
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: request has
> dependency depth of 0
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: msg from cache
> lookup ;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 0
> ;; flags: qr rd ra ; QUERY: 1,
> ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
> ;; QUESTION SECTION:
> b.root-servers.net. IN
> A
>
> ;; ANSWER SECTION:
> b.root-servers.net.
> 1841075 IN A 199.9.14.201
>
> ;; AUTHORITY SECTION:
>
> ;; ADDITIONAL SECTION:
> ;; MSG SIZE rcvd: 52
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: msg ttl is
> 41390, prefetch ttl 32750
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: returning
> answer from cache.
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: iter_handle
> processing q with state FINISHED RESPONSE STATE
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: finishing
> processing for b.root-servers.net. A IN
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: mesh_run:
> iterator module exit state is module_finished
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug:
> validator[module 0] operate: extstate:module_wait_module
> event:module_event_moddone
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: validator
> operate: query b.root-servers.net. A IN
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: validator:
> nextmodule returned
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: response has
> already been validated: sec_status_insecure
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: mesh_run:
> validator module exit state is module_finished
> May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: auth host
> b.root-servers.net lookup 199.9.14.201
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.nlnetlabs.nl/pipermail/unbound-users/attachments/20180531/5f293a91/attachment.bin>
More information about the Unbound-users
mailing list