[Unbound-users] Unbound 1.4.1 cache problems
Jakub Heichman
kuba.heichman at gmail.com
Mon Jan 11 09:58:18 UTC 2010
Thanks Wouter, the workaround with ip6 is working.
Best regards,
Jakub
2010/1/9 W.C.A. Wijngaards <wouter at nlnetlabs.nl>:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi Jakub,
>
> I cannot know for sure since the log seems to be shortened by syslog,
> but I think you may have the same problem as Haw Loeung reported. I
> have since corresponded with him and I think have resolved the issue. A
> patch is in svn for unbound.
>
> The issue seems to be caused by the parent-child disagreement fix in
> 1.4.1, worsened by no IPv6 available (either not in the kernel or
> do-ip6:no statement) - just turning ipv6 back on (also if you have no
> ipv6 connectivity) may stop this bug.
>
> Sorry for the inconvenience,
> Wouter
>
> On 01/08/2010 10:36 PM, Jakub Heichman wrote:
>> Hello there,
>>
>> I have upgraded from Unbound 1.3.0 to 1.4.1 recently on a couple of
>> servers. The servers are doing about 3-7k requests per second.
>> I have noticed that after some time of operation, servers
>> intermittently stop responding to queries to certain domains from
>> cache (return SERVFAIL). It particularly doesn't like webex.com,
>> sometimes vmware.com, dell.com.
>>
>> # unbound-host webex.com
>> webex.com has address 64.68.120.99
>> webex.com mail is handled by 50 mx3.webex.com.
>> webex.com mail is handled by 10 mx1.webex2.iphmx.com.
>> webex.com mail is handled by 10 mx2.webex2.iphmx.com.
>> webex.com mail is handled by 50 mx2.webex.com.
>>
>> But the A record doesnt seem to get cached:
>>
>> # dig webex.com @localhost
>>
>> ; <<>> DiG 9.3.3rc2 <<>> webex.com @localhost
>> ; (1 server found)
>> ;; global options: printcmd
>> ;; Got answer:
>> ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 49004
>> ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
>>
>> ;; QUESTION SECTION:
>> ;webex.com. IN A
>>
>> ;; Query time: 0 msec
>> ;; SERVER: 127.0.0.1#53(127.0.0.1)
>> ;; WHEN: Fri Jan 8 19:16:01 2010
>> ;; MSG SIZE rcvd: 27
>>
>> # unbound-control dump_cache | grep ^webex.com
>> webex.com. 4589 IN NS ns01.webex.com.
>> webex.com. 4589 IN NS ns05.webex.com.
>> webex.com. 4589 IN NS ns07.webex.com.
>> webex.com. 4589 IN NS ns03.webex.com.
>> webex.com. 4589 IN NS ns08.webex.com.
>> webex.com. 4589 IN NS ns06.webex.com.
>>
>> It doesnt respond to NS queries either (No A records for
>> nsxx.webex.com in cache)
>>
>> # dig ns webex.com @localhost
>>
>> ; <<>> DiG 9.3.3rc2 <<>> ns webex.com @localhost
>> ; (1 server found)
>> ;; global options: printcmd
>> ;; Got answer:
>> ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 54946
>> ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
>>
>> ;; QUESTION SECTION:
>> ;webex.com. IN NS
>>
>> ;; Query time: 0 msec
>> ;; SERVER: 127.0.0.1#53(127.0.0.1)
>> ;; WHEN: Fri Jan 8 20:53:18 2010
>> ;; MSG SIZE rcvd: 27
>>
>>
>>
>> After flushing webex.com I can query the cache again:
>>
>> # unbound-control flush webex.com
>> ok
>> # dig +short webex.com @localhost
>> 64.68.120.99
>>
>> Log with increased verbosity:
>>
>> an 8 21:05:32 cns01j unbound: [25052:3] info: validator operate:
>> query <webex.com. A IN>
>> Jan 8 21:05:32 cns01j unbound: [25052:3] info: resolving <webex.com. A IN>
>> Jan 8 21:05:32 cns01j unbound: [25052:3] info:
>> DelegationPoint<webex.com.>: 6 names (6 missing), 6 addrs (0 result, 6
>> avail)
>> Jan 8 21:05:32 cns01j unbound: [25052:3] info: ns08.webex.com. A
>> Jan 8 21:05:32 cns01j unbound: [25052:3] info: ns05.webex.com. A
>> Jan 8 21:05:32 cns01j unbound: [25052:3] info: ns01.webex.com. A
>> Jan 8 21:05:32 cns01j unbound: [25052:3] info: ns06.webex.com. A
>> Jan 8 21:05:32 cns unbound: [25052:3] info: ns07.webex.com. A
>> Jan 8 21:05:32 cns unbound: [25052:3] info: ns03.webex.com. A
>> Jan 8 21:05:32 cns unbound: [25052:3] info: resolving (init part 2):
>> <webex.com. A IN>
>> Jan 8 21:05:32 cns unbound: [25052:3] info: resolving (init part 3):
>> <webex.com. A IN>
>> Jan 8 21:05:32 cns unbound: [25052:3] info: processQueryTargets:
>> <webex.com. A IN>
>> Jan 8 21:05:32 cns unbound: [25052:3] info:
>> DelegationPoint<webex.com.>: 6 names (6 missing), 6 addrs (0 result, 6
>> avail)
>> Jan 8 21:05:32 cns unbound: [25052:3] info: ns08.webex.com. A
>> Jan 8 21:05:32 cns unbound: [25052:3] info: ns05.webex.com. A
>> Jan 8 21:05:32 cns unbound: [25052:3] info: ns01.webex.com. A
>> Jan 8 21:05:32 cns unbound: [25052:3] info: ns06.webex.com. A
>> Jan 8 21:05:32 cns unbound: [25052:3] info: ns07.webex.com. A
>> Jan 8 21:05:32 cns unbound: [25052:3] info: ns03.webex.com. A
>> Jan 8 21:05:32 cns unbound: [25052:3] info: validator operate: query
>> <webex.com. A IN>
>> Jan 8 21:05:32 cns unbound: [25052:3] info: mesh_run: end 8 recursion
>> states (8 with reply, 0 detached), 28 waiting replies, 84 recursion
>> replies sent, 0 replies dropped, 0 states jostled out
>> Jan 8 21:05:32 cns unbound: [25052:3] info: average recursion
>> processing time 4.172319 sec
>> Jan 8 21:05:32 cns unbound: [25052:3] info: histogram of recursion
>> processing times
>> Jan 8 21:05:32 cns unbound: [25052:3] info: [25%]=0.00107789
>> median[50%]=0.00744727 [75%]=0.0524288
>> Jan 8 21:05:32 cns unbound: [25052:3] info: lower(secs) upper(secs) recursions
>> Jan 8 21:05:32 cns unbound: [25052:3] info: 0.000000 0.000001 9
>> Jan 8 21:05:32 cns unbound: [25052:3] info: 0.000256 0.000512 1
>> Jan 8 21:05:32 cns unbound: [25052:3] info: 0.001024 0.002048 1
>> Jan 8 21:05:32 cns unbound: [25052:3] info: 0.002048 0.004096 19
>> Jan 8 21:05:32 cns unbound: [25052:3] info: 0.004096 0.008192 2
>> Jan 8 21:05:32 cns unbound: [25052:3] info: 0.008192 0.016384 11
>> Jan 8 21:05:32 cns unbound: [25052:3] info: 0.016384 0.032768 3
>> Jan 8 21:05:32 cns unbound: [25052:3] info: 0.032768 0.065536 5
>> Jan 8 21:05:32 cns unbound: [25052:3] info: 0.065536 0.131072 15
>> Jan 8 21:05:32 cns unbound: [25052:3] info: 0.131072 0.262144 10
>> Jan 8 21:05:32 cns unbound: [25052:3] info: 0.262144 0.524288 5
>> Jan 8 21:05:32 cns unbound: [25052:3] info: 0.524288 1.000000 2
>> Jan 8 21:05:32 cns unbound: [25052:3] info: 256.000000 512.000000 1
>>
>>
>>
>> I have downgraded one of the servers back to 1.3.0 (libev 3.53), where
>> I don't see this problem.
>>
>> Unbound 1.4.1 compiled with libev-3.9, without threads - 4 processes running
>>
>> num-threads: 4
>> outgoing-range: 10000
>> msg-cache-size: 500m
>> msg-cache-slabs: 1
>> num-queries-per-thread: 10000
>> jostle-timeout: 4000
>> so-rcvbuf: 8m
>> rrset-cache-size: 1024m
>> rrset-cache-slabs: 1
>> cache-max-ttl: 86400
>>
>> Any help would be greatly appreciated :-)
>>
>> Thanks,
>>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
> Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAktIS0QACgkQkDLqNwOhpPjQ2ACfeCESH6t5ezR94AnasM6DP16f
> EsIAoJbV90YIEyovcdz2VWTRmEk4/tMY
> =Zfvf
> -----END PGP SIGNATURE-----
>
--
Kuba Heichman
More information about the Unbound-users
mailing list