serve-expired: "yes" and cache-min-ttl: 30 catastrophic: why?
nicku at nicku.org
Sun Nov 11 07:27:40 UTC 2018
I would never expect that a combination of these two apparently
innocuous configuration values could cause a massive outage.
This appears to be a very serious bug in Unbound. Does anyone think
this behaviour (described below) is in any way expected?
On 30/10/18 16:50 +1100, Nick Urbanik via Unbound-users wrote:
>On 29/10/18 10:14 -0400, Marc Branchaud via Unbound-users wrote:
>>On 2018-10-28 3:20 p.m., Nick Urbanik via Unbound-users wrote:
>>> On 25/10/18 18:10 +1100, Nick Urbanik via Unbound-users wrote:
>>>> I am puzzled by the behaviour of our multi-level DNS system which
>>>> answered many queries for names having shorter TTLs with SERVFAIL.
>>> I mean that SERVFAILs went up to 50% of replies, and current names
>>> with TTLs of around 300 failed to be fetched by the resolver, the last
>>> DNS servers in the chain. What I mean is that adding these two
>>> configuration options (serve-expired: "yes" and cache-min-ttl: 30)
>>> caused an outage. I am trying to understand why.
>>> Any ideas in understanding the mechanism would be very welcome.
>>We use 1.6.8 with both those settings, and observed prolonged SERVFAIL
>>In our case, the upstream server became inaccessible for a period of
>>time, but when contact resumed the SERVFAILs persisted.
>This behaviour was quite catastrophic, and to me, unexpected.
>Do you have any idea of the mechanism behind this failure?
>Is there a way to deal better with zero TTL names?
>>We reduced the infra-host-ttl value to compensate.
>Did that bring your system to a functioning condition?
>>(Why is infra-host-ttl's default 900 seconds? That seems like a long
>>time to wait to retry the upstream server.)
>>>> By multilevel, I mean clients talk to one server, which forwards to
>>>> another, and for some clients, there is a third level of caching.
>>>> So it was unwise to add:
>>>> serve-expired: "yes"
>>>> cache-min-ttl: 30
>>>> to the server section of these DNS servers running unbound 1.6.8 on
>>>> up to date RHEL 7? Please could anyone cast some light on why this
>>>> was so? I will be spending some time examining the cause.
>>>> If you need more information, please let me know.
Nick Urbanik http://nicku.org nicku at nicku.org
GPG: 7FFA CDC7 5A77 0558 DC7A 790A 16DF EC5B BB9D 2C24 ID: BB9D2C24
More information about the Unbound-users