1.7.3 -- strange incident

Havard Eidnes he at uninett.no
Fri Sep 14 11:27:16 UTC 2018


we recently had an incident involving one of our central
recursors which at the time ran unbound 1.7.3.  What appeared to
happen was that it suddenly started being Really Slow in replying
to queries for non-existent names, confirming that they really
did not exist.

"dig" (with default retry & timeout) would time out 2 or 3 times
before I got an NXDOMAIN.  It did not matter whether the zone in
question was DNSSEC-signed or not.

We're graphing various performance data from "unbound-control
stats" and while normal daily peak query load is around 1.000 to
1.200 qps, the weekdays this incident lasted saw a load of only
around 700 qps, so I can't entirely dismiss that users were

I'll have to admit that I didn't turn on more extensive logging
to get some more information about this incident.  I'll also
admit to that I took the opportunity to upgrade to unbound 1.8.0
which it's running at the moment.  This leads to a separate
message about logging...

Prior to this incident, unbound had run continually for 30-40
days, and the "cache_message" value had (according to the graph
we plot) reached its ceiling about 14 days earlier, so this is
unlikely to be a trigger.

I know there isn't much to go on here, but does this match any
other incidents?

Best regards,

- Håvard

