[nsd-users] Timeout for TCP queries to NSD

Anand Buddhdev anandb at ripe.net
Thu May 14 12:43:00 UTC 2020


On 14/05/2020 13:29, Wouter Wijngaards via nsd-users wrote:

Hi Wouter,

> Yes this applies to incoming queries and to outgoing queries.  120
> seconds by default.

Thanks for the clarification. I think the default of 120s should be 
documented in the man page.

I'm still not clear on what the timeout applies to though. Is it to the 
time between individual DNS messages in a TCP connection? Or does it 
apply to any period of inactivity in the connection?

> A much smaller value, of 200 msec, is used when the server is nearly
> full on capacity, for incoming connections that are over the limit.
> Also when the server has updated the existing connections get a smaller
> 100 msec timeout to wait for them to complete their tcp query to NSD.
> 
> That last feature since 4.2.1.  The tcp full shorter timeout is since
> 4.1.11.

Now that you've explained it here, I recall that there was something 
about this in the release notes. However, the values of 200ms isn't 
documented. The release notes have:

"When tcp is more than half full, use short timeout for tcp session." So 
I'm guessing that "short timeout" here is 200ms. Also, it's not clear 
whether the timeout is dynamic. What I mean is: is it applied to all 
sessions (existing and new), or only to new ones. When the number of tcp 
connections drops to less than half, is the timeout reset to 120s? And 
is it reset for all sessions, or just new ones?

Dropping from the default 120s, to a mere 200ms when the number of TCP 
connections goes up, is quite dramatic. And I happen to think that 200ms 
is too low. A client that's getting an AXFR from such an NSD server is 
quite likely to suffer disconnects. In fact, I have been observing 
exactly this behaviour on the servers we run. We have a use case where a 
user is doing AXFR of some largish zones, and when the client is a bit 
slow, NSD drops the connection. This causes the client to retry. This, 
IMHO, is rather wasteful.

The other feature of shortening the timeout to 100ms is also not so 
obvious. The release notes have:

"Fix #14, tcp connections have 1/10 to be active and have to work
every second, and then they get time to complete during a reload,
this is a process that lingers with the old version during a version 
update."

The 1/10 there is not very readable. I think that 100ms would be much 
clearer. And I also don't understad what you mean by "and have to work 
every second". Could you please explain that?

In my opinion, such details should not be buried in the release notes 
document. The release notes are useful when comparing one version to 
another. All these features of how the server dynamically adjusts its 
behaviour should be in the operations manual or at least the nsd.conf 
man page.

Imagine a new user of NSD, who is trying to configure and tune the 
server, and sets "tcp-timeout" to some value, and still observes 
different behaviour when running the server. This leads to confusion. 
And it's not reasonable to expect the user to read the entire set of 
release notes trying to find such undocumented features.

Regards,
Anand Buddhdev
RIPE NCC


More information about the nsd-users mailing list