[nsd-users] Two TCP segments to answer a query

W.C.A. Wijngaards wouter at nlnetlabs.nl
Fri Mar 2 08:39:39 UTC 2012

Hash: SHA1

Hi Anand,

Michael is precisely right.

For portability you cannot assume writev or TCP_CORK or other exotic
options are available.  Thus the current method must work.
Implementing some optimization for TCP traffic has not been done.

Other optimizations also exist, possible some malloc()-tricks that are
also backwards compatible with older operating systems.

Unbound does have support for writev().  This does result in some code
duplication for the tcp write implementation.

Best regards,

On 03/02/2012 07:12 AM, Michael Tokarev wrote:
> On 02.03.2012 02:13, Anand Buddhdev wrote:
>> This questions is aimed more at the NSD developers, but of course
>> if anyone knows the answer, feel free to chime in.
>> While writing some code to process DNS queries and responses over
>> TCP, one of my colleagues noticed something strange about NSD's
>> TCP responses. Here's what we have observed:
>> client: syn server: syn + ack client: ack client: push + ack +
>> query server: ack server: ack + 2 bytes indicating size of
>> following dns message client: ack server: push + ack + response
>> I'm omitting the closing sequence of FINs and ACKs here.
>> Comparing this to a BIND server, we see:
>> client: syn server: syn + ack client: ack client: push + ack +
>> query server: push + ack + 2 bytes + response
>> Notice how NSD uses an extra TCP segment to send just the 2
>> bytes indicating the length of the response packet, whereas BIND
>> does it all in the same TCP segment. BIND's behaviour seems
>> logical to me, whereas NSD's seems... strange.
>> Is there any reason NSD does it this way? TCP performance isn't
>> really an issue for us, so I don't see any immediate need to fix
>> this, if indeed a fix is even needed. We'd just like to
>> understand this difference in behaviour.
> There is no strong reason why NSD _should_ do this the way BIND 
> does it.
> TCP is a STREAM of bytes, the "packetizing" of TCP is not
> specified in any standard at all.  An application can use various
> system- specific methods to express its preference (like TCP_CORK
> in linux), but even with these specified, TCP stack is allowed to
> divide the stream into packets more or less arbitrary.
> So the client shoud be prepared for even the worst case scenario, 
> ie, should be able to read whole thing byte at a time.
> The way BIND does it is merely an optimization, in an attempt to 
> minimize network roundtrips, and it is in no way mandatory again.
> As for optimization itself.  NSD should be prepared for the write 
> to fail with EAGAIN at any time, which means the kernel send
> buffer is full, so NSD will have to repeat the write from the
> position where it stopped.  It is easy if we're writing just one
> buffer of data.  But we've two: the size and the data itself.
> There are at least 2 ways to deal with it.
> First, there's already mentioned TCP_CORK which can be used on 
> linux (if it isn't already).  It is relatively easy: set it to on
> before attempting to send size, and to off when done sending this
> reply.
> Another option is to use writev() and make it restartable from 
> arbitrary position.  For example, the way how it is done in qemu:
> http://git.qemu.org/?p=qemu.git;a=blob;f=cutils.c;hb=HEAD
> (there, see do_sendv_recvv() function, and note it is writev() but
> has extra argument, "offset").
> But these are possible implementation details of an
> _optimization_, not of a bugfix.
> Thanks,
> /mjt _______________________________________________ nsd-users
> mailing list nsd-users at NLnetLabs.nl 
> http://open.nlnetlabs.nl/mailman/listinfo/nsd-users

Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/


More information about the nsd-users mailing list