[nsd-users] Two TCP segments to answer a query

Fri Mar 2 08:39:39 UTC 2012

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Anand,

Michael is precisely right.

For portability you cannot assume writev or TCP_CORK or other exotic
options are available.  Thus the current method must work.
Implementing some optimization for TCP traffic has not been done.

Other optimizations also exist, possible some malloc()-tricks that are
also backwards compatible with older operating systems.

Unbound does have support for writev().  This does result in some code
duplication for the tcp write implementation.

Best regards,
   Wouter

On 03/02/2012 07:12 AM, Michael Tokarev wrote:
> On 02.03.2012 02:13, Anand Buddhdev wrote:
>> This questions is aimed more at the NSD developers, but of course
>> if anyone knows the answer, feel free to chime in.
>> 
>> While writing some code to process DNS queries and responses over
>> TCP, one of my colleagues noticed something strange about NSD's
>> TCP responses. Here's what we have observed:
>> 
>> client: syn server: syn + ack client: ack client: push + ack +
>> query server: ack server: ack + 2 bytes indicating size of
>> following dns message client: ack server: push + ack + response
>> 
>> I'm omitting the closing sequence of FINs and ACKs here.
>> 
>> Comparing this to a BIND server, we see:
>> 
>> client: syn server: syn + ack client: ack client: push + ack +
>> query server: push + ack + 2 bytes + response
>> 
>> Notice how NSD uses an extra TCP segment to send just the 2
>> bytes indicating the length of the response packet, whereas BIND
>> does it all in the same TCP segment. BIND's behaviour seems
>> logical to me, whereas NSD's seems... strange.
>> 
>> Is there any reason NSD does it this way? TCP performance isn't
>> really an issue for us, so I don't see any immediate need to fix
>> this, if indeed a fix is even needed. We'd just like to
>> understand this difference in behaviour.
> 
> There is no strong reason why NSD _should_ do this the way BIND 
> does it.
> 
> TCP is a STREAM of bytes, the "packetizing" of TCP is not
> specified in any standard at all.  An application can use various
> system- specific methods to express its preference (like TCP_CORK
> in linux), but even with these specified, TCP stack is allowed to
> divide the stream into packets more or less arbitrary.
> 
> So the client shoud be prepared for even the worst case scenario, 
> ie, should be able to read whole thing byte at a time.
> 
> The way BIND does it is merely an optimization, in an attempt to 
> minimize network roundtrips, and it is in no way mandatory again.
> 
> As for optimization itself.  NSD should be prepared for the write 
> to fail with EAGAIN at any time, which means the kernel send
> buffer is full, so NSD will have to repeat the write from the
> position where it stopped.  It is easy if we're writing just one
> buffer of data.  But we've two: the size and the data itself.
> 
> There are at least 2 ways to deal with it.
> 
> First, there's already mentioned TCP_CORK which can be used on 
> linux (if it isn't already).  It is relatively easy: set it to on
> before attempting to send size, and to off when done sending this
> reply.
> 
> Another option is to use writev() and make it restartable from 
> arbitrary position.  For example, the way how it is done in qemu:
> 
> http://git.qemu.org/?p=qemu.git;a=blob;f=cutils.c;hb=HEAD
> 
> (there, see do_sendv_recvv() function, and note it is writev() but
> has extra argument, "offset").
> 
> But these are possible implementation details of an
> _optimization_, not of a bugfix.
> 
> Thanks,
> 
> /mjt _______________________________________________ nsd-users
> mailing list nsd-users at NLnetLabs.nl 
> http://open.nlnetlabs.nl/mailman/listinfo/nsd-users

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJPUIdLAAoJEJ9vHC1+BF+NUfIP/R6ldGZBaaRzFVcYB/ouyiGy
WT024kDIQeMhxm5xCfY7E5RRI98yY/FMXG26ckWYZmUhx4FSOCrzIx3Ll8OWHZXO
0RBI9pYnG4Hlv5MBxET06BePYONMNNYI+GCR+afNHmutgxwaBofwlWUPn7BUmZeo
AdIRVjHO68Pmd2m2ZgvK0N3ZsQsa01SEkj2UBJdS00qzxO4T16aujaD988k+bKrZ
LJIj4NtIFHOYtHtXwh9DAGG2Tz5fCTkx96n5yekJUzSBShsY10awgdDEk9dWAY4z
Yuwurp+iKKsVDtckdZgCxfGCNe73eepfx+aZYK22Bj4gXB2FV47TQY7feq7f7col
HbZ/9/URphcwZ4H4YGpI36raXaKTTCqH7Qv6H7tf7ji56OogvRjRpqnSeb3gcFYV
sAfKGRHQlXooVGHDmxnjCHOFmxgm0j94s1FwkLLAV1JMH6JDVdzd+k2rjuI9ji2f
fcfAawbb+Fsk0IpIdywyo7HRWFz/ZiPy/NRYuDe4UxfebzZUHVfa+DpFmU/azXut
bfWZOpDamo3TwD++A3/JIuPAudMVjblSBHpNJtwIA5Bzr82v+SxLCxcxiatEQqd5
7+jXZf2A1Dg409+fp5/OdL4l5IcFe0fPqEMp0uflc687DpeBLkcfNfdaVkGTVuE6
lbCAFItHeFrqsZc9oFL7
=pTz4
-----END PGP SIGNATURE-----