[nsd-users] Frequent RRL false negatives when using multiple server processes on Linux

Matthijs Mekking matthijs at nlnetlabs.nl
Thu Nov 7 09:26:44 UTC 2013


Hi Ville,

On 11/06/2013 02:26 PM, Ville Mattila wrote:
> Hi,
> 
> Please advise how to use Response Rate Limiting on a server which has
> multiple NSD server processes (nsd.conf server section has server-count
>> 1).
> 
> We have a problem with NSD v3.2.16 repeatedly unblocking and blocking
> again a single source which is flooding positive queries at a ~steady
> 700 qps rate.  rrl-ratelimit setting is the default 200 qps.  The
> unblock-block happens multiple times a minute.  This is causing false
> negatives: NSD bursts out 200 responses on every unblock:
> 
> Nov  6 10:11:18 dnstest1 nsd[6881]: ratelimit block demo.funet.fi. type
> positive target 193.166.5.0/24 query 193.166.5.1 NS
> Nov  6 10:11:19 dnstest1 nsd[6876]: ratelimit unblock demo.funet.fi.
> type positive target 193.166.5.0/24 query 193.166.5.1 NS
> Nov  6 10:11:20 dnstest1 nsd[6881]: ratelimit unblock demo.funet.fi.
> type positive target 193.166.5.0/24 query 193.166.5.1 NS
> Nov  6 10:11:21 dnstest1 nsd[6875]: ratelimit unblock demo.funet.fi.
> type positive target 193.166.5.0/24 query 193.166.5.1 NS
> Nov  6 10:11:23 dnstest1 nsd[6880]: ratelimit block demo.funet.fi. type
> positive target 193.166.5.0/24 query 193.166.5.1 NS
> Nov  6 10:11:25 dnstest1 nsd[6880]: ratelimit unblock demo.funet.fi.
> type positive target 193.166.5.0/24 query 193.166.5.1 NS
> Nov  6 10:11:27 dnstest1 nsd[6879]: ratelimit block demo.funet.fi. type
> positive target 193.166.5.0/24 query 193.166.5.1 NS
> Nov  6 10:11:28 dnstest1 nsd[6877]: ratelimit block demo.funet.fi. type
> positive target 193.166.5.0/24 query 193.166.5.1 NS
> Nov  6 10:11:29 dnstest1 nsd[6879]: ratelimit unblock demo.funet.fi.
> type positive target 193.166.5.0/24 query 193.166.5.1 NS
> Nov  6 10:11:29 dnstest1 nsd[6878]: ratelimit block demo.funet.fi. type
> positive target 193.166.5.0/24 query 193.166.5.1 NS
> Nov  6 10:11:30 dnstest1 nsd[6880]: ratelimit block demo.funet.fi. type
> positive target 193.166.5.0/24 query 193.166.5.1 NS
> Nov  6 10:11:42 dnstest1 nsd[6878]: ratelimit unblock demo.funet.fi.
> type positive target 193.166.5.0/24 query 193.166.5.1 NS
> Nov  6 10:11:42 dnstest1 nsd[6881]: ratelimit block demo.funet.fi. type
> positive target 193.166.5.0/24 query 193.166.5.1 NS
> Nov  6 10:12:30 dnstest1 nsd[6877]: ratelimit unblock demo.funet.fi.
> type positive target 193.166.5.0/24 query 193.166.5.1 NS
> Nov  6 10:12:31 dnstest1 nsd[6880]: ratelimit unblock demo.funet.fi.
> type positive target 193.166.5.0/24 query 193.166.5.1 NS
> Nov  6 10:12:31 dnstest1 nsd[6882]: ratelimit block demo.funet.fi. type
> positive target 193.166.5.0/24 query 193.166.5.1 NS
> Nov  6 10:13:30 dnstest1 nsd[6881]: ratelimit unblock demo.funet.fi.
> type positive target 193.166.5.0/24 query 193.166.5.1 NS
> Nov  6 10:13:31 dnstest1 nsd[6876]: ratelimit block demo.funet.fi. type
> positive target 193.166.5.0/24 query 193.166.5.1 NS
> Nov  6 10:14:31 dnstest1 nsd[6878]: ratelimit block demo.funet.fi. type
> positive target 193.166.5.0/24 query 193.166.5.1 NS
> 
> Noting how the PIDs change on the log messages lines I'm guessing what
> happens here is that the operating system (RHEL 6; Linux kernel v2.6.32)
> process scheduler decides to start using a different NSD server process
> every now and then to handle the incoming data on the socket / NIC
> receive queue.  The newly chosen process has the rrl hash bucket for the
> flooding source/type empty and only after sending 200 replies it starts
> blocking.  (NB: The behaviour/interval of changing to a different
> process may depend on what NIC / Linux kernel version / cpu scheduler /
> irq&cpu affinity settings etc. one is using, and of course cannot be
> controlled by NSD.  In this example case the query flood source is our
> lab nameserver 193.166.5.1 itself, but I'm afraid we can expect our
> production Linux server behave ~similarly with external flood sources.)
> 
> If my guess is correct I think the options would be:
> 1. Do nothing and use RRL even though it's per-process.  Even if the
> flood gets unblocked multiple times a minute RRL may still make the
> attack ineffective enough.
> 2. Make use of the multiple receive queues / irq affinity of the server
> network interface card and so that queries from a specific source IP
> always end up being processed by the same CPU, and configure process
> scheduling to tie a single NSD server process to each of those CPUs.
> (Too complex for us!  And of course this has it's drawbacks, too, wrt
> load distribution at least.  And unfortunately our Intel igb NICs only
> can choose the receive queue based on IPv4 srcip,dstip tuples but all
> IPv6 packets end up always in the same queue.)

Yes, this is a problem. A third solution could be to maintain the RRL
table globally, and processes all make use of the same table. However, I
suspect that such a solution will have a huge performance impact.

I am afraid that option 1 (do nothing) is currently the best option. You
may want to tweak the threshold a bit: lower the "rrl-ratelimit:" to
reduce the period of sending false negatives.

Best regards,
  Matthijs

> 
> FWIW, the unblocking seems to be triggered every time by this, around
> line 425 of rrl.c from nsd-3.2.16:
> -----
>         } else if(now - b->stamp > 0) {
>                 /* older bucket */
>                 int olderblock = used_to_block(b->rate, b->counter, lm);
>                 rrl_attenuate_bucket(b, now - b->stamp);
>                 if(olderblock && b->rate < lm)
>                         rrl_msg(query, "unblock");
>                 b->counter = 1;
>                 b->stamp = now;
>         }
> -----
> 
> Thanks,
> 
> 
> 
> _______________________________________________
> nsd-users mailing list
> nsd-users at NLnetLabs.nl
> http://open.nlnetlabs.nl/mailman/listinfo/nsd-users
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 555 bytes
Desc: OpenPGP digital signature
URL: <http://lists.nlnetlabs.nl/pipermail/nsd-users/attachments/20131107/3649100b/attachment.bin>


More information about the nsd-users mailing list