[Unbound-users] Unbound multithread performance: an investigation into scaling of cache response qps

Wed Mar 24 21:03:26 UTC 2010

On Wed, 24 Mar 2010, W.C.A. Wijngaards wrote:

> evport, forked, 4senders:         9619  15860  19010  21979
> evport, forked, special:         10000  18783  23461  25797 
> evport, forked  special, no-tcp: 10200  18552  20226  27161

At 4 cores, there's 25%-ish overhead due the shared sockets between
processes, and this will likely increase as the number of cores increases.

I don't know how much you want to continue to play with this, but as an
experiment, comparing the forked 4senders case above with one that sent
replies from a socket unique to each process (even with the wrong source
IP/port) would show the socket locking overhead for sending vs receiving.

Generally you need to reply from the shared socket to keep the source port.
If it is most of the overhead, there might be other options for sending,
like binding multiple sockets with SO_REUSEADDR set to the same port.  On
most Unix flavors, the last one to bind gets all incoming unicast traffic,
but I think you can still send from the others.

> evport and pthreads     8400      13600       15900       18100 
> evport and solaristhr   8500      14100       16000       18600

It looks like there's also a 20% overhead for having threading enabled,
regardless of the number of CPUs.  Hopefully, this shouldn't be true on
modern Linux, where uncontested mutexes are basically free.

                                     -- Aaron