[Unbound-users] Unbound multithread performance: an investigation into scaling of cache response qps
Aaron Hopkins
lists at die.net
Wed Mar 24 21:03:26 UTC 2010
On Wed, 24 Mar 2010, W.C.A. Wijngaards wrote:
> evport, forked, 4senders: 9619 15860 19010 21979
> evport, forked, special: 10000 18783 23461 25797
> evport, forked special, no-tcp: 10200 18552 20226 27161
At 4 cores, there's 25%-ish overhead due the shared sockets between
processes, and this will likely increase as the number of cores increases.
I don't know how much you want to continue to play with this, but as an
experiment, comparing the forked 4senders case above with one that sent
replies from a socket unique to each process (even with the wrong source
IP/port) would show the socket locking overhead for sending vs receiving.
Generally you need to reply from the shared socket to keep the source port.
If it is most of the overhead, there might be other options for sending,
like binding multiple sockets with SO_REUSEADDR set to the same port. On
most Unix flavors, the last one to bind gets all incoming unicast traffic,
but I think you can still send from the others.
> evport and pthreads 8400 13600 15900 18100
> evport and solaristhr 8500 14100 16000 18600
It looks like there's also a 20% overhead for having threading enabled,
regardless of the number of CPUs. Hopefully, this shouldn't be true on
modern Linux, where uncontested mutexes are basically free.
-- Aaron
More information about the Unbound-users
mailing list