[Unbound-users] Unbound multithread performance: an investigation into scaling of cache response qps

Tue Mar 23 15:05:59 UTC 2010

On Tue, 23 Mar 2010, W.C.A. Wijngaards wrote:

> The performance scales up fairly neatly as multi-threading goes.  For
> every configuration a slower-than-linear speedup is observed, indicating
> locks in the underlying operation system network stack.

There was no lock contention within unbound?  I don't know how to measure
this on Solaris, but did you?

> There is only one network card, after all, and the CPUs have to lock and
> synchronise with it.

This should be true even with multiple processes, however.

This maybe not be true for Solaris, but you might try having unbound listen
on multiple ports and spread requests across them and see if it matters.

The last time I looked, recent-ish Linux 2.6 still had per-socket locking
even in the face of multiple network cards.  This means that multiple
threads or even multiple processes sharing a UDP socket can't really exceed
one CPUs worth of raw sendto() performance sourced from the same socket. 
You can get much closer to linear scalability by binding to a different port
or IP per CPU.

                                     -- Aaron