[nsd-users] Xfrd scalability problem

Martin Svec martin.svec at zoner.cz
Mon Mar 1 11:09:02 UTC 2010


Hi Wouter,

Thank you for your response. I agree with you that server processes
performance is more important than xfrd performance. It should be
sufficient to add xfrd zone handlers to the netio only when their
sockets/timeouts are set. I've found no other serious bottleneck for a
large number of zones yet ;-)

Best regards
Martin Svec


W.C.A. Wijngaards napsal(a):
> Hi Martin,
>
> Thanks for the perf measurements.  I did not know that.  I wrote that
> code some time ago, and decided against optimizing xfrd like this,
> because the netio handler is also used by the server processes.  Those
> processes listen on only a limited number of sockets, and thus this is
> more efficient for them.  If this is the only bottleneck for a larger
> number of zones, then it may be relatively easy to fix.
>
> Best regards,
>    Wouter
>
> On 02/28/2010 08:30 PM, Martin ?vec wrote:
> > Hello again,
>
> > I think that xfrd daemon suffers a scalability problem with respect to
> > the number of zones. For every zone, xfrd adds a netio_handler to the
> > linked list of handlers. Then, every netio_dispatch call sequentially
> > scans the entire list for "valid" filedescriptors and timeouts. With a
> > large number of zones, this scan is pretty expensive and superfluous,
> > because almost all zone filedescriptors/timeouts are usually not
> > assigned. The problem is most obvious during "nsdc reload". Because
> > server_reload function sends soa infos of all zones to xfrd, xfrd
> > performs full scan of the linked list for every zone. So the resulting
> > complexity of reload is O(n^2). Just try "nsdc reload" with 65000 zones
> > and you'll see that xfrd daemon consumes 100% CPU for several _minutes_!
> > However, I guess that the scalability problem is not only limited to
> > reload, because _every_ socket communication with xfrd goes through the
> > same netio_dispatch. There is "perf record" result of xfrd process
> > during reload:
>
> > # Overhead  Command        Shared Object  Symbol
> > # ........  .......  ...................  ......
> > #
> >    98.69%      nsd  /usr/sbin/nsd        [.] netio_dispatch
> >     0.06%      nsd  [kernel]             [k] unix_stream_recvmsg
> >     0.05%      nsd  /usr/sbin/nsd        [.] rbtree_find_less_equal
> >     0.04%      nsd  [kernel]             [k] kfree
> >     0.04%      nsd  [kernel]             [k] copy_to_user
>
> > Then, "perf annotate netio_dispatch" shows that the heart of the problem
> > is indeed in the loop scanning the linked list (because of gcc
> > optimizations, line numbers are only estimative):
>
> > 48.24% /work/nsd-3.2.4/netio.c:158
> > 45.41% /work/nsd-3.2.4/netio.c:158
> > 2.14% /work/nsd-3.2.4/netio.c:172
> > 2.14% /work/nsd-3.2.4/netio.c:156
> > 1.81% /work/nsd-3.2.4/netio.c:172
>
> > I wonder why the linked list in xfrd contains netio_handlers of _all_
> > zones. Wouldn't be better to dynamically add/remove zone handlers only
> > when their filedescriptors/timeouts are assigned/cleared? And perhaps
> > replace the linked list with a more scalable data structure? (Or NSD is
> > intentionally designed to serve only a small number of zones? ;-))
>
> > Best regards
> > Martin Svec
>
>
> > _______________________________________________
> > nsd-users mailing list
> > nsd-users at NLnetLabs.nl
> > http://open.nlnetlabs.nl/mailman/listinfo/nsd-users
>
_______________________________________________
nsd-users mailing list
nsd-users at NLnetLabs.nl
http://open.nlnetlabs.nl/mailman/listinfo/nsd-users





More information about the nsd-users mailing list