From he at uninett.no Tue Nov 11 15:34:32 2025 From: he at uninett.no (Havard Eidnes) Date: Tue, 11 Nov 2025 16:34:32 +0100 (CET) Subject: Monitoring unbound? Message-ID: <20251111.163432.2026917588341998961.he@uninett.no> Hi, at work we like monitoring our recursive resolvers. To that end we use the collectd package, and our backend is (judging by the name) graphite (can you tell I don't "own" it myself?), while presentation is via grafana. A long time ago I found this external plugin for collectd, https://github.com/falzm/collectd-unbound which was apparently archived in 2021, no longer being developed. Of course, I have a few patches to it... This one in turn depends on the external-to-collectd API presented by https://github.com/collectd/go-collectd All this wrapped up in NetBSD's pkgsrc's patch + build system. Now, the sad part is that these are (for now) un-committed pkgsrc packages (yes, I'm looking to fix that), and the build of them has apparently fallen by the wayside with recent-ish updates -- the latter now apparently depends on the collectd-dev (Debian?) package, and collectd itself fails to install one of the header files it wants to include, ref. # collectd.org/plugin/fake plugin/fake/shutdown.go:6:11: fatal error: plugin.h: No such file or directory 6 | // #include "plugin.h" | ^~~~~~~~~~ compilation terminated. Now, before I go on to report these issues to the respective parties (I'll probably do that anyway...), I'd like to ask what others do in terms of monitoring and visualization of the monitored values for unbound. Surely something less rickety has been put together? Bonus points for integration with graphite and grafana. Best regards, - H?vard From carsten at strotmann.de Tue Nov 11 15:56:51 2025 From: carsten at strotmann.de (Carsten Strotmann) Date: Tue, 11 Nov 2025 16:56:51 +0100 Subject: Monitoring unbound? In-Reply-To: <20251111.163432.2026917588341998961.he@uninett.no> References: <20251111.163432.2026917588341998961.he@uninett.no> Message-ID: <512E7E35-FC50-4C7A-8005-5CFD290B02FA@strotmann.de> Hi Havard, On 11 Nov 2025, at 16:34, Havard Eidnes via Unbound-users wrote: > Now, before I go on to report these issues to the respective > parties (I'll probably do that anyway...), I'd like to ask what > others do in terms of monitoring and visualization of the > monitored values for unbound. Surely something less rickety has > been put together? Bonus points for integration with graphite > and grafana. > I use an prometheus exporter maintained by "Let's encrypt" to collect data from Unbound https://github.com/letsencrypt/unbound_exporter I that collect the data in Prometheus and do visualisation with grafana. I've never worked with collectd and graphite, but it might be possible to read the data the prometheus-exporter provides (key/values over http) from collectd and send the data from there to graphite. Greetings Carsten From philporada at gmail.com Tue Nov 11 16:04:51 2025 From: philporada at gmail.com (Phil Porada) Date: Tue, 11 Nov 2025 11:04:51 -0500 Subject: Monitoring unbound? In-Reply-To: <20251111.163432.2026917588341998961.he@uninett.no> References: <20251111.163432.2026917588341998961.he@uninett.no> Message-ID: You should check out the unbound_exporter which we took over development from Kumina some years ago. We've been using it to great effect ever since. https://github.com/letsencrypt/unbound_exporter On Tue, Nov 11, 2025, 10:35?AM Havard Eidnes via Unbound-users < unbound-users at lists.nlnetlabs.nl> wrote: > Hi, > > at work we like monitoring our recursive resolvers. To that end > we use the collectd package, and our backend is (judging by the > name) graphite (can you tell I don't "own" it myself?), while > presentation is via grafana. > > A long time ago I found this external plugin for collectd, > > https://github.com/falzm/collectd-unbound > > which was apparently archived in 2021, no longer being developed. > Of course, I have a few patches to it... This one in turn > depends on the external-to-collectd API presented by > > https://github.com/collectd/go-collectd > > All this wrapped up in NetBSD's pkgsrc's patch + build system. > > Now, the sad part is that these are (for now) un-committed pkgsrc > packages (yes, I'm looking to fix that), and the build of them > has apparently fallen by the wayside with recent-ish updates -- > the latter now apparently depends on the collectd-dev (Debian?) > package, and collectd itself fails to install one of the header > files it wants to include, ref. > > # collectd.org/plugin/fake > plugin/fake/shutdown.go:6:11: fatal error: plugin.h: No such file or > directory > 6 | // #include "plugin.h" > | ^~~~~~~~~~ > compilation terminated. > > Now, before I go on to report these issues to the respective > parties (I'll probably do that anyway...), I'd like to ask what > others do in terms of monitoring and visualization of the > monitored values for unbound. Surely something less rickety has > been put together? Bonus points for integration with graphite > and grafana. > > Best regards, > > - H?vard > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tarko at lanparty.ee Tue Nov 11 15:38:09 2025 From: tarko at lanparty.ee (Tarko Tikan) Date: Tue, 11 Nov 2025 17:38:09 +0200 Subject: Monitoring unbound? In-Reply-To: <20251111.163432.2026917588341998961.he@uninett.no> References: <20251111.163432.2026917588341998961.he@uninett.no> Message-ID: hey, > Now, before I go on to report these issues to the respective > parties (I'll probably do that anyway...), I'd like to ask what > others do in terms of monitoring and visualization of the > monitored values for unbound. Prometheus and https://github.com/letsencrypt/unbound_exporter Ideally unbound would expose the metrics directly via HTTP in prometheus format. As NSD got native support earlier in 2025, there is some hope I guess :) -- tarko From rblayzor.bulk at inoc.net Tue Nov 11 16:03:02 2025 From: rblayzor.bulk at inoc.net (Robert Blayzor) Date: Tue, 11 Nov 2025 11:03:02 -0500 Subject: Monitoring unbound? In-Reply-To: <20251111.163432.2026917588341998961.he@uninett.no> References: <20251111.163432.2026917588341998961.he@uninett.no> Message-ID: <9c058490-777f-445e-a29d-d9aa6611ca2c@inoc.net> I've used a combination of NetSNMP snmpd running on the host and Cacti for collecting status. (cache hits, misses, queries, etc) Basically snmpd uses a tiny shell one liner to grab the values from unbound-control stats output. Since it's SNMP you can just plug them into anything you want to grab the values. On 11/11/2025 10:34, Havard Eidnes via Unbound-users wrote: > Now, before I go on to report these issues to the respective > parties (I'll probably do that anyway...), I'd like to ask what > others do in terms of monitoring and visualization of the > monitored values for unbound. Surely something less rickety has > been put together? Bonus points for integration with graphite > and grafana. From nicomail+unbound at gmail.com Tue Nov 11 18:50:50 2025 From: nicomail+unbound at gmail.com (Nicolas Baumgarten) Date: Tue, 11 Nov 2025 19:50:50 +0100 Subject: Monitoring unbound? In-Reply-To: <9c058490-777f-445e-a29d-d9aa6611ca2c@inoc.net> References: <20251111.163432.2026917588341998961.he@uninett.no> <9c058490-777f-445e-a29d-d9aa6611ca2c@inoc.net> Message-ID: Hi! we are using collectd and graphite for unbound and other metrics In the case of unbound we have an old perl wrapper around "unbound-control stats_noreset" On Tue, Nov 11, 2025 at 5:30?PM Robert Blayzor via Unbound-users < unbound-users at lists.nlnetlabs.nl> wrote: > I've used a combination of NetSNMP snmpd running on the host and Cacti > for collecting status. (cache hits, misses, queries, etc) > > Basically snmpd uses a tiny shell one liner to grab the values from > unbound-control stats output. > > Since it's SNMP you can just plug them into anything you want to grab > the values. > > On 11/11/2025 10:34, Havard Eidnes via Unbound-users wrote: > > Now, before I go on to report these issues to the respective > > parties (I'll probably do that anyway...), I'd like to ask what > > others do in terms of monitoring and visualization of the > > monitored values for unbound. Surely something less rickety has > > been put together? Bonus points for integration with graphite > > and grafana. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From he at uninett.no Tue Nov 11 21:36:23 2025 From: he at uninett.no (Havard Eidnes) Date: Tue, 11 Nov 2025 22:36:23 +0100 (CET) Subject: Monitoring unbound? In-Reply-To: References: <20251111.163432.2026917588341998961.he@uninett.no> <9c058490-777f-445e-a29d-d9aa6611ca2c@inoc.net> Message-ID: <20251111.223623.442346477316468477.he@uninett.no> > Hi! > we are using collectd and graphite for unbound and other metrics > > In the case of unbound we have an old perl wrapper around > "unbound-control stats_noreset" OK. Allow me to follow a tangent, and get up to ride a hobby horse of mine... Our unbound.conf has # don't zero stats on read(!) statistics-cumulative: yes It is beyond me why anyone would have it any other way. What if you wanted to have two distinct monitoring setups looking at the same unbound instance? A counter is a counter! Maybe I'm too much influenced by dealing with SNMP in other parts of my work (but I hope not). Regards, - H?vard From seth.vanburen at versent.com.au Tue Nov 11 22:32:40 2025 From: seth.vanburen at versent.com.au (Seth Van Buren) Date: Tue, 11 Nov 2025 22:32:40 +0000 Subject: Monitoring unbound? In-Reply-To: <20251111.223623.442346477316468477.he@uninett.no> References: <20251111.163432.2026917588341998961.he@uninett.no> <9c058490-777f-445e-a29d-d9aa6611ca2c@inoc.net> <20251111.223623.442346477316468477.he@uninett.no> Message-ID: We use AWS Cloudwatch with some Dasboards. Our entire solution is in AWS. From: Unbound-users on behalf of Havard Eidnes via Unbound-users Date: Wednesday, 12 November 2025 at 7:37?am To: nicomail+unbound at gmail.com Cc: unbound-users at lists.nlnetlabs.nl Subject: Re: Monitoring unbound? > Hi! > we are using collectd and graphite for unbound and other metrics > > In the case of unbound we have an old perl wrapper around > "unbound-control stats_noreset" OK. Allow me to follow a tangent, and get up to ride a hobby horse of mine... Our unbound.conf has # don't zero stats on read(!) statistics-cumulative: yes It is beyond me why anyone would have it any other way. What if you wanted to have two distinct monitoring setups looking at the same unbound instance? A counter is a counter! Maybe I'm too much influenced by dealing with SNMP in other parts of my work (but I hope not). Regards, - H?vard -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sebbe.eu Wed Nov 19 11:41:21 2025 From: sebastian at sebbe.eu (sebastian) Date: Wed, 19 Nov 2025 12:41:21 +0100 Subject: =?UTF-8?Q?respond_with_fake_IP_for_DNS_rebinding_hits=3F?= Message-ID: I currently have an unbound server.However, with some mail providers using the "exists:" mechanism and returning 127.0.0.1, this obviously triggers a DNS rebinding protection and SERVFAIL.This ultimate leads to an SPF rejection.Is there any way to configure unbound, such as so if the rebinding protection trips, it will instead return a non-routeable bogus IP like "192.0.2.123" (documentation only) which both ensures the "exists:" mechanism works as intended, but also protects the localhost if a malicious actor were to do a rebinding attack..Im thinking of excluding 127.0.0.0/8 from private adress, and then use some sort of rewriting mechanism if this exists in unbound? -------------- next part -------------- An HTML attachment was scrubbed... URL: From sca at andreasschulze.de Wed Nov 19 14:45:18 2025 From: sca at andreasschulze.de (A. Schulze) Date: Wed, 19 Nov 2025 15:45:18 +0100 Subject: respond with fake IP for DNS rebinding hits? In-Reply-To: Message-ID: <20251119154518.Horde.98d3QwTUiUHsu4Zzorw6HF3@andreasschulze.de> sebastian via Unbound-users: > I currently have an unbound server.However, with some mail providers > using the "exists:" mechanism and returning 127.0.0.1, this > obviously triggers a DNS rebinding protection and SERVFAIL.This > ultimate leads to an SPF rejection.Is there any way to configure > unbound, such as so if the rebinding protection trips, it will > instead return a non-routeable bogus IP like "192.0.2.123" > (documentation only) which both ensures the "exists:" mechanism > works as intended, but also protects the localhost if a malicious > actor were to do a rebinding attack..Im thinking of excluding > 127.0.0.0/8 from private adress, and then use some sort of rewriting > mechanism if this exists in unbound? Hi, could you describe more verbose, who ask what and why. -> full queries RBLs use an answer 127.0.0.1 all the time. I dont's see, why this should be a rebind attack. Do you have a special unbound setting enabled? Andreas From sebastian at sebbe.eu Wed Nov 19 15:16:07 2025 From: sebastian at sebbe.eu (Sebastian Nielsen) Date: Wed, 19 Nov 2025 16:16:07 +0100 Subject: =?UTF-8?Q?Sv=3A_respond_with_fake_IP_for_DNS_rebinding_hits=3F?= In-Reply-To: <20251119154518.Horde.98d3QwTUiUHsu4Zzorw6HF3@andreasschulze.de> References: <20251119154518.Horde.98d3QwTUiUHsu4Zzorw6HF3@andreasschulze.de> Message-ID: <002401dc5967$6ed57a20$4c806e60$@sebbe.eu> For RBLs I have exceptions. For each RBL server, I have private-domain configured, allowing each RBL server I use, which then is whitelisted, to respond with 127.x.x.x. However, when a mail provider configures "exists:" in their SPF with a macro, it becomes a problem because I can't anticipiate who gonna send mail to me, check their SPF if they have an "exists" and then whitelist their weird server. Causing my DNS to trip rebind protection, return SERVFAIL and then the SPF fails with a SPF failure since the record don't "exists:". (when it really does, DNS rebind protection just swallowed the record). Since I have regular client computers behind the same firewall, I can't just disable DNS rebind protection. So what I want unbound to do, is to, instead of "swallowing" the response when its a "prohibited rebind response" I want it to respond with a bougus IP adress, in this way, any rebind attempts will fail, while still the "exists:" mechanism in SPF will work anyways. (since the exists: mechanism doesn't care about the IP adress). Easiest way would be to have some rewrite mechanism, if the A record would contain 127.0.0.0/8, rewrite that to 192.0.2.123. Does something like that exist in unbound? -----Ursprungligt meddelande----- Fr?n: A. Schulze via Unbound-users Skickat: den 19 november 2025 15:47 Till: unbound-users at lists.nlnetlabs.nl ?mne: Re: respond with fake IP for DNS rebinding hits? sebastian via Unbound-users: > I currently have an unbound server.However, with some mail providers > using the "exists:" mechanism and returning 127.0.0.1, this > obviously triggers a DNS rebinding protection and SERVFAIL.This > ultimate leads to an SPF rejection.Is there any way to configure > unbound, such as so if the rebinding protection trips, it will > instead return a non-routeable bogus IP like "192.0.2.123" > (documentation only) which both ensures the "exists:" mechanism > works as intended, but also protects the localhost if a malicious > actor were to do a rebinding attack..Im thinking of excluding > 127.0.0.0/8 from private adress, and then use some sort of rewriting > mechanism if this exists in unbound? Hi, could you describe more verbose, who ask what and why. -> full queries RBLs use an answer 127.0.0.1 all the time. I dont's see, why this should be a rebind attack. Do you have a special unbound setting enabled? Andreas From sca at andreasschulze.de Wed Nov 19 17:18:02 2025 From: sca at andreasschulze.de (A.Schulze) Date: Wed, 19 Nov 2025 18:18:02 +0100 Subject: Sv: respond with fake IP for DNS rebinding hits? In-Reply-To: <002401dc5967$6ed57a20$4c806e60$@sebbe.eu> References: <20251119154518.Horde.98d3QwTUiUHsu4Zzorw6HF3@andreasschulze.de> <002401dc5967$6ed57a20$4c806e60$@sebbe.eu> Message-ID: Am 19.11.25 um 16:16 schrieb Sebastian Nielsen via Unbound-users: > For RBLs I have exceptions. For each RBL server, I have private-domain configured, allowing each RBL server I use, which then is whitelisted, to respond with 127.x.x.x. I've still no idea, what your problem is. Can you provide example domains with an spf record containing 'exists:' ? Also I would like to know your "whitelist" -> can you post your unbound.conf? Andreas From sebastian at sebbe.eu Wed Nov 19 17:29:14 2025 From: sebastian at sebbe.eu (Sebastian Nielsen) Date: Wed, 19 Nov 2025 18:29:14 +0100 Subject: =?UTF-8?Q?Sv=3A_Sv=3A_respond_with_fake_IP_for_DNS_rebinding_hits=3F?= In-Reply-To: References: <20251119154518.Horde.98d3QwTUiUHsu4Zzorw6HF3@andreasschulze.de> <002401dc5967$6ed57a20$4c806e60$@sebbe.eu> Message-ID: <005001dc597a$07bbe9b0$1733bd10$@sebbe.eu> Here is a example: goteborg.se It has this weird "exists:%{i}.spf.hc2437-76.eu.iphmx.com" which for a valid connection translates to 127.0.0.2 Try with for example, 23.90.102.86.spf.hc2437-76.eu.iphmx.com You can see here: https://mxtoolbox.com/SuperTool.aspx?action=a%3a23.90.102.86.spf.hc2437-76.eu.iphmx.com&run=toolpage This 127.0.0.2 gets caugt in the DNS rebinding filter, and then the SPF validation fails. Here is the relevant config portion for the DNS rebinding protection: server: private-domain: sebbe.eu private-domain: list.dnswl.org private-address: 192.168.0.0/16 private-address: 10.0.0.0/8 private-address: 172.16.0.0/12 private-address: 169.254.0.0/16 private-address: 127.0.0.0/8 private-address: 0:0:0:0:0:ffff:c0a8:0/112 private-address: 0:0:0:0:0:ffff:a00:0/104 private-address: 0:0:0:0:0:ffff:ac10:0/108 private-address: 0:0:0:0:0:ffff:a9fe:0/112 private-address: 0:0:0:0:0:ffff:7f00:0/104 private-address: ::1/128 private-address: fd00::/8 private-address: fe80::/10 The "private-domain" whitelists certain DNS servers to respond with a private-address. So list.dnswl.org and sebbe.eu is permitted to respond with any adress listed as private-address. Any other server responding with a IP listed as private-address is blocked. Now I would want to, instead of blocking the 127.0.0.0/8 responses, respond with a bougus IP like "192.0.2.123" which is a IP reserved for documentation (TESTNET) which is unrouteable both in LAN enviroments, Localhost enviroment and also on the internet. Thus, providing a record so the IP "exists:" but still protects any clients behind the same firewall from DNS rebinding attacks. -----Ursprungligt meddelande----- Fr?n: A.Schulze via Unbound-users Skickat: den 19 november 2025 18:19 Till: unbound-users at lists.nlnetlabs.nl ?mne: Re: Sv: respond with fake IP for DNS rebinding hits? Am 19.11.25 um 16:16 schrieb Sebastian Nielsen via Unbound-users: > For RBLs I have exceptions. For each RBL server, I have private-domain configured, allowing each RBL server I use, which then is whitelisted, to respond with 127.x.x.x. I've still no idea, what your problem is. Can you provide example domains with an spf record containing 'exists:' ? Also I would like to know your "whitelist" -> can you post your unbound.conf? Andreas From sca at andreasschulze.de Wed Nov 19 18:36:23 2025 From: sca at andreasschulze.de (A.Schulze) Date: Wed, 19 Nov 2025 19:36:23 +0100 Subject: Sv: Sv: respond with fake IP for DNS rebinding hits? In-Reply-To: <005001dc597a$07bbe9b0$1733bd10$@sebbe.eu> References: <20251119154518.Horde.98d3QwTUiUHsu4Zzorw6HF3@andreasschulze.de> <002401dc5967$6ed57a20$4c806e60$@sebbe.eu> <005001dc597a$07bbe9b0$1733bd10$@sebbe.eu> Message-ID: <449a157f-2715-4a7d-abe2-cad60fc96e51@andreasschulze.de> Am 19.11.25 um 18:29 schrieb Sebastian Nielsen via Unbound-users: > Here is a example: > goteborg.se > It has this weird "exists:%{i}.spf.hc2437-76.eu.iphmx.com" which for a valid connection translates to 127.0.0.2 > > Try with for example, 23.90.102.86.spf.hc2437-76.eu.iphmx.com > > You can see here: > https://mxtoolbox.com/SuperTool.aspx?action=a%3a23.90.102.86.spf.hc2437-76.eu.iphmx.com&run=toolpage > This 127.0.0.2 gets caugt in the DNS rebinding filter, and then the SPF validation fails. Now I understand your setup. man (5) unbound.conf say private-address: ... We consider to enable this for the RFC1918 private IP address space by default in later releases ... I assume, "private-address" setting is not set by default for good reasons and the unbound developer didn't changed that default till today. I see the value of rebind protection a systems, used by humans. But a mail server is an other use-case. One way to solve your issue is to run two resolver instances. One for servers and one for end-user systems, only the later configured with "private-address". Andreas From sebastian at sebbe.eu Wed Nov 19 19:03:00 2025 From: sebastian at sebbe.eu (Sebastian Nielsen) Date: Wed, 19 Nov 2025 20:03:00 +0100 Subject: =?UTF-8?Q?Sv=3A_Sv=3A_Sv=3A_respond_with_fake_IP_for_DNS_rebinding_h?= =?UTF-8?Q?its=3F?= In-Reply-To: <449a157f-2715-4a7d-abe2-cad60fc96e51@andreasschulze.de> References: <20251119154518.Horde.98d3QwTUiUHsu4Zzorw6HF3@andreasschulze.de> <002401dc5967$6ed57a20$4c806e60$@sebbe.eu> <005001dc597a$07bbe9b0$1733bd10$@sebbe.eu> <449a157f-2715-4a7d-abe2-cad60fc96e51@andreasschulze.de> Message-ID: <007401dc5987$211df260$6359d720$@sebbe.eu> Correct, I have manually set up the DNS rebinding protection feature, to increase security. Is there any way to rewrite all 127.0.0.0/8 responses to a custom IP? Suspect there is some rewrite module or similiar that can replace responses right? Mail server and LAN clients are behind the same firewall, thats why I need rebinding protection. Could move the mailserver off the LAN to a separate net, but requires pulling a long new patch cable. -----Ursprungligt meddelande----- Fr?n: A.Schulze via Unbound-users Skickat: den 19 november 2025 19:38 Till: unbound-users at lists.nlnetlabs.nl ?mne: Re: Sv: Sv: respond with fake IP for DNS rebinding hits? Am 19.11.25 um 18:29 schrieb Sebastian Nielsen via Unbound-users: > Here is a example: > goteborg.se > It has this weird "exists:%{i}.spf.hc2437-76.eu.iphmx.com" which for a valid connection translates to 127.0.0.2 > > Try with for example, 23.90.102.86.spf.hc2437-76.eu.iphmx.com > > You can see here: > https://mxtoolbox.com/SuperTool.aspx?action=a%3a23.90.102.86.spf.hc2437-76.eu.iphmx.com&run=toolpage > This 127.0.0.2 gets caugt in the DNS rebinding filter, and then the SPF validation fails. Now I understand your setup. man (5) unbound.conf say private-address: ... We consider to enable this for the RFC1918 private IP address space by default in later releases ... I assume, "private-address" setting is not set by default for good reasons and the unbound developer didn't changed that default till today. I see the value of rebind protection a systems, used by humans. But a mail server is an other use-case. One way to solve your issue is to run two resolver instances. One for servers and one for end-user systems, only the later configured with "private-address". Andreas From pemensik at redhat.com Fri Nov 21 11:02:22 2025 From: pemensik at redhat.com (=?UTF-8?B?UGV0ciBNZW7FocOtaw==?=) Date: Fri, 21 Nov 2025 12:02:22 +0100 Subject: Monitoring unbound? In-Reply-To: References: <20251111.163432.2026917588341998961.he@uninett.no> Message-ID: <7387ce0a-6e11-4a7e-a09b-bdcd330aace7@redhat.com> Such topic were recently started also on BIND9. If you can document what different statistics is in use now, it might be used to create one common format used by any DNS service. It is silly when each have different format, which can be transformed by some external module into format common on some monitoring service. I am not sure unbound should offer it on HTTP service socket, but it would be great if it could provide general numbers in common format. If it had some numbers different from other implementations, export only those in implementation specific extension. But I think majority of DNS software has similar numbers they want to watch. On 11/11/2025 16:38, Tarko Tikan via Unbound-users wrote: > hey, > >> Now, before I go on to report these issues to the respective >> parties (I'll probably do that anyway...), I'd like to ask what >> others do in terms of monitoring and visualization of the >> monitored values for unbound. > > Prometheus and https://github.com/letsencrypt/unbound_exporter > > Ideally unbound would expose the metrics directly via HTTP in > prometheus format. As NSD got native support earlier in 2025, there is > some hope I guess :) > -- Petr Men??k Senior Software Engineer, RHEL Red Hat, https://www.redhat.com/ PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB From paul at nohats.ca Fri Nov 21 15:02:07 2025 From: paul at nohats.ca (Paul Wouters) Date: Fri, 21 Nov 2025 10:02:07 -0500 (EST) Subject: Monitoring unbound? In-Reply-To: <7387ce0a-6e11-4a7e-a09b-bdcd330aace7@redhat.com> References: <20251111.163432.2026917588341998961.he@uninett.no> <7387ce0a-6e11-4a7e-a09b-bdcd330aace7@redhat.com> Message-ID: <16699ff8-1843-cc53-b851-fd007cc57b37@nohats.ca> On Fri, 21 Nov 2025, Petr Men??k via Unbound-users wrote: > Such topic were recently started also on BIND9. If you can document what > different statistics is in use now, it might be used to create one common > format used by any DNS service. It is silly when each have different format, > which can be transformed by some external module into format common on some > monitoring service. > > I am not sure unbound should offer it on HTTP service socket Do you mean one of these: unbound-control stats_noreset unbound-control stats Paul From pemensik at redhat.com Fri Nov 21 17:53:30 2025 From: pemensik at redhat.com (=?UTF-8?B?UGV0ciBNZW7FocOtaw==?=) Date: Fri, 21 Nov 2025 18:53:30 +0100 Subject: Monitoring unbound? In-Reply-To: <16699ff8-1843-cc53-b851-fd007cc57b37@nohats.ca> References: <20251111.163432.2026917588341998961.he@uninett.no> <7387ce0a-6e11-4a7e-a09b-bdcd330aace7@redhat.com> <16699ff8-1843-cc53-b851-fd007cc57b37@nohats.ca> Message-ID: <0535b7f0-c59f-4f32-9acf-df4643be2abd@redhat.com> No. I think that was question on some conference, DNS-OARC perhaps. The proposal was what if bind9, unbound, knot-resolver and pdns-recursor could create the same format for their statistics. So prometheus could have only one statistics parser code. It might be exported to different path in filesystem and that should be enough. Only path and content should be different for different services. Format should ideally stay compatible. Then it would require less code as glue between statistics dashboards used and the DNS service itself. I think such common format would be great. I would prefer something json based. I can describe only bind9 and unbound statistics. Their format is very different, although quite a lot numbers could be similar. This is main statistics refactoring issue at bind9 https://gitlab.isc.org/isc-projects/bind9/-/issues/38 I am not sure where exactly did they talk about requirements for a new format, sorry. I think it was mentioned after some talk at some OARC recording, but do not remember which one. On 21/11/2025 16:02, Paul Wouters wrote: > On Fri, 21 Nov 2025, Petr Men??k via Unbound-users wrote: > >> Such topic were recently started also on BIND9. If you can document >> what different statistics is in use now, it might be used to create >> one common format used by any DNS service. It is silly when each have >> different format, which can be transformed by some external module >> into format common on some monitoring service. >> >> I am not sure unbound should offer it on HTTP service socket > > Do you mean one of these: > > unbound-control stats_noreset > unbound-control stats > > Paul Yes. It would be nice, if it could serve only subtree. Hmm, would be cool, if it could serve CH answer.num.stats. TXT? query? Although that would need some ACL protection. The then you want to put this into some graph usually. That needs to pick specific fields from these and map them to some graph lines. Every implemetation seems to have very different statistics output format. Systemd people like Varlink format. That could be usable too. Petr -- Petr Men??k Senior Software Engineer, RHEL Red Hat, https://www.redhat.com/ PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB From paul at nohats.ca Fri Nov 21 18:23:18 2025 From: paul at nohats.ca (Paul Wouters) Date: Fri, 21 Nov 2025 13:23:18 -0500 Subject: Monitoring unbound? In-Reply-To: <0535b7f0-c59f-4f32-9acf-df4643be2abd@redhat.com> References: <0535b7f0-c59f-4f32-9acf-df4643be2abd@redhat.com> Message-ID: <2CE864F1-4B20-4087-B57E-50770230C987@nohats.ca> > On Nov 21, 2025, at 12:53, Petr Men??k wrote: > > ?No. I think that was question on some conference, DNS-OARC perhaps. > > The proposal was what if bind9, unbound, knot-resolver and pdns-recursor could create the same format for their statistics. Ahh so then perhaps you should write an IETF Yang module for this and then to export it to yang. Paul > So prometheus could have only one statistics parser code. It might be exported to different path in filesystem and that should be enough. Only path and content should be different for different services. Format should ideally stay compatible. Then it would require less code as glue between statistics dashboards used and the DNS service itself. > > I think such common format would be great. I would prefer something json based. I can describe only bind9 and unbound statistics. Their format is very different, although quite a lot numbers could be similar. > > This is main statistics refactoring issue at bind9 > > https://gitlab.isc.org/isc-projects/bind9/-/issues/38 > > I am not sure where exactly did they talk about requirements for a new format, sorry. I think it was mentioned after some talk at some OARC recording, but do not remember which one. > >> On 21/11/2025 16:02, Paul Wouters wrote: >>> On Fri, 21 Nov 2025, Petr Men??k via Unbound-users wrote: >>> >>> Such topic were recently started also on BIND9. If you can document what different statistics is in use now, it might be used to create one common format used by any DNS service. It is silly when each have different format, which can be transformed by some external module into format common on some monitoring service. >>> >>> I am not sure unbound should offer it on HTTP service socket >> >> Do you mean one of these: >> >> unbound-control stats_noreset >> unbound-control stats >> >> Paul > > Yes. It would be nice, if it could serve only subtree. Hmm, would be cool, if it could serve CH answer.num.stats. TXT? query? Although that would need some ACL protection. > > The then you want to put this into some graph usually. That needs to pick specific fields from these and map them to some graph lines. Every implemetation seems to have very different statistics output format. > > Systemd people like Varlink format. That could be usable too. > > Petr > > -- > Petr Men??k > Senior Software Engineer, RHEL > Red Hat, https://www.redhat.com/ > PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB From edmonds at debian.org Fri Nov 21 22:36:27 2025 From: edmonds at debian.org (Robert Edmonds) Date: Fri, 21 Nov 2025 17:36:27 -0500 Subject: Monitoring unbound? In-Reply-To: <0535b7f0-c59f-4f32-9acf-df4643be2abd@redhat.com> References: <20251111.163432.2026917588341998961.he@uninett.no> <7387ce0a-6e11-4a7e-a09b-bdcd330aace7@redhat.com> <16699ff8-1843-cc53-b851-fd007cc57b37@nohats.ca> <0535b7f0-c59f-4f32-9acf-df4643be2abd@redhat.com> Message-ID: Petr Men??k via Unbound-users wrote: > No. I think that was question on some conference, DNS-OARC perhaps. > > The proposal was what if bind9, unbound, knot-resolver and pdns-recursor > could create the same format for their statistics. So prometheus could have > only one statistics parser code. It might be exported to different path in > filesystem and that should be enough. Only path and content should be > different for different services. Format should ideally stay compatible. > Then it would require less code as glue between statistics dashboards used > and the DNS service itself. > > I think such common format would be great. I would prefer something json > based. I can describe only bind9 and unbound statistics. Their format is > very different, although quite a lot numbers could be similar. > > This is main statistics refactoring issue at bind9 > > https://gitlab.isc.org/isc-projects/bind9/-/issues/38 > > I am not sure where exactly did they talk about requirements for a new > format, sorry. I think it was mentioned after some talk at some OARC > recording, but do not remember which one. I don't think BIND, Unbound, Knot Resolver, and PowerDNS Recursor should generate identical statistics. It would be nice if they used a de facto standard like Prometheus/OpenMetrics format exposed on an HTTP endpoint so that their metrics can be scraped and ingested by modern observability stacks. Currently Unbound requires deploying a third party daemon [0] alongside Unbound to convert the bespoke "UBCT1" protocol (the protocol that unbound-control speaks to the Unbound daemon) in order to ingest Unbound's metrics into a Prometheus-compatible stack. There are some metrics that count the number of times certain kinds of packets occur (queries/responses by QTYPE/OPCODE/RCODE, by transport, etc.) where you can arguably find some level of commonality between different DNS server implementations because they are just counting objective, externally observable events. If you are restricting your visibility to just these externally observable properties of DNS transactions, then perhaps it might be possible to share glue code and "statistics dashboards" between different DNS server implementations. But this is a fairly basic level of visibility. DNS server implementations are going to have diverse internal architectures and implementation details and some level of visibility (or "observability") into the health of those internal implementation details is highly desireable. For instance, I care very much about why Unbound might have dropped a query from a client. It's not very useful or actionable to have a single "number of client queries that were dropped" metric in the DNS server that aggregates every cause together. (All this tells me is that the query got to the server and I can exclude external possibilities like socket receive buffer overruns from the possible causes.) You need more fine grained metrics that let you track down what mechanism(s) resulted in the query drops. So Unbound has been getting more fine grained metrics like [1, 2] that help explain exactly which implementation specific mechanisms are resulting in query drops. It would be unreasonable to expect every implementation to have the same metrics like the ones in [1, 2], because these are implementation-specific details that are going to vary because different implementations take different approaches to solving various problems. It would also be unreasonable to just take a union of all such implementation-specific metrics and add them to a single common format and just have implementations omit the ones that aren't relevant to them. (Or, even worse, have different implementations use the same metric names to mean totally different things, or sort of similar but not really the same things.) So my recommendations are basically: 1) Don't innovate on the exposition format. DNS servers exist in a universe with many other kinds of servers that have had to deal with broadly similar issues and this is not a greenfield. Prometheus/OpenTelemetry already exists. If you come up with a bespoke XML, JSON, protobuf, etc. format, it will have to be converted to something else in order to import it into modern observability stacks, so just generate that format directly. (If you disagree, then by all means, design an additional layer of internal abstraction and build a pluggable module interface so you can support a multitude of different metrics exposition formats/transports.) 2) Every vendor should come up with their own naming scheme, organization, and definition of implementation-specific "health" metrics that fit their own software most naturally. 3) There may be some value in regularizing across server implementations the definitions of metrics that count externally visible properties of DNS transactions (the QTYPEs and RCODEs, etc.). But this kind of effort should be narrowly scoped to exclude the implementation-specific "health" metrics. [0]: https://github.com/letsencrypt/unbound_exporter [1]: https://github.com/NLnetLabs/unbound/pull/1159 [2]: https://github.com/NLnetLabs/unbound/pull/1374 -- Robert Edmonds edmonds at debian.org From sirizake at gmail.com Tue Nov 25 04:49:44 2025 From: sirizake at gmail.com (sir izake) Date: Tue, 25 Nov 2025 04:49:44 +0000 Subject: How to measure cache hit resolution time in unbound 1.24.1 Message-ID: Hi I have installed unbound 1.24.1 on FreeBSD 14.3 OS. My cache hit rate is 76% with over 20% coming through recursive replies. The median time for recursive replies is 440ms while the avg is 520ms. This setup has been running for over 72hrs. I expect stats to improve but that is not happening. Just wanted to find out if there is a way to measure the cache hit resolution time in a dashboard? Can I do anything to improve cache hit ratio? Can I also improve the recursive reply time? I am using unbound_exporter to monitor stats in grafana My configs have been adjusted as follows: rrset-cache-size: 20G msg-cache-size: 10G cache-min-ttl: 1800 I am using the root hint files directly on the server for recursive lookup and not forwarding to any public resolver Thank you Regards, Isaac -------------- next part -------------- An HTML attachment was scrubbed... URL: From seth.vanburen at versent.com.au Tue Nov 25 05:13:15 2025 From: seth.vanburen at versent.com.au (Seth Van Buren) Date: Tue, 25 Nov 2025 05:13:15 +0000 Subject: How to measure cache hit resolution time in unbound 1.24.1 In-Reply-To: References: Message-ID: Home many cores/slabs are you using? From: Unbound-users on behalf of sir izake via Unbound-users Date: Tuesday, 25 November 2025 at 2:51?pm To: unbound-users at lists.nlnetlabs.nl Subject: How to measure cache hit resolution time in unbound 1.24.1 Hi I have installed unbound 1.24.1 on FreeBSD 14.3 OS. My cache hit rate is 76% with over 20% coming through recursive replies. The median time for recursive replies is 440ms while the avg is 520ms. This setup has been running for over 72hrs. I expect stats to improve but that is not happening. Just wanted to find out if there is a way to measure the cache hit resolution time in a dashboard? Can I do anything to improve cache hit ratio? Can I also improve the recursive reply time? I am using unbound_exporter to monitor stats in grafana My configs have been adjusted as follows: rrset-cache-size: 20G msg-cache-size: 10G cache-min-ttl: 1800 I am using the root hint files directly on the server for recursive lookup and not forwarding to any public resolver Thank you Regards, Isaac -------------- next part -------------- An HTML attachment was scrubbed... URL: From sirizake at gmail.com Tue Nov 25 10:08:11 2025 From: sirizake at gmail.com (sir izake) Date: Tue, 25 Nov 2025 10:08:11 +0000 Subject: How to measure cache hit resolution time in unbound 1.24.1 In-Reply-To: References: Message-ID: Hi Seth num-threads: 64 msg-cache-slabs: 32 rrset-cache-slabs: 32 infra-cache-slabs: 32 key-cache-slabs: 32 ratelimit-slabs: 32 ip-ratelimit-slabs: 32 The physical server is a dell 640 with specs below hw.ncpu: 104 hw.model: Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz Thank you Isaac On Tue, Nov 25, 2025 at 5:13?AM Seth Van Buren wrote: > Home many cores/slabs are you using? > > *From: *Unbound-users on > behalf of sir izake via Unbound-users > *Date: *Tuesday, 25 November 2025 at 2:51?pm > *To: *unbound-users at lists.nlnetlabs.nl > *Subject: *How to measure cache hit resolution time in unbound 1.24.1 > > Hi > > I have installed unbound 1.24.1 on FreeBSD 14.3 OS. My cache hit rate is > 76% with over 20% coming through recursive replies. > > The median time for recursive replies is 440ms while the avg is 520ms. > > This setup has been running for over 72hrs. I expect stats to improve but > that is not happening. > > Just wanted to find out if there is a way to measure the cache hit > resolution time in a dashboard? > > Can I do anything to improve cache hit ratio? > > Can I also improve the recursive reply time? > > I am using unbound_exporter to monitor stats in grafana > > My configs have been adjusted as follows: > rrset-cache-size: 20G > msg-cache-size: 10G > cache-min-ttl: 1800 > > I am using the root hint files directly on the server for recursive lookup > and not forwarding to any public resolver > > Thank you > > Regards, > Isaac > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From seth.vanburen at versent.com.au Wed Nov 26 03:34:39 2025 From: seth.vanburen at versent.com.au (Seth Van Buren) Date: Wed, 26 Nov 2025 03:34:39 +0000 Subject: How to measure cache hit resolution time in unbound 1.24.1 In-Reply-To: References: Message-ID: Your thread should be equal to or lower than the number of slabs. The thread count seems extremely high, you should not need so many. You should set num-queries-per-thread. Try 16384 Can you also paste your memory settings and Cache settings? From: sir izake Date: Tuesday, 25 November 2025 at 8:08?pm To: Seth Van Buren Cc: unbound-users at lists.nlnetlabs.nl Subject: Re: How to measure cache hit resolution time in unbound 1.24.1 Hi Seth num-threads: 64 msg-cache-slabs: 32 rrset-cache-slabs: 32 infra-cache-slabs: 32 key-cache-slabs: 32 ratelimit-slabs: 32 ip-ratelimit-slabs: 32 The physical server is a dell 640 with specs below hw.ncpu: 104 hw.model: Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz Thank you Isaac On Tue, Nov 25, 2025 at 5:13?AM Seth Van Buren > wrote: Home many cores/slabs are you using? From: Unbound-users > on behalf of sir izake via Unbound-users > Date: Tuesday, 25 November 2025 at 2:51?pm To: unbound-users at lists.nlnetlabs.nl > Subject: How to measure cache hit resolution time in unbound 1.24.1 Hi I have installed unbound 1.24.1 on FreeBSD 14.3 OS. My cache hit rate is 76% with over 20% coming through recursive replies. The median time for recursive replies is 440ms while the avg is 520ms. This setup has been running for over 72hrs. I expect stats to improve but that is not happening. Just wanted to find out if there is a way to measure the cache hit resolution time in a dashboard? Can I do anything to improve cache hit ratio? Can I also improve the recursive reply time? I am using unbound_exporter to monitor stats in grafana My configs have been adjusted as follows: rrset-cache-size: 20G msg-cache-size: 10G cache-min-ttl: 1800 I am using the root hint files directly on the server for recursive lookup and not forwarding to any public resolver Thank you Regards, Isaac -------------- next part -------------- An HTML attachment was scrubbed... URL: From sirizake at gmail.com Wed Nov 26 09:35:31 2025 From: sirizake at gmail.com (sir izake) Date: Wed, 26 Nov 2025 09:35:31 +0000 Subject: How to measure cache hit resolution time in unbound 1.24.1 In-Reply-To: References: Message-ID: Hi Seth The server is dedicated for this purpose hence the high number of threads below configs are in place: num-queries-per-thread: 4096 msg-cache-size: 10G rrset-cache-size: 20G key-cache-size: 1G Thank you On Wed, Nov 26, 2025 at 3:34?AM Seth Van Buren wrote: > Your thread should be equal to or lower than the number of slabs. > > The thread count seems extremely high, you should not need so many. You > should set num-queries-per-thread. Try 16384 > > Can you also paste your memory settings and Cache settings? > > > *From: *sir izake > *Date: *Tuesday, 25 November 2025 at 8:08?pm > *To: *Seth Van Buren > *Cc: *unbound-users at lists.nlnetlabs.nl > *Subject: *Re: How to measure cache hit resolution time in unbound 1.24.1 > > Hi Seth > > num-threads: 64 > msg-cache-slabs: 32 > rrset-cache-slabs: 32 > infra-cache-slabs: 32 > key-cache-slabs: 32 > ratelimit-slabs: 32 > ip-ratelimit-slabs: 32 > > The physical server is a dell 640 with specs below > > hw.ncpu: 104 > hw.model: Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz > > Thank you > Isaac > > > On Tue, Nov 25, 2025 at 5:13?AM Seth Van Buren < > seth.vanburen at versent.com.au> wrote: > > Home many cores/slabs are you using? > > *From: *Unbound-users on > behalf of sir izake via Unbound-users > *Date: *Tuesday, 25 November 2025 at 2:51?pm > *To: *unbound-users at lists.nlnetlabs.nl > *Subject: *How to measure cache hit resolution time in unbound 1.24.1 > > Hi > > I have installed unbound 1.24.1 on FreeBSD 14.3 OS. My cache hit rate is > 76% with over 20% coming through recursive replies. > > The median time for recursive replies is 440ms while the avg is 520ms. > > This setup has been running for over 72hrs. I expect stats to improve but > that is not happening. > > Just wanted to find out if there is a way to measure the cache hit > resolution time in a dashboard? > > Can I do anything to improve cache hit ratio? > > Can I also improve the recursive reply time? > > I am using unbound_exporter to monitor stats in grafana > > My configs have been adjusted as follows: > rrset-cache-size: 20G > msg-cache-size: 10G > cache-min-ttl: 1800 > > I am using the root hint files directly on the server for recursive lookup > and not forwarding to any public resolver > > Thank you > > Regards, > Isaac > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yorgos at nlnetlabs.nl Wed Nov 26 11:12:17 2025 From: yorgos at nlnetlabs.nl (Yorgos Thessalonikefs) Date: Wed, 26 Nov 2025 12:12:17 +0100 Subject: Unbound 1.24.2 released Message-ID: Hi, Unbound 1.24.2 is available: https://nlnetlabs.nl/downloads/unbound/unbound-1.24.2.tar.gz sha256 44e7b53e008a6dcaec03032769a212b46ab5c23c105284aa05a4f3af78e59cdb pgp https://nlnetlabs.nl/downloads/unbound/unbound-1.24.2.tar.gz.asc This security release has additional fixes for CVE-2025-11411. Promiscuous NS RRSets that complement DNS replies in the authority section can be used to trick resolvers to update their delegation information for the zone. The CVE is described here https://nlnetlabs.nl/downloads/unbound/CVE-2025-11411.txt Unbound 1.24.1 included a fix that scrubs unsolicited NS RRSets (and their respective address records) from replies mitigating the possible poison effect. Unbound 1.24.2 includes an additional fix that scrubs unsolicited NS RRSets (and their respective address records) from YXDOMAIN and non-referral nodata replies as well, mitigating the possible poison effect. We would like to thank TaoFei Guo from Peking University, Yang Luo and JianJun Chen from Tsinghua University for discovering and responsibly disclosing the partial mitigation of CVE-2025-11411 in Unbound 1.24.1. Bug Fixes: - Additional fix for CVE-2025-11411 (possible domain hijacking attack), to include YXDOMAIN and non-referral nodata answers in the mitigation as well, reported by TaoFei Guo from Peking University, Yang Luo and JianJun Chen from Tsinghua University. Best regards, -- Yorgos From seth.vanburen at versent.com.au Wed Nov 26 12:37:05 2025 From: seth.vanburen at versent.com.au (Seth Van Buren) Date: Wed, 26 Nov 2025 12:37:05 +0000 Subject: How to measure cache hit resolution time in unbound 1.24.1 In-Reply-To: References: Message-ID: I think you missed the point. See: https://nlnetlabs.nl/documentation/unbound/howto-optimise/ Set *-slabs to a power of 2 close to the num-threads value. Do this for msg-cache-slabs , rrset-cache-slabs , infra-cache-slabs and key-cache-slabs . This reduces lock contention. I service several hundred thousands of simultaneous clients with 10,000s queries per second on only 12 threads. Cache response time is less than 1ms, average response time is < 10ms. My hosts (I have 3 of them) have 16 threads/cores each, I leave 4 threads to do some server busy work like stats and logs collection. More threads doesn?t always mean better performance and in your case since your slab count is low you?re going to have a lot of lock contention. Cheers From: sir izake Date: Wednesday, 26 November 2025 at 7:35?pm To: Seth Van Buren Cc: unbound-users at lists.nlnetlabs.nl Subject: Re: How to measure cache hit resolution time in unbound 1.24.1 Hi Seth The server is dedicated for this purpose hence the high number of threads below configs are in place: num-queries-per-thread: 4096 msg-cache-size: 10G rrset-cache-size: 20G key-cache-size: 1G Thank you On Wed, Nov 26, 2025 at 3:34?AM Seth Van Buren > wrote: Your thread should be equal to or lower than the number of slabs. The thread count seems extremely high, you should not need so many. You should set num-queries-per-thread. Try 16384 Can you also paste your memory settings and Cache settings? From: sir izake > Date: Tuesday, 25 November 2025 at 8:08?pm To: Seth Van Buren > Cc: unbound-users at lists.nlnetlabs.nl > Subject: Re: How to measure cache hit resolution time in unbound 1.24.1 Hi Seth num-threads: 64 msg-cache-slabs: 32 rrset-cache-slabs: 32 infra-cache-slabs: 32 key-cache-slabs: 32 ratelimit-slabs: 32 ip-ratelimit-slabs: 32 The physical server is a dell 640 with specs below hw.ncpu: 104 hw.model: Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz Thank you Isaac On Tue, Nov 25, 2025 at 5:13?AM Seth Van Buren > wrote: Home many cores/slabs are you using? From: Unbound-users > on behalf of sir izake via Unbound-users > Date: Tuesday, 25 November 2025 at 2:51?pm To: unbound-users at lists.nlnetlabs.nl > Subject: How to measure cache hit resolution time in unbound 1.24.1 Hi I have installed unbound 1.24.1 on FreeBSD 14.3 OS. My cache hit rate is 76% with over 20% coming through recursive replies. The median time for recursive replies is 440ms while the avg is 520ms. This setup has been running for over 72hrs. I expect stats to improve but that is not happening. Just wanted to find out if there is a way to measure the cache hit resolution time in a dashboard? Can I do anything to improve cache hit ratio? Can I also improve the recursive reply time? I am using unbound_exporter to monitor stats in grafana My configs have been adjusted as follows: rrset-cache-size: 20G msg-cache-size: 10G cache-min-ttl: 1800 I am using the root hint files directly on the server for recursive lookup and not forwarding to any public resolver Thank you Regards, Isaac -------------- next part -------------- An HTML attachment was scrubbed... URL: From sirizake at gmail.com Wed Nov 26 15:34:23 2025 From: sirizake at gmail.com (sir izake) Date: Wed, 26 Nov 2025 15:34:23 +0000 Subject: How to measure cache hit resolution time in unbound 1.24.1 In-Reply-To: References: Message-ID: Thanks Seth. I'll make the adjustments and monitor performance. On Wed, Nov 26, 2025 at 12:37?PM Seth Van Buren < seth.vanburen at versent.com.au> wrote: > I think you missed the point. See: > https://nlnetlabs.nl/documentation/unbound/howto-optimise/ > > Set > *-slabs > to a power of 2 close to the > num-threads > value. Do this for > msg-cache-slabs > , > rrset-cache-slabs > , > infra-cache-slabs > and > key-cache-slabs > . This reduces lock contention. > > I service several hundred thousands of simultaneous clients with 10,000s > queries per second on only 12 threads. Cache response time is less than > 1ms, average response time is < 10ms. My hosts (I have 3 of them) have 16 > threads/cores each, I leave 4 threads to do some server busy work like > stats and logs collection. More threads doesn?t always mean better > performance and in your case since your slab count is low you?re going to > have a lot of lock contention. > > Cheers > > > *From: *sir izake > *Date: *Wednesday, 26 November 2025 at 7:35?pm > *To: *Seth Van Buren > *Cc: *unbound-users at lists.nlnetlabs.nl > *Subject: *Re: How to measure cache hit resolution time in unbound 1.24.1 > > Hi Seth > > The server is dedicated for this purpose hence the high number of threads > > below configs are in place: > > num-queries-per-thread: 4096 > msg-cache-size: 10G > rrset-cache-size: 20G > key-cache-size: 1G > > Thank you > > On Wed, Nov 26, 2025 at 3:34?AM Seth Van Buren < > seth.vanburen at versent.com.au> wrote: > > Your thread should be equal to or lower than the number of slabs. > > The thread count seems extremely high, you should not need so many. You > should set num-queries-per-thread. Try 16384 > > Can you also paste your memory settings and Cache settings? > > > *From: *sir izake > *Date: *Tuesday, 25 November 2025 at 8:08?pm > *To: *Seth Van Buren > *Cc: *unbound-users at lists.nlnetlabs.nl > *Subject: *Re: How to measure cache hit resolution time in unbound 1.24.1 > > Hi Seth > > num-threads: 64 > msg-cache-slabs: 32 > rrset-cache-slabs: 32 > infra-cache-slabs: 32 > key-cache-slabs: 32 > ratelimit-slabs: 32 > ip-ratelimit-slabs: 32 > > The physical server is a dell 640 with specs below > > hw.ncpu: 104 > hw.model: Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz > > Thank you > Isaac > > > On Tue, Nov 25, 2025 at 5:13?AM Seth Van Buren < > seth.vanburen at versent.com.au> wrote: > > Home many cores/slabs are you using? > > *From: *Unbound-users on > behalf of sir izake via Unbound-users > *Date: *Tuesday, 25 November 2025 at 2:51?pm > *To: *unbound-users at lists.nlnetlabs.nl > *Subject: *How to measure cache hit resolution time in unbound 1.24.1 > > Hi > > I have installed unbound 1.24.1 on FreeBSD 14.3 OS. My cache hit rate is > 76% with over 20% coming through recursive replies. > > The median time for recursive replies is 440ms while the avg is 520ms. > > This setup has been running for over 72hrs. I expect stats to improve but > that is not happening. > > Just wanted to find out if there is a way to measure the cache hit > resolution time in a dashboard? > > Can I do anything to improve cache hit ratio? > > Can I also improve the recursive reply time? > > I am using unbound_exporter to monitor stats in grafana > > My configs have been adjusted as follows: > rrset-cache-size: 20G > msg-cache-size: 10G > cache-min-ttl: 1800 > > I am using the root hint files directly on the server for recursive lookup > and not forwarding to any public resolver > > Thank you > > Regards, > Isaac > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew.cagney at gmail.com Wed Nov 26 19:11:25 2025 From: andrew.cagney at gmail.com (Andrew Cagney) Date: Wed, 26 Nov 2025 14:11:25 -0500 Subject: combining ub_resolve{,_async,_event}() calls on a single context? Message-ID: LTDR; don't? Hi, I'm reorganizing how Libreswan performs DNS. The old code would either: - used ub_resolve() in a separate process - used ub_resolve_event() in the IKE daemon Since the new code will all live in the IKE daemon I need to revisit how unbound is being used. (the IKE daemon has a main thread running an event-loop (libevent), and a thread pool to perform crypto) With that in mind, I've a few things I'm not clear on. - given a single context, can it be passed to any of the three calls ub_resolve{,_async,_event}() I'm fairly sure that a context created with ub_ctx_create_event() must only be passed to ub_resolve_event(); however I'm less clear about a context created using ub_ctx_create() being passed to both ub_resolve() and ub_resolve_async(). But I suspect the answer is still "don't do that". - given a single context, can it be passed to ub_resolve() (the blocking variant) from different threads? Just asking. There's a comment saying "Application not threaded", so probably not. - is the lookup cache per-context, or shared between contexts? If it's per-context, the argument for reworking the ub_resolve() calls and have everything share a single context becomes stronger - the ub_resolve_async() code creates and uses a libworker_bg() thread that does the heavy lifting (talk to internet, DNSSEC math) Is there one thread per context, one thread per resolve request, or ...? - for ub_resolve_event() does the heavy lifting (notably DNSSEC math) happen on the main thread or worker threads? Anyway, I suspect my strategy is to change the code to use ub_resolve_async() with its single worker. Andrew From sirizake at gmail.com Thu Nov 27 08:03:26 2025 From: sirizake at gmail.com (sir izake) Date: Thu, 27 Nov 2025 08:03:26 +0000 Subject: How to measure cache hit resolution time in unbound 1.24.1 In-Reply-To: References: Message-ID: Hi Seth Do you have a tool that helps you measure the cache hit resolution time? On Wed, 26 Nov 2025, 15:34 sir izake, wrote: > Thanks Seth. > > I'll make the adjustments and monitor performance. > > > > On Wed, Nov 26, 2025 at 12:37?PM Seth Van Buren < > seth.vanburen at versent.com.au> wrote: > >> I think you missed the point. See: >> https://nlnetlabs.nl/documentation/unbound/howto-optimise/ >> >> Set >> *-slabs >> to a power of 2 close to the >> num-threads >> value. Do this for >> msg-cache-slabs >> , >> rrset-cache-slabs >> , >> infra-cache-slabs >> and >> key-cache-slabs >> . This reduces lock contention. >> >> I service several hundred thousands of simultaneous clients with 10,000s >> queries per second on only 12 threads. Cache response time is less than >> 1ms, average response time is < 10ms. My hosts (I have 3 of them) have 16 >> threads/cores each, I leave 4 threads to do some server busy work like >> stats and logs collection. More threads doesn?t always mean better >> performance and in your case since your slab count is low you?re going to >> have a lot of lock contention. >> >> Cheers >> >> >> *From: *sir izake >> *Date: *Wednesday, 26 November 2025 at 7:35?pm >> *To: *Seth Van Buren >> *Cc: *unbound-users at lists.nlnetlabs.nl >> *Subject: *Re: How to measure cache hit resolution time in unbound 1.24.1 >> >> Hi Seth >> >> The server is dedicated for this purpose hence the high number of threads >> >> below configs are in place: >> >> num-queries-per-thread: 4096 >> msg-cache-size: 10G >> rrset-cache-size: 20G >> key-cache-size: 1G >> >> Thank you >> >> On Wed, Nov 26, 2025 at 3:34?AM Seth Van Buren < >> seth.vanburen at versent.com.au> wrote: >> >> Your thread should be equal to or lower than the number of slabs. >> >> The thread count seems extremely high, you should not need so many. You >> should set num-queries-per-thread. Try 16384 >> >> Can you also paste your memory settings and Cache settings? >> >> >> *From: *sir izake >> *Date: *Tuesday, 25 November 2025 at 8:08?pm >> *To: *Seth Van Buren >> *Cc: *unbound-users at lists.nlnetlabs.nl >> *Subject: *Re: How to measure cache hit resolution time in unbound 1.24.1 >> >> Hi Seth >> >> num-threads: 64 >> msg-cache-slabs: 32 >> rrset-cache-slabs: 32 >> infra-cache-slabs: 32 >> key-cache-slabs: 32 >> ratelimit-slabs: 32 >> ip-ratelimit-slabs: 32 >> >> The physical server is a dell 640 with specs below >> >> hw.ncpu: 104 >> hw.model: Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz >> >> Thank you >> Isaac >> >> >> On Tue, Nov 25, 2025 at 5:13?AM Seth Van Buren < >> seth.vanburen at versent.com.au> wrote: >> >> Home many cores/slabs are you using? >> >> *From: *Unbound-users on >> behalf of sir izake via Unbound-users >> *Date: *Tuesday, 25 November 2025 at 2:51?pm >> *To: *unbound-users at lists.nlnetlabs.nl >> *Subject: *How to measure cache hit resolution time in unbound 1.24.1 >> >> Hi >> >> I have installed unbound 1.24.1 on FreeBSD 14.3 OS. My cache hit rate is >> 76% with over 20% coming through recursive replies. >> >> The median time for recursive replies is 440ms while the avg is 520ms. >> >> This setup has been running for over 72hrs. I expect stats to improve but >> that is not happening. >> >> Just wanted to find out if there is a way to measure the cache hit >> resolution time in a dashboard? >> >> Can I do anything to improve cache hit ratio? >> >> Can I also improve the recursive reply time? >> >> I am using unbound_exporter to monitor stats in grafana >> >> My configs have been adjusted as follows: >> rrset-cache-size: 20G >> msg-cache-size: 10G >> cache-min-ttl: 1800 >> >> I am using the root hint files directly on the server for recursive >> lookup and not forwarding to any public resolver >> >> Thank you >> >> Regards, >> Isaac >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From daisuke.higashi at gmail.com Thu Nov 27 11:43:50 2025 From: daisuke.higashi at gmail.com (Daisuke HIGASHI) Date: Thu, 27 Nov 2025 20:43:50 +0900 Subject: How to measure cache hit resolution time in unbound 1.24.1 In-Reply-To: References: Message-ID: sir izake via Unbound-users : > Just wanted to find out if there is a way to measure the cache hit > resolution time in a dashboard? > Unbound has no facility to measure cache hit resolution time. A method to measure cache hit time is to make query to names which always resolved with cache hit e.g. ?dig -t NS . ?; If the resolver is too busy and queries always remain stuck in its receive queue, its response (even for cache hit) would be delayed due to the queue dwell time, and queries may even be dropped. -------------- next part -------------- An HTML attachment was scrubbed... URL: From seth.vanburen at versent.com.au Thu Nov 27 23:24:06 2025 From: seth.vanburen at versent.com.au (Seth Van Buren) Date: Thu, 27 Nov 2025 23:24:06 +0000 Subject: How to measure cache hit resolution time in unbound 1.24.1 In-Reply-To: References: Message-ID: Yes as Daisuke said, it?s a very unscientific approach to measure this. We use data from our load test rig plus some baseline network latency to arrive at estimates. Our average also includes timeouts from some exotic domains and records that do not exist which probably originate from malware and all sorts of crap on our clients devices. It?s amazing the junk that people try to access. How did your test go with the tuning already suggested, did you see any improvements? From: Unbound-users on behalf of Daisuke HIGASHI via Unbound-users Date: Thursday, 27 November 2025 at 9:45?pm To: sir izake Cc: unbound-users at lists.nlnetlabs.nl Subject: Re: How to measure cache hit resolution time in unbound 1.24.1 sir izake via Unbound-users >: Just wanted to find out if there is a way to measure the cache hit resolution time in a dashboard? Unbound has no facility to measure cache hit resolution time. A method to measure cache hit time is to make query to names which always resolved with cache hit e.g. ?dig -t NS . ?; If the resolver is too busy and queries always remain stuck in its receive queue, its response (even for cache hit) would be delayed due to the queue dwell time, and queries may even be dropped. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mjt at tls.msk.ru Sun Nov 30 13:53:56 2025 From: mjt at tls.msk.ru (Michael Tokarev) Date: Sun, 30 Nov 2025 16:53:56 +0300 Subject: configure~ in the release tarbals.. Message-ID: ..is still present. Can you get rid of it for good, please? :) Thanks, /mjt