Monitoring unbound?

Fri Nov 21 22:36:27 UTC 2025

Petr Menšík via Unbound-users wrote:
> No. I think that was question on some conference, DNS-OARC perhaps.
> 
> The proposal was what if bind9, unbound, knot-resolver and pdns-recursor
> could create the same format for their statistics. So prometheus could have
> only one statistics parser code. It might be exported to different path in
> filesystem and that should be enough. Only path and content should be
> different for different services. Format should ideally stay compatible.
> Then it would require less code as glue between statistics dashboards used
> and the DNS service itself.
> 
> I think such common format would be great. I would prefer something json
> based. I can describe only bind9 and unbound statistics. Their format is
> very different, although quite a lot numbers could be similar.
> 
> This is main statistics refactoring issue at bind9
> 
> https://gitlab.isc.org/isc-projects/bind9/-/issues/38
> 
> I am not sure where exactly did they talk about requirements for a new
> format, sorry. I think it was mentioned after some talk at some OARC
> recording, but do not remember which one.

I don't think BIND, Unbound, Knot Resolver, and PowerDNS Recursor should
generate identical statistics. It would be nice if they used a de facto
standard like Prometheus/OpenMetrics format exposed on an HTTP endpoint
so that their metrics can be scraped and ingested by modern observability
stacks. Currently Unbound requires deploying a third party daemon [0]
alongside Unbound to convert the bespoke "UBCT1" protocol (the protocol
that unbound-control speaks to the Unbound daemon) in order to ingest
Unbound's metrics into a Prometheus-compatible stack.

There are some metrics that count the number of times certain kinds of
packets occur (queries/responses by QTYPE/OPCODE/RCODE, by transport,
etc.) where you can arguably find some level of commonality between
different DNS server implementations because they are just counting
objective, externally observable events. If you are restricting your
visibility to just these externally observable properties of DNS
transactions, then perhaps it might be possible to share glue code and
"statistics dashboards" between different DNS server implementations. But
this is a fairly basic level of visibility. DNS server implementations are
going to have diverse internal architectures and implementation details
and some level of visibility (or "observability") into the health of those
internal implementation details is highly desireable.

For instance, I care very much about why Unbound might have dropped a
query from a client. It's not very useful or actionable to have a single
"number of client queries that were dropped" metric in the DNS server that
aggregates every cause together. (All this tells me is that the query got
to the server and I can exclude external possibilities like socket receive
buffer overruns from the possible causes.) You need more fine grained
metrics that let you track down what mechanism(s) resulted in the query
drops. So Unbound has been getting more fine grained metrics like [1, 2]
that help explain exactly which implementation specific mechanisms are
resulting in query drops.

It would be unreasonable to expect every implementation to have the same
metrics like the ones in [1, 2], because these are implementation-specific
details that are going to vary because different implementations take
different approaches to solving various problems. It would also be
unreasonable to just take a union of all such implementation-specific
metrics and add them to a single common format and just have
implementations omit the ones that aren't relevant to them. (Or, even
worse, have different implementations use the same metric names to mean
totally different things, or sort of similar but not really the same
things.)

So my recommendations are basically:

1) Don't innovate on the exposition format. DNS servers exist
in a universe with many other kinds of servers that have had to
deal with broadly similar issues and this is not a greenfield.
Prometheus/OpenTelemetry already exists. If you come up with a bespoke
XML, JSON, protobuf, etc. format, it will have to be converted to
something else in order to import it into modern observability stacks, so
just generate that format directly. (If you disagree, then by all means,
design an additional layer of internal abstraction and build a pluggable
module interface so you can support a multitude of different metrics
exposition formats/transports.)

2) Every vendor should come up with their own naming scheme, organization,
and definition of implementation-specific "health" metrics that fit their
own software most naturally.

3) There may be some value in regularizing across server implementations
the definitions of metrics that count externally visible properties of DNS
transactions (the QTYPEs and RCODEs, etc.). But this kind of effort should
be narrowly scoped to exclude the implementation-specific "health"
metrics.

[0]: https://github.com/letsencrypt/unbound_exporter

[1]: https://github.com/NLnetLabs/unbound/pull/1159

[2]: https://github.com/NLnetLabs/unbound/pull/1374

-- 
Robert Edmonds
edmonds at debian.org