Monitoring Unbound
John Todd
jtodd at loligo.com
Wed Nov 23 19:55:43 UTC 2016
On 23 Nov 2016, at 0:49, Jaap Akkerhuis via Unbound-users wrote:
> Alexander via Unbound-users writes:
>
>> Hi to every one, can you help to monitor unbound dns with cacti?
>> I'm tried to set up unbound and cacti, but the graphs are empty. I'm
>> installed Dmitriy Demidov package.
>
> Once I set-up cacti to do this, but I'm not really happy with that.
>
>> Can you tell me others tools for monitoring dns queues? Some
>> tips
>> for monitoring DNS?
>
> I really prefer using munin. See the user contributed directory.
[snip]
I know it’s not a direct answer to the top part of the original
question, but perhaps it does answer the second part about monitoring
queues. We’ve recently created an exporter for Unbound resolver for
importation into Prometheus, which seems to work quite well. We then use
Grafana to extract and visualize information from Prometheus. Building
charts once you get the hang of the query language is quite easy, and
allows on-the-fly regeneration of data visualization and complex
comparisons/aggregations if you have multiple servers, locations, or
services. Here is an example chart that took about 30 seconds to build.
There are also monitoring components for Prometheus and/or Grafana which
can generate alerts based on metrics in a more complex way other than
just visualization, but that perhaps is outside the scope of this
thread. There are a number of tools for importing other system-level
data into Prometheus, and it may be a good idea to investigate those
other components to compliment or replace your existing monitoring
systems if they do what you need. It is not trivial to learn - the query
language is mostly unlike SQL, and there are quite a few ways to fail
silently with what seem to be legitimate queries, but if you know the
ground truth of one system you can start iteratively trying to draw
graphs until you figure out the right way to do it.
If there is interest, we can try to work on getting the exporter we
wrote in a condition where it could be provided in the contrib
directory. It uses the “push gateway” method, which is not ideal but
does work well enough. (Note: “Prometheus Unbound” is also a novel
by Percy Bysshe Shelley, which makes keyword searching for prior work on
this a bit difficult, so apologies if someone has already done this
project. :-)
Prometheus overview:
To give an example of how a graph is built, this is the simplest query
that I performed to get the component of the chart that generates the
“A” QTYPE component line. I just cut/pasted this into a number of
other queries in the same graph to create the other lines, replacing
“A” with “AAA”, “MX”, etc. This aggregates all of the
Unbound servers I am running (I have many) with the “sum” command,
then uses the “irate” command which shows change over time, with a
time interval of 1 minute.
sum(irate(unbound_num_query_type_A[1m]))
I then specified that this is stacked chart, percentage-measured, with
60% as the lower bound. I could command-click any of the labels shown
and they would disappear from the graph and it would be re-drawn without
that statistic instantly. Alternately, I could click on just one of the
labels and only that graph line would be shown, re-drawing instantly.
A more complex query, limiting to systems that are tagged with
“prod” (vs. “dev”) and limiting to specific POPs is shown below.
The “env” and “loc” tags are made up by us, and the contents of
those tags are set on the remote server before the metrics are
collected. This allows arbitrary tagging of each metric so that it is
possible to filter (think of it as a modified “SELECT WHERE”
statement.) The $POP string specification (created by us, again another
arbitrary tag name) is consumed by Grafana using a concept called
“templates”, which puts a pull-down list at the top of the graph
page with a list of all of the POPs we have. I can then select one OR
MORE POPs and the system will automatically aggregate all the data
across all those metrics and display it. I could put other filters in
here that would be parsed at the moment the graph is drawn.
sum(irate(unbound_num_query_type_A{env="prod",loc=~"$POP"}[1m]))
In summary: Once you start putting your monitoring data into a TSDB or
TSDB-ish system like Prometheus (or InfluxDB, or OpenTSDB) and creating
visualizations with Grafana, you will wonder how you possibly survived
without it. Even just using the most basic features is a huge win over
older systems, in my opinion, and moving up into the automation methods
and alerting methods as you get more experience is another win. If
you’re looking for a short intro to Prometheus, see the following
presentation from Monitorama 2015 by Jamie Wilkinson.
Video: https://vimeo.com/131581353
Slides:
https://docs.google.com/presentation/d/1X1rKozAUuF2MVc1YXElFWq9wkcWv3Axdldl8LOH9Vik/edit#slide=id.ga150a40c0_0_193
If you’re looking for an introduction to Grafana, there are many -
Google will be a better guide than I.
JT
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2016-11-23 at 10.28.43 AM.png
Type: image/png
Size: 76669 bytes
Desc: not available
URL: <http://lists.nlnetlabs.nl/pipermail/unbound-users/attachments/20161123/ae36986c/attachment.png>
More information about the Unbound-users
mailing list