Monitoring Unbound

Wed Nov 23 19:55:43 UTC 2016

On 23 Nov 2016, at 0:49, Jaap Akkerhuis via Unbound-users wrote:

>  Alexander via Unbound-users writes:
>
>>      Hi to every one, can you help to monitor unbound dns with cacti?
>> I'm tried to set up unbound and cacti, but the graphs are empty. I'm
>> installed Dmitriy Demidov package.
>
> Once I set-up cacti to do this, but I'm not really happy with that.
>
>>      Can you tell me others tools for monitoring dns queues? Some 
>> tips
>> for monitoring DNS?
>
> I really prefer using munin. See the user contributed directory.

[snip]

I know it’s not a direct answer to the top part of the original 
question, but perhaps it does answer the second part about monitoring 
queues.  We’ve recently created an exporter for Unbound resolver for 
importation into Prometheus, which seems to work quite well. We then use 
Grafana to extract and visualize information from Prometheus. Building 
charts once you get the hang of the query language is quite easy, and 
allows on-the-fly regeneration of data visualization and complex 
comparisons/aggregations if you have multiple servers, locations, or 
services. Here is an example chart that took about 30 seconds to build.  
There are also monitoring components for Prometheus and/or Grafana which 
can generate alerts based on metrics in a more complex way other than 
just visualization, but that perhaps is outside the scope of this 
thread. There are a number of tools for importing other system-level 
data into Prometheus, and it may be a good idea to investigate those 
other components to compliment or replace your existing monitoring 
systems if they do what you need. It is not trivial to learn - the query 
language is mostly unlike SQL, and there are quite a few ways to fail 
silently with what seem to be legitimate queries, but if you know the 
ground truth of one system you can start iteratively trying to draw 
graphs until you figure out the right way to do it.

If there is interest, we can try to work on getting the exporter we 
wrote in a condition where it could be provided in the contrib 
directory. It uses the “push gateway” method, which is not ideal but 
does work well enough. (Note: “Prometheus Unbound” is also a novel 
by Percy Bysshe Shelley, which makes keyword searching for prior work on 
this a bit difficult, so apologies if someone has already done this 
project.  :-)

Prometheus overview:

To give an example of how a graph is built, this is the simplest query 
that I performed to get the component of the chart that generates the 
“A” QTYPE component line. I just cut/pasted this into a number of 
other queries in the same graph to create the other lines, replacing 
“A” with “AAA”, “MX”, etc.  This aggregates all of the 
Unbound servers I am running (I have many) with the “sum” command, 
then uses the “irate” command which shows change over time, with a 
time interval of 1 minute.

sum(irate(unbound_num_query_type_A[1m]))

I then specified that this is stacked chart, percentage-measured, with 
60% as the lower bound.  I could command-click any of the labels shown 
and they would disappear from the graph and it would be re-drawn without 
that statistic instantly. Alternately, I could click on just one of the 
labels and only that graph line would be shown, re-drawing instantly.

A more complex query, limiting to systems that are tagged with 
“prod” (vs. “dev”) and limiting to specific POPs is shown below.
The “env” and “loc” tags are made up by us, and the contents of 
those tags are set on the remote server before the metrics are 
collected.  This allows arbitrary tagging of each metric so that it is 
possible to filter (think of it as a modified “SELECT WHERE” 
statement.)  The $POP string specification (created by us, again another 
arbitrary tag name) is consumed by Grafana using a concept called 
“templates”, which puts a pull-down list at the top of the graph 
page with a list of all of the POPs we have.  I can then select one OR 
MORE POPs and the system will automatically aggregate all the data 
across all those metrics and display it. I could put other filters in 
here that would be parsed at the moment the graph is drawn.

sum(irate(unbound_num_query_type_A{env="prod",loc=~"$POP"}[1m]))

In summary: Once you start putting your monitoring data into a TSDB or 
TSDB-ish system like Prometheus (or InfluxDB, or OpenTSDB) and creating 
visualizations with Grafana, you will wonder how you possibly survived 
without it.  Even just using the most basic features is a huge win over 
older systems, in my opinion, and moving up into the automation methods 
and alerting methods as you get more experience is another win. If 
you’re looking for a short intro to Prometheus, see the following 
presentation from Monitorama 2015 by Jamie Wilkinson.

Video: https://vimeo.com/131581353
Slides: 
https://docs.google.com/presentation/d/1X1rKozAUuF2MVc1YXElFWq9wkcWv3Axdldl8LOH9Vik/edit#slide=id.ga150a40c0_0_193

If you’re looking for an introduction to Grafana, there are many - 
Google will be a better guide than I.

JT

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2016-11-23 at 10.28.43 AM.png
Type: image/png
Size: 76669 bytes
Desc: not available
URL: <http://lists.nlnetlabs.nl/pipermail/unbound-users/attachments/20161123/ae36986c/attachment.png>