Issues setting up dnstap logging

Patrik Lundin patrik at sigterm.se
Sun Dec 27 18:09:08 UTC 2020


Hello,

I am curious if anyone is running a successful dnstap logging setup for
unbound and what tools you are using for solving this.

Currently I am playing around with this on a debian buster machine with
backports enabled so I can install unbound 1.12. My goal is to have binary
dnstap files being created on disk, with logfile rotation over some interval so
you can clean up older files over time.

For my initial attempt I tried using fstrm_capture since it is already
available from the "fstrm-bin" package in buster and also supports logfile
rotation. I have noticed however that fstrm_capture will often crash when
stopping unbound.

Based on the man page example I start the process like so:
```
sudo -u unbound /usr/bin/fstrm_capture -ddddd -t protobuf:dnstap.Dnstap \
                 -u /var/lib/unbound/dnstap.sock \
                 -w /var/log/dnstap/dnstap-%F-%T.fstrm \
                 -s 3600 --gmtime
```

This is coupled with the following dnstap-related config for unbound:
```
dnstap:
    dnstap-enable: yes
    dnstap-socket-path: /var/lib/unbound/dnstap.sock
    dnstap-log-client-query-messages: yes
    dnstap-log-client-response-messages: yes
```

Given that unbound is not running at the startup of fstrm_capture the following
is logged:
```
# ./run-fstrm_capture.sh
fstrm_capture: opening Unix socket path /var/lib/unbound/dnstap.sock
fstrm_capture: opened output file /var/log/dnstap/dnstap-2020-12-27-17:05:32.fstrm
```

When starting unbound the following output is observed:
```
fstrm_capture: accepted new connection fd 8
fstrm_capture: connection fd 8: reading control frame (42 bytes): "\x00\x00\x00\x00\x00\x00\x00\"\x00\x00\x00\x04\x00\x00\x00\x01\x00\x00\x00\x16protobuf:dnstap.Dnstap"
fstrm_capture: connection fd 8: received FSTRM_CONTROL_READY (4)
fstrm_capture: connection fd 8: CONTENT_TYPE [1/1] (22 bytes): "protobuf:dnstap.Dnstap"
fstrm_capture: connection fd 8: sending FSTRM_CONTROL_ACCEPT (1)
fstrm_capture: connection fd 8: writing frame (42) bytes: "\x00\x00\x00\x00\x00\x00\x00\"\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x16protobuf:dnstap.Dnstap"
fstrm_capture: connection fd 8: reading control frame (42 bytes): "\x00\x00\x00\x00\x00\x00\x00\"\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00\x16protobuf:dnstap.Dnstap"
fstrm_capture: connection fd 8: received FSTRM_CONTROL_START (2)
```

Looks good so far, and querying the server for nlnetlabs.nl will result in the
following output:
```
fstrm_capture: connection fd 8: processing data frame (90 bytes)
fstrm_capture: connection fd 8: processing data frame (94 bytes)
```

The problem I have seen here is if I stop (or restart) unbound, the
fstrm_capture process will often die, outputting the following before exiting:
```
fstrm_capture: connection fd 8: reading control frame (12 bytes): "\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x03"
fstrm_capture: connection fd 8: received FSTRM_CONTROL_STOP (3)
fstrm_capture: connection fd 8: sending FSTRM_CONTROL_FINISH (5)
fstrm_capture: connection fd 8: writing frame (12) bytes: "\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x05"
```

I thought this was always the case, but while collecting output for this email
I did see an instance where it did not die, logging some more and remaining running:
```
fstrm_capture: connection fd 8: reading control frame (12 bytes): "\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x03"
fstrm_capture: connection fd 8: received FSTRM_CONTROL_STOP (3)
fstrm_capture: connection fd 8: sending FSTRM_CONTROL_FINISH (5)
fstrm_capture: connection fd 8: writing frame (12) bytes: "\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x05"
fstrm_capture: connection fd 8: closing (read 6 frames, 552 bytes)
```

I was confused by the exiting beaviour at first, it looks like the process just
disappears for no obvious reason but after attaching gdb I saw the following:
````
fstrm_capture: connection fd 8: reading control frame (12 bytes): "\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x03"
fstrm_capture: connection fd 8: received FSTRM_CONTROL_STOP (3)
fstrm_capture: connection fd 8: sending FSTRM_CONTROL_FINISH (5)
fstrm_capture: connection fd 8: writing frame (12) bytes: "\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x05"

Program received signal SIGPIPE, Broken pipe.
0x00007ffff7c74504 in __GI___writev (fd=8, iov=0x7fffffffe010, iovcnt=1) at ../sysdeps/unix/sysv/linux/writev.c:26
26      ../sysdeps/unix/sysv/linux/writev.c: Permission denied.
```

So it appears that when fstrm_capture attempts to write the FSTRM_CONTROL_FINISH frame, unbound will
often have already gone away, leading to an unhandled SIGPIPE which probably
defaults to terminating the process, based on signal(7):
```
Signal     Value     Action   Comment
──────────────────────────────────────────────────────────────────────
[...]
SIGPIPE      13       Term    Broken pipe: write to pipe with no
                                     readers; see pipe(7)
```

My assumption that the signal is unhandled is based on setup_event_pool() at
https://github.com/farsightsec/fstrm/blob/master/src/fstrm_capture.c#L1147 having
explicit handlers for SIGHUP and SIGUSR1, but no mention of SIGPIPE.

Looking at an alternative reader I also tried the golang based dnstap
application from https://github.com/dnstap/golang-dnstap, running it like so:
```
sudo -u unbound /usr/local/bin/dnstap \
                 -u /var/lib/unbound/dnstap.sock \
                 -w /var/log/dnstap/golang-dnstap.fstrm
```

It does indeed also see SIGPIPEs but manages to survive them. Starting it:
```
# ./run-dnstap.sh
dnstap: opened input socket /var/lib/unbound/dnstap.sock
```

... starting unbound:
```
2020/12/27 18:42:12 /var/lib/unbound/dnstap.sock: accepted connection 1
```

... stopping unbound:
```
2020/12/27 18:43:08 FrameStreamInput: Read error: write unix /var/lib/unbound/dnstap.sock->@: write: broken pipe
2020/12/27 18:43:08 /var/lib/unbound/dnstap.sock: closed connection 1
```

... starting it again:
```
2020/12/27 18:43:40 /var/lib/unbound/dnstap.sock: accepted connection 2
```

... and restarting unbound again for the combined experience:
```
2020/12/27 18:44:06 FrameStreamInput: Read error: write unix /var/lib/unbound/dnstap.sock->@: write: broken pipe
2020/12/27 18:44:06 /var/lib/unbound/dnstap.sock: closed connection 2
2020/12/27 18:44:06 /var/lib/unbound/dnstap.sock: accepted connection 3
```

So it appears more resilient, but unfortunately it has no native support for
rotating the output files, and I am not sure of a proper way of doing this while
not missing queries.

For this reason I am wondering if someone out there have already solved these
issues and have some pointers how to properly set this up. I guess it is also
possible that the SIGPIPE problem indicates a shortcoming either in
fstrm_capture or unbounds shutdown handling, but I am not sure who if anyone is
at fault there.

Regards,
Patrik Lundin


More information about the Unbound-users mailing list