Unbound memory resource consumption?
Yuri
yvoinov at gmail.com
Thu Mar 13 17:09:50 UTC 2025
I suspect there may be a leak in the DoH library. Not in unbound itself.
I have had it running in uptime for years without any signs of leaks.
However, I had a precedent for a leak in one of the setups with a DoH
library (third-party).
13.03.2025 21:50, Havard Eidnes via Unbound-users пишет:
> Hi,
>
> and thanks for the feedback, the general advice, and the pointer to
> jemalloc. I may look into that a bit later.
>
>
> However, in the mean time I have come to the conclusion that there
> may be a correlation between me enabling DoH and DoT and using RFC
> 9462 to direct clients which probe for _dns.resolver.arpa to use the
> DoH and/or DoT endpoints on the one hand, and on the other hand what
> really does look like a massive memory leak in unbound. If that is
> true, which malloc() you use should not make much of a difference.
>
>
> To test this hypothesis, I turned off DoH and DoT (diff to config
> attached below, it was only turned on about last month), and also
> stopped serving resolver.arpa, and then restarted unbound. Here are
> a few "top" displays taken over the span of a few hours. First
> after this config change:
>
> load averages: 0.26, 0.20, 0.25; up 6+00:57:31 14:24:00
> 79 processes: 76 sleeping, 1 stopped, 2 on CPU
> CPU states: 4.5% user, 0.0% nice, 2.2% system, 1.0% interrupt, 92.2% idle
> Memory: 2702M Act, 7948K Inact, 17M Wired, 27M Exec, 2367M File, 17G Free
> Swap: 14G Total, 32M Used, 14G Free / Pools: 3149M Used / Network: 1574K In, 16
>
> PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
> 14982 unbound 43 0 398M 268M CPU/2 6:55 30.22% 30.22% unbound
>
>
> load averages: 0.13, 0.17, 0.21; up 6+01:49:28 15:15:57
> 79 processes: 77 sleeping, 1 stopped, 1 on CPU
> CPU states: 2.8% user, 0.0% nice, 2.0% system, 0.6% interrupt, 94.5% idle
> Memory: 2847M Act, 11M Inact, 17M Wired, 27M Exec, 2367M File, 17G Free
> Swap: 14G Total, 32M Used, 14G Free / Pools: 3149M Used / Network: 1234K In, 13
>
> PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
> 14982 unbound 85 0 544M 417M kqueue/2 18:13 38.23% 38.23% unbound
>
>
> load averages: 0.22, 0.11, 0.10; up 6+03:55:58 17:22:27
> 90 processes: 87 sleeping, 1 stopped, 2 on CPU
> CPU states: 1.2% user, 0.0% nice, 1.1% system, 0.2% interrupt, 97.3% idle
> Memory: 3040M Act, 18M Inact, 17M Wired, 27M Exec, 2367M File, 17G Free
> Swap: 14G Total, 32M Used, 14G Free / Pools: 3149M Used / Network: 648K In, 700
>
> PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
> 14982 unbound 43 0 738M 604M CPU/2 38:45 3.61% 3.61% unbound
>
>
> If we compare this to what I experienced with these options turned
> on and a number of DoH / DoT clients using those endpoints, quoting
> from yesterday's e-mail:
>
> load averages: 0.86, 0.94, 0.92; up 5+00:58:04 14:24:33
> 86 processes: 83 sleeping, 1 stopped, 2 on CPU
> CPU states: 14.8% user, 0.0% nice, 1.3% system, 0.8% interrupt, 83.0% idle
> Memory: 3035M Act, 68M Inact, 17M Wired, 21M Exec, 14M File, 17G Free
> Swap: 14G Total, 38M Used, 14G Free / Pools: 2885M Used / Network: 1322K In, 1906K Out
>
> PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
> 14678 unbound 40 0 5408M 3033M CPU/2 183:17 78.47% 78.47% unbound
>
>
> load averages: 0.52, 0.53, 0.52; up 5+02:22:23 15:48:52
> 85 processes: 82 sleeping, 1 stopped, 2 on CPU
> CPU states: 11.4% user, 0.0% nice, 1.8% system, 1.0% interrupt, 85.7% idle
> Memory: 3815M Act, 81M Inact, 17M Wired, 21M Exec, 14M File, 16G Free
> Swap: 14G Total, 38M Used, 14G Free / Pools: 2885M Used / Network: 1509K In, 19
>
> PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
> 14678 unbound 84 0 6863M 3825M kqueue/0 236:12 39.55% 39.55% unbound
>
>
> load averages: 0.19, 0.35, 0.41; up 5+04:50:24 18:16:53
> 85 processes: 1 runnable, 82 sleeping, 1 stopped, 1 on CPU
> CPU states: 11.3% user, 0.0% nice, 1.2% system, 0.0% interrupt, 87.4% idle
> Memory: 5085M Act, 99M Inact, 17M Wired, 21M Exec, 14M File, 15G Free
> Swap: 14G Total, 38M Used, 14G Free / Pools: 2886M Used / Network: 79G In, 107G
>
> PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
> 14678 unbound 85 0 9358M 5118M RUN/1 319:53 29.30% 29.30% unbound
>
>
> You'll notice the difference is quite stark.
>
> Not only is the CPU time much lower (OK, crypto costs, I guess), but
> also the trajectory of the virtual size is vastly different:
>
> 5408M -> 6863M (1:24h later) -> 9358M (3:52h after 0th measurement)
>
> compared to
>
> 398M -> 544M (51m later) -> 738M (2:58h after 0th measurement)
>
>
> And according to "unbound-control stats" the query rate is
> comparable to what it was yesterday.
>
>
> So I suspect there is a serious memory leak, possibly in unbound,
> related to the code which does DoH and/or DoT handling.
>
>
> Diff to our unbound.conf compared to yesterday attached.
>
>
> Regards,
>
> - Håvard
>
> rcsdiff -u unbound.conf
> ===================================================================
> RCS file: RCS/unbound.conf,v
> retrieving revision 1.9
> diff -u -r1.9 unbound.conf
> --- unbound.conf 2025/03/03 16:25:44 1.9
> +++ unbound.conf 2025/03/13 12:53:24
> @@ -12,27 +12,27 @@
> # 853 = DNS-over-TLS
> # 443 = DNS-over-HTTPS
> interface: 158.38.0.2
> - interface: 158.38.0.2 at 443
> - interface: 158.38.0.2 at 853
> +# interface: 158.38.0.2 at 443
> +# interface: 158.38.0.2 at 853
> interface: 2001:700:0:ff00::2
> - interface: 2001:700:0:ff00::2 at 443
> - interface: 2001:700:0:ff00::2 at 853
> +# interface: 2001:700:0:ff00::2 at 443
> +# interface: 2001:700:0:ff00::2 at 853
> interface: 158.38.0.169
> - interface: 158.38.0.169 at 443
> - interface: 158.38.0.169 at 853
> +# interface: 158.38.0.169 at 443
> +# interface: 158.38.0.169 at 853
> interface: 2001:700:0:503::c253
> - interface: 2001:700:0:503::c253 at 443
> - interface: 2001:700:0:503::c253 at 853
> +# interface: 2001:700:0:503::c253 at 443
> +# interface: 2001:700:0:503::c253 at 853
> interface: 127.0.0.1
> interface: ::1
>
> # TLS key and certificate
> - tls-service-key: "/usr/pkg/etc/unbound/dns-resolver2-key.pem"
> - tls-service-pem: "/usr/pkg/etc/unbound/dns-resolver2-cert.pem"
> - tls-cert-bundle: "/etc/openssl/certs/ca-certificates.crt"
> +# tls-service-key: "/usr/pkg/etc/unbound/dns-resolver2-key.pem"
> +# tls-service-pem: "/usr/pkg/etc/unbound/dns-resolver2-cert.pem"
> +# tls-cert-bundle: "/etc/openssl/certs/ca-certificates.crt"
>
> # Enable DNS-over-HTTPS (doh):
> - https-port: 443
> +# https-port: 443
>
> # These need tuning away from defaults;
> # the defaults not suitable for TCP-heavy workloads:
> @@ -988,9 +988,9 @@
> # for-upstream: yes
> # zonefile: "example.org.zone"
>
> - auth-zone:
> - name: resolver.arpa
> - zonefile: "pz/resolver.arpa"
> +# auth-zone:
> +# name: resolver.arpa
> +# zonefile: "pz/resolver.arpa"
>
> # Views
> # Create named views. Name must be unique. Map views to requests using
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nlnetlabs.nl/pipermail/unbound-users/attachments/20250313/0d89d379/attachment-0001.htm>
More information about the Unbound-users
mailing list