Unbound 1.9.2 deadlock?

qyb qiuyingbo at gmail.com
Fri Mar 20 13:30:26 UTC 2020


Recently I got the same 100% CPU behavior as yours with unbound 1.10 and
1.9.6. Did you resolve the problem?


Patrik Lundin via Unbound-users <unbound-users at nlnetlabs.nl> 于2019年9月23日周一
下午5:12写道:

> Hello,
>
> Today I noticed a machine running Unbound 1.9.2 had gotten itself into
> an odd state.
>
> Running "unbound-control status" would just hang, and even after
> removing all client queries from the machine one of the CPUs in the
> server was still fully busy in 100% user time in top:
> ===
> %Cpu4  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,
> 0.0 st
> ===
>
> Keep in mind this is on a machine that except for some monitoring queries
> are
> seeing no traffic at this point:
> ===
> PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
> 5447 unbound   20   0   14.8g  14.2g   4320 S 100.0 22.7   3742:27 unbound
> ===
>
> strace was fairly silent, which i guess makes sense since all time is
> spent in user space according to top:
> ===
> # strace -f -p 5447
> strace: Process 5447 attached with 16 threads
> [pid  5489] epoll_wait(130,  <unfinished ...>
> [pid  5488] epoll_wait(133,  <unfinished ...>
> [pid  5487] epoll_wait(132,  <unfinished ...>
> [pid  5486] epoll_wait(143,  <unfinished ...>
> [pid  5485] epoll_wait(126,  <unfinished ...>
> [pid  5484] epoll_wait(127,  <unfinished ...>
> [pid  5483] epoll_wait(134,  <unfinished ...>
> [pid  5482] epoll_wait(121,  <unfinished ...>
> [pid  5481] epoll_wait(120,  <unfinished ...>
> [pid  5480] epoll_wait(115,  <unfinished ...>
> [pid  5479] epoll_wait(114,  <unfinished ...>
> [pid  5477] epoll_wait(105,  <unfinished ...>
> [pid  5476] epoll_wait(102,  <unfinished ...>
> [pid  5475] epoll_wait(104,  <unfinished ...>
> [pid  5447] read(70,
> ===
>
> I do find that read to be a bit suspicious because I am not seeing anything
> equivalent on another machine that is not having any problems, though it
> is not
> clear to me what it is communicating with:
> ===
> # lsof -n -P -p 5447
> [...]
> unbound 5447 unbound   70u     unix 0xffff9090b9832400      0t0    20234
> socket
> [...]
> ===
>
> It appears being unbound talking to itself:
> ===
> # ss -xp | grep 20234
> u_str  ESTAB      0      0       * 20235                 * 20234
>        users:(("unbound",pid=5447,fd=71))
> u_str  ESTAB      0      0       * 20234                 * 20235
>        users:(("unbound",pid=5447,fd=70))
> ===
>
> Too me this feels like some race condition might have caused something to
> spin
> uncontrollably in the process.
>
> Looking at the process with GDB I can see the following for that read()'ing
> thread (thread apply all bt):
> ===
> Thread 1 (Thread 0x7fa25fada840 (LWP 5447)):
> #0  0x00007fa25e87a6fd in read () from /lib64/libpthread.so.0
> #1  0x0000560e3305f285 in tube_read_msg ()
> #2  0x0000560e33014d81 in server_stats_obtain ()
> #3  0x0000560e3300f794 in do_stats.isra.12 ()
> #4  0x0000560e33011d18 in execute_cmd ()
> #5  0x0000560e33013fb8 in handle_req.isra.18 ()
> #6  0x0000560e330140d0 in remote_control_callback ()
> #7  0x00007fa25f423a14 in event_base_loop () from /lib64/libevent-2.0.so.5
> #8  0x0000560e330b140c in comm_base_dispatch ()
> #9  0x0000560e3300d8c1 in daemon_fork ()
> #10 0x0000560e3300942f in main ()
> ===
>
> All remaining threads are sitting in epoll_wait except for this one:
> ===
> Thread 13 (Thread 0x7fa25ad46700 (LWP 5478)):
> #0  0x00007fa25f4219e2 in event_queue_insert () from
> /lib64/libevent-2.0.so.5
> #1  0x00007fa25f421c24 in event_active_nolock () from
> /lib64/libevent-2.0.so.5
> #2  0x00007fa25f424426 in event_base_loop () from /lib64/libevent-2.0.so.5
> #3  0x0000560e330b140c in comm_base_dispatch ()
> #4  0x0000560e3300cf6f in thread_start ()
> #5  0x00007fa25e873dd5 in start_thread () from /lib64/libpthread.so.0
> #6  0x00007fa25e59d02d in clone () from /lib64/libc.so.6
> ===
>
> Though if this is related I dont know at this point.
>
> Anyone else seeing something like this? Any guesses what might be wrong?
>
> ==
> Patrik Lundin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nlnetlabs.nl/pipermail/unbound-users/attachments/20200320/5da8d685/attachment.htm>


More information about the Unbound-users mailing list