[nsd-users] NSD as slave leaves a child process behind
Matthijs Mekking
matthijs at NLnetLabs.nl
Tue Sep 15 10:03:07 UTC 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Ville,
It may look like a child process is not killed after a reload, but isn't
the process you are referring to (20933), the xfrd process? By default,
NSD has three processes: a parent process, a child process (answering
queries) and a xfrd process. During a NSD reload, the xfrd process
should not be killed, merely the communication channels with the parent
are updated.
Of course, if you run nsdc stop, no processes should be kept alive. Is
this the case?
What does nsdc stop output?
Notice that nsdc stop && nsdc start differs from nsdc restart. The
latter attempts to start nsd after all processes are shut down.
Best regards,
Matthijs Mekking
NLnet Labs
Ville Mattila wrote:
> Hi,
>
> We've run into problems with NSD v3.2.3 (on Red Hate Enterprise Linux
> 5.4 x86_64) failing to kill one of it's children processes while NSD
> is reloading after it has received an update to a zone from master.
> Everything seems to running fine, but 'nsdc stop' and 'nsdc patch'
> etc. really don't work because the reload updates NSD the pid file but
> the child that was left behind really is handling all the queries
> and zone updates.
>
> Our NSD is a slave to some thousands of zones and because of the way we
> automatically create NSD configuration (we configure NSD to use every
> other NS of the zone as master even though typically only one of the other
> NS servers of a zone actually allows an AXFR for us), our NSD typically
> has n*10 zone transfers in SYN_SENT state destined to fail after connect
> timeout and for a lot of AXFR requests our NSD gets REFUSED response.
>
> I'd appreciate help hunting down the cause for this problem; we need to
> get 'nsdc stop && nsdc start' working to be able to restart NSD reliably
> after generating new nsd.conf automatically. (Of course I could write
> a script which kills NSD some other way instead of 'nsdc stop' but I'd
> prefer fixing NSD/nsdc/our environment.)
>
> -----
> Here's how the problem evolves on our systems:
>
> 1. Initially (after e.g. server reboot) NSD is running fine. Three nsd
> processes (with default server-count setting), one of which (pid 20925
> here) is the parent to the other two (20933 and 20934). The parent NSD
> owns all tcp LISTEN and udp sockets, and child 20933 is the only one
> trying to do some AXFRs from masters:
>
> % fgrep 'nsd[20925]' /var/log/messages
> Sep 14 14:59:50 isar nsd[20925]: nsd started (NSD 3.2.3), pid 20925
>
> % ps -ef | grep nsd.conf
> nsd 20925 1 56 14:59 ? 00:00:04 /usr/sbin/nsd -c
> /v/net/ns-secondary.funet.fi/etc/nsd/nsd-isar.conf
> nsd 20933 20925 11 14:59 ? 00:00:00 /usr/sbin/nsd -c
> /v/net/ns-secondary.funet.fi/etc/nsd/nsd-isar.conf
> nsd 20934 20925 1 14:59 ? 00:00:00 /usr/sbin/nsd -c
> /v/net/ns-secondary.funet.fi/etc/nsd/nsd-isar.conf
>
> % sudo netstat -anp | grep nsd | grep -v -E '(LISTEN|ESTABLISHED)'
> tcp 0 0 128.214.248.137:53 0.0.0.0:* LISTEN
> 20925/nsd
> tcp 0 0 193.167.245.84:53 0.0.0.0:* LISTEN
> 20925/nsd
> tcp 0 0 195.148.12.100:53 0.0.0.0:* LISTEN
> 20925/nsd
> tcp 0 0 2001:708:10:70::55:1:53 :::* LISTEN
> 20925/nsd
> tcp 0 0 2001:708:10:55::53:53 :::* LISTEN
> 20925/nsd
> udp 0 0 128.214.248.137:53 0.0.0.0:*
> 20925/nsd
> udp 0 0 193.167.245.84:53 0.0.0.0:*
> 20925/nsd
> udp 0 0 195.148.12.100:53 0.0.0.0:*
> 20925/nsd
> udp 0 0 2001:708:10:70::55:1:53 :::*
> 20925/nsd
> udp 0 0 2001:708:10:55::53:53 :::*
> 20925/nsd
>
> % sudo netstat -anp -A inet,inet6 | grep nsd | grep SYN_SENT
> (master server IP addresses obfuscated as masterX)
> tcp 0 1 128.214.248.137:57130 masterA:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:53660 masterA:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:37449 masterB:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:50400 masterC:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:48250 masterD:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:35477 masterE:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:34076 masterE:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:48529 masterE:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:48535 masterE:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:35901 masterD:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:37632 masterF:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:47745 masterE:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:32931 masterF:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:49364 masterG:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:37719 masterD:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:36117 masterF:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:37209 masterD:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:45596 masterG:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:60195 masterD:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:54687 masterE:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:54131 masterF:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:60484 masterE:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:55516 masterD:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:50900 masterF:53 SYN_SENT
> 20933/nsd
> tcp 0 1 128.214.248.137:36627 masterG:53 SYN_SENT
> 20933/nsd
>
>
> 2. Soon NSD receives an update to one of the zones and reloads, and
> updates the pid file:
>
> (from /var/log/messages:)
> Sep 14 15:01:55 isar nsd[20925]: signal received, reloading...
> Sep 14 15:01:55 isar nsd[21072]: memory recyclebin holds 535424 bytes
>
> % cat $(nsd-checkconf -o pidfile
> /v/net/ns-secondary.funet.fi/etc/nsd/nsd-isar.conf)
> 21072
>
> 3. Now observe the child pid 20933 still is running directly under init
> process (pid 1).
> The child 20933 now owns the LISTEN sockets.
>
> % ps -ef | grep nsd.conf
> nsd 20933 1 0 14:59 ? 00:00:01 /usr/sbin/nsd
> /v/net/ns-secondary.funet.fi/etc/nsd/nsd-isar.conf
> nsd 21072 1 2 15:01 ? 00:00:00 /usr/sbin/nsd
> /v/net/ns-secondary.funet.fi/etc/nsd/nsd-isar.conf
> nsd 21073 21072 1 15:01 ? 00:00:00 /usr/sbin/nsd
> /v/net/ns-secondary.funet.fi/etc/nsd/nsd-isar.conf
>
> % sudo netstat -anp | grep nsd | grep -v -E '(LISTEN|ESTABLISHED)'
> tcp 0 0 128.214.248.137:53 0.0.0.0:* LISTEN
> 20933/nsd
> tcp 0 0 193.167.245.84:53 0.0.0.0:* LISTEN
> 20933/nsd
> tcp 0 0 195.148.12.100:53 0.0.0.0:* LISTEN
> 20933/nsd
> tcp 0 0 2001:708:10:70::55:1:53 :::* LISTEN
> 20933/nsd
> tcp 0 0 2001:708:10:55::53:53 :::* LISTEN
> 20933/nsd
> udp 0 0 128.214.248.137:53 0.0.0.0:*
> 20933/nsd
> udp 0 0 193.167.245.84:53 0.0.0.0:*
> 20933/nsd
> udp 0 0 195.148.12.100:53 0.0.0.0:*
> 20933/nsd
> udp 0 0 2001:708:10:70::55:1:53 :::* 20933/nsd
> udp 0 0 2001:708:10:55::53:53 :::* 20933/nsd
>
> Steps 2-4 repeat over and over: process 20933 keeps on running and
> responding to DNS queries.
>
> I'm not sure if process 20933 should terminate in reloads but for sure
> nsdc cannot find it because nsdc depends on pidfile being correct.
> -----
>
> Regards,
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iQEcBAEBAgAGBQJKr2VrAAoJEA8yVCPsQCW5/FMIAKVpwdZzN4PcHGtTGxE/oJ/J
Wl68IWKdit43Taj6F5T120Zay2x/YvaFaA/yQqosHNgQ7HwuJGTcARRDY1RD2eyr
rOXXLo26+istlrN0EVgIeyZshTqTlZ05lFtA/EBVpLOIsGWG2CFEF8dGPkFRDPYY
BxFt6RWrfluJsuFOcMy7FS05Hczvp0iAtuoFjrJJF6R9w1RtOGakCkG2oMzePlZr
ZKe2us303sb/QFNX5XF/yCyQf7DWg+fvbuZnfdGNIjnxQN4dUQdS644P6qiDGPwy
+lSyzQQjl1MH3nMBmv/dTbADc9yJ1Ef2OTZV2H7xWxsVsWgiQF+VpSkztwGHN34=
=rWQR
-----END PGP SIGNATURE-----
More information about the nsd-users
mailing list