[nsd-users] NSD as slave leaves a child process behind

Matthijs Mekking matthijs at NLnetLabs.nl
Tue Sep 15 10:03:07 UTC 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Ville,

It may look like a child process is not killed after a reload, but isn't
the process you are referring to (20933), the xfrd process? By default,
NSD has three processes: a parent process, a child process (answering
queries) and a xfrd process. During a NSD reload, the xfrd process
should not be killed, merely the communication channels with the parent
are updated.

Of course, if you run nsdc stop, no processes should be kept alive. Is
this the case?

What does nsdc stop output?

Notice that nsdc stop && nsdc start differs from nsdc restart. The
latter attempts to start nsd after all processes are shut down.


Best regards,

Matthijs Mekking
NLnet Labs

Ville Mattila wrote:
> Hi,
> 
> We've run into problems with NSD v3.2.3 (on Red Hate Enterprise Linux
> 5.4 x86_64) failing to kill one of it's children processes while NSD
> is reloading after it has received an update to a zone from master.
> Everything seems to running fine, but 'nsdc stop' and 'nsdc patch'
> etc. really don't work because the reload updates NSD the pid file but
> the child that was left behind really is handling all the queries
> and zone updates.
> 
> Our NSD is a slave to some thousands of zones and because of the way we
> automatically create NSD configuration (we configure NSD to use every
> other NS of the zone as master even though typically only one of the other
> NS servers of a zone actually allows an AXFR for us), our NSD typically
> has n*10 zone transfers in SYN_SENT state destined to fail after connect
> timeout and for a lot of AXFR requests our NSD gets REFUSED response.
> 
> I'd appreciate help hunting down the cause for this problem;  we need to
> get 'nsdc stop && nsdc start' working to be able to restart NSD reliably
> after generating new nsd.conf automatically.  (Of course I could write
> a script which kills NSD some other way instead of 'nsdc stop' but I'd
> prefer fixing NSD/nsdc/our environment.)
> 
> -----
> Here's how the problem evolves on our systems:
> 
> 1. Initially (after e.g. server reboot) NSD is running fine.  Three nsd
> processes (with default server-count setting), one of which (pid 20925
> here) is the parent to the other two (20933 and 20934).  The parent NSD
> owns all tcp LISTEN and udp sockets, and child 20933 is the only one
> trying to do some AXFRs from masters:
> 
>   % fgrep 'nsd[20925]' /var/log/messages
>   Sep 14 14:59:50 isar nsd[20925]: nsd started (NSD 3.2.3), pid 20925
> 
>   % ps -ef | grep nsd.conf
>   nsd      20925     1 56 14:59 ?        00:00:04 /usr/sbin/nsd -c
> /v/net/ns-secondary.funet.fi/etc/nsd/nsd-isar.conf
>   nsd      20933 20925 11 14:59 ?        00:00:00 /usr/sbin/nsd -c
> /v/net/ns-secondary.funet.fi/etc/nsd/nsd-isar.conf
>   nsd      20934 20925  1 14:59 ?        00:00:00 /usr/sbin/nsd -c
> /v/net/ns-secondary.funet.fi/etc/nsd/nsd-isar.conf
> 
>   % sudo netstat -anp | grep nsd | grep -v -E '(LISTEN|ESTABLISHED)'
>   tcp        0      0 128.214.248.137:53          0.0.0.0:*      LISTEN 
> 20925/nsd
>   tcp        0      0 193.167.245.84:53           0.0.0.0:*      LISTEN 
> 20925/nsd
>   tcp        0      0 195.148.12.100:53           0.0.0.0:*      LISTEN 
> 20925/nsd
>   tcp        0      0 2001:708:10:70::55:1:53     :::*           LISTEN 
> 20925/nsd
>   tcp        0      0 2001:708:10:55::53:53       :::*           LISTEN 
> 20925/nsd
>   udp        0      0 128.214.248.137:53          0.0.0.0:*             
> 20925/nsd
>   udp        0      0 193.167.245.84:53           0.0.0.0:*             
> 20925/nsd
>   udp        0      0 195.148.12.100:53           0.0.0.0:*             
> 20925/nsd
>   udp        0      0 2001:708:10:70::55:1:53     :::*                  
> 20925/nsd
>   udp        0      0 2001:708:10:55::53:53       :::*                  
> 20925/nsd
> 
>   % sudo netstat -anp -A inet,inet6 | grep nsd | grep SYN_SENT
>   (master server IP addresses obfuscated as masterX)
>   tcp        0      1 128.214.248.137:57130       masterA:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:53660       masterA:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:37449       masterB:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:50400       masterC:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:48250       masterD:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:35477       masterE:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:34076       masterE:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:48529       masterE:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:48535       masterE:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:35901       masterD:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:37632       masterF:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:47745       masterE:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:32931       masterF:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:49364       masterG:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:37719       masterD:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:36117       masterF:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:37209       masterD:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:45596       masterG:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:60195       masterD:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:54687       masterE:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:54131       masterF:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:60484       masterE:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:55516       masterD:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:50900       masterF:53 SYN_SENT   
> 20933/nsd
>   tcp        0      1 128.214.248.137:36627       masterG:53 SYN_SENT   
> 20933/nsd
> 
> 
> 2. Soon NSD receives an update to one of the zones and reloads, and
> updates the pid file:
> 
>   (from /var/log/messages:)
>   Sep 14 15:01:55 isar nsd[20925]: signal received, reloading...
>   Sep 14 15:01:55 isar nsd[21072]: memory recyclebin holds 535424 bytes
> 
>   % cat $(nsd-checkconf -o pidfile
> /v/net/ns-secondary.funet.fi/etc/nsd/nsd-isar.conf)
>   21072
> 
> 3. Now observe the child pid 20933 still is running directly under init
> process (pid 1).
> The child 20933 now owns the LISTEN sockets.
> 
>   % ps -ef | grep nsd.conf
>   nsd      20933     1  0 14:59 ?        00:00:01 /usr/sbin/nsd
> /v/net/ns-secondary.funet.fi/etc/nsd/nsd-isar.conf
>   nsd      21072     1  2 15:01 ?        00:00:00 /usr/sbin/nsd
> /v/net/ns-secondary.funet.fi/etc/nsd/nsd-isar.conf
>   nsd      21073 21072  1 15:01 ?        00:00:00 /usr/sbin/nsd
> /v/net/ns-secondary.funet.fi/etc/nsd/nsd-isar.conf
> 
>   % sudo netstat -anp | grep nsd | grep -v -E '(LISTEN|ESTABLISHED)'
>   tcp        0      0 128.214.248.137:53          0.0.0.0:*    LISTEN   
> 20933/nsd
>   tcp        0      0 193.167.245.84:53           0.0.0.0:*    LISTEN   
> 20933/nsd
>   tcp        0      0 195.148.12.100:53           0.0.0.0:*    LISTEN   
> 20933/nsd
>   tcp        0      0 2001:708:10:70::55:1:53     :::*        LISTEN   
> 20933/nsd
>   tcp        0      0 2001:708:10:55::53:53       :::*        LISTEN   
> 20933/nsd
>   udp        0      0 128.214.248.137:53          0.0.0.0:*       
> 20933/nsd
>   udp        0      0 193.167.245.84:53           0.0.0.0:*       
> 20933/nsd
>   udp        0      0 195.148.12.100:53           0.0.0.0:*       
> 20933/nsd
>   udp        0      0 2001:708:10:70::55:1:53     :::*            20933/nsd
>   udp        0      0 2001:708:10:55::53:53       :::*            20933/nsd
> 
> Steps 2-4 repeat over and over: process 20933 keeps on running and
> responding to DNS queries.
> 
> I'm not sure if process 20933 should terminate in reloads but for sure
> nsdc cannot find it because nsdc depends on pidfile being correct.
> -----
> 
> Regards,

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQEcBAEBAgAGBQJKr2VrAAoJEA8yVCPsQCW5/FMIAKVpwdZzN4PcHGtTGxE/oJ/J
Wl68IWKdit43Taj6F5T120Zay2x/YvaFaA/yQqosHNgQ7HwuJGTcARRDY1RD2eyr
rOXXLo26+istlrN0EVgIeyZshTqTlZ05lFtA/EBVpLOIsGWG2CFEF8dGPkFRDPYY
BxFt6RWrfluJsuFOcMy7FS05Hczvp0iAtuoFjrJJF6R9w1RtOGakCkG2oMzePlZr
ZKe2us303sb/QFNX5XF/yCyQf7DWg+fvbuZnfdGNIjnxQN4dUQdS644P6qiDGPwy
+lSyzQQjl1MH3nMBmv/dTbADc9yJ1Ef2OTZV2H7xWxsVsWgiQF+VpSkztwGHN34=
=rWQR
-----END PGP SIGNATURE-----



More information about the nsd-users mailing list