[nsd-users] make nsdc more reliable for restart

Paul Wouters paul at nohats.ca
Sun Aug 12 17:41:58 UTC 2012


On Fri, 10 Aug 2012, Stuart Henderson wrote:

>>> while running nsd as a secondary nameserver with +1000
>>> domains we discovered that the default nsdc(8) was
>>> not able to reliable restart nsd.
>>> Reason I think is that, by using the PID file, it sends
>>> it's signal to only 1 of the default 3 processes.
>>> Afterwards it only checks against this 1 process while
>>> the other 2 still may be running causing trouble on
>>> start up.
>
> I wondered whether there's a particular reason that only the
> master is signalled, or is this purely due to lack of a portable
> pkill-type program?
>
>>> The patch below fixes it for us (was tested in a lab
>>> environment with 10.000 domains).
>>
>> The "pkill" command is not available on all systems. Linux distros ship
>> with it these days, and MacOS X introduced it with Mountain Lion (10.8),
>> but it may not be available on other systems. Therefore your patch is
>> not portable.
>
> Some OS have "killall" that does the same as pkill, but other
> OS have a different "killall" that behaves slightly differently ;)

The patch did not address my issue actually.

[root at nohats ~]# pidof nsd
4697 4696 4677
[root at nohats ~]# ls /var/run/nsd
[root at nohats ~]# nsdc stop
nsd is not running

somehow nsd gets signaled and deletes its pid, but won't write a new
one. There are two methods my nsd is getting signalled. One is via
an hourly cron running (if necc) a nsdc patch and nsdc reload. When
doing this manually, it works fine and the reload signals nsd and a
new pidfile is created:

[root at nohats ~]# pidof nsd
1301 1300 1298
[root at nohats ~]# cat /var/run/nsd/nsd.pid 
1298
[root at nohats ~]# kill -HUP 1298
[root at nohats ~]# cat /var/run/nsd/nsd.pid 
1304
[root at nohats ~]# pidof nsd
1305 1304 1300
[root at nohats ~]# kill -HUP 1304
[root at nohats ~]# pidof nsd
1313 1312 1300
[root at nohats ~]# cat /var/run/nsd/nsd.pid 
1312

The second method is by opendnssec, configured to use:

/etc/opendnssec/conf.xml:		<NotifyCommand>sudo /sbin/service nsd restart</NotifyCommand>

[root at nohats ~]# su - ods
-bash-4.1$ cat /var/run/nsd/nsd.pid 
1312
-bash-4.1$ sudo /sbin/service nsd restart
Stopping nsd:                                              [  OK  ]
Starting nsd:                                              [  OK  ]
-bash-4.1$ cat /var/run/nsd/nsd.pid 
1494
-bash-4.1$ pidof nsd
1497 1496 1494

So it all looks fine, but after a while something happens and the
pidfile is either wrong or gone, and then all of these fail. But
even with the pkill patch applied to /usr/sbin/nsdc, this still
happens.

Paul



More information about the nsd-users mailing list