[nsd-users] Edge case on nsdc?

Shane Kerr shane at ca.afilias.info
Tue Jul 8 12:16:49 UTC 2008


Hello,

We have this in our NSD logs occasionally:

[1214740996] nsd[93921]: warning: nsd is already running as 93888,  
continuing
[1214740996] nsd[93922]: error: can't bind the socket: Address already  
in use
[1214741027] nsd[94418]: error: can't bind the socket: Address already  
in use
[1214741057] nsd[94932]: error: can't bind the socket: Address already  
in use


I think this is because we have a script monitoring to make sure NSD  
is running at all time and attempts to start it... even though NSD is  
already running.


In the nsdc.sh script we see the following:


signal() {
         if [ -s ${pidfile} ]
         then
                 kill -"$1" `cat ${pidfile}` && return 0
         else
                 echo "nsd is not running"
         fi
         return 1
}


But it seems like NSD restarts itself regularly, getting a new process  
ID when it does so. In this case, we have the possibility for the  
following to happen:

- nsdc.sh reads the contents of pidfile

- NSD restarts, getting a new PID

- nsdc.sh sends a signal to test NSD using the old PID, which fails,  
so nsdc claims NSD is not running

Is this possible?



It is possible to work around this with a little more sophistication,  
I think:

signal() {
	while true
	do
		# if there is no PID file, NSD is not running
		if [ ! -s ${pidfile} ]
		then
			return 1
                 fi

		# if we can send the signal to the PID, then NSD is running
                 #   (or some other process with that PID, but we hope  
not...)
		PID=`cat ${pidfile}`
		if kill -"$1" $PID
		then
			 return 0
		fi

		# double-check NSD did not restart between the time we read the PID
		# and the time we sent the signal
		CHECK_PID=`cat ${pidfile}`
		if [ $PID -eq $CHECK_PID ]
		then
			echo "nsd is not running"
			return 1
		fi
	done
}

--
Shane



More information about the nsd-users mailing list