[nsd-users] Edge case on nsdc?
Shane Kerr
shane at ca.afilias.info
Tue Jul 8 12:16:49 UTC 2008
Hello,
We have this in our NSD logs occasionally:
[1214740996] nsd[93921]: warning: nsd is already running as 93888,
continuing
[1214740996] nsd[93922]: error: can't bind the socket: Address already
in use
[1214741027] nsd[94418]: error: can't bind the socket: Address already
in use
[1214741057] nsd[94932]: error: can't bind the socket: Address already
in use
I think this is because we have a script monitoring to make sure NSD
is running at all time and attempts to start it... even though NSD is
already running.
In the nsdc.sh script we see the following:
signal() {
if [ -s ${pidfile} ]
then
kill -"$1" `cat ${pidfile}` && return 0
else
echo "nsd is not running"
fi
return 1
}
But it seems like NSD restarts itself regularly, getting a new process
ID when it does so. In this case, we have the possibility for the
following to happen:
- nsdc.sh reads the contents of pidfile
- NSD restarts, getting a new PID
- nsdc.sh sends a signal to test NSD using the old PID, which fails,
so nsdc claims NSD is not running
Is this possible?
It is possible to work around this with a little more sophistication,
I think:
signal() {
while true
do
# if there is no PID file, NSD is not running
if [ ! -s ${pidfile} ]
then
return 1
fi
# if we can send the signal to the PID, then NSD is running
# (or some other process with that PID, but we hope
not...)
PID=`cat ${pidfile}`
if kill -"$1" $PID
then
return 0
fi
# double-check NSD did not restart between the time we read the PID
# and the time we sent the signal
CHECK_PID=`cat ${pidfile}`
if [ $PID -eq $CHECK_PID ]
then
echo "nsd is not running"
return 1
fi
done
}
--
Shane
More information about the nsd-users
mailing list