[nsd-users] Stale lock file (an nsdc problem)
Shane Kerr
shane at ca.afilias.info
Wed Oct 15 10:06:29 UTC 2008
Hello,
We have a stale lock file that is preventing nsdc from running. From the
log file our cron job produces:
Wed Oct 15 05:40:01 UTC 2008
5:40AM up 16:27, 0 users, load averages: 6.33, 8.02, 7.31
ns0a 25383 1 8175
/opt/prod/nsd/sbin/nsdc: line 138: /opt/nshome/ns0a/var/nsd.db.lock: cannot overwrite existing file
database locked by PID: 78717
aborting...
ns0a 25383 1 8175
Wed Oct 15 08:40:00 UTC 2008
8:40AM up 19:27, 0 users, load averages: 9.36, 5.78, 5.18
ns0a 25383 1 8175
/opt/prod/nsd/sbin/nsdc: line 138: /opt/nshome/ns0a/var/nsd.db.lock: cannot overwrite existing file
database locked by PID: 78717
aborting...
ns0a 25383 1 8175
This lock file does exist, and does point to process 78717:
[root at app7 /opt/nshome/ns0a/var]# ls -l
total 1639596
-rw-r--r-- 1 ns0a ns0a 601899131 Oct 15 09:09 ixfr.db
-rw-r--r-- 1 root ns0a 426287130 Oct 14 06:01 nsd.db
-rw-r--r-- 1 root ns0a 0 Oct 14 08:49 nsd.db.78717
-r--r--r-- 1 root ns0a 30 Oct 14 08:49 nsd.db.lock
-rw-r--r-- 1 ns0a ns0a 6 Oct 14 20:31 nsd.pid
-rw-r--r-- 1 root ns0a 1079 Jul 23 02:50 org.afilias-nst.info.zone
-rw-r--r-- 1 root ns0a 1075 Jul 23 02:50 org.afilias-nst.org.zone
-rw-r--r-- 1 root ns0a 649832393 Oct 14 08:49 org.zone
-rw-r--r-- 1 ns0a ns0a 2414 Sep 5 00:21 xfrd.state
[root at app7 /opt/nshome/ns0a/var]# cat nsd.db.lock
database locked by PID: 78717
But the process is not running:
[root at app7 /opt/nshome/ns0a/var]# ps ax | grep 78717
78079 p0 S+ 0:00.00 grep 78717
As with the signal() case reported a few months ago, nsdc.sh needs a bit
of love. The lock() function needs to be improved so it handles stale
locks. Something like this would probably work (and is even NFS-safe),
but requires that everything that writes to the lock use the PID and not
"database locked by PID: $$" as the contents.
lock() {
# create a temporary file based on our PID
TEMPFILE="${dbfile}.$$"
echo $$ > $TEMPFILE || (echo "error creating temporary file, aborting..."; exit 1)
# try to lock using this file
if ln $TEMPFILE ${lockfile} 2>/dev/null; then
rm -f $TEMPFILE
return
fi
# if that did not work, see if the locking process exists
PID=`cat ${lockfile}`
if kill -0 $PID 2>/dev/null; then
rm -f $TEMPFILE
echo "database locked by PID: $PID"
exit 1
fi
# if the locking process does not exist, consider the lock stale
echo "removing stale lockfile"
rm -f ${lockfile}
# lock the database
if ! ln $TEMPFILE ${lockfile} 2>/dev/null; then
rm -f $TEMPFILE
echo "unable to lock database"
exit 1
fi
}
Bad things happen to good processes. :)
Cheers,
--
Shane
More information about the nsd-users
mailing list