[nsd-users] fork-failed only on certain servers

Jeroen Koekkoek jeroen at nlnetlabs.nl
Tue Sep 24 12:34:53 UTC 2019


Hi Klaus,

This is an interesting problem. The fact that it happens only with huge
IXFRs suggests it really is a memory allocation problem. However, the
reload itself seems to be successful. The instantiation of the new
children seems to be where NSD fails, but the new children make use of
copy-on-write memory to serve the new data and therefore shouldn't
require a lot of additional memory.

Is it the same 4 servers where forking fails? Also, does the next
reload work after zone data has been transfered again?

Best regards,
Jeroen Koekkoek


On Mon, 2019-09-23 at 12:00 +0200, Klaus Darilion wrote:
> Hello!
> 
> We use NSD as slave on ~ 20 server. One in a while, if there is huge
> IXFR, the fork fails. Frankly, it fails only on 4 of these identical
> 20
> servers.
> 
> Those VMs are really identical: Same dom0, same amount of RAM, CPUs,
> Diskspace, Kernel, sysctl settings, NSD settings.
> 
> When I compare a failled server with a good server: RAM usage before
> the
> IXFR, was on both server 10.5GB. Both have 25G RAM installed - hence
> there should be sufficient RAM available - the IXFR was ~2GB.
> 
> NSD logs look identical, except that the fork failed on one (see
> below).
> 
> Do have any hints whe the fork fails on some VMs?
> 
> thanks
> Klaus
> 
> 
> 
> Good Server:
> 05:44:58 vie nsd[22157]: notify for xxx. from 1.2.3.4 serial
> 2019092314
> 05:44:58 vie nsd[22157]: notify for xxx. from 1234::5 serial
> 2019092314
> 05:49:45 vie nsd[669]: xfrd: zone xxx committed "received update to
> serial 2019092314 at 2019-09-23T05:49:45 from 1.2.3.20 TSIG verified
> with key mykey"
> 05:51:14 vie nsd[672]: rehash of zone xxx. with parameters 1 0 5
> 939fffb0948cbf34
> 05:51:27 vie nsd[672]: nsec3 xxx 1 %
> 05:51:33 vie nsd[672]: nsec3 xxx 17 %
> 05:51:39 vie nsd[672]: nsec3 xxx 25 %
> 05:51:45 vie nsd[672]: nsec3 xxx 31 %
> 05:51:49 vie nsd[22157]: notify for xxx. from 1.2.3.4 serial
> 2019092315
> 05:51:49 vie nsd[22157]: notify for xxx. from 1234::5 serial
> 2019092315
> 05:51:49 vie nsd[669]: xfrd: zone xxx committed "received update to
> serial 2019092315 at 2019-09-23T05:51:49 from 1.2.3.4 TSIG verified
> with
> key mykey"
> 05:51:49 vie nsd[22157]: notify for xxx. from 1.2.3.20 serial
> 2019092315
> 05:51:49 vie nsd[22157]: notify for xxx. from 2345::5 serial
> 2019092315
> 05:51:51 vie nsd[672]: nsec3 xxx 39 %
> 05:51:57 vie nsd[672]: nsec3 xxx 45 %
> 05:52:03 vie nsd[672]: nsec3 xxx 54 %
> 05:52:09 vie nsd[672]: nsec3 xxx 61 %
> 05:52:15 vie nsd[672]: nsec3 xxx 68 %
> 05:52:21 vie nsd[672]: nsec3 xxx 77 %
> 05:52:27 vie nsd[672]: nsec3 xxx 84 %
> 05:52:33 vie nsd[672]: nsec3 xxx 91 %
> 05:52:39 vie nsd[672]: nsec3 xxx 98 %
> 05:52:41 vie nsd[672]: zone xxx. received update to serial 2019092314
> at
> 2019-09-23T05:49:45 from 1.2.3.20 TSIG verified with key mykey of
> 1815276647 bytes in 411.196 seconds
> 05:52:45 vie nsd[669]: xfrd: zone xxx committed "received update to
> serial 2019092315 at 2019-09-23T05:52:45 from 2345::5 TSIG verified
> with
> key mykey"
> 05:52:57 vie nsd[672]: zone xxx. received update to serial 2019092315
> at
> 2019-09-23T05:51:49 from 1.2.3.4 TSIG verified with key mykey of
> 792413
> bytes in 0.03947 seconds
> 05:53:05 vie nsd[669]: zone xxx serial 2019092314 is updated to
> 2019092315.
> 
> 
> 
> Failed Server:
> 05:44:59 nyc nsd[344]: notify for xxx. from 1234::5 serial 2019092314
> 05:44:59 nyc nsd[344]: notify for xxx. from 1.2.3.4 serial 2019092314
> 05:49:54 nyc nsd[10937]: xfrd: zone xxx committed "received update to
> serial 2019092314 at 2019-09-23T05:49:54 from 2345::5 TSIG verified
> with
> key mykey"
> 05:51:14 nyc nsd[10939]: rehash of zone xxx. with parameters 1 0 5
> 939fffb0948cbf34
> 05:51:25 nyc nsd[10939]: nsec3 xxx 1 %
> 05:51:31 nyc nsd[10939]: nsec3 xxx 17 %
> 05:51:37 nyc nsd[10939]: nsec3 xxx 25 %
> 05:51:43 nyc nsd[10939]: nsec3 xxx 31 %
> 05:51:49 nyc nsd[10939]: nsec3 xxx 38 %
> 05:51:49 nyc nsd[344]: notify for xxx. from 1.2.3.4 serial 2019092315
> 05:51:49 nyc nsd[344]: notify for xxx. from 1234::5 serial 2019092315
> 05:51:50 nyc nsd[344]: notify for xxx. from 2345::5 serial 2019092315
> 05:51:50 nyc nsd[344]: notify for xxx. from 1.2.3.20 serial
> 2019092315
> 05:51:50 nyc nsd[10937]: xfrd: zone xxx committed "received update to
> serial 2019092315 at 2019-09-23T05:51:50 from 1.2.3.4 TSIG verified
> with
> key mykey"
> 05:51:55 nyc nsd[10939]: nsec3 xxx 45 %
> 05:52:02 nyc nsd[10939]: nsec3 xxx 54 %
> 05:52:09 nyc nsd[10939]: nsec3 xxx 62 %
> 05:52:15 nyc nsd[10939]: nsec3 xxx 71 %
> 05:52:21 nyc nsd[10939]: nsec3 xxx 78 %
> 05:52:27 nyc nsd[10939]: nsec3 xxx 84 %
> 05:52:33 nyc nsd[10939]: nsec3 xxx 90 %
> 05:52:39 nyc nsd[10939]: nsec3 xxx 97 %
> 05:52:42 nyc nsd[10939]: zone xxx. received update to serial
> 2019092314
> at 2019-09-23T05:49:54 from 2345::5 TSIG verified with key mykey of
> 1815276647 bytes in 418.798 seconds
> 05:52:43 nyc nsd[10939]: fork failed: Cannot allocate memory
> 05:52:45 nyc nsd[10937]: process 10939 exited with status 256
> 05:52:45 nyc nsd[4570]: handle_reload_cmd: reload closed cmd channel
> 05:52:45 nyc nsd[4570]: Reload process 10939 failed, continuing with
> old
> database
> 05:52:46 nyc nsd[10937]: xfrd: zone xxx committed "received update to
> serial 2019092315 at 2019-09-23T05:52:46 from 1.2.3.20 TSIG verified
> with key mykey"
> 05:53:10 nyc nsd[10937]: xfrd: zone xxx: soa serial 2019092315 update
> failed, restarting transfer (notified zone)
> 
> _______________________________________________
> nsd-users mailing list
> nsd-users at NLnetLabs.nl
> https://open.nlnetlabs.nl/mailman/listinfo/nsd-users




More information about the nsd-users mailing list