From cra at WPI.EDU  Sun Feb  1 00:21:16 2015
From: cra at WPI.EDU (Chuck Anderson)
Date: Sat, 31 Jan 2015 19:21:16 -0500
Subject: [Dnssec-trigger] persistent cache needed?
In-Reply-To: <20150131235759.GD4025@angus.ind.WPI.EDU>
References: <20150131235759.GD4025@angus.ind.WPI.EDU>
Message-ID: <20150201002115.GE4025@angus.ind.WPI.EDU>

On Sat, Jan 31, 2015 at 06:58:00PM -0500, Chuck Anderson wrote:
> After booting up and re-opening Firefox, restoring 50-100 tabs causes
> so much DNS traffic that unbound goes unresponsive, and queries
> repeatedly timeout for many minutes until things finally settle down.
> I thought Firefox's behavior was to not reload every tab until you
> activate the tab, but maybe it is still doing DNS pre-fetches for the
> inactive tabs?  I don't know.
> 
> I think we need a persistent cache, saved across restarts/reboots.
> What else can we do to solve this problem?
> 
> Or is the verbosity the cause of the problem:
> 
> #journalctl -b -u unbound | wc -l
> 24581
> 
> unbound.conf:
> 
> server:
>         # verbosity number, 0 is least verbose. 1 is default.
>         verbosity: 3

Nope, I turned this back down to 1, and the problem is the same after
rebooting.  I also confirmed that only some DNS queries timeout.  For
example, www.yahoo.com and www.nasa.gov timeout (or sometimes
SERVFAIL), but www.google.com works fine.  Probably any DNS queries
that are already cached before the flood of queries comes into unbound
will work fine.  I also confirmed that the problem only begins when
Firefox is reloading the previous session.  It takes about 5 minutes
for things to settle down enough for queries to finish without timing
out.


From paul at nohats.ca  Sun Feb  1 18:46:53 2015
From: paul at nohats.ca (Paul Wouters)
Date: Sun, 1 Feb 2015 13:46:53 -0500 (EST)
Subject: [Dnssec-trigger] persistent cache needed?
In-Reply-To: <20150131235759.GD4025@angus.ind.WPI.EDU>
References: <20150131235759.GD4025@angus.ind.WPI.EDU>
Message-ID: <alpine.LFD.2.10.1502011342090.31873@bofh.nohats.ca>

On Sat, 31 Jan 2015, Chuck Anderson wrote:

> After booting up and re-opening Firefox, restoring 50-100 tabs causes
> so much DNS traffic that unbound goes unresponsive, and queries
> repeatedly timeout for many minutes until things finally settle down.

Why is that causing timeouts and failures on DNS for you?

I do think unbound needs an option to tell it it is operating on
an endnode and not a network wide cache, where it can be a little
more aggressive on negative cache entries and retry more.

> I think we need a persistent cache, saved across restarts/reboots.
> What else can we do to solve this problem?

I would like that. But it would require the cache to have some kind
of timestamp associaed to it, so the loading unbound can calculate
how much to lower the TTL's of the cached data. Otherwise you would
end up with badly cached data that has in reality expired (and might
have changed)

Note this is the reverse of another problem people have, which is when
switching network they want the cache to be wiped because some networks
might have split-DNS entries that aren't valid elsewhere.

> Or is the verbosity the cause of the problem:
>
> #journalctl -b -u unbound | wc -l
> 24581

Verbosity causes a significant performance drop, so for your original
problem it might be worth reducing it to 1 again and see if your
problem disappears.

Paul


From wouter at nlnetlabs.nl  Mon Feb  2 08:27:18 2015
From: wouter at nlnetlabs.nl (W.C.A. Wijngaards)
Date: Mon, 02 Feb 2015 09:27:18 +0100
Subject: [Dnssec-trigger] persistent cache needed?
In-Reply-To: <alpine.LFD.2.10.1502011342090.31873@bofh.nohats.ca>
References: <20150131235759.GD4025@angus.ind.WPI.EDU>
 <alpine.LFD.2.10.1502011342090.31873@bofh.nohats.ca>
Message-ID: <54CF34E6.8040308@nlnetlabs.nl>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi,

On 01/02/15 19:46, Paul Wouters wrote:
> On Sat, 31 Jan 2015, Chuck Anderson wrote:
> 
>> After booting up and re-opening Firefox, restoring 50-100 tabs
>> causes so much DNS traffic that unbound goes unresponsive, and
>> queries repeatedly timeout for many minutes until things finally
>> settle down.
> 
> Why is that causing timeouts and failures on DNS for you?

If unbound was compiled with libevent, it should not have any issues
coping with the traffic.  But I heard that 'nat boxes' have trouble
with many connections.  So, I do not know how to fix this, the network
won't allow the amount of traffic you are trying to do ...

Best regards,
   Wouter

> I do think unbound needs an option to tell it it is operating on an
> endnode and not a network wide cache, where it can be a little more
> aggressive on negative cache entries and retry more.
> 
>> I think we need a persistent cache, saved across
>> restarts/reboots. What else can we do to solve this problem?
> 
> I would like that. But it would require the cache to have some
> kind of timestamp associaed to it, so the loading unbound can
> calculate how much to lower the TTL's of the cached data. Otherwise
> you would end up with badly cached data that has in reality expired
> (and might have changed)
> 
> Note this is the reverse of another problem people have, which is
> when switching network they want the cache to be wiped because some
> networks might have split-DNS entries that aren't valid elsewhere.
> 
>> Or is the verbosity the cause of the problem:
>> 
>> #journalctl -b -u unbound | wc -l 24581
> 
> Verbosity causes a significant performance drop, so for your
> original problem it might be worth reducing it to 1 again and see
> if your problem disappears.
> 
> Paul _______________________________________________ dnssec-trigger
> mailing list dnssec-trigger at NLnetLabs.nl 
> http://open.nlnetlabs.nl/mailman/listinfo/dnssec-trigger

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBCAAGBQJUzzTmAAoJEJ9vHC1+BF+N0FQP/i7juffUKyFfRfPM9g+AX/qP
gWXWdWy7E1bQeMxy7eniLk25zcAM1gD39d3GjJAgT9ujbU/8exJzEeDDLech/4z0
lzDup6QGJSUKH36A78G/cZXmWhfZSHFP5w0iZo3wrvvv6NnQ3UcyvdbTEsEo99U5
9gEcpQNeA2RbTTz6xgeyW/JoHcg9PJGaAbQ7e5xqzBtAZ6pLfthW+EWkIIXZJGRd
H+nGxL58d5lKGM+3lFzF4YmFiGd2VRreXqy4y+e20SdxvxenGZ8e1GBhC8LVvfP7
vlTzhjYdUiV9pKyACC/5jng8BrDqFuqNif+n8stI1Z1CuoAwSQbXf/kCSO9hnpPw
Jg/SiA/9tLX9Z4RFDG6SmXYsKQMfkVEzPhnNmUtg8s7i8N1+Kt2HTEgFtIi0cI3y
udQMW09VVckXJaLd6zj6t2BVYUZ/9RhxWJO4ieCuzfnuBnVNepj3T6+hgmjOEX0o
mHB5nkcEuDk23MqFV5Tj1ac80JuJrzuO2c4BOPcD0uw5jQWmSEwnImvlAd1q8ng6
tkzTptkLPoFaNo5xDkNhLPNOH0d3OdgXaurH4AbExb2pepQSkMyKA73kS+9K0QWH
4KgPlf7ew6HU4F63h+Xn19gLvNOrWfZJSab0CSW71kk6GjiHW+Z22h554jNbxJKF
kiRcK4BvjxPtYOIEtZLs
=t9kc
-----END PGP SIGNATURE-----


From cra at WPI.EDU  Mon Feb  2 14:54:55 2015
From: cra at WPI.EDU (Chuck Anderson)
Date: Mon, 2 Feb 2015 09:54:55 -0500
Subject: [Dnssec-trigger] persistent cache needed?
In-Reply-To: <54CF34E6.8040308@nlnetlabs.nl>
References: <20150131235759.GD4025@angus.ind.WPI.EDU>
 <alpine.LFD.2.10.1502011342090.31873@bofh.nohats.ca>
 <54CF34E6.8040308@nlnetlabs.nl>
Message-ID: <20150202145454.GI4025@angus.ind.WPI.EDU>

On Mon, Feb 02, 2015 at 09:27:18AM +0100, W.C.A. Wijngaards wrote:
> Hi,
> 
> On 01/02/15 19:46, Paul Wouters wrote:
> > On Sat, 31 Jan 2015, Chuck Anderson wrote:
> > 
> >> After booting up and re-opening Firefox, restoring 50-100 tabs
> >> causes so much DNS traffic that unbound goes unresponsive, and
> >> queries repeatedly timeout for many minutes until things finally
> >> settle down.
> > 
> > Why is that causing timeouts and failures on DNS for you?

I'm unsure why.  It happens even with verbosity set back to 1.

> If unbound was compiled with libevent, it should not have any issues
> coping with the traffic.  But I heard that 'nat boxes' have trouble
> with many connections.  So, I do not know how to fix this, the network
> won't allow the amount of traffic you are trying to do ...

That sounds plausible.

After fixing a bug in dnssec-trigger-script that was causing it to
crash (TRUE -> True), the forwarders are now being set properly via
DHCP.  The behavior is the same either way--without any forwaders or
with one forwarder set to 192.168.1.1, a NetGear router with stock
firmware.

I have a CeroWRT router that I'll test next--at least I should be able
to monitor the connection limit to see if that is the problem.