[Unbound-users] Problem to resolve domains from a certain registrar
leo.bush at mylife.lu
Thu Sep 29 14:25:56 UTC 2011
Thank you Wouter for you last answer. This answer pushed me to get into
contact with the particular operator, but we did not find a new hint
until I found the following explanation for the problem today:
- Since weeks, our unbound resolving server gets every minute a request
for A www.coolbox.be from a device in our network.
- unbound tries to get the answer from ns1.register.be or
ns2.register.be -> in both cases: no answer -> timeout -> rto climbs
quickly to 120000
- in parallel our unbound server gets various requests for domains
hosted at ns1.register.be or ns2.register.be. Normally they all get
answered quickly. But we notice, that once the rto is arrived at 120000
(because of www.coolbox.be), unbound does not try to contact the remote
authoritative servers any more and only returns SERVFAILs even though an
answer would be available. The whole registrar's nameserver farm is
blacklisted because one zone is not working any more.
- This explains, why I noticed the error only for ns1.register.be and
ns2.register.be and not on ns3.register.be, because coolbox.be is not
delegated on the third server.
- This explains why I noticed that in rare moments, the resolution works
correctly and why it does not work most of the time. (it works in the
short periods when the nameservers are not yet blacklisted again)
- I would say unbound does a bad negative caching for two nameservers
that only respond when they have something to respond. If they are asked
things they do not know any more, they do not answer (no REJECT). So
this penalizes the whole communication between the (unbound) resolvers
and the authoritative server.
- I found the following text in RFC2308
7 - Other Negative Responses
... not covered by any existing RFC.
7.1 Server Failure (OPTIONAL)
a resolver MAY cache a server failure response. If it
does so it MUST NOT cache it for longer than five (5) minutes, and it
MUST be cached against the specific query tuple<query name, type,
class, server IP address>
7.2 Dead / Unreachable Server (OPTIONAL)
Dead / Unreachable servers are servers that fail to respond in any
way to a query or where the transport layer has provided an
indication that the server does not exist or is unreachable. A
server may be deemed to be dead or unreachable if it has not
responded to an outstanding query within 120 seconds.
Examples of transport layer indications are:
ICMP error messages indicating host, net or port unreachable.
IP stack error messages providing similar indications to those above.
A server MAY cache a dead server indication. If it does so it MUST
NOT be deemed dead for longer than five (5) minutes. The indication
MUST be stored against query tuple<query name, type, class, server
IP address> unless there was a transport layer indication that the
server does not exist, in which case it applies to all queries to
that specific IP address.
- Can you tell me if my interpretation is correct: requests which do not
get answered, make unbound blacklist the whole server so that it does
not even request correct domains which would get answered). Does unbound
do a caching over the complete tuple <query name, type, class, server IP
On 08/09/2011 20:13, W.C.A. Wijngaards wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> Hi Leo,
> I do not have a solution for you, but wanted to help read the output.
> The rto value to 120000 means timeouts. This means that the host is
> timing out, and it does not reply to you.
> rto: roundtrip-time-out value. The roundtrip value modified by
> exponential backoff due to timeouts. The 'ping' time would be the
> pingtime when it does respond to you (msec).
> The leonidas.be servers seem to have blacklisted you? Or some firewall
> or other script is throttling traffic to zero for you? So, it works for
> a bit, then it blacklists you, and it stops working and timeouts.
> Best regards, Wouter
More information about the Unbound-users