[Unbound-users] Resolving facebook and RTO of 120000
Leo Bush
leo.bush at mylife.lu
Thu Mar 22 08:39:24 UTC 2012
Dear all,
We are using unbound 1.4.14 for DNS resolving. It runs at 5Mbp/s of
traffic in average.
For several days we had a problem with the resolution of m.facebook.com.
Unbound returned permanently a SERVFAIL error and mobile clients were
stuck and unhappy.
When I looked at the infra cache I saw the responsible nameservers with
an RTO value of 120000 and I did not succeed to flush or to clear that
setting. Is there a possibility that I did not find (flush does not work
any more)? That is why I did an "unbound-control reload". Now
m.facebook.com works again (since 3 days).
What astonishes me is, that the unbound algorithm did not succeed to get
out of that RTO120000 situation by itself. I had a similar problem a few
weeks earlier with the domain www.voipbuster.com for which the
nameserver ns1.finarea.ch was listed with 120000 in the dump_cache. The
only solution to get it work was unbound-control reload.
After two failures in less than a month, I got curious and looked for
other 120000 entries in the dump cache:
[ ~]# date; unbound-control dump_infra |grep 120000 | sort; echo; sleep
120; date; unbound-control dump_infra |grep 120000 | sort
Thu Mar 22 09:21:04 CET 2012
128.242.103.18 fr.fm. expired rto 120000
128.242.103.32 fr.fm. expired rto 120000
199.7.59.78 ocsp.verisign.net. expired rto 120000
200.74.240.66 globalcccam.com. expired rto 120000
62.212.66.130 uploadhere.com. expired rto 120000
68.232.43.4 cedexis.net. expired rto 120000
69.171.239.10 star.facebook.com. expired rto 120000
69.171.255.10 star.facebook.com. expired rto 120000
Thu Mar 22 09:23:04 CET 2012
128.242.103.18 fr.fm. expired rto 120000
128.242.103.32 fr.fm. expired rto 120000
199.7.59.78 ocsp.verisign.net. expired rto 120000
200.74.240.66 globalcccam.com. expired rto 120000
62.212.66.130 uploadhere.com. expired rto 120000
68.232.43.4 cedexis.net. expired rto 120000
69.171.239.10 star.facebook.com. expired rto 120000
69.171.255.10 star.facebook.com. expired rto 120000
72.51.41.148 bl.csma.biz. ttl 642 ping 0 var 94 rtt 376 rto 120000
ednsknown 0 edns 0 delay 0 lame dnssec 0 rec 0 A 0 other 0
As you can see, I repeated the dump two times in 2 minutes, and there
are some entries where the RTO did not change (especially
star.facebook.com, I repeated the dump even more times). When I try to
manually resolve the RR towards the IP, I get an instant answer. So I do
not think that I have a network problem because everything else works fine.
[ ~]# dig star.facebook.com
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 16121
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;star.facebook.com. IN A
;; Query time: 165 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Mar 22 09:25:38 2012
[ ~]# dig @69.171.239.10 star.facebook.com. +norec
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 47559
;; flags: qr aa; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;star.facebook.com. IN A
;; ANSWER SECTION:
star.facebook.com. 30 IN A 66.220.158.72
;; Query time: 10 msec
;; SERVER: 69.171.239.10#53(69.171.239.10)
;; WHEN: Thu Mar 22 09:26:02 2012
I also checked the authoritative IP for facebook in the dump. I returns
plenty of successes besides star.facebook.com.
[ ~]# unbound-control dump_infra | grep 69.171.239.10
69.171.239.10 touch.facebook.com. ttl 854 ping 3 var 46 rtt 187 rto 187
ednsknown 1 edns 0 delay 0 lame dnssec 0 rec 0 A 0 other 0
69.171.239.10 developers.facebook.com. ttl 886 ping 1 var 74 rtt 297 rto
297 ednsknown 1 edns 0 delay 0 lame dnssec 0 rec 0 A 0 other 0
69.171.239.10 orcart.facebook.com. ttl 890 ping 1 var 73 rtt 293 rto 293
ednsknown 1 edns 0 delay 0 lame dnssec 0 rec 0 A 0 other 0
69.171.239.10 check6.facebook.com. ttl 879 ping 1 var 74 rtt 297 rto 297
ednsknown 1 edns 0 delay 0 lame dnssec 0 rec 0 A 0 other 0
69.171.239.10 www.facebook.com. ttl 888 ping 1 var 73 rtt 293 rto 293
ednsknown 1 edns 0 delay 0 lame dnssec 0 rec 0 A 0 other 0
69.171.239.10 staging.channel.facebook.com. ttl 898 ping 1 var 74 rtt
297 rto 297 ednsknown 1 edns 0 delay 0 lame dnssec 0 rec 0 A 0 other 0
69.171.239.10 wild.facebook.com. ttl 890 ping 1 var 74 rtt 297 rto 297
ednsknown 1 edns 0 delay 0 lame dnssec 0 rec 0 A 0 other 0
69.171.239.10 chat.facebook.com. ttl 838 ping 6 var 17 rtt 74 rto 74
ednsknown 1 edns 0 delay 0 lame dnssec 0 rec 0 A 0 other 0
69.171.239.10 api-read.facebook.com. ttl 891 ping 1 var 74 rtt 297 rto
297 ednsknown 1 edns 0 delay 0 lame dnssec 0 rec 0 A 0 other 0
69.171.239.10 check4.facebook.com. ttl 879 ping 3 var 47 rtt 191 rto 191
ednsknown 1 edns 0 delay 0 lame dnssec 0 rec 0 A 0 other 0
69.171.239.10 apps.facebook.com. ttl 891 ping 1 var 74 rtt 297 rto 297
ednsknown 1 edns 0 delay 0 lame dnssec 0 rec 0 A 0 other 0
69.171.239.10 graph.facebook.com. ttl 894 ping 1 var 75 rtt 301 rto 301
ednsknown 1 edns 0 delay 0 lame dnssec 0 rec 0 A 0 other 0
69.171.239.10 star.facebook.com. expired rto 120000
Does anybody see a similar phenomenon (IPs get stuck at expired rto
120000)? Do you have an idea when and why this arrives and how I should
deal with it? Unbound reloads are penalizing for the other users.
Thanks for your thoughts.
kind regards
Leo Bush
More information about the Unbound-users
mailing list