Open Bug 1122907 Opened 5 years ago Updated 8 months ago
Slow DNS lookup/connection timings on 64 bit Linux
3.02 MB, text/plain
2.97 MB, text/plain
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:35.0) Gecko/20100101 Firefox/35.0 Build ID: 20150108202552 Steps to reproduce: Download official 64 bit Linux binaries. Extract to a temporary area ~/Temp/ff1 Execute the command: ~/Temp/ff1/firefox --no-remote --profile ~/Temp/ffp Access any website (in my case: www.joemygod.com) Download official 32 bit Linux binaries. Extract to a temporary area ~/Temp/ff2 Execute the command: ~/Temp/ff2/firefox --no-remote --profile ~/Temp/ffp Access same website Download source code, install dependencies. Configure: ../configure --enable-release Compile: make -j6 Package: make package Extract package to a temporary area: ~/Temp/ff3 Execute the command: ~/Temp/ff3/firefox --no-remote --profile ~/Temp/ffp Access same website Actual results: When using the official 64 bit binaries, DNS look up is slow and the page takes 30+ seconds to really be ready. The network page of the developer tools shows some long times for DNS lookup and connecting, 5-10 seconds in some cases. When using the official 32 bit binaries, connections are fast and most lookups are less than 100 milliseconds. When using the locally compiled 64 bit binaries, I get the same issues, long lookup/connection times in some cases. Expected results: The 64 bit binary should be just as fast at DNS lookup/connections as the 32 bit binary.
Additional Details: System: Debian Wheezy 64 bit RAM: 12GB CPU: 3.4GHz 6 core Firefox Version: 35 (and previous versions had the same problem, I don't recall when it started) Local services: I'm running a caching name server (ISC BIND) which forwards queries to 126.96.36.199 and 188.8.131.52. Additional Test: Clear BIND cache: /etc/init.d/bind9 restart Log queries: rndc querylog View queries: tail -f /var/log/syslog Run 64 bit Firefox and access the same site, observe log The results seems to be that the query requests occur slowly, one at time, sometimes with pauses in between, no more than one request per second. Clear BIND cache: /etc/init.d/bind9 restart Log queries: rndc querylog View queries: tail -f /var/log/syslog Run 32 bit Firefox and access the same site, observe log The log show queries fly by pretty quickly, sometimes 5 or more per second. I repeat the tests to ensure the speedup from the 32 bit test wasn't somehow because of the previous lookups in the 64 bit tests (even though I restart BIND in between to clear its cache). The results are the same, slow DNS lookups on 64 bit, fast on 32 bit. Notes: I can compile on my machine so if there is any request for me to tweak something in the source, recompile, and see if it helps or resolves the issue, I can do that.
After a little bit of extra testing detailed below, this may not be a Firefox bug. However it only seems to affect Firefox 64 bit. Firefox 32 bit and Chromium doesn't have this slow loading issue, so maybe it still is a Firefox 64 bit issue). This is a line from my original /etc/nsswitch.conf file: hosts: files wins mdns4_minimal [NOTFOUND=return] dns mdns4 After I remove just the wins entry: hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4 64 bit Firefox requests are just as fast as 32 bit requests. The wins entry is used to discover other Windows computers on the network by name. It basically sends out a broadcast for the computer name and system with the name reports its IP address. I can disable it because the mdns4_minimal can serve the same purpose (although requiring me to use hostname.local) But it is still curios why removing that one entry speeds up Firefox 64 bit, but other browsers and Firefox 32 bit had no problem with it.
I also see what could be DNS lookups failing to complete. Saw this in Firefox 34 and 35, and I'm using 64bit Lubuntu Linux. Filed bug 1116476 about it.
is this still an issue?
Assignee: nobody → valentin.gosu
Whiteboard: [presto][necko-active] → [necko-would-take]
(In reply to Patrick McManus [:mcmanus] from comment #4) > is this still an issue? Patrick, Brian seems to be gone. Perhaps this impacts your perspective, given the bug status is still unconfirmed.
Flags: needinfo?(brianvanderburg) → needinfo?(mcmanus)
Thanks a lot :swu for trying out the issue as you mentioned at the (wrongly closed) bug report 417689. Strange that you were not able to reproduce it. I've just made another test with an Ubuntu 17.04 64-bit Live System (i.e., no installation needed), which includes a 64-bit Firefox 52.0.1. Basically the same as I reported at #417689: Loading google.de for the first time took 15 seconds. After setting network.dns.disableIPv6, loading google.at took 2 seconds. After clearing network.dns.disableIPv6 again to default state, loading google.it took 12 seconds.
As mentioned earlier, most time appears to get lost on waiting for DNS resolution. It was claimed by @badger (see bug 417689) that Firefox uses both IPv4 and IPv6 in parallel (Happy eyeballs). If so, my suspicion is the implementation is buggy in the sense that it waits for both to terminate or timeout before returning in this case the result of the IPv4 lookup (which is the only one working on vanilla 64-bit Linux).
Thanks David for the information. It's great you can always reproduce this issue. Could you help to capture the log? https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging
(In reply to David von Oheimb from comment #6) > I've just made another test with an Ubuntu 17.04 64-bit Live System (i.e., > no installation needed), which includes a 64-bit Firefox 52.0.1. Basically > the same as I reported at #417689: > Loading google.de for the first time took 15 seconds. > After setting network.dns.disableIPv6, loading google.at took 2 seconds. > After clearing network.dns.disableIPv6 again to default state, loading > google.it took 12 seconds. One thing, probably you hit the condition in comment 2?
Here is the requested HTTP log exhibiting the undue delays loading google.pt.
For comparison, swift loading of google.es when network.dns.disableIPv6 is set.
(In reply to David von Oheimb from comment #10) > Created attachment 8883674 [details] > HTTP log accessing google.pt with many seconds of delay due to IPv6 DNS bug > > Here is the requested HTTP log exhibiting the undue delays loading google.pt. It takes 5 seconds to resolve google.pt by getaddrinfo in IPv6 enabled case. After google.pt was resolved and cached, for unknown reason, it failed to get entry from the cache and tried to resolve again, which takes another 5 seconds. I cannot see how this happens from the log. Valentin, do you have any idea?
Maybe this effect (DNS cache not working) was because I fiddled also with other settings options, as suggested by others, trying out workarounds for the slow DNS). Yet this is not actually the problem we are after here. The actual problem is that when both IPv4 and IPv6 are enabled and Mozilla uses both DNS lookups in parallel, as soon as one of them gives a positive response (in this case, likely IPv4), Mozilla should not wait for the other (IPv6). Maybe the OS behavior is somewhat strange, but Mozilla should be able to cope with this (which is the case for the 32-bit version).
Do you have the wins entry in your /etc/nsswitch.conf as mentioned in comment 2?
No. (Yet there may be some other race condition between IPv4 and IPv6 in the OS side.) Con you confirm from the logs I provided Mozilla actually starts both IPv4 and IPv6 lookups requests (in parallel)? How much time thereafter does the first response arrive, and which one is it (IPv4 or IPv6)?
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P5
So this issue seems to be caused by getaddrinfo with AF_UNSPEC as it waits for both the IPv4 and IPv6 response. Sometimes, due to way some networks are configured, the IPv6 response never arrives, meaning that getaddrinfo hangs until the system timeout expires. As pointed out earlier, one solution would be to disableIPv6 via the pref (network.dns.disableIPv6) Another would be to configure a timeout in resolv.conf: http://man7.org/linux/man-pages/man5/resolver.5.html
Assignee: valentin.gosu → nobody
Status: ASSIGNED → NEW
Whiteboard: [necko-would-take] → [necko-would-take][workaround in comment 17]
You need to log in before you can comment on or make changes to this bug.