Closed Bug 1122907 Opened 10 years ago Closed 1 month ago

Slow DNS lookup/connection timings on 64 bit Linux.

Categories

(Core :: Networking: DNS, defect, P2)

35 Branch
x86_64
Linux
defect
Points:
5

Tracking

()

RESOLVED FIXED
133 Branch
Tracking Status
firefox133 --- fixed

People

(Reporter: brianvanderburg, Assigned: valentin)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged][necko-priority-queue][workaround in comment 17])

Attachments

(5 files, 1 obsolete file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:35.0) Gecko/20100101 Firefox/35.0 Build ID: 20150108202552 Steps to reproduce: Download official 64 bit Linux binaries. Extract to a temporary area ~/Temp/ff1 Execute the command: ~/Temp/ff1/firefox --no-remote --profile ~/Temp/ffp Access any website (in my case: www.joemygod.com) Download official 32 bit Linux binaries. Extract to a temporary area ~/Temp/ff2 Execute the command: ~/Temp/ff2/firefox --no-remote --profile ~/Temp/ffp Access same website Download source code, install dependencies. Configure: ../configure --enable-release Compile: make -j6 Package: make package Extract package to a temporary area: ~/Temp/ff3 Execute the command: ~/Temp/ff3/firefox --no-remote --profile ~/Temp/ffp Access same website Actual results: When using the official 64 bit binaries, DNS look up is slow and the page takes 30+ seconds to really be ready. The network page of the developer tools shows some long times for DNS lookup and connecting, 5-10 seconds in some cases. When using the official 32 bit binaries, connections are fast and most lookups are less than 100 milliseconds. When using the locally compiled 64 bit binaries, I get the same issues, long lookup/connection times in some cases. Expected results: The 64 bit binary should be just as fast at DNS lookup/connections as the 32 bit binary.
Additional Details: System: Debian Wheezy 64 bit RAM: 12GB CPU: 3.4GHz 6 core Firefox Version: 35 (and previous versions had the same problem, I don't recall when it started) Local services: I'm running a caching name server (ISC BIND) which forwards queries to 8.8.8.8 and 8.8.4.4. Additional Test: Clear BIND cache: /etc/init.d/bind9 restart Log queries: rndc querylog View queries: tail -f /var/log/syslog Run 64 bit Firefox and access the same site, observe log The results seems to be that the query requests occur slowly, one at time, sometimes with pauses in between, no more than one request per second. Clear BIND cache: /etc/init.d/bind9 restart Log queries: rndc querylog View queries: tail -f /var/log/syslog Run 32 bit Firefox and access the same site, observe log The log show queries fly by pretty quickly, sometimes 5 or more per second. I repeat the tests to ensure the speedup from the 32 bit test wasn't somehow because of the previous lookups in the 64 bit tests (even though I restart BIND in between to clear its cache). The results are the same, slow DNS lookups on 64 bit, fast on 32 bit. Notes: I can compile on my machine so if there is any request for me to tweak something in the source, recompile, and see if it helps or resolves the issue, I can do that.
After a little bit of extra testing detailed below, this may not be a Firefox bug. However it only seems to affect Firefox 64 bit. Firefox 32 bit and Chromium doesn't have this slow loading issue, so maybe it still is a Firefox 64 bit issue). This is a line from my original /etc/nsswitch.conf file: hosts: files wins mdns4_minimal [NOTFOUND=return] dns mdns4 After I remove just the wins entry: hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4 64 bit Firefox requests are just as fast as 32 bit requests. The wins entry is used to discover other Windows computers on the network by name. It basically sends out a broadcast for the computer name and system with the name reports its IP address. I can disable it because the mdns4_minimal can serve the same purpose (although requiring me to use hostname.local) But it is still curios why removing that one entry speeds up Firefox 64 bit, but other browsers and Firefox 32 bit had no problem with it.
Component: Untriaged → Networking: DNS
Product: Firefox → Core
I also see what could be DNS lookups failing to complete. Saw this in Firefox 34 and 35, and I'm using 64bit Lubuntu Linux. Filed bug 1116476 about it.
is this still an issue?
Assignee: nobody → valentin.gosu
Flags: needinfo?(brianvanderburg)
Whiteboard: [presto]
Whiteboard: [presto] → [presto][necko-active]
Whiteboard: [presto][necko-active] → [necko-would-take]
(In reply to Patrick McManus [:mcmanus] from comment #4) > is this still an issue? Patrick, Brian seems to be gone. Perhaps this impacts your perspective, given the bug status is still unconfirmed.
Flags: needinfo?(brianvanderburg) → needinfo?(mcmanus)
See Also: → 417689
See Also: → 1079217
Thanks a lot :swu for trying out the issue as you mentioned at the (wrongly closed) bug report 417689. Strange that you were not able to reproduce it. I've just made another test with an Ubuntu 17.04 64-bit Live System (i.e., no installation needed), which includes a 64-bit Firefox 52.0.1. Basically the same as I reported at #417689: Loading google.de for the first time took 15 seconds. After setting network.dns.disableIPv6, loading google.at took 2 seconds. After clearing network.dns.disableIPv6 again to default state, loading google.it took 12 seconds.
As mentioned earlier, most time appears to get lost on waiting for DNS resolution. It was claimed by @badger (see bug 417689) that Firefox uses both IPv4 and IPv6 in parallel (Happy eyeballs). If so, my suspicion is the implementation is buggy in the sense that it waits for both to terminate or timeout before returning in this case the result of the IPv4 lookup (which is the only one working on vanilla 64-bit Linux).
Thanks David for the information. It's great you can always reproduce this issue. Could you help to capture the log? https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging
Flags: needinfo?(mueller8)
(In reply to David von Oheimb from comment #6) > I've just made another test with an Ubuntu 17.04 64-bit Live System (i.e., > no installation needed), which includes a 64-bit Firefox 52.0.1. Basically > the same as I reported at #417689: > Loading google.de for the first time took 15 seconds. > After setting network.dns.disableIPv6, loading google.at took 2 seconds. > After clearing network.dns.disableIPv6 again to default state, loading > google.it took 12 seconds. One thing, probably you hit the condition in comment 2?
Here is the requested HTTP log exhibiting the undue delays loading google.pt.
Flags: needinfo?(mueller8)
For comparison, swift loading of google.es when network.dns.disableIPv6 is set.
(In reply to David von Oheimb from comment #10) > Created attachment 8883674 [details] > HTTP log accessing google.pt with many seconds of delay due to IPv6 DNS bug > > Here is the requested HTTP log exhibiting the undue delays loading google.pt. It takes 5 seconds to resolve google.pt by getaddrinfo in IPv6 enabled case. After google.pt was resolved and cached, for unknown reason, it failed to get entry from the cache and tried to resolve again, which takes another 5 seconds. I cannot see how this happens from the log. Valentin, do you have any idea?
Flags: needinfo?(valentin.gosu)
Maybe this effect (DNS cache not working) was because I fiddled also with other settings options, as suggested by others, trying out workarounds for the slow DNS). Yet this is not actually the problem we are after here. The actual problem is that when both IPv4 and IPv6 are enabled and Mozilla uses both DNS lookups in parallel, as soon as one of them gives a positive response (in this case, likely IPv4), Mozilla should not wait for the other (IPv6). Maybe the OS behavior is somewhat strange, but Mozilla should be able to cope with this (which is the case for the 32-bit version).
Do you have the wins entry in your /etc/nsswitch.conf as mentioned in comment 2?
Flags: needinfo?(mueller8)
No. (Yet there may be some other race condition between IPv4 and IPv6 in the OS side.) Con you confirm from the logs I provided Mozilla actually starts both IPv4 and IPv6 lookups requests (in parallel)? How much time thereafter does the first response arrive, and which one is it (IPv4 or IPv6)?
Flags: needinfo?(mueller8)
Priority: -- → P5
Flags: needinfo?(valentin.gosu)
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
So this issue seems to be caused by getaddrinfo with AF_UNSPEC as it waits for both the IPv4 and IPv6 response. Sometimes, due to way some networks are configured, the IPv6 response never arrives, meaning that getaddrinfo hangs until the system timeout expires. As pointed out earlier, one solution would be to disableIPv6 via the pref (network.dns.disableIPv6) Another would be to configure a timeout in resolv.conf: http://man7.org/linux/man-pages/man5/resolver.5.html
Assignee: valentin.gosu → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(mcmanus)
Whiteboard: [necko-would-take] → [necko-would-take][workaround in comment 17]

In 84.0 this is solved, don't know if on purpose or by chance.

Severity: normal → S3

I'm on the latest version of Firefox (v125.0.1) and this has not been resolved for me.
I would like to note that I am currently running macOS 14.3.

As previous users have noted, disabling network.dns.disableIPv6 would start loading near instantly in contrast to it being in it's default state (enabled) which yields a few extra seconds of delay (about ~10 seconds)

Ah, thank you for the comment.
This does seem to be a problem, and it's also likely to fix bug 1664492. [1]
I think we can just make sure to temporarily disable IPv6 on a network when the NetworkConnectivityChecker indicates there's no IPv6. That should workaround the issue.

[1] https://www.reddit.com/r/PFSENSE/comments/4wtqad/slow_dns_only_on_android_devices/

Blocks: necko-perf
Priority: P5 → P2
See Also: → 1664492
Whiteboard: [necko-would-take][workaround in comment 17] → [necko-triaged][necko-priority-next][necko-priority-review][workaround in comment 17]
Whiteboard: [necko-triaged][necko-priority-next][necko-priority-review][workaround in comment 17] → [necko-triaged][necko-priority-next][workaround in comment 17]

We should also make sure that when making the decision on whether to disable IPv6, the channels and DNS requests we use for connectivity checker are not affected.

Points: --- → 5
Assignee: nobody → valentin.gosu

This is a potential performance optimization for networks that don't
have IPv6 connectivity.

Depends on D212105

Whiteboard: [necko-triaged][necko-priority-next][workaround in comment 17] → [necko-triaged][necko-priority-queue][workaround in comment 17]
See Also: 1079217
Attachment #9404759 - Attachment description: WIP: Bug 1122907 - Check if device has non-local IPv6 addresses r=#necko → Bug 1122907 - Check if device has non-local IPv6 addresses r=#necko

Looking at webrtc test failures:
https://treeherder.mozilla.org/jobs?repo=try&revision=e3e0950f1c32425ffd1bb0001bdb10514bac0f78&selectedTaskRun=H19szTLyR6e6RE5tGkO56Q.0
in test_peerConnection_gatherWithSetConfiguration.html | Should have two srflx candidates with redirect rule: => iceServers: [{"urls":["stun:127.0.0.1,127.0.0.1"]}] - got +0, expected 2

None of the values in nsIDNSSerrvice::DNSFlags that are greater than 1 << 15
currently have any impact on the behaviour of GetAddrInfo, but if we wanted
to define others, those bits might get truncated.
It is better just to keep the same type all though the function call pipeline.

Attachment #9429605 - Attachment is obsolete: true
Pushed by valentin.gosu@gmail.com: https://hg.mozilla.org/integration/autoland/rev/121346649c75 Check if device has non-local IPv6 addresses r=necko-reviewers,kershaw https://hg.mozilla.org/integration/autoland/rev/db43cb6efba7 Don't do IPv6 DNS when the device doesn't have any non-local IPv6 addresses r=necko-reviewers,kershaw https://hg.mozilla.org/integration/autoland/rev/4d8cd0d71db2 Fix GetAddrInfo functions to pass in a uint32_t flags instead of uint16_t r=necko-reviewers,kershaw
Status: NEW → RESOLVED
Closed: 1 month ago
Resolution: --- → FIXED
Target Milestone: --- → 133 Branch
Regressions: 1924631
Regressions: 1924682
See Also: → 1702025
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: