Closed Bug 231607 Opened 21 years ago Closed 21 years ago

DNS delays for people with IPv6-buggy local DNS servers

Categories

(Core :: Networking, defect)

All
macOS
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 68796

People

(Reporter: dmb41, Assigned: darin.moz)

Details

(Whiteboard: [IPv6])

Attachments

(4 files, 1 obsolete file)

User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7a) Gecko/20040120 Firebird/0.8.0+ Build Identifier: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7a) Gecko/20040120 Firebird/0.8.0+ I have been using 0.7 for a few months; just found out about the nightly builds. It looks much nicer, but when loading ANY site the browswer hangs at "Resolving Host" (status bar) for about 4-5 seconds before loading the site. Once it brings up the site, all the graphics and text on the site load quickly. I do not experience this on the same computer and internet connection with 0.7 Reproducible: Always Steps to Reproduce: 1. Open Firebird 0.8+ 2. Load any website (www.google.com) 3. Watch the status bar, will say "resolving host" for a few seconds before page loads Actual Results: Resolving Host hangs for about 5 seconds before loading google.com. With firebird 0.7, i barely ever see the "resolving host" message in the status bar Expected Results: Make it more like 0.7, where you it doesn't hang at "resolving host"
I am seeing this too. I just went from 0.7 to 0.7+ (nightly build) on WinXP -- should be changed to all platforms. Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040114 Firebird/0.7+
OS: MacOS X → All
This seems to be something localized on your systems. I've been using the nightlies every day with no problems. What types of Internet connections do you have?
Actually, when i disconnect from my Netgear MR814V2 wireless router, and plug into regular ethernet, firebird 0.8+ flies, no hanging at all. So, is it a problem with Netgear and Firebird 0.8+? I hope not... Firebird 0.7 handles my Netgear wireless router fine. Any ideas?
Hmm, that shouldn't effect the browser. All of the networking at that level should be handled by the OS. There must be something funky going on with the network. Can you provide a packet trace for me to look at? As root (or with sudo) run /usr/sbin/tcpdump -i [interface] -s 0 -w moz.trc port 53 or port 80 where [interface] is the network device that you are using (usually en0 on MacOS X, but it may be different if you are using a wireless device.) This will capture all of the DNS and HTTP requests that the machine is making, and should help to show where the problem is (feel free to mail that to me if you don't want to attach it to the bug.) Also, please add the output of a |netstat -i| and |netstat -r|, which will show the interface stats and routing configuration.
I have a DSL connection with Earthlink/Covad/SBC. Using an Asante FR3002AL as my DSL router. Is there something I can run/trace on XP that might be able to help out? (I just went back to FB 0.7 and the problem went away -- but I'll be happy to reinstall a nightly to try to repro this problem)
re: comment #5, I don't think that XP has anything builtin to do this. You can download Windump (http://windump.polito.it/). BTW, are you using IPv6 at all, or have it enabled in XP?
After getting tons of data from Mike, I think I've narrowed down the problem. It has to do with IPv6 on MacOS 10.3, and how host names are lookup up. Although Moz is only doing the lookup once, the system is sending out 2 IPv4 DNS requests, then 4 IPv6 DNS requests. That's what's causing all of the lag. I had Mike disable IPv6, and the problem still occurs. I suspect that the OS is allowing us to open an IPv6 socket in _pr_test_ipv6_socket(), even with IPv6 disabled. I'm building a debug release to test this out now.
Ok, it's definitely a problem with MacOS's getaddrinfo() function. The OS is doing multiple lookups for the same thing, causing a huge lag. In every case, it does two lookups for the same name (which does take twice as long, but it's still only ~0.04 seconds in most cases.) Then it does a lookup for the IPv6 address, which brings it up to ~0.09 seconds. That's only slightly noticeable, so it doesn't seem to be a problem with the PPP connection. However, when you are connected to an IPv6 routeable device (the wireless router,) even with IPv6 disabled, it: Waits 0.7 seconds, and sends out a second IPv6 request. Waits another 0.7 seconds, and sends out a third IPv6 request. Waits still another 0.7 seconds, and sends out a fourth IPv6 request. All of this extra waiting and requests total up to over 2 seconds, for each request. It then seems to take a while to return the response to the caller, which adds even more time. So, basically, this function call is still busted on MacOS 10.3 (I say still because there was a related bug 222031 that just disabled this function call on anything < 10.3, but nobody has noticed it on 10.3 because you have to have an IPv6 router attached.) I wrote a tiny program that calls getaddrinfo() for the command line arguments, and had Mike run it and tcpdump in 3 cases: 1) After running |ipv6 -a| to enable IPv6 on all interfaces. 2) After running |ipv6 -x| to disable IPv6 on all interfaces. 3) With the wireless router disconnected. In cases 1) and 2), MacOS is making 4 IPv6 requests, with ~0.7 lag time between them, even with IPv6 disabled. Case 3) works as expected, sort of, but it is still making 2 identical IPv4 DNS requests, which is pretty artarted, but doesn't cause a noticeable lag. I'm looking at the code now to determine the best way to get around this. Easiest would be to just disable getaddrinfo() for 10.3 also, but some people may be starting to use it. The current method of opening a socket(AF_INET6, SOCK_STREAM, 0) doesn't work, because that always works on all 10.x versions of MacOS. I'll see if there is a MacOS way to tell if IPv6 is actually enabled.
All of these traces are from getaddrinfo() system calls.
re: comment #5, this seems to be a MacOS X only bug. You may have similar symptoms, but you should file a different bug if you can't get any help from the newsgroups.
Assignee: blake → darin
Status: UNCONFIRMED → NEW
Component: General → Networking: HTTP
Ever confirmed: true
Product: Firebird → Browser
QA Contact: httpqa
Version: unspecified → Trunk
Component: Networking: HTTP → Networking
Summary: Hangs at Resolving Host → DNS: getaddrinfo() problems in Panther
Blocks: 232382
Flags: blocking1.7a?
Here's a patch to disable IPv6 on all Darwin machines, not just those that are < 10.3. It comments out the version checking code, to make it easier to re-enable it once Apple fixes getaddrinfo().
Attachment #141536 - Flags: review?
*** Bug 232382 has been marked as a duplicate of this bug. ***
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.7alpha
-> NSPR
Component: Networking → NSPR
Product: Browser → NSPR
Target Milestone: mozilla1.7alpha → ---
Version: Trunk → other
Whiteboard: [IPv6]
is this really something we want to disable? wouldn't a preference make more sense since IPv6 might be important to some users, and moreover it might be working okay for them as is. disabling seems like a bad solution. perhaps the patch in bug 68796 (or one like it) makes more sense?
it would be very good to figure this out for 1.7b
Flags: blocking1.7b+
Flags: blocking1.7a?
Flags: blocking1.7a-
(In reply to comment #16) > is this really something we want to disable? wouldn't a preference make more > sense since IPv6 might be important to some users, and moreover it might be > working okay for them as is. disabling seems like a bad solution. > > perhaps the patch in bug 68796 (or one like it) makes more sense? Hmm, I didn't know that a pref could be put there. I'll make a "network.dns.disableIPv6" pref and set it to true by default for MacOS.
Flags: blocking1.7a- → blocking1.7a?
Why are we talking about disabling IPv6 when we haven't yet properly diagnosed the problem?
(In reply to comment #19) > Why are we talking about disabling IPv6 when we haven't yet properly diagnosed > the problem? > We are talking about disabling DNS lookups for the IPv6 address (AAAA record) of hosts when using MacOS X 10.3.x (IPv6 is already completely disabled for all other versions.) Passing PR_AF_INET instead of PR_AF_UNSPEC in nsHostResolver::ThreadFunc() should do the trick. As for not having properly diagnosed the problem, see comment #8. If you can think of a way to narrow it down more than that, please add your suggestions to the bug.
Attachment #141536 - Attachment is obsolete: true
Attachment #141536 - Flags: review?
Comment #8 hasn't eliminated the possibility a bug in the DNS server. My intial look at the traces suggests that the server might be returning an incorrect type in the query section of its response, leading the resolver to correctly ignore the packets. I haven't yet had time to decode the packets by hand. It also hasn't eliminated the possibility that Mozilla is incorrectly calling getaddrinfo().
Decoding by hand confirms that the bug is in the DNS server. The query: 10:51:46.965834 (tos 0x0, ttl 64, length: 61) 192.168.0.100.49974 > 192.168.0.1.domain: [udp sum ok] 50780+ AAAA? www.toshiba.com. (33) 0x0000 4500 003d f894 0000 4011 0066 c0a8 0064 E..=....@..f...d 0x0010 c0a8 0001 c336 0035 0029 a6ab c65c 0100 .....6.5.)...\.. 0x0020 0001 0000 0000 0000 0377 7777 0774 6f73 .........www.tos 0x0030 6869 6261 0363 6f6d 0000 1c00 01 hiba.com..... Note byte 0x003a has the value 1c, which corresponds to the AAAA type. The response: 10:51:46.971466 (tos 0x0, ttl 64, length: 77) 192.168.0.1.domain > 192.168.0.100.49974: [udp sum ok] 50780 q: A? www.toshiba.com. 1/0/0 www.toshiba.com. A edu-gov.toshiba.com (49) 0x0000 4500 004d 10bf 0000 4011 e82b c0a8 0001 E..M....@..+.... 0x0010 c0a8 0064 0035 c336 0039 fbbb c65c 8180 ...d.5.6.9...\.. 0x0020 0001 0001 0000 0000 0377 7777 0774 6f73 .........www.tos 0x0030 6869 6261 0363 6f6d 0000 0100 01c0 0c00 hiba.com........ 0x0040 0100 0100 0000 5000 04d8 17b5 ca ......P...... Note byte 0x003a has the value 0x01, which corresponds to the A type. The response's query section does not copy that in the query, in violation of RFC 1034. RFC 1035 section 7.3 states in part: "The next step is to match the response to a current resolver request. The recommended strategy is to do a preliminary matching using the ID field in the domain header, and then to verify that the question section corresponds to the information currently desired." By ignoring those responses, Panther's resolver is correctly implemententing the recommended algorithm specified in the relevant standard. The reporter would most likely encounter these symptoms using any IPv6 capable operating system. There is nothing here specific to MacOS.
based on jgmyers analysis, this sounds like an exact duplicate of bug 68796. *** This bug has been marked as a duplicate of 68796 ***
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → DUPLICATE
This bug differs from 68796 in that the broken DNS server appears to be the local recursive server, not the remote domain's server.
fair enough, but i think from the point-of-view of mozilla, there's little effective difference.
I agree with John. Reporter, your DNS server is pretty badly broken. Mozilla and the OS are doing nothing wrong here. The solutions I can see are: - Use another DNS server - Disable IPv6 in your OS This will not affect other Panther users unless they use the same (broken) router.
(In reply to comment #26) > I agree with John. Reporter, your DNS server is pretty badly broken. Mozilla and > the OS are doing nothing wrong here. The solutions I can see are: > > - Use another DNS server > - Disable IPv6 in your OS > > This will not affect other Panther users unless they use the same (broken) router. Explain something for the ignorant (me) here. In 232382, I sent an attachment. IPv6 is enabled nowhere on the network. So IPv6 AAAA requests should not be getting generated at all because I am not supposed to be using IPv6 nor do I want to at this time. The DNS server is on the same subnet. That system also knows nothing about IPv6, nor does its DNS server. So why is it correct behavior for: - If IPv6 is turned off on my system, why does the system insist on sending out IPv6 DNS requests - Why is my DNS server "broken" for ignoring requests it does not know how to handle?
Your DNS server is broken because it is returning an IPv4 address (= A resource record) when asked for anb IPv6 address (= AAAA resource record). This is a violation of at least RFC 1034, section 4.3.2 (3a). Regarding IPv6 DNS lookups, you are right, they probably shouldn't be going out. Jerry, > I wrote a tiny program that calls getaddrinfo() for the command line > arguments, and had Mike run it and tcpdump in 3 cases: > 1) After running |ipv6 -a| to enable IPv6 on all interfaces. > 2) After running |ipv6 -x| to disable IPv6 on all interfaces. > 3) With the wireless router disconnected. could you run the same program while passing the AI_ADDRCONFIG flag to getaddrinfo and see if it still does IPv6 lookups? It shouldn't
(In reply to comment #26) > I agree with John. Reporter, your DNS server is pretty badly broken. Mozilla and > the OS are doing nothing wrong here. The solutions I can see are: > > - Use another DNS server Not possible, since the router assigns itself as the DNS server via DHCP. > - Disable IPv6 in your OS We tried that, and getaddrinfo() still looked for the AAAA record. _pr_test_ipv6_socket() always returns successful on MacOS, even if IPv6 is disabled.
(In reply to comment #29) > > - Use another DNS server > Not possible, since the router assigns itself as the DNS server via DHCP. You don't mean to say that Mac OS doesn't allow you to specify a DNS server manually if you are using DHCP? > > - Disable IPv6 in your OS > We tried that, and getaddrinfo() still looked for the AAAA record. > _pr_test_ipv6_socket() always returns successful on MacOS, even if IPv6 is disabled. Can you try passing in AI_ADDRCONFIG and let me know? Something like: struct addrinfo hints, *res; memset(&hints, 0, sizeof(hints)); hints.ai_flags = AI_CANONNAME | AI_ADDRCONFIG; hints.ai_family = AF_UNSPEC; hints.ai_socktype = SOCK_STREAM; getaddrinfo("www.kame.net", NULL, &hints, &res);
(In reply to comment #30) > (In reply to comment #29) > > > - Use another DNS server > > Not possible, since the router assigns itself as the DNS server via DHCP. > > You don't mean to say that Mac OS doesn't allow you to specify a DNS server > manually if you are using DHCP? I'm sure that you probably can, but that's going to make life difficult for the average user. Most of them just want to plug in the wireless router, then surf the web. Firebird 0.7, IE, Safari, Opera, etc. will allow them to do that, only Firefox is causing grief. If we have to make an extra pref, then it should be the people that actually need IPv6 functionality that should go through an extra step, since a majority of people aren't using IPv6, and IPv6 users are going to be more savvy. > > > - Disable IPv6 in your OS > > We tried that, and getaddrinfo() still looked for the AAAA record. > > _pr_test_ipv6_socket() always returns successful on MacOS, even if IPv6 is > disabled. > > Can you try passing in AI_ADDRCONFIG and let me know? Something like: > > struct addrinfo hints, *res; > > memset(&hints, 0, sizeof(hints)); > > hints.ai_flags = AI_CANONNAME | AI_ADDRCONFIG; > hints.ai_family = AF_UNSPEC; > hints.ai_socktype = SOCK_STREAM; > > getaddrinfo("www.kame.net", NULL, &hints, &res); I blasted the older program that I used, so I'll whip up a new one with some switches to allow us to change the flags/family. In the mean time, I've sent James a test program with the above code.
(In reply to comment #31) > (In reply to comment #30) > > (In reply to comment #29) > > > > - Use another DNS server > > > Not possible, since the router assigns itself as the DNS server via DHCP. > > > > You don't mean to say that Mac OS doesn't allow you to specify a DNS server > > manually if you are using DHCP? > > I'm sure that you probably can, but that's going to make life difficult for the > average user. Most of them just want to plug in the wireless router, then surf I actually do this, and it doesn't make a dang bit of difference anyway. So, my DCHP is set to use specific DNS servers, and I have IPV6 disabled, and I still have the issues with sites like allmusic.com.
(In reply to comment #32) > I actually do this, and it doesn't make a dang bit of difference anyway. So, my > DCHP is set to use specific DNS servers, and I have IPV6 disabled, and I still > have the issues with sites like allmusic.com. allmusic.com is bug 68796. It looks like this behaviour is irritating you. If so, then please help fix it. You can help by providing a list of sites which don't play nice with IPv6 lookups and attaching it to bug 68796. When you come across a slow site, note its name, and verify the problem by doing: % time host -t AAAA <site name> % time host -t A <site name> If the AAAA lookup takes an unreasonable time and is much longer than the A lookup, include this site in the list. Thank you.
(In reply to comment #30) > Can you try passing in AI_ADDRCONFIG and let me know? Something like: > > struct addrinfo hints, *res; > > memset(&hints, 0, sizeof(hints)); > > hints.ai_flags = AI_CANONNAME | AI_ADDRCONFIG; > hints.ai_family = AF_UNSPEC; > hints.ai_socktype = SOCK_STREAM; getaddrinfo() returns EAI_BADHINTS.
(In reply to comment #25) > fair enough, but i think from the point-of-view of mozilla, there's little > effective difference. The possible workarounds are different. With known broken remote DNS servers, Mozilla can work around the problem by adding the domains to a blacklist. Such a workaround isn't effective against broken local DNS servers, as all domains would be affected. (In reply to comment #27) > - Why is my DNS server "broken" for ignoring requests it does not know how > to handle? How it is to handle requests for unknown types is well specified in the RFC's. It needs to return a success response with an exact copy of the question section and an empty answer section. The authority and additional sections should be filled in as ususal. You should be reporting this bug to the operator/implementor of your DNS server.
(In reply to comment #35) > (In reply to comment #25) > > fair enough, but i think from the point-of-view of mozilla, there's little > > effective difference. > > The possible workarounds are different. With known broken remote DNS servers, > Mozilla can work around the problem by adding the domains to a blacklist. Such > a workaround isn't effective against broken local DNS servers, as all domains > would be affected. ok, i completely didn't grok that. we should un-dup this bug then. > (In reply to comment #27) > > - Why is my DNS server "broken" for ignoring requests it does not know how > > to handle? > > How it is to handle requests for unknown types is well specified in the RFC's. > It needs to return a success response with an exact copy of the question section > and an empty answer section. The authority and additional sections should be > filled in as ususal. > > You should be reporting this bug to the operator/implementor of your DNS server. so, this bug report is INVALID then? reopening...
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
(In reply to comment #35) > (In reply to comment #27) > > - Why is my DNS server "broken" for ignoring requests it does not know how > > to handle? > > How it is to handle requests for unknown types is well specified in the RFC's. > It needs to return a success response with an exact copy of the question section > and an empty answer section. The authority and additional sections should be > filled in as ususal. > > You should be reporting this bug to the operator/implementor of your DNS server. > First note that *my* DNS server is not the one referenced in the above attachments. My issues are noted in the attachment for BugID 232382. lorenzo@colitti.com has verified my DNS server is *not* broken. It is handling requests properly. The problem are the other sites I was trying to reach which do appear to have broken DNS servers. Which explained why I was only seeing the problem sometimes and not others. The fact that there seems to be no way to disable IPv6 fully on Panther causes you to trip up on the problem. So you notice when you run into one of these broken servers as there's an AAAA request for just about everything.
Just to clarify this: all these bugs are about IPv6 name lookups (= AAAA queries) timing out and causing slow page loads. They are are mostly a mac problem because OS X supports IPv6 by default and IPv6 name lookups can't be turned off. 1. Bug 68796 is about broken DNS servers for certain domains which don't reply properly to IPv6 name lookups. This includes doubleclick.net, allmusic.com, vanguard.com, allmediaguide.com, and more. The solution proposed in bug 68796 is to maintain a blacklist of these domains so mozilla never asks these servers for IPv6 addresses. This problem also affects people who have IPv6 connections and equipment, including people on Windows and Linux who turn IPv6 on. 2. The problem Mike reported in this bug is due to his router, which acts as a DNS server, returning invalid answers to *any* IPv6 query. This does not affect people who have IPv6-capable equipment, and the only way to work around it is to disable IPv6 lookups. This mainly affects Mac users. There are two possible solutions to this: - Disabling IPv6 lookups in mozilla via a pref (bug 68796 comment #105) - Getting the OS to disable IPv6 lookups by passing the AI_ADDRCONFIG flag to getaddrinfo. This enables IPv6 lookups only if IPv6 is in use (which is what Linux and Windows already do). Investigation is under way as to whether this actually works on OS X. Safari does not suffer from this problem because it tries IPv4 lookups first and IPv6 afterwards, which I think is taking the easy way out.
No longer blocks: 232382
(In reply to comment #38) > 2. The problem Mike reported in this bug is due to his router, which acts as a > DNS server, returning invalid answers to *any* IPv6 query. This does not affect > people who have IPv6-capable equipment, and the only way to work around it is to > disable IPv6 lookups. Comment #3 is consistent with this affecting only people with IPv6-capable networks. The wireless router in question may well be an IPv6-capable router or it may well make another IPv6 router visible to the client host. Do we have any evidence this affects Panther users who aren't on IPv6-capable networks?
Hardware: Macintosh → All
Summary: DNS: getaddrinfo() problems in Panther → DNS delays for people with IPv6-buggy local DNS servers
(In reply to comment #39) > Comment #3 is consistent with this affecting only people with IPv6-capable > networks. The wireless router in question may well be an IPv6-capable router or > it may well make another IPv6 router visible to the client host. Do we have any > evidence this affects Panther users who aren't on IPv6-capable networks? Actually, I think the reverse is true. A router which responds to AAAA queries with A answers is not likely to support IPv6. :) When Mike plugs in to the wireless router he's using its broken DNS server, when he plugs into Ethernet he's using a working DNS server. So I think that this does affect Panther users who only have IPv4 networks. I have managed to get an account on a mac with Panther and I'll see if I can get AI_ADDRCONFIG to work. That should solve the problem for people who only have IPv4. For people who use IPv6, there's always bug 68796, but I think we have a hold on that as well now.
(In reply to comment #40) > (In reply to comment #39) > > Comment #3 is consistent with this affecting only people with IPv6-capable > > networks. The wireless router in question may well be an IPv6-capable router or > > it may well make another IPv6 router visible to the client host. Do we have any > > evidence this affects Panther users who aren't on IPv6-capable networks? > > Actually, I think the reverse is true. A router which responds to AAAA queries > with A answers is not likely to support IPv6. :) > > When Mike plugs in to the wireless router he's using its broken DNS server, when > he plugs into Ethernet he's using a working DNS server. So I think that this > does affect Panther users who only have IPv4 networks. After your and John Myers' analyzes, and looking back at the traces, I'd have to agree with this. DNS isn't my strong suit, so some of the conclusions that I jumped to were incorrect. Glad to see that you've got access to a Panther machine, hopefully you don't come up with the same results as in comment #34.
Actually, it looks like AI_ADDRCONFIG doesn't work at all on OS 10.3. Even if I turn IPv6 off in every possible way, the OS still tries IPv6 lookups. This is an OS bug. You don't see it in Safari because it tries IPv4 first, but it's there nonetheless. I have posted to the apple IPv6 mailing list, http://www.lists.apple.com/mailman/listinfo/ipv6 to see if someone can shed light on the issue, but if nobody can, the only solution for people with these broken routers is to disable IPv6 in Mozilla. If I get no response from the mailing list I'll add this functionality to the patch in bug 68796.
Flags: blocking1.7a? → blocking1.7a-
Attached file test program
This program calls getaddrinfo in a similar manner to the Mozilla DNS resolver. I tested it on a Panther machine. I have tried various things (except rebooting), but I have not been able to find a way to stop it from making an IPv6 name lookup. I consider this a bug in Mac OS. If you want to try, compile with "gcc test-gai.c -o test-gai" and run with "./test-gai www.kame.net". See if you can fiddle with system settings until this outputs ONLY an IPv4 address without making you wait more than one or two seconds. If you do, those settings should prevent this bug as well. If there's no way to do this, the latest patch in bug 68796 (attachment 142076 [details] [diff] [review]) provides a way to turn off IPv6 via pref for people who have broken DNS servers and suffer from this bug.
Attachment #142115 - Attachment mime type: application/octet-stream → text/plain
Severity: blocker → normal
Component: NSPR → Networking
OS: All → MacOS X
Product: NSPR → Browser
Version: other → Trunk
QA Contact: httpqa → benc
Ok, AI_ADDRCONFIG doesn't seem to work at all, and nobody on the Apple IPv6 mailing list provided any info on this. Duping against bug 68796, which has a patch that can fix this. *** This bug has been marked as a duplicate of 68796 ***
Status: REOPENED → RESOLVED
Closed: 21 years ago21 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: