Closed Bug 68796 Opened 24 years ago Closed 21 years ago

IPv6 : Some IPv4 addresses won't resolve w/IPv6 OS

Categories

(Core :: Networking, defect)

defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla1.7beta

People

(Reporter: john, Assigned: lorenzo)

References

()

Details

Attachments

(4 files, 11 obsolete files)

1.23 KB, text/plain
Details
14.53 KB, patch
lorenzo
: review+
Details | Diff | Splinter Review
815 bytes, patch
darin.moz
: review+
Details | Diff | Splinter Review
679 bytes, patch
darin.moz
: review+
darin.moz
: superreview+
Details | Diff | Splinter Review
When IPv6 is enabled in the kernel (and in the DNS machine, and named is compiled against the IPv6-enabled kernel), some hostnames won't resolve for Mozilla, though all other programs resolve them fine. news.bbc.co.uk is one of them as is, delightfully, ad.doubleclick.net. But the vast majority of other hostnames resolve just fine. What's interesting is that when I type "host news.bbc.co.uk" into a terminal, thus causing named to look it up correctly, and then immediately ask Mozilla to browse there, it finds it without trouble. But a minute or two later, and Mozilla can't find it again. Is this a named bug, glibc bug, or a Mozilla issue? A strace of named shows that, in the failure case, named *is* returning the correct IP address of the host, but the browser/resolver library disregards this, and asks for the hostname with ".localdomain" tacked on the end, as per my local setup. But after a successful lookup from the "host" command, the correct IP address is received and the browser goes to the right page. Anyone else seeing this?
Hmmm this sounds like a glibc probem. Are you having this problem with other applications as well or Just Mozilla? What build are you using?
Latest I've tried it with is 2001021321 with glibc-2.2.1 but I'll get another nightly tonight and test that. I'm not having this problem with any other applications (yet).
Still not working with nightly 2001021809. It is only a *very* few hostnames that won't resolve. So far, I have only found non-resolvable names within the BBC domain (bbc.co.uk and bbc.net.uk) and doubleclick.net.
Very odd...what happens when you try and ping them etc? maybe can we capture the dns lookup that is going on (with snort or something). Its odd it only occurs on those sites..
Mozilla differs from other applications by (on Linux) calling gethostbyname2(name, AF_INET6) instead of gethostbyname(). If that call returns NULL, it will then try gethostbyname2(name, AF_INET). This would seem to be a problem with the DNS subsystem. Perhaps the name servers for these destinations have IPv6 addresses which are not reachable from your network?
I have a guess: NSCD daemon is running and it is caching things. If it gets a "soft" lookup failure, it appears to present those to gethostbyname*() function as HARD failures, which causes all kinds of merriment. I am pretty sure that (Linux) glibc 2.2(.1) nscd is borken, kill it, and things should work far better. To some degree this may also be a duplicate of bug #66872, but perhaps not..
Relieving tever of IPv6 related QA.
QA Contact: tever → benc
setting bug status to New
setting bug status to New
Status: UNCONFIRMED → NEW
Ever confirmed: true
Target Milestone: --- → mozilla1.0
It looks like a bug of mozilla. I am using mozilla0.9.5 on RH Linux7.2, which uses ipv6 net utils. When ipv6 is enabled in the kernel, some addresses (in my case http://www.yomiuri.co.jp;Japanese site) have troubles. At least ncftp has no problems with ipv6 addresses and ncftp has a trouble when login to ftp.iij.ad.jp if ipv6 is disabled. So glibc doesn't matter(I use glibc-2.2.4-19). DNS service looks working fine for ipv6 addresses.
check to see what addresses are returned. I'm not so IPv6literate, but I think there is some nslookup version that shows the new address record type.
Bugs targeted at mozilla1.0 without the mozilla1.0 keyword moved to mozilla1.0.1 (you can query for this string to delete spam or retrieve the list of bugs I've moved)
Target Milestone: mozilla1.0 → mozilla1.0.1
*** Bug 114276 has been marked as a duplicate of this bug. ***
Sorry for late response. I checked ip addresses returned by dig. For ex, "dig www.6bone.net aaaa" returns: --snip-- ..... ;; ANSWER SECTION: www.6bone.net. 43047 IN CNAME 6bone.net. 6bone.net. 43047 IN AAAA 3ffe:b00:c18:1::10 ...... --snip-- but "dig www.yomiuri.co.jp aaaa" doesn't return an IPv6 ip address. So I guess glibc is working fine. Actually some IPv6 pages work fine.
Please check to see if any of the *name servers* used in resolving the DNS request have IPv6 addresses. Please check to see if these addresses are reachable.
*** Bug 134215 has been marked as a duplicate of this bug. ***
I have the same problem. Using FreeBSD 4.5 - Mozilla 0.9.9. --------------------------------- bash-2.05a$ nslookup news.bbc.co.uk Server: cedar.ukc.ac.uk Address: 129.12.21.8 Non-authoritative answer: Name: newswww.bbc.net.uk Address: 212.58.226.40 Aliases: news.bbc.co.uk ------------------------- Can reach the site fine, and the first few (about 5) clicks to other pages on the site, after that i get the message "news.bbc.co.uk could not be found be check the address and try again". The GENERIC kernel for FreeBSD includes IP6 support.
Summary: Some addresses won't resolve with IPv6 in kernel → IPv6 : Some addresses won't resolve
Summary: IPv6 : Some addresses won't resolve → IPv6 : Some IPv4 addresses won't resolve w/IPv6 OS
For news.bbc.co.uk, at least, this was a problem with their DNS server, not with mozilla. As described on NANOG: http://www.merit.edu/mail.archives/nanog/2002-04/msg00562.html When IPv6 is enabled on the client machine, mozilla does a AAAA lookup first, and if there is none, does a lookup for the A record. Correct response for a name server if there is no AAAA record (but the domain exists) is to return NOERROR, with an empty reply. The BBC server returned NXDOMAIN (which was incorrect), and mozilla exhibited correct behaviour by assuming that the domain did not exist. (I don't know if this was mozilla's behaviour, or that of some library that was doing the lookups; either way the client was not at fault). Bug has since been fixed by the BBC and the site is now reachable just fine in v6 land. On the users@ipv6.org list, it was reported that the BBC use custom DNS software based on lbnamed - http://www.stanford.edu/~riepel/lbnamed/ - and they reckon that it's that package that has the bug.
I think the same behavior described by the poster in Comment 18 is happening with ad.doubleclick.net, except it is returning SERVFAIL instead of NXDOMAIN on queries. Example: pkw@voldemort:~/ > dig ad.doubleclick.net. aaaa ; <<>> DiG 9.2.0 <<>> ad.doubleclick.net. aaaa ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 13327 ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;ad.doubleclick.net. IN AAAA ;; ANSWER SECTION: ad.doubleclick.net. 46 IN CNAME gd25.doubleclick.net. ;; Query time: 6053 msec ;; SERVER: 9.53.159.2#53(9.53.159.2) ;; WHEN: Thu May 9 17:23:25 2002 ;; MSG SIZE rcvd: 55
I found that PR_GetIPNodeByName(PR_AF_INET6, PR_AI_ADDRCONFIG) is not implemented correctly on platforms that use gethostbyname2. Currently, AIX, FreeBSD, Linux, and NetBSD are using gethostbyname2. I filed bug 144886 about this PR_GetIPNodeByName bug and attached a patch. Please test that patch (attachment 83810 [details] [diff] [review]). Note that that patch only avoids the problem on hosts that only have IPv4 source addresses configured (which is the common case). If your host has IPv6 source addresses configured, I suspect that that patch won't help.
This test program calls gethostbyname2(AF_INET6). You can run it with various host names and see it succeeds (for example, www.6bone.net) or fails (most hosts don't have IPv6 addresses). In particular, run gethost2 ad.doubleclick.net This blocks for a long time and then fails. On Red Hat Linux 7.1 (spd13.mcom.com), it fails with error code 2 (TRY_AGAIN). On AIX 4.3.3 (spd12.mcom.com), it fails with error code 1 (HOST_NOT_FOUND).
Target Milestone: mozilla1.0.1 → Future
For ad.doubleclick.net, the load-balancing nameserver responds to MX or AAAA queries by returning a response with an empty question section. This violates RFC 1034 and causes the intermediate nameserver to be unable to correlate the answer with the query.
*** Bug 128898 has been marked as a duplicate of this bug. ***
*** Bug 148362 has been marked as a duplicate of this bug. ***
You can work around these DNS lookup problems on IPv4-only hosts by implementing the _pr_QueryNetIfs function in mozilla/nsprpub/pr/src/misc/prnetdb.c. Right now that function is only implemented for AIX. For FreeBSD (and possibly NetBSD and OpenBSD too) that function can be easily implemented with the getifaddrs(3) function.
moving neeti's futured bugs for triaging.
Assignee: neeti → new-network-bugs
Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.5b) Gecko/20030914 Mac OS X 10.2.6 (IPv6 ready) I've struggled with this bug since bug 205726 (DNS rewrite) was checked in. Some websites like www.gva.be have become much slower, probably because getaddrinfo is now used instead of gethostbyname. I've noticed that most problematic websites were using *.doubleclick.net , which is ofcourse is a high-profile website, since it provides a large portion of all ads in the world. To see the difference, compare the output of the following 2 statements : host -d -t a ad.doubleclick.net (or nslookup -type=a ad.doubleclick.net) host -d -t aaaa ad.doubleclick.net (or nslookup -type=aaaa ad.doubleclick.net) Workaround is to use a proxy-server of my ISP, which isn't using IPv6 yet. Note that image-blocking is helping too, if you have that particular server in your blockinglist.
Wouldn't it be possible to query for the AAAA and the A records in parallel (not serially), and use the first one that responds ? It affects any website that uses *.doubleclick.net, for example : http://www.theregister.co.uk/
Assigned the bug to Darin.
Assignee: new-network-bugs → darin
About the suggestion to look up A and AAAA records in parallel - my colleagues at the ISP where I work have some experience with resolver libraries out there that in the past have exhibited this behaviour. It's not fatal, and will work, but it's quite difficult to work with. In particular it introduces some non-determinism to the lookup process. Bugs become a lot harder to diagnose and fix as reproducing them is a hassle, especially when it's not clear if a particular problem might be an IPv6-only or IPv4-only. This would build resolver functionality into the client. Again not fatal in itself, but it's redundant with all that that entails - system administrator can no longer make global changes to resolving behaviour as applications do their own thing. Also it is an ugly workaround where it would perhaps be preferable to continue to try to have the underlying problem fixed. A colleague suggests that if a workaround is necessary, it might be better to work with the library: call getaddrinfo with ai_family = AF_INET6 , but use alarm() or similar to enforce a short timeout, and then do the AF_INET lookup.
*** Bug 219512 has been marked as a duplicate of this bug. ***
Wan-Teh, comment #25 is no longer relevant to Mozilla, correct? From my reading of current NSPR code, if a native getaddrinfo() function can't be found, PR_GetAddrInfoByName() falls back to an IPv4-only DNS lookup. This problem is limited to machines with at least one configured IPv6 interface, is it not?
Comment #25 is no longer relevant to Mozilla. Comment #25 is concerned with PR_GetIPNodeByName, and Mozilla is no longer using that function.
Confirmed this problem on Mac OS. BTW, an old Japanese document shows a sample code without using AF_INET like as follows. http://playground.iijlab.net/iij.news/4.html int s; struct addrinfo hints, *res, *res0; memset(&hints, 0, sizeof(hints)); hints.ai_family = PF_UNSPEC; hints.ai_socktype = SOCK_STREAM; getaddrinfo("www.kame.net", "http", &hints, &res0); for (res = res0; res; res = res->ai_next) { s = socket(res->ai_family, res->ai_socktype, res->ai_protocol); if (connect(s, res->ai_addr, res->ai_addrlen) < 0) { close(s); continue; } break; } freeaddrinfo(res0);
This affects Camino on Mac OS X 10.2. See bug 219512, bug 223221. Severity to major. Resetting OS (it was Linux), target milestone, and hardware. Another site this makes effectively impossible to browse in Camino: scifi.com (that should motivate people ;-) You don't need to use the attachment to test this... on Mac OS X 10.2.8 I get the following output: [simons-tibook2:~] woodside% time host ad.doubleclick.net ad.doubleclick.net is a nickname for gd18.doubleclick.net gd18.doubleclick.net has address 206.65.183.95 0.000u 0.000s 0:40.14 0.0% 0+0k 2+0io 0pf+0w Check out the wall clock time: 40 seconds!
Severity: normal → major
OS: Linux → All
Hardware: PC → All
Target Milestone: Future → ---
I don't understand. This is a bug in doubleclick's DNS servers. What do you think mozilla should do, except turn off IPv6 support?
End users don't really care whose bug it is. All they know is Mozilla family of browsers are now the only ones with this problem. What's more, the previous versions of Mozilla did not have it. This all comes down to who the software is for. The developer, or the users. I would choose the later.
Stephen, Simon: before adding workarounds we should make sure that doubleclick has been contacted. Note that this also occurs on Windows (except the timeout is 15 seconds) if IPv6 is turned on, including IE (since it's not bug in the client). So can you try and contact doubleclick about this? Another thing: are the people who see the bug using IPv6 (even having an IPv6 address counts)?
Simon, another thing: does OS X support the AI_ADDRCONFIG flag? "man getaddrinfo" should tell you, I think.
I've written twice to doubleclick.net in september, but I didn't get any response. I don't have the developer tools installed here, but Apple's manpage at http://developer.apple.com/documentation/Darwin/Reference/ManPages/html/getaddrinfo.3.html doesn't mention AI_ADDRCONFIG at all. But the flag is mentioned on another page, in http://developer.apple.com/documentation/Darwin/Reference/ManPages/html/freehostent.3.html .
I never turned IPv6 on, it must be on by default in OS X 10.2.8 (test: can you see the dancing kame ? http://www.kame.net/kame-mosaic.html )
IPv6 is turned on by default since 10.2 I think (I'm running Mac OS X 10.2.8). Panther provides some GUI-controls for IPv6, but Jaguar doesn't have them yet. I can see IPv6 mentioned in the reports of netstat and ifconfig, and I can query for AAA-records (ofcourse), but my ADSL-provider doesn't supports IPv6 yet. So I can visit www.kame.net and see the dancing turtle, but only in the low-res version (IPv4).
Regarding comment #40, that's funny. The man pages say that getipnodebyname supports the AI_ADDRCONFIG flag (and will not query for AAAA records unless the system has at least one IPv6 address), but getaddrinfo does not. But on the other hand getaddrinfo supports scoped addresses, but getipnodebyname does not. This is strange, because getipnodebyname is deprecated in favour of getaddrinfo (RFC 3493). Someone with OS X and the include files, can you double check that AI_ADDRCONFIG is not defined? Look for it near AI_NUMERICHOST and friends, it should be in netdb.h I think. We should check on both 10.2 and Panther. Since it's not clear to me what benefit scoped addresses have for a browser, a partial workaround for this could be to use getipnodebyname on OS X, i.e. implement PR_GetAddrInfoByName using PR_GetIPNodeByName. This would probably also fix bug 222031. wtc? darin? However, I can't do this, I don't have a mac (too expensive :)).
Another thing: does Panther suffer from this problem too? How long does % host ad.doubleclick.net take on 10.3?
this problem occurs also on FreeBSD 5.1-RELEASE (IPv6 enabled by default). it takes about 80 seconds to http://ad.doubleclick.net/ with Mozilla CVS build. % time host ad.doubleclick.net ad.doubleclick.net is a nickname for gd13.doubleclick.net gd13.doubleclick.net has address 216.73.87.42 0.000u 0.006s 0:20.28 0.0% 0+0k 0+0io 0pf+0w
That's not a huge surprise since OS X and FreeBSD share the same networking stack ;-)
This also affect washingtonpost.com Lorenzo: AFAIK this is not a problem on 10.3. you wrote: > Since it's not clear to me what benefit scoped addresses have for a browser, a > partial workaround for this could be to use getipnodebyname on OS X, i.e. > implement PR_GetAddrInfoByName using PR_GetIPNodeByName. This would probably > also fix bug 222031. wtc? darin? Where in code would this be done? Can you post a patch that's a best guess and we can test it out and go from there?
No, it's not a surprise, since it doesn't depend on the client at all, but only on Doubleclick's DNS servers. :-)
On MacOS 10.3.1, with both Mozilla 1.5 and 1.6a, ad.doubleclick.net resolvs quickly. I have verified that AI_ADDRCONFIG is defined in netdb.h on MacOS 10.3.1 ifconfig reports the following. It would be helpful if people experiencing this problem on MacOS would run Terminal and report what ifconfig reports for them. lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16290 inet6 ::1 prefixlen 128 inet6 fe80::1 prefixlen 64 scopeid 0x1 inet 127.0.0.1 netmask 0xff000000 gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280 stf0: flags=0<> mtu 1280 en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 inet 64.81.73.94 netmask 0xffffff00 broadcast 64.81.73.255 ether 00:03:93:43:9a:b6 media: autoselect (10baseT/UTP <half-duplex>) status: active supported media: none autoselect 10baseT/UTP <half-duplex> 10baseT/UTP <full-duplex> 10baseT/UTP <full-duplex,hw-loopback> 100baseTX <half-duplex> 100baseTX <full-duplex> 100baseTX <full-duplex,hw-loopback> 1000baseTX <full-duplex> 1000baseTX <full-duplex,hw-loopback> 1000baseTX <full-duplex,flow-control> 1000baseTX <full-duplex,flow-control,hw-loopback> fw0: flags=8822<BROADCAST,SMART,SIMPLEX,MULTICAST> mtu 2030 tunnel inet --> lladdr 00:03:93:ff:fe:43:9a:b6 media: autoselect <full-duplex> status: inactive supported media: autoselect <full-duplex>
On Mac OS X 10.2.8 (Jaguar), while connected to a PPPoE connection (IPv4 only) over the en0 interface. The IPv6 address was auto-selected, not given by PPPoE or DHCP. That's why I have an IPv6 address, but it leads to nowhere (only local traffic, not routed over PPPoE). ====== ifconfig ====== lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 inet 127.0.0.1 netmask 0xff000000 gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280 stf0: flags=0<> mtu 1280 en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 tunnel inet --> inet6 fe80::20a:27ff:fed6:491a%en0 prefixlen 64 scopeid 0x4 ether 00:0a:27:d6:49:1a media: autoselect (10baseT/UTP <half-duplex>) status: active supported media: none autoselect 10baseT/UTP <half-duplex> 10baseT/UTP <full-duplex> 10baseT/UTP <full-duplex,hw-loopback> 100baseTX <half-duplex> 100baseTX <full-duplex> 100baseTX <full-duplex,hw-loopback> ppp0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1492 inet 213.177.159.135 --> 213.177.159.129 netmask 0xffffff00 ====== netstat -rn ====== Routing tables Internet: Destination Gateway Flags Refs Use Netif Expire default 213.177.159.129 UGSc 5 1 ppp0 127.0.0.1 127.0.0.1 UH 7 7598 lo0 213.177.159.129 213.177.159.135 UH 6 0 ppp0 Internet6: Destination Gateway Flags Netif Expire UH lo0 fe80::%lo0/64 Uc lo0 link#1 UHL lo0 fe80::%en0/64 link#4 UC en0 0:a:27:d6:49:1a UHL lo0 ff01::/32 U lo0 ff02::%lo0/32 UC lo0 ff02::%en0/32 link#4 UC en0 I'm planning to upgrade to Panther soon, but it's not urgent, so I can wait a few more months to help testing.
Flags: blocking1.6b?
Searching for AI_ADDRCONFIG in google, I came across a thread on the IPv6 working group where people were suggesting that link-local addresses shouldn't count for AI_ADDRCONFIG. This makes a lot of sense to me. The other solution (which Panther seems to have taken) is to not autoconfigure link-local addresses. So unless we get information to the contrary, there is a bug against MacOS 10.2 which has been fixed in MacOS 10.3. Do we have any instances of this problem on platforms where NSPR and not the OS is doing the interface-configured check? We could file a bug against NSPR to ignore link-local addresses in the addrconfig checks it implements itself.
*** Bug 226240 has been marked as a duplicate of this bug. ***
wtc, what do you think about the suggestion in comment #30? It looks like the only way to support IPv6 on Mac and FreeBSD which have really long resolver timeouts.
Note that bug 222031 has now disabled IPv6 support on versions < 10.3. When I'm home tonight, I'll try to verify if this bug is fixed.
Today, it seems fixed for me : Build 2003112705 on Mac OS X 10.2.8 I tried build 2003112603 an hour ago, and it still had the error. I checked about a dozen links that I found in this bug and the bug that were duped against this one. Now I can upgrade to 10.3 :-)
Agreed in my build from cvs last night this seems to be fixed. :-)
Flags: blocking1.6b?
Was fixed a while ago, by the checkin of bug 222031. I have been surfing for a month without the problems that I encountered before. checked on Mac OS 10.2.8 with : Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7a) Gecko/20031229
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
This is not a Mac bug, and it has not been fixed. You're just not seeing it because IPv6 has been turned off in 10.2. Reopening.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → ASSIGNED
I am suffering from this bug *a lot* running OS X 10.3.2 and Camino 2004011103. Especially ad.coudleclick.net. My question: Is there a workaround? Can I disable IPv6 in Panther, and if so how?
I disabled ipv6 this morning and hat - so far - no more problems. One way to disable it is in the Network Preference Pane. I did that few days ago but somehow it enabled itself again, so I had to disable it once more. Today I edited /etc/hostconfig and replace YES with NO in IPV6=. I guess a reboot is required to make that change active. I hope thats more "permanent". As said, I changed it today and didn't that much testing so far - no idea, whether this resolves it.
A workaround for those of you who want to keep IPv6: echo ::1 ad.doubleclick.net >> /etc/hosts for i in uk fr de it se dk ch; do echo ::1 $i.doubleclick.net >> /etc/hosts; done (add your own if you like) This will also block their ads and save you bandwidth. We can see this as a form of pressure on them to fix the bug in their DNS server. :)
I went with Lorenzo's suggestion, because I like the idea of giving them incentives to improve ;-) and it works fine. But why ::1? That's localhost in IPv6 lingo no? I used ::0 instead, which I assume corresponds to 0.0.0.0. Anyway, since this seems to be related to misconfigured DNS servers I would like to spread the word. Is there some kind of explanation/evangelism one could send to doubleclick clients, e.g. wired.com, telling them why their revenues is drying up because of this?
The explanation is that doubleclick's DNS servers are incorrectly failing to respond to DNS queries for AAAA records. Per the DNS standard, the servers are required to respond with a zero answer section to queries for extant domains which have no records of the type being queried. This nonconformance on the part of doubleclick causes users of IPv6-capable browsers to have to wait for the DNS query for IPv6 addresses to time out before failing over to the query for IPv4 addresses. Were doubleclick to conform to the DNS standard, this failover would be within milliseconds.
Sure, you and I know that. And presumably doubleclick too, but they don't do anything about it (three years have passed since this bug was reported...) My thought was that perhaps it would be smart to tell doubleclick's *clients* to put some pressure since they are losing money when I'm forced to block ads.
Technically, doubleclick use broken DNS server software that doesn't answer AAAA queries, or indeed anything other than A or MX queries. It should reply immediately with a packet saying that there are no records of the requested type. Sometimes it answers with a SERVFAIL after a minute or two, most of the time it doesn't answer at all.
As I understand it, the doubleclick DNS servers don't return SERVFAIL. It is intermediate, recursive DNS servers which return SERVFAIL when they time out waiting for the doubleclick DNS servers.
Actually, I've seen it from the doubleclick servers as well (a minute or more after the original query). But maybe that is due to load balancing. Most of the time they don't reply at all.
Attached patch Experimental workaround (obsolete) — Splinter Review
I'd appreciate if someone experiencing the problem could test this workaround patch.
Hmm, sounds like a good idea! It could catch most of the worst offenders at very low cost. This approach has already been used for pipelining, see: http://lxr.mozilla.org/seamonkey/source/netwerk/protocol/http/src/nsHttpConnection.cpp#235 Maybe you could rewrite the patch so the code is something like that? Darin, what do you think of an explicit workaround like this?
Aside from needing to use PL_strcasecmp() instead of strcasecmp(), how would you want me to rewrite the patch? The matching needs to be aware of domain boundaries, so a strstr won't cut it. The remaining complexity is trying to optimize strlen() calls, one could reasonably argue that is an undesired microoptimization.
> sizeof(v6brokendomain)/sizeof(v6brokendomain[0]); there's NS_ARRAY_LENGTH for this purpose
Well, I'd kick the len element out of the structure and make brokendomains a string array. Optimizing strlen does not really make much sense IMO since you're talking about DNS queries, which are probably three orders of magnitude slower. And maybe instead of using an extra if(i < sizeof(v6brokendomain)/ ...) after the comparisons I would change the AF directly in the loop, like: int i, len, af = PR_AF_UNSPEC; for (i in domains) { if(domain matches) { af = AF_INET; break; } } ai = PR_GetAddrInfoByName(rec->host, af, PR_AI_ADDRCONFIG); #if defined(RES_RETRY_ON_FAILURE) if (!ai && rs.Reset()) ai = PR_GetAddrInfoByName(rec->host, af, PR_AI_ADDRCONFIG); #endif But of course these are nits (just make the code a little shorter and a little easier to read).
Comment on attachment 139590 [details] [diff] [review] Experimental workaround The NSPR change in this patch is fine by me. (We could also allow af == PR_AF_INET6.)
Attached patch Experimental workaround v2 (obsolete) — Splinter Review
Updated per review comments. I would appreciate it if someone who can reproduce the problem could give this a whirl. I will submit the NSPR portion of this patch as a separate bug.
Attachment #139590 - Attachment is obsolete: true
Depends on: 231786
I'm new in this community, so feel free to ignore my (ignorant?) input, but I find the idea of hard coding domain names almost repulsive. I think a better way would be to either a) let the user edit the list of domains b) maintain a dynamic list and when a timeout is detected add the offender to the list. What do you think? But, I'll test it for you anyway.
The v2 patch works for me (except for strange slowness/hangs on http://www.macosrumors.com on ad server resolution -- another candidate?). I am very satisfied with this workaround. I can leave Java enabled! OS: Tru64 unix V5.1B
Eyvind: you are right, hardcoding is not very elegant. But what can we do? This is a bug in doubleclick's DNS server software, and we can't do anything about that (people have tried to write to them, and nothing has happened). There will likely be similar bugs that affect other web sites, but I don't think it is a common enough feature that it would make sense to add UI for it. Maybe a hidden pref, which users can modify using about:config. That would make name resolution depend on preferences, but if this is not a problem, then maybe a hidden pref is the way to go. Other things that would work: 1. Disable IPv6. I personally am against this. 2. Try IPv4 first, then IPv4. Safari does this. I am also against this. If we don't start using it, IPv6 will never take off. :) 2. Start v6 and v4 DNS lookups in parallel and use the first one that comes back. Problem: introduces non-deterministic behaviour (never a good thing). 3. Set a timeout on IPv6 lookups. If the query takes too long, repeat it using IPv4. Problems: Difficult to find a good timeout (balance between the possibility of slow DNS servers and the of goal optimizing page loads); DNS lookups may take longer; (risk of non-deterministic behaviour). Possible improvement: do both queries at the same time and wait for v6 to complete (but wouldn't gain much).
Yes, a UI is probably overkill and an about:config thingy has the drawback that most users won't understand how to fiddle with it. How about my suggestion b)? Wouldn't that be a reasonable solution. Something like this: At installation, start with an empty list of ipv6-incompatible domains (the list is of course saved to disk between sessions) Before doing DNS lookup, check the list if domain name is in the list do ipv4 lookup else do ipv6 lookup if timeout add domain name to list do ipv4 lookup This could be combined with a background check once in a while to see if domains in the list now works with ipv6 This way you get a simple, flexible, automatic and (almost) deterministic scheme :-)
Attached patch Proposed workaround (obsolete) — Splinter Review
Attachment #139637 - Attachment is obsolete: true
Attachment #139684 - Flags: review?(darin)
I was thinking of writing some code to make this a hidden pref. Proposal: - A string pref which would hold a comma-separated (since comma is not a legal character in a DNS name) list of "broken for v6" domains. - The pref could be in all.js and be contain ".doubleclick.net" by default but people could add/change domains using about:config - The prefs would be read in nsDnsService::Init() with the others and passed to nsHostResolver::Create() I'm willing to code this. Suggestions? Darin, what do you say? Eyvind: I would not like a file because would it would mean adding another file to the distribution and reinventing the wheel to open it, read from it, write it at shutdown, etc. I think a pref is ok; after all, this is pretty advanced stuff.
Attached patch proof of concept patch (obsolete) — Splinter Review
Building on John's idea, here is a patch that uses a pref to store the domains for which all DNS lookups must be for IPv4 addresses. At the moment it only works for one hostname and leaks memory, but the pieces are there. It might also need a restart for the changes to take effect. Comments?
Attachment #139684 - Attachment is obsolete: true
Attachment #139684 - Flags: review?(darin)
Lorenzo, it looks very good. I'm happy it. Regarding your earlier comment. I'm not entirely sure what you meant with files and re-inventing the wheel. You could write to a preference, could you not? (it might be a bad idea, though?) I also think it's very important to have a good defalt preferences. Does anybody have a list? Otherwise even with this patch you will never be able to close this bug... I have found two domains other than doubleclick: (a.) tribalfusion.com (adserver1-images.) backbeatmedia.com
Attached patch patch v1 (obsolete) — Splinter Review
This one fixes the leaks and cleans up the code. It works for me. The name of the pref is network.dns.ipv4OnlyDomains and it is a comma-separated list of domains (actually substrings at the end of the host; it's not a true domain match) for which DNS lookups are for IPv4 addresses only. The pref can be set using about:config, but its default value is already ".doubleclick.net" so you shouldn't need to change it. :)
Attachment #139765 - Attachment is obsolete: true
Assignee: darin → lorenzo
Status: ASSIGNED → NEW
Taking
Status: NEW → ASSIGNED
Attached patch patch v1 with proper directories (obsolete) — Splinter Review
Same patch as before, except it has directories inside it and should actually apply.
Attachment #139857 - Attachment is obsolete: true
Attached patch patch that actually works (obsolete) — Splinter Review
Oops, previous patch wouldn't work. This one should.
Attachment #139861 - Attachment is obsolete: true
Comment on attachment 139867 [details] [diff] [review] patch that actually works I have tested this for a while, and it seems to work. Darin, can you review?
Attachment #139867 - Flags: review?(darin)
i think you need to protect against a race between nsDNSService::Shutdown and nsDNSService::GetAFForLookup. it looks like mIPv4OnlyDomains could be freed while another thread is inside GetAFForLookup. perhaps you need to acquire mLock inside GetAFForLookup before accessing mIPv4OnlyDomains.
Comment on attachment 139867 [details] [diff] [review] patch that actually works Hmm, I hadn't thought of that. Removing review request until I can look into it.
Attachment #139867 - Flags: review?(darin)
*** Bug 53967 has been marked as a duplicate of this bug. ***
I'm not the biggest fan of this proposed solution, but I'm not coming up with a lot of great ideas myself. Do we have an evangelism bug to doubleclick? And, isn't this a problem w/ that is being discussed in the IPv6 community? What about other browsers, how do they handle it? Perhaps it is also worth thinking about bug 96432, because I think many Mac OS X 10.3 users probably do not care about IPv6 right now. As for the list of sites, www.vanguard.com, and etrade.com were mentioned in some of the dupes.
This is definitely being discussed in the IPv6 community. A Google search found http://www.jinmei.org/draft-ietf-dnsop-misbehavior-against-aaaa-00.txt
> Do we have an evangelism bug to doubleclick? Comments in this bug and elsewhere seem to indicate that it is very hard to get in contact with doubleclick on this issue. Of course, if somebody else wants to try, go ahead... > What about other browsers, how do they handle it? Safari tries IPv4 first and IPv6 second. On Windows it's not a very big problem because the tiemouts are shorter. > I think many Mac OS X 10.3 users probably do not care about IPv6 right now. I think this is not the way forward. v6 actually does exist, at least in Europe and Asia if not in the US. Why cripple yourself by disabling v6 if you can work around the problem?
I have a contact at a big customer of DoubleClick which I haven't yet followed up with. If there's a procedure to get a communication as representing mozilla.org instead of as an individual, I'd appreciate someone directing me to it.
Attached patch patch with locking (obsolete) — Splinter Review
This patch turns GetAFForLookup into a critical section. This is sub-optimal, but it shouldn't have any real effect on performance since all GetAFForLookup does is scan through the pref string.
Attachment #139867 - Attachment is obsolete: true
Comment on attachment 140625 [details] [diff] [review] patch with locking Requesting review. Am I using the locks properly? Is more finely grained locking needed?
Attachment #140625 - Flags: review?(darin)
i think this patch should really share the code that lives in nsProtocolProxyService for implementing the "no_proxy_for" pref. that code has support for IP pattern matching as well as domain name matching. i'd really like to utilize that code if possible. i think there is still a race between getting the pref inside the Init method and the function that actually uses the pref. the right thing to do is probably to defer storing the string value until you can get inside the lock. that means, it would need to happen after the |firstTime| block in Init.
the other advantage of the no-proxy-for code is that it support port blocking as well. indeed, it might be the case that you want to use IPv6 only for some ports. i know that it is common for a IPv6 node to run mixed software (e.g., an IMAP server on IPv4 and a HTTP server on IPv6). sound good? if so, it should be fairly trivial to rip the host:port blocking code out of nsProtocolProxyService. we can make a helper class that lives in netwerk/base/src ... maybe call it nsHostPortFilter or something like that.
(In reply to comment #98) > the other advantage of the no-proxy-for code is that it support port blocking as > well. indeed, it might be the case that you want to use IPv6 only for some > ports. The normal failover-on-connect code should handle this. The client DNS lookup code is not an appropriate place to deal with this.
yeah, i agree with you... in fact, i was just about to post a comment similar to what you posted. i think it is probably best to keep this code focused on just blocking domains that are known to not handle DNS requests for AAAA records. so, i'm inclined to go with the current patch, provided we fix the race condition.
Attached patch patch with even more locking (obsolete) — Splinter Review
Ok, now all accesses to mIPv4OnlyDomains are behind mLock. This includes the destructor of nsDNSService, although I'm not sure this is necessary.
Attachment #140625 - Attachment is obsolete: true
Attachment #140625 - Flags: review?(darin) → review-
Attachment #140871 - Flags: review?(darin)
Another one seems to be mii.instacontent.net, as does vanguard.com. etrade.com seems to work though. Darin, have you had a chance to look at this patch yet?
see also bug 231607. it seems that OSX 10.3 does not support IPv6 very well in some cases...
*** Bug 231607 has been marked as a duplicate of this bug. ***
Since duplicate bug 231786 is blocking 1.7b, I could add the capability to disable IPv6 DNS lookups via pref as suggested in bug 231607 comment #18. This would help those with broken DNS servers. This could be done with a network.dns.disableIPv6 boolean pref (default false on all OSs). Darin, what do you think? Can we get this in in time for 1.7b?
*** Bug 232382 has been marked as a duplicate of this bug. ***
Another affected site seems to be allmusic.com: see bug 232382.
Like the previous patch, but adds the ability to turn IPv6 off via a hidden pref (mainly for Mac users with broken caching DNS servers).
Flags: blocking1.7b?
darin, are you ok with this line of patching, or would you prefer to see an entirely different mechanism? We need to get this resolved before beta so it can be tested.
The prefs introduced by this patch seem impossible to maintain. How can we know what domains we need to add to the prefs? How does this list get updated when new broken domains are found, or others are fixed? I'd rather see a patch that globally disables IPv6 lookups.
simon, the last patch (if i'm correct) allows for disabling ipv6 entirely in addition to a blacklist if the user wants ipv6 support enabled. am i mistaken?
*** Bug 230891 has been marked as a duplicate of this bug. ***
*** Bug 231118 has been marked as a duplicate of this bug. ***
OK, sounds fine. I didn't read the patch in sufficient detail.
(In reply to comment #111) > simon, the last patch (if i'm correct) allows for disabling ipv6 entirely in > addition to a blacklist if the user wants ipv6 support enabled. That is correct. The blacklist is meant for advanced users who have IPv6 turned on and know how to maintain the list (and just being able to put .doubleclick.net in the list will do a lot for these users). The other pref is meant mainly for Mac users, because on the Mac you can't turn off IPv6 except in the app, and if you have a broken DNS server you're hosed.
The blacklist is for normal users and is meant to default to a Mozilla-maintained list of the most problematic sites. The configurability of the list is just feature creep. The ability to disable IPv6 lookups is for people with broken local DNS servers which they're unwilling or unable to get fixed.
*** Bug 231607 has been marked as a duplicate of this bug. ***
Attached patch v2 patchSplinter Review
i revised the latest patch slightly. i decided to do this myself instead of posting my review comments in the bug. reason: 1.7 beta freeze is tomorrow, and i don't want this patch to miss the deadline. minor changes include: (1) cleaned up pref handling (2) make Init method enter lock before setting all member variables. this is now necessary since AsyncResolve/Resolve enter the lock twice. (3) revised GetAFForLookup so that we don't need to PromiseFlatCString before calling it. i also made it so that the host value must either exactly match (case insensitive) a domain in the list or it must be prefixed by a dot. i think this is reasonable since it doesn't make sense for foobar.com to match bar.com. Lorenzo: please review these changes, and let me know if you see anything out of wack. thanks!
Attachment #140871 - Attachment is obsolete: true
Attachment #142076 - Attachment is obsolete: true
Attachment #140871 - Flags: review?(darin)
Comment on attachment 143342 [details] [diff] [review] v2 patch Index: modules/libpref/src/init/all.js =================================================================== >+// The following prefs pertain to the negotiate-auth extension (see bug 17578), >+// which provides transparent Kerberos authentication using the SPNEGO protocol. This doesn't look like it belongs here. :) >+// This preference specifies a list of domains for which DNS lookups will be >+// IPv4 only. Works around broken DNS servers which can't handle IPv6 lookups >+// and/or allows the user to disable IPv6 on a per-domain basis. See bug 68796. >+pref("network.dns.ipv4OnlyDomains", ".doubleclick.net"); We also might want to add other domains such as .mii.instacontent.net (common ad servers) or .allmusic.com, (mentioned in dupes), or .bloomberg.com, .lastminute.com, .apc.com, and .apcc.com, .vanguard.com. But how many should we add? I have a list of these, but there's quite a few of them... A couple of questions on strings while I'm at it: >Index: netwerk/dns/src/nsDNSService2.cpp >=================================================================== +>[...] >+ if (NS_SUCCEEDED(rv)) { >+ // now, set all of our member variables while holding the lock >+ nsAutoLock lock(mLock); >+ mResolver = res; >+ mIDN = idn; >+ mIPv4OnlyDomains = ipv4OnlyDomains; // exchanges buffer ownership I assume this also frees the buffer which was previously being used by mIPv4OnlyDomains, right? >[...] >+ nsACString::const_iterator hostStart; >+ host.BeginReading(hostStart); Why do you need this? Is it because you can't call get() on an nsACString? Apart from that, the patch looks fine. I'll be testing it on an IPv6-connected system this evening (european time; = about 7 hours from now), if you want another data point.
Attachment #143342 - Flags: review+
Since IPv6 cannot be turned off on Panther, maybe we should mention this on the release notes? Something like "If you are using mozilla on Mac OS X 10.3, and some sites load very slowly, try turning off IPv6. To do so, enter about:config in the URL bar, scroll down to "network.dns.disableIPv6", right click to modify it and set it to true." Or maybe we should just set this pref to default true on Mac OS...
The Mac OS 10.3 issue with getaddrinfo() would be sufficient reason to turn off IPv6 on that version, just like it's turned off for 10.2.
i planned to turn this off entirely for camino, at least. not sure what m.o wants to do with firefox/seamonkey.
That's a shame. OS X should: (1) disable IPv6 lookups on its own if IPv6 is not turned on, (2) honour API flags like AI_ADDRCONFIG to do so at the application's request or, at least, (3) allow the user to turn IPv6 off It can't be so hard if both Windows and Linux do it. What's the point of including IPv6 support and even GUI configuration for IPv6 if such problems mean the apps have to ship with IPv6 disabled? Oh well...
Has anyone filed bugs with Apple? bugreporter.apple.com.
> (From update of attachment 143342 [details] [diff] [review]) > >+// which provides transparent Kerberos authentication using the SPNEGO > This doesn't look like it belongs here. :) whoops.. yes, i forgot to strip that out of the patch. > >+// This preference specifies a list of domains for which DNS lookups will be > >+// IPv4 only. Works around broken DNS servers which can't handle IPv6 lookups > >+// and/or allows the user to disable IPv6 on a per-domain basis. See bug 68796. > >+pref("network.dns.ipv4OnlyDomains", ".doubleclick.net"); > > We also might want to add other domains such as .mii.instacontent.net (common > ad servers) or .allmusic.com, (mentioned in dupes), or .bloomberg.com, > .lastminute.com, .apc.com, and .apcc.com, .vanguard.com. But how many should we > add? I have a list of these, but there's quite a few of them... yeah, i don't know how best to deal with this either. we should probably include some of these in the default prefs, but doing so is somewhat non-ideal since over time those sites might correct their DNS servers. meanwhile, mozilla will continue to block IPv6 queries to those sites. i think blacklists like this don't scale well over time, but it's not like i have a better solution in mind. pinkerton said he wants to disable IPv6 queries by default for Camino. his point is that users that need IPv6 can go and enable it. that seems like a reasonable thing. we just need a good blurb in the release notes about this stuff. > >+ mIPv4OnlyDomains = ipv4OnlyDomains; // exchanges buffer ownership > > I assume this also frees the buffer which was previously being used by > mIPv4OnlyDomains, right? right. this exchange of buffers is a property of nsAdoptingString. normally strings don't behave this way. they copy on assign, leaving both strings with the same "value". here i'm just exploiting nsAdoptingString to avoid a buffer copy. > >+ nsACString::const_iterator hostStart; > >+ host.BeginReading(hostStart); > > Why do you need this? Is it because you can't call get() on an nsACString? yes, exactly. nsAC?String represents an array of characters that is not necessarily null-terminated. as a result, there is no .get() method b/c that method is intended to return a pointer to a null-terminated array. so, we use BeginReading to get the starting iterator, and from that we get the pointer to the start of the buffer. the string API is only a "little" overkill ;-)
patch checked in for 1.7 beta. lorenzo: if you want to generate a subsequent patch to block specific domains, please do. this patch gives us the machinery to fix this bug, but we still need to configure the builds to solve this bug.
As regards disabling IPv6, I think we should leave IPv6 enabled by default for Windows and Linux builds, since on both systems IPv6 name lookups are tried only if the user explicitly turns on IPv6. The Mac case is a tougher call for an IPv6 evangelist like me, but I can understand if people want to disable IPv6 by default on OS X. :-) This can be done by wrapping a pref("network.dns.disableIPv6", true) inside #ifdef XP_MACOSX in all.js; I don't know who should make this call though. Regarding IPv4OnlyDomains, I think a blacklist is the best we can do. Remember that this is for people who have turned IPv6 on themselves, either at the OS level on Linux and Windows or by using about:config on OS X. Maybe we could put it in the release notes and then target the most common problems, like ad servers and high profile sites. If new bugs are filed, the blacklist can be expanded, and it can be periodically checked thanks to this script by David Malone which automates the necessary tests: http://www.cnri.dit.ie/cgi-bin/check_aaaa.pl
Attached patch turn off IPv6 by default on Mac (obsolete) — Splinter Review
This disables IPv6 by default on Mac. :( To re-enable, use about:config to change network.dns.disableIPv6 to true.
Same as previous patch, but the comment is a bit clearer.
Attachment #143439 - Attachment is obsolete: true
Attachment #143443 - Flags: review?(darin)
Attachment #143443 - Flags: review?(darin) → review+
Target Milestone: --- → mozilla1.7beta
over AIM: > chrishofmann: ok. go ahead. asa also notes that it was one of the top problems > in mac firefox 0.8 checked in latest patch to all.js (attachment 143443 [details] [diff] [review]) marking FIXED lorenzo: can you see about adding a release note? i hear that asa will be handling the release notes for 1.7.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago21 years ago
Resolution: --- → FIXED
Comment on attachment 143478 [details] [diff] [review] Fix missing " r+sr=darin thanks neil for catching that! we should get this patch in right away.
Attachment #143478 - Flags: superreview+
Attachment #143478 - Flags: review+
ok, i see that neil has already taken care of it: > revision 3.507 > date: 2004/03/10 15:20:54; author: neil%parkwaycc.co.uk; state: Exp; lines: +1 -1 > Fix missing quote rs=glazou a=mkaply patch in bug 68796 thanks neil!
Flags: blocking1.7b?
*** Bug 245174 has been marked as a duplicate of this bug. ***
*** Bug 244176 has been marked as a duplicate of this bug. ***
The problem in bug 244176 remains. I cannot get to the New York Times web site using the preference in Mozilla to turn off IPv6. I need to turn it off completely through modprobe.conf. Yet Opera and Lynx do not have this problem. I really think that there is more to do here. I don't understand all the discussion in this bug, but I think it seems to ignore the real problem that still exists on Linux.
(In reply to comment #136) > The problem in bug 244176 remains. I cannot get to the New York Times > web site using the preference in Mozilla to turn off IPv6. That's bug 245174, and not bug 244176 which is OpenBSD-only. Please don't spam this bug, discussion should continue there.
Now that http://www.cnri.dit.ie/cgi-bin/check_aaaa.pl?dom=doubleclick.net appears to give our single blacklisted domain a clean bill of health, should we remove it from the list? It would keep us from having to explain why the evil, baby-soul-destroying incantation "doubleclick.net" is in our all.js.
Doubleclick is still broken. Don't query for doubleclick.net, rather for ad.doubleclick.net: http://www.cnri.dit.ie/cgi-bin/check_aaaa.pl?dom=ad.doubleclick.net
Blocks: 377383
Depends on: 377395
today, that link says that all looks good even with ad.doubleclick.net . Can it be removed from the list now?
(In reply to comment #140) > today, that link says that all looks good even with ad.doubleclick.net . Can > it be removed from the list now? > That was done in bug 377395
I just wanted to inform you that I needed to manually toggle "network.dns.disableIPv6" in "about:config" to "true" in order to get websites to work, i.e. load completely (for example o2online.de, payback.de, somany.de, ...). Firefox 18.0.1 @ Windows XP SP3: Mozilla/5.0 (Windows NT 5.1; rv:18.0) Gecko/20100101 Firefox/18.0
(In reply to Elomir from comment #142) > (for example o2online.de, payback.de, somany.de, ...). WORKSFORME. Perhaps you have a local networking problem. Try test-ipv6.com.
(In reply to John G. Myers from comment #143) > (In reply to Elomir from comment #142) > > (for example o2online.de, payback.de, somany.de, ...). > > WORKSFORME. Perhaps you have a local networking problem. Try test-ipv6.com. Yes, no ipv6 here yet (0/10): > No IPv6 address detected [more info] > > It seems as if you had only an IPv4-enabled Internet connection. > You will not be able to reach websites which are available via IPv6. > > Your DNS server (probably run by your ISP) seems either to have no IPv6 > Internet access, or is not configured to use it. In the future, this could > restrict accessibility of Web content, which is only accessible via IPv6. > [more info]
Blocks: 1673856
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: