Closed
Bug 239358
Opened 20 years ago
Closed 20 years ago
DNS: Reverse lookups are degrading performance
Categories
(Core :: Networking, defect)
Tracking
()
RESOLVED
FIXED
mozilla1.8beta1
People
(Reporter: bugzilla, Assigned: darin.moz)
References
()
Details
(Keywords: fixed-aviary1.0, fixed1.7.5, perf, Whiteboard: [patch] [lin14523])
Attachments
(4 files, 11 obsolete files)
1.18 KB,
text/plain
|
Details | |
6.67 KB,
patch
|
Details | Diff | Splinter Review | |
39.24 KB,
patch
|
bryner
:
superreview+
chofmann
:
approval-aviary+
chofmann
:
approval1.7.5+
|
Details | Diff | Splinter Review |
1.37 KB,
patch
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040114 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040114 Here's a tcpdump run and some comments to demonstrate the problem scampbell:~ # tcpdump -n -i eth0 port 53 tcpdump: listening on eth0 09:58:11.094628 10.8.178.10.32785 > 10.8.171.87.53: 47349+ AAAA? www.google.com . (32) (DF) 09:58:11.119236 10.8.171.87.53 > 10.8.178.10.32785: 47349 1/0/0 CNAME[|domain] (DF) 09:58:11.119537 10.8.178.10.32785 > 10.8.171.87.53: 47350+ A? www.google.com. ( 32) (DF) 09:58:11.121283 10.8.171.87.53 > 10.8.178.10.32785: 47350 3/1/0 CNAME[|domain] (DF) 09:58:11.121571 10.8.178.10.32785 > 10.8.171.87.53: 47351+ PTR? 104.167.233.64. in-addr.arpa. (45) (DF) four seconds elapse while we lookup the reverse dns for the google site This seems unnecessary and disfunctional 09:58:16.121846 10.8.178.10.32785 > 10.8.171.87.53: 47351+ PTR? 104.167.233.64. in-addr.arpa. (45) (DF) 09:58:21.123186 10.8.178.10.32785 > 10.8.171.87.53: 47352+ AAAA? www.google.com . (32) (DF) 09:58:21.147515 10.8.171.87.53 > 10.8.178.10.32785: 47352 1/0/0 CNAME[|domain] (DF) 09:58:21.148001 10.8.178.10.32785 > 10.8.171.87.53: 47353+ PTR? 99.167.233.64.i n-addr.arpa. (44) (DF) We seem to try again for another 5 seconds lost 09:58:26.148848 10.8.178.10.32785 > 10.8.171.87.53: 47353+ PTR? 99.167.233.64.i n-addr.arpa. (44) (DF) 09:58:31.150124 10.8.178.10.32785 > 10.8.171.87.53: 47354+ AAAA? www.google.com . (32) (DF) 09:58:31.175291 10.8.171.87.53 > 10.8.178.10.32785: 47354 1/0/0 CNAME[|domain] (DF) 09:58:39.130214 10.8.171.87.53 > 10.8.178.10.32785: 47351 ServFail 0/0/0 (45) (DF) 09:58:39.130239 10.8.171.87.53 > 10.8.178.10.32785: 47351 ServFail 0/0/0 (45) (DF) 09:58:49.131367 10.8.171.87.53 > 10.8.178.10.32785: 47353 ServFail 0/0/0 (44) (DF) 09:58:49.131405 10.8.171.87.53 > 10.8.178.10.32785: 47353 ServFail 0/0/0 (44) (DF) Ultimately it took us 38 seconds to open the web page, almost all time was waiting on reverse lookup records that do not seem to serve a usefull purpose. Reproducible: Always Steps to Reproduce: 1.Start with a clean DNS cache and new copy of mozilla 2 Go to the google site 3.Watch as things are slow to occur initially, tcpdump is great help Actual Results: It took 38 seconds to open the page. See the attached and annoted tcpdump report Expected Results: Opened the page immediately, I suspect there is no good reason to do the reverse lookups that appear to be causing the problem. I've put this to 'Major' as it will likely cause many bad perceptions about the excellent Mozilla software.
Reporter | ||
Comment 1•20 years ago
|
||
I have also sent an email to the google folks informing them of their missing reverse zone. I would think we must expect that such 'bad' dns situations are fairly common and need to handle them gracefully but google -may- become a bad test site for this as they may fix the problem on their side soon.
Comment 2•20 years ago
|
||
*** Bug 239357 has been marked as a duplicate of this bug. ***
Updated•20 years ago
|
Assignee: general → darin
Component: Browser-General → Networking
QA Contact: general → benc
Reporter | ||
Comment 4•20 years ago
|
||
As of this morning, 4/6/2004 it appears Google has had a reverse zone (albeit unpopulated) created, as predicted they are no longer a good test site.
Comment 5•20 years ago
|
||
fortunately, www.nytimes.com, my original test case, still provides no reverse lookup.
Comment 6•20 years ago
|
||
I see the reverse lookup the first time I try www.nytimes.com, but if I close Mozilla and try again, the reverse lookup is not performed, although the forward lookup is still made. I'm not running a proxy or any DNS caching (nscd). tested with linux trunk 2004040408
Reporter | ||
Comment 7•20 years ago
|
||
Andrew's experience seems unique, I have several folks at the local LUG (www.mdlug.org), as well as myself, who all can reproduce this on a variety of linux distros with Mozilla and Firebird.
Comment 8•20 years ago
|
||
It's likely that the reverse lookups you're seeing are the result of tcpdump trying to resolve IP addresses for display. I don't see them when using tcpdump -n. I'm going to mark WFM; please re-open if you can reproduce with tcpdump -n.
Status: UNCONFIRMED → RESOLVED
Closed: 20 years ago
Resolution: --- → WORKSFORME
Reporter | ||
Comment 9•20 years ago
|
||
We did use -n on the original tcpdump but here is a new example or mozilla doing reverse lookups (and failing in this case as www.nytimes.com has no reverse zone). scampbell:~ # tcpdump -n -i eth0 port 53 tcpdump: listening on eth0 07:55:12.608191 172.17.5.73.32782 > 172.17.4.129.53: 15626+ AAAA? www.nytimes.com. (33) (DF) 07:55:12.609193 172.17.4.129.53 > 172.17.5.73.32782: 15626 0/1/0 (79) 07:55:12.609372 172.17.5.73.32782 > 172.17.4.129.53: 15627+ AAAA? www.nytimes.com.seqnt.com. (43) (DF) 07:55:12.610740 172.17.4.129.53 > 172.17.5.73.32782: 15627 NXDomain* 0/1/0 (111) 07:55:12.619182 172.17.5.73.32782 > 172.17.4.129.53: 15628+ A? www.nytimes.com. (33) (DF) 07:55:12.619382 172.17.4.129.53 > 172.17.5.73.32782: 15628 4/0/0 A 199.239.137.245,[|domain] 07:55:12.619830 172.17.5.73.32782 > 172.17.4.129.53: 15629+ PTR? 245.137.239.199.in-addr.arpa. (46) (DF) 07:55:12.621196 172.17.4.129.53 > 172.17.5.73.32782: 15629 NXDomain 0/1/0 (103) 07:55:12.623749 172.17.5.73.32782 > 172.17.4.129.53: 15630+ AAAA? www.nytimes.com. (33) (DF) 07:55:12.624360 172.17.4.129.53 > 172.17.5.73.32782: 15630 0/1/0 (79) 07:55:12.630070 172.17.5.73.32782 > 172.17.4.129.53: 15631+ AAAA? www.nytimes.com.seqnt.com. (43) (DF) 07:55:12.631437 172.17.4.129.53 > 172.17.5.73.32782: 15631 NXDomain* 0/1/0 (111) 07:55:12.631821 172.17.5.73.32782 > 172.17.4.129.53: 15632+ PTR? 200.136.239.199.in-addr.arpa. (46) (DF) 07:55:17.632438 172.17.5.73.32783 > 172.17.4.128.53: 15632+ PTR? 200.136.239.199.in-addr.arpa. (46) (DF) 07:55:22.633391 172.17.5.73.32782 > 172.17.4.129.53: 15632+ PTR? 200.136.239.199.in-addr.arpa. (46) (DF) 07:55:27.634397 172.17.5.73.32783 > 172.17.4.128.53: 15632+ PTR? 200.136.239.199.in-addr.arpa. (46) (DF) 07:55:28.585931 172.17.4.129.53 > 172.17.5.73.32782: 15632 ServFail 0/0/0 (46) 07:55:32.635730 172.17.5.73.32783 > 172.17.4.129.53: 15633+ AAAA? www.nytimes.com. (33) (DF) 07:55:32.636700 172.17.4.129.53 > 172.17.5.73.32783: 15633 0/1/0 (79) 07:55:32.636924 172.17.5.73.32783 > 172.17.4.129.53: 15634+ AAAA? www.nytimes.com.seqnt.com. (43) (DF) 07:55:32.638300 172.17.4.129.53 > 172.17.5.73.32783: 15634 NXDomain* 0/1/0 (111) 07:55:32.650682 172.17.5.73.32783 > 172.17.4.129.53: 15635+ PTR? 245.136.239.199.in-addr.arpa. (46) (DF) 07:55:32.967714 172.17.4.128.53 > 172.17.5.73.32783: 15632 ServFail 0/0/0 (46) 07:55:37.651421 172.17.5.73.32784 > 172.17.4.128.53: 15635+ PTR? 245.136.239.199.in-addr.arpa. (46) (DF) 07:55:38.587303 172.17.4.129.53 > 172.17.5.73.32782: 15632 ServFail 0/0/0 (46)
Status: RESOLVED → UNCONFIRMED
Resolution: WORKSFORME → ---
Comment 10•20 years ago
|
||
You're right, I misread the original post, sorry. I still can't reproduce it on my machine.
Comment 11•20 years ago
|
||
Steven: the existence of the reverse lookups in DNS doesn't necessarily mean that Mozilla is making them. Do you have a proxy/firewall/etc that might be doing a lookup? To see the DNS lookups Mozilla is doing, you can do: setenv NSPR_LOG_MODULES nsHttpResolver:5 setenv NSPR_LOG_FILE /tmp/dns.log mozilla http://www.nytimes.com/ (quit once the page is loaded)
Reporter | ||
Comment 12•20 years ago
|
||
I really don't think it's something I'm running, I am not running nsc or named or any other software that might interfere with DNS. We (www.mdlug.org) have witnessed this with many different versions of Mozilla/Firebird on many platform (it was actually conversations we had amongst ourselves that prompted me to look for the cause). Also, I just tried: scampbell@scampbell:~> NSPR_LOG_MODULES=nsHttpResolver:5 scampbell@scampbell:~> NSPR_LOGFILE=/tmp/dns.log scampbell@scampbell:~> export NSPR_LOGFILE scampbell@scampbell:~> export NSPR_LOG_MODULES scampbell@scampbell:~> mozilla But this did not generate a /tmp/dns.log file. Again, my specific example is Suse 9.0 (patched up to date) using Mozilla 1.6 - Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040114
Comment 13•20 years ago
|
||
Andrew Schultz suggested you set NSPR_LOG_FILE, but you set NSPR_LOGFILE, so it may be a simple typo (I haven't tried it with either one, just observing the discrepency)
Comment 14•20 years ago
|
||
yes, should be NSPR_LOG_FILE when not setting NSPR_LOG_FILE but setting NSPR_LOG_MODULES, logging will happen to stdout (or stderr, not sure)
Reporter | ||
Comment 15•20 years ago
|
||
Thanks, I corrected that but I got an empty log file. I have asked my local lug to all try this and report back including tcpdumps of dns, copies of /etc/resolv.conf and ps-ef. I'll see what I get back on that. Sorry for the bugspam folks.
Comment 16•20 years ago
|
||
(In reply to comment #12) > scampbell@scampbell:~> NSPR_LOG_MODULES=nsHttpResolver:5 Hmm, actually looking at the sourcecode, it seems that this is called nsHostResolver, not nsHttpResolver.
Comment 17•20 years ago
|
||
I'm still puzzled about where the reverse lookups come from. In the past, Mozilla has sent reverse lookups spuriously, and that has been from using IP addresses. So if you really think mozilla is the problem, trace a URL that goes to a fixed address (use nslookup to convert and then do http://1.2.3.4). Use a page that doens't load much from elsewhere.
Summary: Reverse DNS lookups are degrading performance → DNS: Reverse lookups are degrading performance
Reporter | ||
Comment 18•20 years ago
|
||
Testing with a fixed IP there were no reverse lookups generated. We're seeing lots of evidence that this is tied to the linux distributions. Suse 9.0 or Mandrake 10.0 loaded with Mozilla 1.6 (sorry I don't have specific versions) demonstrate the problem, however Debian does not. This is actually watching for dns traffic using tcpdump. We've eliminated all local nameservers and cachers from the systems so it's not there. Also, we've noted that some ISPs, particularily WOW seem to be able to give a negative dns answer quickly masking the problem. Testing with Konqueror on the same systems does not exhibit the reverse lookup and the few I've tried (not everyone has tried this). At this point it is difficult to say where this problem lies, I'll keep digging for usuable information/evidence.
Comment 19•20 years ago
|
||
There maybe variances in the way DNS resolvers work, either through OS libraries or vendor configuration.
Comment 20•20 years ago
|
||
I see the same problem with firefox 0.8 on my Debian sarge system. Trying to log DNS lookups to /tmp/dns.log as described also gets me an empty log: what's the secret to getting this to work? The problem also shows up if I use another machine running squid as a proxy, which is strange because requests go to squid as URLs, not IP addresses which would have to be looked up. Using ethereal on port 3128 of the machine running firefox doesn't show anything but the URL requests going out. And it takes even longer to display the page this way. Yes, I'm running ethereal with name lookups turned off. .....Ron
Comment 21•20 years ago
|
||
One more thing: Doing "netstat -anp" while waiting for the page to load shows that the process doing the DNS lookups is indeed that of firefox.
Comment 22•20 years ago
|
||
I think I've found the problem. File nsprpub/pr/src/misc/prnetdb.c calls getaddrinfo, using (among other things): > hints.ai_flags = AI_CANONNAME; (around line 2085). According to the getaddrinfo man page, > ai_canonname is set to point to the official name of the host, > if ai_flags in hints includes the AI_CANONNAME flag. Under some versions of Linux, at least, this seems to cause the resolver library to do a reverse lookup immediately after doing the forward lookup. This would happen every time mozilla code calls PR_GetAddrInfoByName(), and indeed the same problem occurs with mozilla, galeon and epiphany (the only other browsers I've tried). It's easy to test this: create a simple program that calls getaddrinfo() on www.nytimes.com and check the response time with and without the AI_CANONNAME flag. There's still a huge time delay with the flag set, and there's no mozilla code involved here. I've done a firefox build with the code setting that flag commented out, and can confirm that loading nytimes.com is lots faster. Furthermore, monitoring port 53 outgoing on the box with ethereal shows no reverse lookups while that page is loading. The remaining problem, of course, is what to do if anything uses the ai_canonname member, which will now be empty if the flag is not set. A grep through the firefox code seems to show that the only routine using it is PR_GetCanonNameFromAddrInfo(), and that in turn doesn't seem to be called by anything. It'd probably be best to rewrite that routine so it calls getaddrinfo() itself with the AI_CANONNAME set. Other problems? For a start, - This box dual-boots Linux and Windows 2000. I don't see anything like this problem on win2k. - It's not clear why the problem arises on some boxes and not others. Sigh. .....Ron
Comment 23•20 years ago
|
||
(In reply to comment #22) > - It's not clear why the problem arises on some boxes and not others. If #246361 is the same as this one, it occurs on Debian with the 2.6.3 kernel but not on Debian with the 2.4.18 kernel. What about the kernel versions on the boxes where you have seen this bug?
Comment 24•20 years ago
|
||
Steven: Can you try to execute http://mcsmurf.milten.lima-city.de/linux-test/test and http://mcsmurf.milten.lima-city.de/linux-test/test2 ? The source can be found under http://mcsmurf.milten.lima-city.de/linux-test/test.c and http://mcsmurf.milten.lima-city.de/linux-test/test2.c Please see how long the first program needs to execute and how long the second one needs. This programs trys to lookup www.nytimes.com, one time with the flag set (test) and one time without it (test2).
Assignee | ||
Comment 25•20 years ago
|
||
It should be fairly straightforward to avoid resolving the CNAME in most cases. There are times when we need it, but those can be special cased. I'll wait for feedback from mcsmurf's testcases.
Reporter | ||
Comment 26•20 years ago
|
||
I ran the tests. 'test' does the reverse lookup, 'test2' does not. I'd like to suggest to those who try this out to please run tcpdump on port 53 'tcpdump -n port 53' and watch for 'PTR?' (for example: PTR? 245.137.239.199.in-addr.arpa.). Remember that the negative cache may still hold failed PTR? request which will mask the result when timeing. Also note that some ISP's (I note particularily WOW up here in SE Michigan) seems to mask this by failing PTR? requests quickly. If we only time it then we will have unreliable results, we really need to see if the machine trys the PTR? (reverse lookup) query. Here's the run times: > time ./test real 0m1.652s user 0m0.003s sys 0m0.001s > time ./test2 real 0m0.003s user 0m0.001s sys 0m0.000s > time ./test real 0m0.333s user 0m0.003s sys 0m0.002s Notice how the second run of ./test is pretty quick, the negative query got cached. I'll post this out the our local LUG and see if everyone agrees that this is probably the culprit. So I think you may have put your finger on the problem, I just caution us on our testing procedure so we don't blow right by the answer.
Assignee | ||
Updated•20 years ago
|
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Keywords: helpwanted,
perf
Target Milestone: --- → mozilla1.8alpha2
Comment 27•20 years ago
|
||
(In reply to comment #24) > Steven: Can you try to execute > http://mcsmurf.milten.lima-city.de/linux-test/test and > http://mcsmurf.milten.lima-city.de/linux-test/test2 ? The source can be found > under http://mcsmurf.milten.lima-city.de/linux-test/test.c and > http://mcsmurf.milten.lima-city.de/linux-test/test2.c > Please see how long the first program needs to execute and how long the second > one needs. This programs trys to lookup www.nytimes.com, one time with the flag > set (test) and one time without it (test2). I'm running Arch Linux, kernel 2.6.6 and Mozilla 1.6 Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040117 I ran into this while using a caching DNS on my firewall. Resolving NYT was taking 20 - 30 sec. I bypassed my DNS by modifying resolv.conf and went directly to Comcast's DNS. This reduced the times to a second, but inspection with tcpdump shows server failure on PTR lookups, just very quickly. ==== test1 ==== # tcpdump -ni epi1 port 53 tcpdump: listening on ep1 17:16:18.516403 24.61.140.46.9977 > 204.127.202.19.53: 63965+ AAAA? www.nytimes.com. (33) (DF) 17:16:18.534299 204.127.202.19.53 > 24.61.140.46.9977: 63965 0/1/0 (79) (DF) 17:16:18.534839 24.61.140.46.9977 > 204.127.202.19.53: 63966+ AAAA? www.nytimes.com. (33) (DF) 17:16:18.551648 204.127.202.19.53 > 24.61.140.46.9977: 63966 0/1/0 (79) (DF) 17:16:18.552138 24.61.140.46.9977 > 204.127.202.19.53: 63967+ A? www.nytimes.com. (33) (DF) 17:16:18.567887 204.127.202.19.53 > 24.61.140.46.9977: 63967 4/2/2 A 199.239.137.200,[|domain] (DF) 17:16:18.568710 24.61.140.46.9977 > 204.127.202.19.53: 51829+ PTR? 200.137.239.199.in-addr.arpa. (46) (DF) 17:16:18.711439 204.127.202.19.53 > 24.61.140.46.9977: 51829 NXDomain 0/1/0 (103) (DF) 17:16:18.712074 24.61.140.46.9977 > 204.127.202.19.53: 51830+ AAAA? www.nytimes.com. (33) (DF) 17:16:18.732767 204.127.202.19.53 > 24.61.140.46.9977: 51830 0/1/0 (79) (DF) 17:16:18.733260 24.61.140.46.9977 > 204.127.202.19.53: 51831+ AAAA? www.nytimes.com. (33) (DF) 17:16:18.750722 204.127.202.19.53 > 24.61.140.46.9977: 51831 0/1/0 (79) (DF) 17:16:18.751324 24.61.140.46.9977 > 204.127.202.19.53: 51832+ PTR? 245.137.239.199.in-addr.arpa. (46) (DF) 17:16:18.766845 204.127.202.19.53 > 24.61.140.46.9977: 51832 NXDomain 0/1/0 (103) (DF) 17:16:18.767459 24.61.140.46.9977 > 204.127.202.19.53: 51833+ AAAA? www.nytimes.com. (33) (DF) 17:16:18.785336 204.127.202.19.53 > 24.61.140.46.9977: 51833 0/1/0 (79) (DF) 17:16:18.785841 24.61.140.46.9977 > 204.127.202.19.53: 51834+ AAAA? www.nytimes.com. (33) (DF) 17:16:18.810603 204.127.202.19.53 > 24.61.140.46.9977: 51834 0/1/0 (79) (DF) 17:16:18.811146 24.61.140.46.9977 > 204.127.202.19.53: 51835+ PTR? 200.136.239.199.in-addr.arpa. (46) (DF) 17:16:18.828220 204.127.202.19.53 > 24.61.140.46.9977: 51835 ServFail 0/0/0 (46) (DF) 17:16:18.828943 24.61.140.46.48892 > 216.148.227.79.53: 51835+ PTR? 200.136.239.199.in-addr.arpa. (46) (DF) 17:16:18.909172 216.148.227.79.53 > 24.61.140.46.48892: 51835 ServFail 0/0/0 (46) (DF) 17:16:18.909629 24.61.140.46.9977 > 204.127.202.19.53: 51835+ PTR? 200.136.239.199.in-addr.arpa. (46) (DF) 17:16:18.926686 204.127.202.19.53 > 24.61.140.46.9977: 51835 ServFail 0/0/0 (46) (DF) 17:16:18.927119 24.61.140.46.48892 > 216.148.227.79.53: 51835+ PTR? 200.136.239.199.in-addr.arpa. (46) (DF) 17:16:19.018154 216.148.227.79.53 > 24.61.140.46.48892: 51835 ServFail 0/0/0 (46) (DF) 17:16:19.018790 24.61.140.46.9977 > 204.127.202.19.53: 51836+ AAAA? www.nytimes.com. (33) (DF) 17:16:19.041374 204.127.202.19.53 > 24.61.140.46.9977: 51836 0/1/0 (79) (DF) 17:16:19.041920 24.61.140.46.9977 > 204.127.202.19.53: 51837+ AAAA? www.nytimes.com. (33) (DF) 17:16:19.063247 204.127.202.19.53 > 24.61.140.46.9977: 51837 0/1/0 (79) (DF) 17:16:19.063848 24.61.140.46.9977 > 204.127.202.19.53: 51838+ PTR? 245.136.239.199.in-addr.arpa. (46) (DF) 17:16:19.081715 204.127.202.19.53 > 24.61.140.46.9977: 51838 ServFail 0/0/0 (46) (DF) 17:16:19.082147 24.61.140.46.48892 > 216.148.227.79.53: 51838+ PTR? 245.136.239.199.in-addr.arpa. (46) (DF) 17:16:19.162845 216.148.227.79.53 > 24.61.140.46.48892: 51838 ServFail 0/0/0 (46) (DF) 17:16:19.163351 24.61.140.46.9977 > 204.127.202.19.53: 51838+ PTR? 245.136.239.199.in-addr.arpa. (46) (DF) 17:16:19.180410 204.127.202.19.53 > 24.61.140.46.9977: 51838 ServFail 0/0/0 (46) (DF) 17:16:19.180830 24.61.140.46.48892 > 216.148.227.79.53: 51838+ PTR? 245.136.239.199.in-addr.arpa. (46) (DF) 17:16:19.261321 216.148.227.79.53 > 24.61.140.46.48892: 51838 ServFail 0/0/0 (46) (DF) 17:16:19.261892 24.61.140.46.9977 > 204.127.202.19.53: 51839+ AAAA? www.nytimes.com. (33) (DF) 17:16:24.261723 24.61.140.46.55773 > 216.148.227.79.53: 51839+ AAAA? www.nytimes.com. (33) (DF) 17:16:24.413212 216.148.227.79.53 > 24.61.140.46.55773: 51839* 0/1/0 (79) (DF) 17:16:24.413794 24.61.140.46.61673 > 204.127.202.19.53: 51840+ AAAA? www.nytimes.com. (33) (DF) 17:16:24.437986 204.127.202.19.53 > 24.61.140.46.61673: 51840* 0/1/0 (79) (DF) ==== test2==== 17:16:42.622400 24.61.140.46.61673 > 204.127.202.19.53: 5550+ AAAA? www.nytimes.com. (33) (DF) 17:16:42.637652 204.127.202.19.53 > 24.61.140.46.61673: 5550 0/1/0 (79) (DF) 17:16:42.638162 24.61.140.46.61673 > 204.127.202.19.53: 5551+ AAAA? www.nytimes.com. (33) (DF) 17:16:42.653836 204.127.202.19.53 > 24.61.140.46.61673: 5551 0/1/0 (79) (DF) 17:16:42.654343 24.61.140.46.61673 > 204.127.202.19.53: 5552+ A? www.nytimes.com. (33) (DF) 17:16:42.670007 204.127.202.19.53 > 24.61.140.46.61673: 5552 4/2/2 A 199.239.136.200,[|domain] (DF) trader ~$ time ./test1 real 0m0.885s user 0m0.002s sys 0m0.002s trader ~$ time ./test2 real 0m0.057s user 0m0.001s sys 0m0.002s
Comment 28•20 years ago
|
||
It appears the pertinent code is in /sysdeps/posix/getaddrinfo.c This is by far the largest source file in .../posix and has been modified heavily over the last couple of years. It's hard to follow how it works and I'd have to use gdb to figure it out :-( From glibc 2.3.3 BUGS: (my Arch linux is running 2.3.2): List of known bugs (certainly very incomplete) ---------------------------------------------- Time-stamp: <02/09/30 13:49:48 drepper> [...] [ *] Some of the functions which also handled IPv6 are currently broken. IPv6 and IPv4 lookups occasionally happen when not needed. This happens in getaddrinfo() and getnameinfo(). IPv4 handling of these functions is OK though and there are patches available to fix the IPv6 code as well.
Comment 29•20 years ago
|
||
(In reply to comment #24) > Steven: Can you try to execute > http://mcsmurf.milten.lima-city.de/linux-test/test and > http://mcsmurf.milten.lima-city.de/linux-test/test2 ? The source can be found > under http://mcsmurf.milten.lima-city.de/linux-test/test.c and > http://mcsmurf.milten.lima-city.de/linux-test/test2.c > Please see how long the first program needs to execute and how long the second > one needs. This programs trys to lookup www.nytimes.com, one time with the flag > set (test) and one time without it (test2). hi, i ran each test from both inside and outside out firewall, with a peek at tcpdump. here's what i found: all times are user time inside firewall: test consistently takes 46s test2 takes 2s, most of which looks like quad-a lookup outside the firewall: test takes 0.52s +- 0.03s test2 takes 0.12s +- 0.01s
Assignee | ||
Comment 30•20 years ago
|
||
I can reproduce the performance problems on FC2. It seems that I get 3 DNS request packets for each hostname that I load when AI_CANONNAME is passed to getaddrinfo. It seems that GLIBC issues a PTR request for each of the returned IP addresses. I'm not sure why it is doing that since the CNAME is clearly indicated in the response from the A request.
Assignee | ||
Comment 31•20 years ago
|
||
Interestingly enough, if I call gethostbyname instead of getaddrinfo, I only get the first A request. It returns the CNAME, but does not do the reverse lookups. I don't understand why GLIBC insists on doing reverse lookups when the AI_CANONNAME flag is passed to getaddrinfo.
Assignee | ||
Comment 32•20 years ago
|
||
I tested WinXP, and it's getaddrinfo implementation does not generate PTR requests when passed AI_CANONNAME.
Assignee | ||
Comment 33•20 years ago
|
||
WTC suggested that we could extend NSPR's PR_GetAddrInfoByName to support a PR_AI_NOCANONNAME flag. We cannot use PR_AI_CANONNAME because we have to keep binary compatibility with previous NSPR releases that assumed AI_CANONNAME. However, this seems like a GLIBC bug to me. Or, maybe there is something that I'm missing. I'm going to try to find out more on this from the GLIBC side.
Assignee | ||
Comment 34•20 years ago
|
||
I found some comments on the web suggesting that the specification for getaddrinfo does not define precisely the meaning of AI_CANONNAME in the context of DNS. In some interpretations the CNAME included with a response to an A query may only be an alias for the real canonical name. That may explain why GLIBC is doing reverse lookups. Sounds to me like implementing WTC's suggestion is the right way to go.
Assignee | ||
Comment 35•20 years ago
|
||
*** Bug 245174 has been marked as a duplicate of this bug. ***
Comment 36•20 years ago
|
||
This does look like dubious behaviour on the part of glibc. When you already have a reasonable answer (since you got the CNAME) why do a reverse lookup, which will probably result in lower quality information?
Assignee | ||
Comment 37•20 years ago
|
||
Apparently, the behavior of getaddrinfo(AI_CANONNAME) is quite intentional on the part of the GLIBC development team. At least, I was able to get some information on this from folks at IBM. The word on the street is: "don't pass AI_CANONNAME if you don't need the reverse lookups." So, that's what we're going to need to do. It's a shame that we need to reverse the sense of AI_CANONNAME for NSPR, but I guess that's our only choice. WTC also suggested making PR_GetCanonNameFromAddrInfo do another lookup passing AI_CANONNAME. That might actually be a good solution for NSPR since we could then limit the reverse lookup to only one of the IP addresses. I'm not sure which I prefer at the moment, but I think that Mozilla has almost no use for AI_CANONNAME. It is only our webservices security layer that depends on reverse DNS lookups, and at the moment at least it has no need for reverse lookups on all IP addresses returned for a hostname. So, there may in fact be good reason to implement the reverse lookup inside PR_GetCanonNameFromAddrInfo. Thoughts?
Assignee | ||
Comment 38•20 years ago
|
||
Patch for NSPR that makes us lazily resolve the canonical hostname when asked via PR_GetCanonNameFromAddrInfo. This seems to do the trick. Under FC2, resolving www.yahoo.com now results in a single AAAA request followed by a single A request. If PR_GetCanonNameFromAddrInfo is called (normally doesn't happen in Mozilla), then a single PTR request is made for the first IP address in the addrinfo structure. I have only tested this patch under Fedora Core 2. I use getnameinfo to implement the PTR request. The amount of code is minimal, and I stash the result in the ai_canonname field of the first addrinfo struct. I take care to free this myself before passing the addrinfo struct to freeaddrinfo. This solution is better than calling getaddrinfo with AI_CANONNAME since it avoids doing reverse lookups on all returned IP addresses. NSPR's API does not provide the caller with a way to enumerate the reverse lookups, and I don't think we have a need to supply that API.
Assignee | ||
Comment 39•20 years ago
|
||
Comment on attachment 151050 [details] [diff] [review] v1 patch NOTE: with this patch, there is no need for any necko changes. this is because necko doesn't call PR_GetCanonNameFromAddrInfo until its consumer calls GetCanonicalName. that could be a performance issue because it means that we could be blocking the UI thread for reverse DNS lookups, but for now i think that is acceptable. we only use reverse lookups for the webservices security model, which is as far as i can tell pretty much unused at the moment.
Attachment #151050 -
Flags: review?(wchang0222)
Comment 40•20 years ago
|
||
I thought that each host has only one canonical name. You seem to imply that each IP address of a host has a canonical name.
Assignee | ||
Comment 41•20 years ago
|
||
(In reply to comment #40) > I thought that each host has only one canonical name. > You seem to imply that each IP address of a host has > a canonical name. Yeah, I was surprised too. Try doing a reverse lookup on each of the IP addresses returned for www.yahoo.com. You'll find that the result is different for each IP node! Even more interesting is the fact that none of the results for the reverse lookups match the value of h_name returned by gethostbyname! So, my patch is perhaps suboptimal because we will not return the same value for PR_GetCanonNameFromAddrInfo in the getaddrinfo and gethostbyname cases. There doesn't seem to be a way to make getaddrinfo return the same cname that gethostbyname returns.
Comment 42•20 years ago
|
||
(In reply to comment #37) > Apparently, the behavior of getaddrinfo(AI_CANONNAME) is quite intentional on > the part of the GLIBC development team. At least, I was able to get some > information on this from folks at IBM. The word on the street is: "don't pass > AI_CANONNAME if you don't need the reverse lookups." So, that's what we're > going to need to do. That doesn't make much sense. RFC 3493 says: If nodename is not null, and if requested by the AI_CANONNAME flag, the ai_canonname field of the first returned addrinfo structure shall point to a null-terminated string containing the canonical name corresponding to the input nodename; so ai_canonname is only defined for the first addrinfo structure. This seems to suggest that ai_canonname is associated with the host itself and not with one of its IP addresses. Therefore, getting ai_canonname through a reverse lookup seems like the wrong thing to do, since as you observe in comment #41, the reverse DNS entries to multiple IP addresses from the same forward lookup are often (actually, most of the time) different. Thus if you get ai_canonname through a reverse lookup you are going to end up with a random string (since the IP addresses for the forward lookup are usually returned in random order by the DNS server) which usually tells you much less than the simple CNAME would have done in the first place. Do you know where this is discussed, or can you point me to the checkin that changed this behaviour? It might be an idea to take this up with the glibc developers...
Assignee | ||
Comment 43•20 years ago
|
||
Unfortunately, I don't know of any public forum where this was discussed. Your interpretation of the RFC makes sense to me, but unfortunately there were several RFCs covering getaddrinfo, so perhaps GLIBC is implementing an earlier version???
Comment 44•20 years ago
|
||
Grazie, Lorenzo. That's exactly what I remembered RFC 3493 says. When Darin told me that GLIBC does reverse lookups on all returned IP addresses, I thought I misinterpreted RFC 3493. We should ask the GLIBC implementors of getaddrinfo why they did not implement the AI_CANONNAME flag in the way RFC 3493 specifies.
Comment 45•20 years ago
|
||
I have changed the code in glibc to only fill in ai_canonname for the first returned entry. This was something nobody noticed so far even though POSIX explicitly spells this out. This does not eliminate the revserse lookups, but now only one is made.
Assignee | ||
Comment 46•20 years ago
|
||
> I have changed the code in glibc to only fill in ai_canonname for the first
> returned entry.... This does not eliminate the revserse lookups, but
> now only one is made.
ulrich: thanks for making a fix in glibc, but ... i don't understand why even a
single reverse lookup is required. the response to the A (or AAAA) requests
will include the desired CNAME value. a reverse lookup on the first IP address
will not return the correct value (or so it seems from the www.yahoo.com testcase).
Comment 47•20 years ago
|
||
Ulrich, might it not be a good idea to be a bit lax in interpreting POSIX, given the current state of the reverse DNS tree in today's Internet? Reverse lookups quite frequently time out and often the data returned is less useful than that which can be obtained by simply following the CNAME chain (see the end of my comment #42). Wouldn't it be better just to do this? After all, there's getnameinfo() if you want to do reverse lookups.
Comment 48•20 years ago
|
||
*** Bug 248585 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 49•20 years ago
|
||
*** Bug 248828 has been marked as a duplicate of this bug. ***
Assignee | ||
Updated•20 years ago
|
Target Milestone: mozilla1.8alpha2 → mozilla1.8beta
Comment 50•20 years ago
|
||
This LD_PRELOAD library removes the AI_CANONNAME flag from glibc getaddrinfo (without modifying glibc or Firefox/Mozilla binary). This can be used to test with the binary distibuted Firefox/Mozilla.
Assignee | ||
Comment 51•20 years ago
|
||
Someone reported on #firefox that his Comcast DNS server won't do reverse lookups in many cases. It sounds like a problem with Comcast, but as a result he was simply unable to browse many websites using Firefox under Fedora Core 2. That kind of report makes this bug all the more serious I think.
Comment 52•20 years ago
|
||
The only time when reverse lookups are needed is for SSL certificate verification. It should be possible to move the entire reverse lookup procedure from the main browser/mail components to nspr and get rid of this problem with very little ado.
Comment 53•20 years ago
|
||
That is not true. DNS reverse lookups are used everytime a number is displayed as a name. netstat output. The "connected to" prompt when you telnet or ftp. As the internet grew rapidly, the DNS tree became very flat and automatically administered. PTR records became neglected. We have had several bugs in mozilla that would have been solvable by requiring reverse lookups, but we have found that too many domains neglect their care.
Comment 54•20 years ago
|
||
But doesn't that boil down to the same? If you don't need reverse lookups, or if you do need them but can't use them because PTR records are neglected, the end result in both cases is that you shouldn't be doing them. I don't think it's only neglected PTR records though. When three zillion domains and several different services are hosted on the same machine and respond on the same IP, that one lonely PTR record for the IP becomes rather meaningless, or even misleading.
Reporter | ||
Comment 55•20 years ago
|
||
"Proper DNS structure" should be that there is one and only one forward, one and only one reverse and everything else is CNAMES. However, it is certainly a fact that "proper DNS structure" doesn't actually exist on the internet. I would think we should avoid using reverse pointers except where absolutely required so that we meet the reality rather than to aspire to get everyone else on the planet to come up to our level :)
Assignee | ||
Comment 56•20 years ago
|
||
Keep in mind that we aren't asking the OS to do a PTR request. We're asking for the CNAME, which is returned from a forward lookup. The OS should not need to do a PTR request to acquire the CNAME for a given host. Unfortunately, it seems like we are in a position where we either have to live with the implementation on Linux or we have to hack around it. Our APIs between modules within Mozilla require us to provide the CNAME if asked. We can't change the NSPR API because it is frozen, so we have to look for a way to support CNAME queries. What to do? Hmm...
Comment 57•20 years ago
|
||
Updated to compile on newer gcc.
Attachment #154282 -
Attachment is obsolete: true
Comment 58•20 years ago
|
||
Has anyone tried to contact the glibc developers on this one? I think strong arguments can be made for reverting to the previous behaviour. Fist off, it's a real showstopper for IPv6 adoption. We can hack around it in Mozilla, and we can hack around it in other apps, but how many apps do we have to modify? Also, the change also seems to remove the only way to retrieve a CNAME from the DNS using the system resolver (as opposed to implementing your own DNS queries, which is a major pain). And are there any advantages? If this change was only made to comply with POSIX, then why not disregard that part of the standard and implement what makes more sense? It's not as if Linux hasn't done that before... Could anyone take this up with the glibc developers? I can try posting to glibc-alpha, but I don't have high hopes that will result in anything...
Comment 59•20 years ago
|
||
What previous behavior? You seem not to understand at all what this is about. The getaddrinfo implementation is correct. It is not glibc's fault if the user uses flags which cause unnecessary additional work. Simply introduce a way to you the nspr to query without the reverse lookup by not adding AI_CANONNAME. There is no need to bring this up on the glibc lists, there won't be any change since people using this flag actually want the behavior. Fix nspr and mozilla.
Comment 60•20 years ago
|
||
(In reply to comment #59) > What previous behavior? You seem not to understand at all what this is about. Ok, so this is what I understand: - In previous glibc code AI_CANONNAME caused getaddrinfo to return the name at the end of the CNAME chain as the canonical name - In current glibc code AI_CANONNAME causes getaddrinfo to obtain the canonical name through PTR queries for the IP address returned Is this correct? If so, the current code has the following disadvantages: 1. It will time out and fail on broken DNS servers. nytimes.com is just one example. Look at http://www.ripe.net/ripe/meetings/ripe-48/presentations/ripe48-dns-malone.pdf and note the substantial percentage of broken servers that will answer A queries ONLY (and thus no PTR queries). There is a substantial number of sites out there which have this problem, which is why we implemented per-domain disabling of IPv6 lookups in bug 68796. 2. It will probably provide lower quality information. CNAME chains are typically more accurate than PTR records for the simple reason that the vast majority of applications uses forward lookups only, and if the CNAME chain breaks, the apps break. PTR records are often synthetic, generic or even nonexistent, and nothing really depends on them. 3. It is not clear (to me at least) which IP address the reverse lookup is performed on if there are multiple addresses for one name (of course there is only one, since the RFC specifies that ai_canonname is only defined for the first element of the result list). Is the PTR lookup done on the first IP address returned? That would mean it is done on one of the host's IPs at random, given most nameserver implementations. How can this be meaningful? 4. It removes the only way the app had of finding the name at the end of the CNAME chain (short of implementing its own DNS lookups). Reverse lookups can always be done with getnameinfo, but getting the name at the end of the CNAME chain cannot be done using the resolver API any more. > The getaddrinfo implementation is correct. It is not glibc's fault if the > user uses flags which cause unnecessary additional work. It's correct according to POSIX (the RFC is not clear on this), but might it not be better to fix these issues rather than be correct according to POSIX? > Simply introduce a way to you the nspr to query without the reverse lookup by > not adding AI_CANONNAME. It's not that we don't want to do this in mozilla... it just seems to me that the change in glibc's behaviour, although made in the interests of POSIX compliance, was a change for the worse and thus might be reconsidered. Which of course would benefit not just mozilla but all apps that use AI_CANONNAME, of which there are probably a lot of (after all, in all other implementations of getaddrinfo, including old glibc implementations, AI_CANONNAME was "free"). > There is no need to bring this up on the glibc lists, there won't be any > change since people using this flag actually want the behavior. Are you sure? Has this issue been brought up before? How can PTR lookups be better than following the CNAME chain?
Comment 61•20 years ago
|
||
Don't you realize how incredibly stupid all this is? "We use this function with a flag. It does what it is supposed to do but it takes a while and we actually don't want the extra information. We could remove the flag but we don't want this. So, change the C library to match what we want. This will effect all kinds of other programs but this certainly is not as important as not changing the mozilla sources." I can explain this only by somebody having been out in the sun for too long. Fix the code. Don't use the flag if you don't want to pay the price to get the correct behavior.
Assignee | ||
Comment 62•20 years ago
|
||
> Don't you realize how incredibly stupid all this is? Please don't be silly. We do need the CNAME in some cases, and the expectation we had was that the CNAME was given to us for free (as Lorenzo explained). So, we chose to ask for it unconditionally. With gethostbyname and previous versions of getaddrinfo, it truly was free to ask for the CNAME. Ok, I accept that glibc has changed its implementation. It's unfortunate that binary compatibility is so poorly regarded, but I can understand your argument about choosing to implement a standard. It's unfortunate that other platforms do not agree with glibc's implementation, but OK... I get it... they are simply behind the times. Mozilla will of course change to not request the CNAME. The distinction between getaddrinfo for forward lookups and getnameinfo for reverse lookups seemed like a clean distinction to me. It seems odd to hide the CNAME value returned from an A or AAAA DNS query, but again... I yield to the standards argument. Oh well :-/ > I can explain this only by somebody having been out in the sun for too long. Thanks for the warm fuzzies... much appreciated.
Assignee | ||
Comment 63•20 years ago
|
||
This patch introduces PR_AI_NOCANONNAME for PR_GetAddrInfoByName. When this flag is not given, the code attempts to resolve the CNAME for the given hostname. It does not rely on using AI_CANONNAME with the OS's getaddrinfo because it's clear that platforms do not agree on the meaning of that flag. For Mozilla's purposes, we can of adjust the meaning of the CNAME. This patch assumes rightly or wrongly that a reverse lookup on the first returned IP address will be OK for Mozilla on all platforms in all cases. I'm looking for consistency first and foremost. Our CNAME support only applies to the webservices security model and service principal name creation on Windows for SPNEGO authentication. I think reverse lookup on the first returned IP address will be OK for those applications.
Attachment #151050 -
Attachment is obsolete: true
Assignee | ||
Comment 64•20 years ago
|
||
Here's the corresponding changes for the rest of the mozilla code base.
Assignee | ||
Comment 65•20 years ago
|
||
I have only tested these patches on Linux (FC2). They may not compile or work correctly on other platforms. I plan to do more testing next week, but help testing is of course welcome ;-)
Assignee | ||
Updated•20 years ago
|
Attachment #151050 -
Flags: review?(wchang0222)
Assignee | ||
Comment 66•20 years ago
|
||
Lorenzo: WTC says he probably won't be able to look at these patches until the 17th or so since he is on vacation. If you have time to review them that would be great!
Comment 67•20 years ago
|
||
(In reply to comment #66) > Lorenzo: WTC says he probably won't be able to look at these patches until the > 17th or so since he is on vacation. If you have time to review them that would > be great! Sure, but I won't be able to look at them before the 9th or 10th or so when I get back home...
Assignee | ||
Updated•20 years ago
|
Attachment #155383 -
Flags: review?(lorenzo)
Assignee | ||
Updated•20 years ago
|
Attachment #155384 -
Flags: review?(lorenzo)
Assignee | ||
Updated•20 years ago
|
Keywords: helpwanted
Whiteboard: [patch]
Comment 68•20 years ago
|
||
After consultation with other people in the IPv6 community I decided to take this up on libc-alpha, because I think glibc's behaviour is incorrect. It has the disadvantages I list in comment #60, it is arguably incorrect according to POSIX, and it is done by no other implementation of getaddrinfo that I know of. See http://sources.redhat.com/ml/libc-alpha/2004-08/msg00080.html
Comment 69•20 years ago
|
||
Comment on attachment 155383 [details] [diff] [review] v2 patch - nspr part >--- pr/src/misc/prnetdb.c 28 Apr 2004 00:34:07 -0000 3.21.2.22 >+++ pr/src/misc/prnetdb.c 6 Aug 2004 21:00:43 -0000 [...] >+/* >+ * This flag prevents getnameinfo from returning numeric addresses. >+ */ >+#ifndef NI_NAMEREQD >+#define NI_NAMEREQD 8 >+#endif Hmm... you call the system implementation of getnameinfo with the NI_NAMEREQD flag regardless of whether the system headers define it. But if NI_NAMEREQD is not defined in the system headers, then probably the system implementation of getnameinfo does not support it, so it's meaningless to use it. So wouldn't it be better to lose the #ifdef here and just get a compile error if NI_NAMEREQD is not defined? >+static char *_pr_strdup(const char *s) >+{ >+ char *rv; >+ int len = strlen(s); [...] Is this really needed here? :( > static PRAddrInfo * > pr_GetAddrInfoByNameFB(const char *hostname, > PRUint16 af, > PRIntn flags) > { > PRStatus rv; > PRAddrInfoFB *ai; >- /* fallback on PR_GetHostByName */ >+ >+ /* fallback to PR_GetHostByName */ > ai = PR_NEW(PRAddrInfoFB); > if (!ai) { > PR_SetError(PR_OUT_OF_MEMORY_ERROR, 0); > return NULL; > } >+ ai->have_hostent = PR_FALSE; >+ >+ /* >+ * test if input is an IP address literal. if so, then avoid calling >+ * PR_GetHostByName since on some platforms (Win9x), it cannot handle >+ * IP address literals. >+ */ >+ rv = PR_StringToNetAddr(hostname, &ai->addr); >+ if (rv == PR_SUCCESS) { >+ if (!(flags & PR_AI_NOCANONNAME)) { >+ /* >+ * use PR_GetHostByAddr to fetch CNAME >+ */ >+ rv = PR_GetHostByAddr(&ai->addr, ai->buf, sizeof ai->buf, &ai->hostent); >+ if (rv == PR_SUCCESS) { >+ ai->have_hostent = PR_TRUE; >+ } else { >+ /* >+ * reverse lookup failures are not fatal. use the supplied >+ * IP address literal instead. >+ */ >+ ai->hostent.h_name = _pr_strdup(hostname); >+ if (!ai->hostent.h_name) { >+ PR_Free(ai); >+ return NULL; >+ } >+ } >+ ai->have_cname = PR_TRUE; >+ } >+ return (PRAddrInfo *) ai; >+ } >+ > rv = PR_GetHostByName(hostname, ai->buf, sizeof ai->buf, &ai->hostent); > if (rv == PR_FAILURE) { > PR_Free(ai); > return NULL; > } >+ ai->have_hostent = PR_TRUE; >+ >+ if (!(flags & PR_AI_NOCANONNAME)) { >+ char he_buf[PR_NETDB_BUF_SIZE]; >+ PRHostEnt he; >+ PRNetAddr addr; >+ >+ memset(&addr, 0, sizeof(addr)); >+ addr.inet.family = PR_AF_INET; >+ addr.inet.ip = *(PRUint32 *) ai->hostent.h_addr_list[0]; >+ >+ /* >+ * use PR_GetHostByAddr to fetch CNAME based on first returned IP >+ * address. this is done so that we are consistent with our impl >+ * that uses getaddrinfo and getnameinfo. >+ * >+ * if the reverse lookup fails, then we'll use the given hostname as >+ * the CNAME disregarding the value of h_name returned from >+ * PR_GetHostByName. this is again done to be consistent with our >+ * impl that uses getaddrinfo and getnameinfo. >+ */ I don't understand why we need to be consistent here. This is a hack, right? If you just picked the hostname returned by gethostbyname, the win9x hack above would go away too. > PR_IMPLEMENT(PRAddrInfo *) PR_GetAddrInfoByName(const char *hostname, > PRUint16 af, > PRIntn flags) > { [...] >+ /* >+ * we do not pass AI_CANONNAME at this point because it is known to >+ * result in reverse lookups for each returned IP address on some >+ * platforms. instead, we'll perform an explicit reverse lookup on the >+ * first returned IP address to resolve the CNAME if requested. >+ * >+ * unfortunately, we are constrained by the fact that AI_CANONNAME is >+ * not implemented uniformly across platforms. see bug 239358 for >+ * further info regarding this implementation. > */ > > memset(&hints, 0, sizeof(hints)); >- hints.ai_flags = AI_CANONNAME; >+ hints.ai_flags = 0; [...] > rv = GETADDRINFO(hostname, NULL, &hints, &res); >- if (rv == 0) >+ if (rv == 0) { >+ if (!(flags & PR_AI_NOCANONNAME)) { >+ /* >+ * do a reverse lookup on the first returned IP address to >+ * determine the CNAME for this host. >+ * >+ * NOTE: this may give bogus results in some cases. >+ */ Hmm... if I understand correctly, this means that we adopt the questionable glibc behaviour of doing PTR requests on all platforms. Wouldn't it be better to wrap this in an #ifdef GLIBC or something so at least other platforms don't take the hit? > PR_IMPLEMENT(void) PR_FreeAddrInfo(PRAddrInfo *ai) > { [...] >+#if !defined(_PR_HAVE_GETADDRINFO) || defined(_PR_INET6_PROBE) >+ PRAddrInfoFB *fb = (PRAddrInfoFB *) ai; >+ /* >+ * h_name field is malloc'd separately if not pointing to an address >+ * within buf. >+ */ >+ if (fb->have_cname && >+ (fb->hostent.h_name < fb->buf || >+ fb->hostent.h_name >= (fb->buf + sizeof fb->buf))) { >+ PR_Free(fb->hostent.h_name); 1. Wow, it would never have occurred to me that there might be a leak there! 2. Bah, how is it that "sizeof fb->buf" compiles but "sizeof char" doesn't compile?!? :)
Assignee | ||
Comment 70•20 years ago
|
||
thanks for looking over the patch... from a security point of view (think our web services security model), it's important that we have a consistent way of resolving CNAMEs on all platforms. that said, i'm not crazy about this patch. it is a real shame to even consider this much extra code for something the OS should provide. but, given that there is no consistency in the implementation of getaddrinfo(AI_CANONNAME), what are we to do other than what i have proposed here? today's web services security model is somewhat broken (i.e., not consistent across all platforms) as a result of GLIBC's new behavior, and that's not acceptable for Mozilla.
Comment 71•20 years ago
|
||
See also this bug in glibc bugzilla: http://sources.redhat.com/bugzilla/show_bug.cgi?id=304 which reports the same issue and contains a patch (but not written by the glibc developers).
Comment 72•20 years ago
|
||
This seems to be fixed in glibc CVS: the source I pulled from CVS head today does not do PTR lookups, but if I revert to getaddrinfo.c v1.63 (5 august) it does. So maybe we should put these patches on hold...
Comment 73•20 years ago
|
||
> So maybe we should put these patches on hold...
No. It is wrong to look for the canonical name if it is not needed. There is
always a cost associated with the lookup. Just fix that damned function and
punish the original author.
Assignee | ||
Comment 74•20 years ago
|
||
Comment on attachment 155383 [details] [diff] [review] v2 patch - nspr part ok, let's not do it this way. let's assume that the OS returns the right thing when AI_CANONNAME is passed. this may break some things like the webservices security model, but at least that breakage will be confined to a relatively small set of operating systems.
Attachment #155383 -
Attachment is obsolete: true
Attachment #155383 -
Flags: review?(lorenzo)
Assignee | ||
Updated•20 years ago
|
Attachment #155384 -
Flags: review?(lorenzo)
Assignee | ||
Comment 75•20 years ago
|
||
Comment on attachment 155384 [details] [diff] [review] v2 patch - other part This part of the v2 patch is still good. We still want PR_AI_NOCANONNAME, but we should implement it differently (i.e., in terms of getaddrinfo(AI_CANONNAME)). I won't have a chance to work on that patch until next week.
Comment 76•20 years ago
|
||
Darin, did you mean something like this?
Comment 77•20 years ago
|
||
Darin, this is your v2 patch with a few small changes. I still have a nagging doubt though: what if someone starts a lookup with AI_NOCANONNAME and then someone else (web services?) starts a new lookup for the same host without AI_NOCANONNAME while the first one is still in progress? If I understand the code right, it will get back a result without a CNAME, which is not what it wants...
Comment 78•20 years ago
|
||
Comment on attachment 156347 [details] [diff] [review] v3 patch (rest of mozilla) As can be seen using the handy diff-between-patches feature: http://bugzilla.mozilla.org/attachment.cgi?oldid=155384&action=interdiff&newid= 156347&headers=1 I only slightly changed the numeric address handling and a couple of comments.
Comment 79•20 years ago
|
||
Comment on attachment 156346 [details] [diff] [review] v3 patch (nspr part) wtc, could you look at this?
Attachment #156346 -
Flags: review?(wchang0222)
Comment 80•20 years ago
|
||
Same patch as before, minus typos.
Attachment #156347 -
Attachment is obsolete: true
Comment 81•20 years ago
|
||
Comment on attachment 156346 [details] [diff] [review] v3 patch (nspr part) Sorry about the late reply. I just came back from a three week vacation. >+** PRIntn flags Must include PR_AI_ADDRCONFIG. PR_AI_NOCANONNAME >+** can be passed to suppress CNAME resolution. Please change "CNAME resolution" to "the determination of the canonical name corresponding to hostname". The reason is that "CNAME" is specific to DNS. My suggested wording comes from RFC 3493. It seems that the comment for 'flags' should say "Must be either PR_AI_ADDRCONFIG or PR_AI_ADDRCONFIG|PR_AI_NOCANONNAME. Include PR_AI_NOCANONNAME to suppress the determination of the canonical name corresponding to hostname." I have a related comment below. > typedef struct PRAddrInfo PRAddrInfo; > >+#define PR_AI_NOCANONNAME 0x8000 >+ > NSPR_API(PRAddrInfo*) PR_GetAddrInfoByName( > const char *hostname, PRUint16 af, PRIntn flags); I'm wondering if we should put the definition of PR_AI_NOCANONNAME next to the definitions of the other PR_AI_XXX macros. > /* restrict input to supported values */ >- if ((af != PR_AF_INET && af != PR_AF_UNSPEC) || flags != PR_AI_ADDRCONFIG) { >+ if ((af != PR_AF_INET && af != PR_AF_UNSPEC) || !(flags & PR_AI_ADDRCONFIG) ) { > PR_SetError(PR_INVALID_ARGUMENT_ERROR, 0); > return NULL; > } It seems that the right change is to allow only two possible values of flags: PR_AI_ADDRCONFIG PR_AI_ADDRCONFIG|PR_AI_NOCANONNAME The (flags & PR_AI_ADDRCONFIG) test allows more than those two values > #if defined(_PR_HAVE_GETADDRINFO) > #if defined(_PR_INET6_PROBE) > if (!_pr_ipv6_is_present) >- return ((const PRAddrInfoFB *) ai)->hostent.h_name; > #endif >- return ((const PRADDRINFO *) ai)->ai_canonname; >-#else >- return ((const PRAddrInfoFB *) ai)->hostent.h_name; >+ { >+ return ((const PRADDRINFO *) ai)->ai_canonname; >+ } >+#endif >+ >+#if !defined(_PR_HAVE_GETADDRINFO) || defined(_PR_INET6_PROBE) >+ const PRAddrInfoFB *fb = (const PRAddrInfoFB *) ai; >+ return fb->have_cname ? fb->hostent.h_name : NULL; > #endif > } I prefer to see you keep the original #if structure. I want to avoid nesting #if with if, which is very hard to read. Also, the above code is not legal C code because the declaration of local variable 'fb' cannot be in the middle of the function.
Comment 82•20 years ago
|
||
Attachment #156346 -
Attachment is obsolete: true
Comment 83•20 years ago
|
||
Same patch as above, but fixes comment of flags parameter to PR_GetAddrInfoByName.
Attachment #156502 -
Attachment is obsolete: true
Comment 84•20 years ago
|
||
Sorry for all the confusion. Please ignore the other patches I just posted. :)
Updated•20 years ago
|
Attachment #156503 -
Attachment is obsolete: true
Updated•20 years ago
|
Attachment #156346 -
Flags: review?(wchang0222)
Comment 85•20 years ago
|
||
Comment on attachment 156505 [details] [diff] [review] yet another nspr patch wtc, is this better?
Attachment #156505 -
Flags: review?(wchang0222)
Comment 86•20 years ago
|
||
Comment on attachment 156505 [details] [diff] [review] yet another nspr patch r=wtc. > PR_IMPLEMENT(const char *) PR_GetCanonNameFromAddrInfo(const PRAddrInfo *ai) > { >+ const PRAddrInfoFB *fb = (const PRAddrInfoFB *) ai; >+ > #if defined(_PR_HAVE_GETADDRINFO) > #if defined(_PR_INET6_PROBE) > if (!_pr_ipv6_is_present) >- return ((const PRAddrInfoFB *) ai)->hostent.h_name; >+ return fb->has_cname ? fb->hostent.h_name : NULL; > #endif > return ((const PRADDRINFO *) ai)->ai_canonname; > #else >- return ((const PRAddrInfoFB *) ai)->hostent.h_name; >+ return fb->has_cname ? fb->hostent.h_name : NULL; > #endif > } This will cause an unused local variable compiler warning in the case that doesn't use 'fb'.
Attachment #156505 -
Flags: superreview?(darin)
Attachment #156505 -
Flags: review?(wchang0222)
Attachment #156505 -
Flags: review+
Assignee | ||
Comment 87•20 years ago
|
||
Comment on attachment 156505 [details] [diff] [review] yet another nspr patch >Index: nsprpub/pr/include/prnetdb.h > #define PR_AI_ALL 0x08 > #define PR_AI_V4MAPPED 0x10 > #define PR_AI_ADDRCONFIG 0x20 >+#define PR_AI_NOCANONNAME 0x8000 > #define PR_AI_DEFAULT (PR_AI_V4MAPPED | PR_AI_ADDRCONFIG) might be nice to make all the values line up in a column. >+ ai->has_cname = (flags & PR_AI_NOCANONNAME) ? PR_FALSE : PR_TRUE; just use a logical not operator? or maybe the compiler would generate the same code when optimized. ai->has_cname = !(flags & PR_AI_NOCANONNAME); sr=darin
Attachment #156505 -
Flags: superreview?(darin) → superreview+
Assignee | ||
Comment 88•20 years ago
|
||
Here's the final patch for NSPR that I landed on the trunk and NSPRPUB_PRE_4_2_CLIENT_BRANCH.
Attachment #156505 -
Attachment is obsolete: true
Assignee | ||
Comment 89•20 years ago
|
||
(In reply to comment #77) > I still have a nagging doubt though: what if someone starts a lookup with > AI_NOCANONNAME and then someone else (web services?) starts a new lookup for > the same host without AI_NOCANONNAME while the first one is still in progress? > > If I understand the code right, it will get back a result without a CNAME, > which is not what it wants... you are correct. this is a serious bug in the current patch. i'm looking at the code now to see about fixing that.
Assignee | ||
Comment 90•20 years ago
|
||
lorenzo: i think the solution is to use a key for the dns cache that is composed of the hostname being queried, the presence of the RES_CANON_NAME flag, and probably also the address family being queried.
Assignee | ||
Comment 91•20 years ago
|
||
This patch implements my proposal to use a hash key that is composed of the hostname, flags, and af parameters passed to ResolveHost.
Attachment #155384 -
Attachment is obsolete: true
Attachment #156370 -
Attachment is obsolete: true
Assignee | ||
Updated•20 years ago
|
Attachment #157588 -
Flags: review?(lorenzo)
Comment 92•20 years ago
|
||
Comment on attachment 157588 [details] [diff] [review] v4 patch (rest of mozilla) A couple of nits: >Index: extensions/negotiateauth/nsNegotiateAuthSSPI.cpp >=================================================================== >+ rv = dns->Resolve(Substring(buf, index + 1), >+ nsIDNSService::RESOLVE_CANONICAL_NAME, Does this need CNAME resolution? (I know nothing about negotiateauth, so this is almost certainly right, but still) >Index: netwerk/dns/public/nsIDNSService.idl >=================================================================== >@@ -60,47 +60,64 @@ interface nsIDNSService : nsISupports >[...] > nsIDNSRequest asyncResolve(in AUTF8String aHostName, >- in boolean aBypassCache, >+ in unsigned long aFlags, > in nsIDNSListener aListener, > in nsIEventQueue aListenerEventQ); > > /** >- * called to synchronously resolve a hostname. >+ * called to synchronously resolve a hostname. warning this method may >+ * block the calling thread for a length period of time. it is extremely s/length/long >Index: netwerk/dns/src/nsDNSService2.cpp >=================================================================== >@@ -1,10 +1,11 @@ >+/* vim:set ts=4 sw=4 sts=4 et cin: */ >Index: netwerk/dns/src/nsHostResolver.cpp >=================================================================== >@@ -1,10 +1,11 @@ >+/* vim:set ts=4 sw=4 sts=4 et cin: */ Should these be here? > [...] >+// this macro filters out any flags that are not used when constructing >+// the host key. the significant flags are those that would affect the >+// result of the host resolution. >+#define RES_KEY_FLAGS(_f) ((_f) & nsHostResolver::RES_CANON_NAME) Hmm... it's not really that the flags influence host resolution, since the bypass cache flag also affects it. Maybe you could say "the significant flags are those that would be passed to the system resolver"? Although that's not very good either. >Index: netwerk/dns/src/nsHostResolver.h >=================================================================== >+/* vim:set ts=4 sw=4 sts=4 et cin: */ Ditto Aside from that, looks good! r=lorenzo
Attachment #157588 -
Flags: review?(lorenzo) → review+
Assignee | ||
Comment 93•20 years ago
|
||
> Does this need CNAME resolution? > (I know nothing about negotiateauth, so this is almost certainly right, but > still) Yeah, under windows the service principal name is formed from the canonical hostname. At least, this is what IE does, so we try to do the same. > >+/* vim:set ts=4 sw=4 sts=4 et cin: */ Yeah, I like to add vim control lines so that folks don't have to change their global editor settings as they jump around to different files in the mozilla source. You'll see similar stuff in other files around the tree. > Hmm... it's not really that the flags influence host resolution, since the yeah, good point. i'll come up with something better.
Assignee | ||
Updated•20 years ago
|
Attachment #157627 -
Flags: superreview?(bryner)
Updated•20 years ago
|
Attachment #157627 -
Flags: superreview?(bryner) → superreview+
Assignee | ||
Comment 95•20 years ago
|
||
fixed-on-trunk for mozilla 1.8a4
Status: ASSIGNED → RESOLVED
Closed: 20 years ago → 20 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 96•20 years ago
|
||
Comment on attachment 157627 [details] [diff] [review] v4.1 patch (rest of mozilla) this patch solves a significant performance problem on newer linux distros. while the patch is large and involved, it might be worth considering for the aviary branch at least, and maybe the 1.7 branch as well.
Attachment #157627 -
Flags: approval1.7.x?
Attachment #157627 -
Flags: approval-aviary?
Comment 97•20 years ago
|
||
Comment on attachment 157627 [details] [diff] [review] v4.1 patch (rest of mozilla) after some discussion a=chofmann,brendan for the branches
Attachment #157627 -
Flags: approval1.7.x?
Attachment #157627 -
Flags: approval1.7.x+
Attachment #157627 -
Flags: approval-aviary?
Attachment #157627 -
Flags: approval-aviary+
Assignee | ||
Comment 98•20 years ago
|
||
Comment 101•20 years ago
|
||
I just got a c:/work/mozilla_source/1.7/mozilla/netwerk/dns/src/nsHostResolver.cpp(635) : error C2065: 'PR_AI_NOCANONNAME' : undeclared identifier building on WinXp. Do I need to clean and rebuild or is this a problem?
Assignee | ||
Comment 102•20 years ago
|
||
bc: make sure you update and rebuild nsprpub. you shouldn't have to clobber anything. all of the tinderboxen cleared.
Updated•20 years ago
|
Whiteboard: [patch] → [patch] [lin14523]
You need to log in
before you can comment on or make changes to this bug.
Description
•