Closed Bug 237084 Opened 21 years ago Closed 21 years ago

PR_GetAddrInfoByName is serialized on FreeBSD [was: DNS lookups not done in parallel as per resolved bug 70213]

Categories

(NSPR :: NSPR, defect)

x86
FreeBSD
defect
Not set
major

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: tmclaugh, Assigned: wtc)

References

()

Details

Attachments

(1 file)

User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.6) Gecko/20040307 Firefox/0.8 Build Identifier: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.6) Gecko/20040307 Firefox/0.8 Originally filed this entry as an additional comment to bug #70213 which is long closed as being fixed. Haven't seen any activity on that bug so I am filing a new one so the issue does not get lost. Bug #70213 which was opened to have DNS queries parallelized instead of them being serialized does not appear to be fixed on at least FreeBSD 4.9. First noticed this issue with firefox 0.8 so I tested mozilla 1.6 and 1.7a and mozilla still appears to have the same problems. I ran a packet capture with ethereal while trying to load www.allmusic.com and it shows multiple type AAAA queries being made that ultimately receive a "Server failed" response. Eventually a type A request is made which receives back a good response. This site takes on average almost 4 minutes to load in its entirety. I also found that if I hit stop and then try to go to another site mozilla will continue to try and query the old site for a period of time before trying the new site. (May be same issue as bug #188332) Mozilla should be making DNS queries in parallel but they are still serialized. A few notes about recreating this on FreeBSD. The resolver library in the -CURRENT branch is now different from the one in 4.9 and the -STABLE branch. I have not had a chance to test this bug on -CURRENT. Also, INET6 being enabled in the kernel (default) will cause the problem to be more evident. Reproducible: Always Steps to Reproduce: 1. Make sure INET6 is enables in the FreeBSD kernel (This does not cause the problem but instead makes it more apparent) 2. Open mozilla to www.allmusic.com 3. Packet trace the session Actual Results: Mozilla will make DNS queries in serial instead of in parallel as they should be according to bug 70213. When a FreeBSD kernel has INET6 enabled it will make type AAAA queries until it receives a server failure response at which point it will make a type A query. Expected Results: Mozilla should make parallel DNS queries so that the stalled type AAAA query response does not prevent the type A query from being made.
Assignee: general → darin
Component: Browser-General → Networking
QA Contact: general → benc
you can set the pref "network.dns.disableIPv6" if you want to disable IPv6 DNS queries manually. see bug 68796 for more details. otherwise, i suspect this problem is caused by the way PR_GetAddrInfoByName is implemented in NSPR for FreeBSD. -> reassigning to NSPR in case there is something we can do to ensure that PR_GetAddrInfoByName calls are not serialized.
Assignee: darin → wchang0222
Component: Networking → NSPR
Product: Browser → NSPR
QA Contact: benc → wchang0222
Summary: DNS lookups not done in parallel as per resolved bug 70213 → PR_GetAddrInfoByName is serialized on FreeBSD [was: DNS lookups not done in parallel as per resolved bug 70213]
Version: Trunk → 4.2
a quick inspection of nsprpub/pr/include/md/_freebsd.h shows that we won't call getaddrinfo if the freebsd build system is not recent enough. there's this check: #if __FreeBSD_version >= 400014 and if this does not pass, then PR_GetAddrInfoByName will be implemented using gethostbyname, and the global DNS lock will be used to force these calls to be serialized.
Reporter, are you compiling Mozilla yourself? If so, can you discover what __FreeBSD_version is defined to? Just change a few lines in mozilla/nsprpub/pr/include/md/_freebsd.h around line 93 like this: #if __FreeBSD_version >= 400014 +#error FreeBSD is too old #define _PR_INET6 #define _PR_HAVE_INET_NTOP #define _PR_HAVE_GETHOSTBYNAME2 #define _PR_HAVE_GETADDRINFO #define _PR_INET6_PROBE -#endif +#else +#error FreeBSD is too new! +#endif When you compile on FreeBSD, you should hit one of the two.
Ok, I swapped "old" with "new". :) But the principle should work. I don't think this is the problem, since that check will probably always be true on any semi-recent version of FreeBSD, but we need to rule that out.
(In reply to comment #3) My os version is 490000. I did the edit to mozilla/nsprpub/pr/include/md/_freebsd.h just to be sure and the build failed for the version being too new. I am compiling mozilla out of the FreeBSD ports collection so potentially a patch there could be causing problems. My quick glance through the patches show them to be primarilly architecture related patches though and not relating to my system's architecture. I have already been in contact with a mozilla port maintainer and will continue to do so. Thanks.
Sorry, I completely misunderstood this. I thought you were referring to parallelizing DNS lookups for different host names. Mozilla does do this, as per bug 70213. Rewind to original report: Tom, if I understand you correctly, your bug report is that Mozilla is trying AAAA queries first and only trying A queries after that, whereas it should try them in parallel. This behaviour is by design for two reasons (see bug 68796 for most of the discussion on this). First, mozilla (rightly) lets the OS's getaddrinfo() call decide what to do instead of taking name resolution into its own hands. This is what should be done if IPv6 support is desired. See, for example, Itojun's document "Implementing AF-independent application", which is _the_ reference in IPv6 socket programming: http://www.kame.net/newsletter/19980604/ . Second, issuing two parallel name lookups for two different address families would probably lead to non-deterministic behaviour, with IPv6 or IPv4 being used depending on which lookup completed first, which can lead to many headaches. See, for example, bug 68796 comment 30. allmusic.com is a problem since their DNS server (like, for example, doubleclick's) is broken and times out when queried for an AAAA record. This hits FreeBSD and Mac OS X hard since on these OS's the resolver always queries for IPv6 addresses even if the machine has no IPv6 connection, causing delays even for people who don't have IPv6. This does not happen on Linux or Windows, which only do IPv6 queries if IPv6 is turned on (which doesn't require a recompile of the kernel). I have tried to work around this by using the AI_ADDRCONFIG flag, which is the proper mechanism as per RFC 3493, but at least on Mac OS 10.3 (synced to FreeBSD 5.0?) the flag has no effect. Apart from pestering the people who run these buggy DNS servers, the only sensible way to fix this is by making the OS resolver be a bit smarter on when to issue AAAA lookups and/or act on the AI_ADDRCONFIG getaddrinfo flag. Apps should never have to worry about this. However, since this hits people hard, you can work around it in mozilla by using the prefs added in bug 68796. Set network.dns.disableIPv6 to true to disable IPv6 lookups completely or set network.dns.ipv4OnlyDomains to ".doubleclick.net,.allmusic.com" to disable IPv6 name lookups for these sites. Resolving WONTFIX, since this should be taken care of through evangelism or in the OS, and there is a workaround.
Status: UNCONFIRMED → RESOLVED
Closed: 21 years ago
Resolution: --- → WONTFIX
Lorenzo, I read your entry regarding the problems with the FreeBSD DNS resolver and IPv6 so I will try to take those up with someone related to FreeBSD. But, I have another example that I tested and it appears the DNS queries are still being serialized. Steps to Reproduce: 1. Open two tabs in mozilla, one for www.allmusic.com and one for another site. 2. Go to www.allmusic.com first and go to the second site next. I ran a packet capture while doing this and no DNS query was made for the second site until the type A DNS query for www.allmusic.com returned a response. My reading of bug 70213 leads me to belive that while DNS queries for www.allmusic.com are being made, DNS queries for the seond site should be made as well. This is to prevent one page from preventing all other pages from being loaded. Also, if I go to www.allmusic.com in mozilla and decide to go to someplace else instead before All Music has loaded (usuauly because I got tired of waiting :), it is actually quicker to exit and restart mozilla and go to the other page than it is to wait for mozilla to stop querting All Music and finally query the new site.
Status: RESOLVED → UNCONFIRMED
Resolution: WONTFIX → ---
(In reply to comment #7) > I ran a packet capture while doing this and no DNS query was made for the > second site until the type A DNS query for www.allmusic.com returned a > response. My reading of bug 70213 leads me to belive that while DNS queries > for www.allmusic.com are being made, DNS queries for the seond site should be > made as well. Your reading of bug 70213 is correct to my knowledge. But I'm still not sure that it's mozilla's fault. I vaguely seem to recall once browsing through the FreeBSD code to getaddrinfo and seeing a comment to the effect that it is not thread safe. If that is the case, maybe there's some locking going on in the resolver and the lookups arebeing done one after another even if mozilla is issuing them in parallel on different threads (in which case, there's not much that mozilla can do). Is there a FreeBSD equivalent of Linux's strace/ltrace? If so, can you check whether mozilla is waiting for the response to the getaddrinfo on www.allmusic.com before making the new getaddrinfo call for the other site, or whether it's issuing them in parallel? Or maybe we could write a test program to issue two getaddrinfo calls on different threads and see if they block each other.
Attached file test program
The attached C program starts a name lookup for one host on one thread and after one second creates another name lookup on another thread. It then waits for the lookups to complete and prints out the time taken for each one. To compile, run "gcc -pthread -o test-thread test-thread.c". On a Linux 2.4 system with glibc 2.2, I get: $ ./test-thread www.allmusic.com www.mozilla.org Resolving www.allmusic.com Resolving www.mozilla.org (1 ms) www.mozilla.org (h-207-126-111-202-mozilla.sv.meer.net) 207.126.111.202 (56232 ms) www.allmusic.com (64.152.71.2) 64.152.71.2 (56232 ms) www.allmusic.com (unknown.Level3.net) 166.90.203.130 But on FreeBSD 4.6-RELEASE, I get: $ ./test-thread www.allmusic.com www.mozilla.org Resolving www.allmusic.com Resolving www.mozilla.org (59350 ms) www.mozilla.org (rheet.mozilla.org) 207.126.111.202 (125492 ms) www.allmusic.com (www.allmusic.com) 166.90.203.130 (125492 ms) www.allmusic.com ((null)) 64.152.71.2 which is pretty bad. It seems that short of forking off a separate process, there is nothing mozilla can do to fix this on FreeBSD. :(
there once was a dns helper app in netwerk: http://bonsai.mozilla.org/rview.cgi?cvsroot=/cvsroot&dir=mozilla/netwerk/dns/daemon/Attic&module=default&rev= you're welcome to consider resurrecting it.
Hi Lorenzo, I applogize for the delay. I've been trying to get a second machine running mozilla 1.5 since this problem seems to have become more apparent with 1.6. The machine has been flakey so I'm SOL trying to test this idea. That being said... Here are the results that I received from running the program on my FreeBSD 4.9 machine. [tom@compass projects]$ ./test-thread www.allmusic.com www.mozilla.org Resolving www.allmusic.com Resolving www.mozilla.org (80158 ms) www.allmusic.com (www.allmusic.com) 166.90.203.130 (80158 ms) www.allmusic.com ((null)) 64.152.71.2 (79273 ms) www.mozilla.org (rheet.mozilla.org) 207.126.111.202 I ran the program multiple times and the results are consistant with www.mozilla.org being resolved a fraction of a second quicker than www.allmusic.com. Thanks.
This was fixed in FreeBSD-CURRENT on Feb 25: http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/net/getaddrinfo.3?only_with_tag=MAIN and I have personally seen it work correctly on a -CURRENT system. FreeBSD's previous behaviour was bordering on violating the RFCs, which explicitly state that getaddrinfo() must be thread-safe. What's the point of a function being thread-safe if it blocks in this way? Resolving INVALID.
Status: UNCONFIRMED → RESOLVED
Closed: 21 years ago21 years ago
Resolution: --- → INVALID
A thread-safe function doesn't need to allow concurrent execution.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: