Closed
Bug 237084
Opened 21 years ago
Closed 21 years ago
PR_GetAddrInfoByName is serialized on FreeBSD [was: DNS lookups not done in parallel as per resolved bug 70213]
Categories
(NSPR :: NSPR, defect)
Tracking
(Not tracked)
RESOLVED
INVALID
People
(Reporter: tmclaugh, Assigned: wtc)
References
()
Details
Attachments
(1 file)
1.48 KB,
text/plain
|
Details |
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.6) Gecko/20040307 Firefox/0.8
Build Identifier: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.6) Gecko/20040307 Firefox/0.8
Originally filed this entry as an additional comment to bug #70213 which is long
closed as being fixed. Haven't seen any activity on that bug so I am filing a
new one so the issue does not get lost.
Bug #70213 which was opened to have DNS queries parallelized instead of them
being serialized does not appear to be fixed on at least FreeBSD 4.9. First
noticed this issue with firefox 0.8 so I tested mozilla 1.6 and 1.7a and mozilla
still appears to have the same problems. I ran a packet capture with ethereal
while trying to load www.allmusic.com and it shows multiple type AAAA queries
being made that ultimately receive a "Server failed" response. Eventually a
type A request is made which receives back a good response. This site takes on
average almost 4 minutes to load in its entirety. I also found that if I hit
stop and then try to go to another site mozilla will continue to try and query
the old site for a period of time before trying the new site. (May be same
issue as bug #188332) Mozilla should be making DNS queries in parallel but they
are still serialized.
A few notes about recreating this on FreeBSD. The resolver library in the
-CURRENT branch is now different from the one in 4.9 and the -STABLE branch. I
have not had a chance to test this bug on -CURRENT. Also, INET6 being enabled
in the kernel (default) will cause the problem to be more evident.
Reproducible: Always
Steps to Reproduce:
1. Make sure INET6 is enables in the FreeBSD kernel (This does not cause the
problem but instead makes it more apparent)
2. Open mozilla to www.allmusic.com
3. Packet trace the session
Actual Results:
Mozilla will make DNS queries in serial instead of in parallel as they should be
according to bug 70213. When a FreeBSD kernel has INET6 enabled it will make
type AAAA queries until it receives a server failure response at which point it
will make a type A query.
Expected Results:
Mozilla should make parallel DNS queries so that the stalled type AAAA query
response does not prevent the type A query from being made.
Assignee: general → darin
Component: Browser-General → Networking
QA Contact: general → benc
Comment 1•21 years ago
|
||
you can set the pref "network.dns.disableIPv6" if you want to disable IPv6 DNS
queries manually. see bug 68796 for more details.
otherwise, i suspect this problem is caused by the way PR_GetAddrInfoByName is
implemented in NSPR for FreeBSD.
-> reassigning to NSPR in case there is something we can do to ensure that
PR_GetAddrInfoByName calls are not serialized.
Assignee: darin → wchang0222
Component: Networking → NSPR
Product: Browser → NSPR
QA Contact: benc → wchang0222
Summary: DNS lookups not done in parallel as per resolved bug 70213 → PR_GetAddrInfoByName is serialized on FreeBSD [was: DNS lookups not done in parallel as per resolved bug 70213]
Version: Trunk → 4.2
Comment 2•21 years ago
|
||
a quick inspection of nsprpub/pr/include/md/_freebsd.h shows that we won't call
getaddrinfo if the freebsd build system is not recent enough. there's this check:
#if __FreeBSD_version >= 400014
and if this does not pass, then PR_GetAddrInfoByName will be implemented using
gethostbyname, and the global DNS lock will be used to force these calls to be
serialized.
Comment 3•21 years ago
|
||
Reporter, are you compiling Mozilla yourself? If so, can you discover what
__FreeBSD_version is defined to?
Just change a few lines in mozilla/nsprpub/pr/include/md/_freebsd.h around line
93 like this:
#if __FreeBSD_version >= 400014
+#error FreeBSD is too old
#define _PR_INET6
#define _PR_HAVE_INET_NTOP
#define _PR_HAVE_GETHOSTBYNAME2
#define _PR_HAVE_GETADDRINFO
#define _PR_INET6_PROBE
-#endif
+#else
+#error FreeBSD is too new!
+#endif
When you compile on FreeBSD, you should hit one of the two.
Comment 4•21 years ago
|
||
Ok, I swapped "old" with "new". :) But the principle should work.
I don't think this is the problem, since that check will probably always be true
on any semi-recent version of FreeBSD, but we need to rule that out.
Reporter | ||
Comment 5•21 years ago
|
||
(In reply to comment #3)
My os version is 490000. I did the edit to
mozilla/nsprpub/pr/include/md/_freebsd.h just to be sure and the build failed
for the version being too new.
I am compiling mozilla out of the FreeBSD ports collection so potentially a
patch there could be causing problems. My quick glance through the patches show
them to be primarilly architecture related patches though and not relating to my
system's architecture. I have already been in contact with a mozilla port
maintainer and will continue to do so. Thanks.
Comment 6•21 years ago
|
||
Sorry, I completely misunderstood this. I thought you were referring to
parallelizing DNS lookups for different host names. Mozilla does do this, as per
bug 70213.
Rewind to original report: Tom, if I understand you correctly, your bug report
is that Mozilla is trying AAAA queries first and only trying A queries after
that, whereas it should try them in parallel.
This behaviour is by design for two reasons (see bug 68796 for most of the
discussion on this).
First, mozilla (rightly) lets the OS's getaddrinfo() call decide what to do
instead of taking name resolution into its own hands. This is what should be
done if IPv6 support is desired. See, for example, Itojun's document
"Implementing AF-independent application", which is _the_ reference in IPv6
socket programming: http://www.kame.net/newsletter/19980604/ .
Second, issuing two parallel name lookups for two different address families
would probably lead to non-deterministic behaviour, with IPv6 or IPv4 being used
depending on which lookup completed first, which can lead to many headaches.
See, for example, bug 68796 comment 30.
allmusic.com is a problem since their DNS server (like, for example,
doubleclick's) is broken and times out when queried for an AAAA record. This
hits FreeBSD and Mac OS X hard since on these OS's the resolver always queries
for IPv6 addresses even if the machine has no IPv6 connection, causing delays
even for people who don't have IPv6. This does not happen on Linux or Windows,
which only do IPv6 queries if IPv6 is turned on (which doesn't require a
recompile of the kernel).
I have tried to work around this by using the AI_ADDRCONFIG flag, which is the
proper mechanism as per RFC 3493, but at least on Mac OS 10.3 (synced to FreeBSD
5.0?) the flag has no effect.
Apart from pestering the people who run these buggy DNS servers, the only
sensible way to fix this is by making the OS resolver be a bit smarter on when
to issue AAAA lookups and/or act on the AI_ADDRCONFIG getaddrinfo flag. Apps
should never have to worry about this.
However, since this hits people hard, you can work around it in mozilla by using
the prefs added in bug 68796. Set network.dns.disableIPv6 to true to disable
IPv6 lookups completely or set network.dns.ipv4OnlyDomains to
".doubleclick.net,.allmusic.com" to disable IPv6 name lookups for these sites.
Resolving WONTFIX, since this should be taken care of through evangelism or in
the OS, and there is a workaround.
Status: UNCONFIRMED → RESOLVED
Closed: 21 years ago
Resolution: --- → WONTFIX
Reporter | ||
Comment 7•21 years ago
|
||
Lorenzo, I read your entry regarding the problems with the FreeBSD DNS resolver
and IPv6 so I will try to take those up with someone related to FreeBSD. But,
I have another example that I tested and it appears the DNS queries are still
being serialized.
Steps to Reproduce:
1. Open two tabs in mozilla, one for www.allmusic.com and one for another site.
2. Go to www.allmusic.com first and go to the second site next.
I ran a packet capture while doing this and no DNS query was made for the
second site until the type A DNS query for www.allmusic.com returned a
response. My reading of bug 70213 leads me to belive that while DNS queries
for www.allmusic.com are being made, DNS queries for the seond site should be
made as well. This is to prevent one page from preventing all other pages from
being loaded.
Also, if I go to www.allmusic.com in mozilla and decide to go to someplace else
instead before All Music has loaded (usuauly because I got tired of waiting :),
it is actually quicker to exit and restart mozilla and go to the other page
than it is to wait for mozilla to stop querting All Music and finally query the
new site.
Status: RESOLVED → UNCONFIRMED
Resolution: WONTFIX → ---
Comment 8•21 years ago
|
||
(In reply to comment #7)
> I ran a packet capture while doing this and no DNS query was made for the
> second site until the type A DNS query for www.allmusic.com returned a
> response. My reading of bug 70213 leads me to belive that while DNS queries
> for www.allmusic.com are being made, DNS queries for the seond site should be
> made as well.
Your reading of bug 70213 is correct to my knowledge. But I'm still not sure
that it's mozilla's fault. I vaguely seem to recall once browsing through the
FreeBSD code to getaddrinfo and seeing a comment to the effect that it is not
thread safe. If that is the case, maybe there's some locking going on in the
resolver and the lookups arebeing done one after another even if mozilla is
issuing them in parallel on different threads (in which case, there's not much
that mozilla can do).
Is there a FreeBSD equivalent of Linux's strace/ltrace? If so, can you check
whether mozilla is waiting for the response to the getaddrinfo on
www.allmusic.com before making the new getaddrinfo call for the other site, or
whether it's issuing them in parallel? Or maybe we could write a test program to
issue two getaddrinfo calls on different threads and see if they block each other.
Comment 9•21 years ago
|
||
The attached C program starts a name lookup for one host on one thread and
after one second creates another name lookup on another thread. It then waits
for the lookups to complete and prints out the time taken for each one.
To compile, run "gcc -pthread -o test-thread test-thread.c". On a Linux 2.4
system with glibc 2.2, I get:
$ ./test-thread www.allmusic.com www.mozilla.org
Resolving www.allmusic.com
Resolving www.mozilla.org
(1 ms) www.mozilla.org (h-207-126-111-202-mozilla.sv.meer.net)
207.126.111.202
(56232 ms) www.allmusic.com (64.152.71.2) 64.152.71.2
(56232 ms) www.allmusic.com (unknown.Level3.net) 166.90.203.130
But on FreeBSD 4.6-RELEASE, I get:
$ ./test-thread www.allmusic.com www.mozilla.org
Resolving www.allmusic.com
Resolving www.mozilla.org
(59350 ms) www.mozilla.org (rheet.mozilla.org) 207.126.111.202
(125492 ms) www.allmusic.com (www.allmusic.com) 166.90.203.130
(125492 ms) www.allmusic.com ((null)) 64.152.71.2
which is pretty bad. It seems that short of forking off a separate process,
there is nothing mozilla can do to fix this on FreeBSD. :(
Comment 10•21 years ago
|
||
there once was a dns helper app in netwerk:
http://bonsai.mozilla.org/rview.cgi?cvsroot=/cvsroot&dir=mozilla/netwerk/dns/daemon/Attic&module=default&rev=
you're welcome to consider resurrecting it.
Reporter | ||
Comment 11•21 years ago
|
||
Hi Lorenzo, I applogize for the delay. I've been trying to get a second machine
running mozilla 1.5 since this problem seems to have become more apparent with
1.6. The machine has been flakey so I'm SOL trying to test this idea.
That being said... Here are the results that I received from running the
program on my FreeBSD 4.9 machine.
[tom@compass projects]$ ./test-thread www.allmusic.com www.mozilla.org
Resolving www.allmusic.com
Resolving www.mozilla.org
(80158 ms) www.allmusic.com (www.allmusic.com) 166.90.203.130
(80158 ms) www.allmusic.com ((null)) 64.152.71.2
(79273 ms) www.mozilla.org (rheet.mozilla.org) 207.126.111.202
I ran the program multiple times and the results are consistant with
www.mozilla.org being resolved a fraction of a second quicker than
www.allmusic.com. Thanks.
Comment 12•21 years ago
|
||
This was fixed in FreeBSD-CURRENT on Feb 25:
http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/net/getaddrinfo.3?only_with_tag=MAIN
and I have personally seen it work correctly on a -CURRENT system.
FreeBSD's previous behaviour was bordering on violating the RFCs, which
explicitly state that getaddrinfo() must be thread-safe. What's the point of a
function being thread-safe if it blocks in this way?
Resolving INVALID.
Status: UNCONFIRMED → RESOLVED
Closed: 21 years ago → 21 years ago
Resolution: --- → INVALID
Assignee | ||
Comment 13•21 years ago
|
||
A thread-safe function doesn't need to allow concurrent
execution.
You need to log in
before you can comment on or make changes to this bug.
Description
•