Closed Bug 205726 Opened 21 years ago Closed 21 years ago

nsDnsService rewrite

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla1.6alpha

People

(Reporter: darin.moz, Assigned: darin.moz)

References

Details

Attachments

(2 files, 16 obsolete files)

180.61 KB, patch
dougt
: review+
bryner
: superreview+
Details | Diff | Splinter Review
5.77 KB, text/plain
Details
we need to change the DNS service in the following ways:

 1- use one common implementation for all platforms
 2- call getaddrinfo instead of getipnodebyname (this will require NSPR changes)
 3- spawn multiple threads on which getaddrinfo will be called
 4- revise DNS IDL to be consistent with getaddrinfo
 5- make threads unjoinable (to solve bug 192271)
 6- modify cache to store getaddrinfo results (may save footprint)

these changes most likely amount to a rewrite of the DNS service.  wtc is
working on providing getaddrinfo via NSPR.
> BIND 9.2 includes a new lightweight stub resolver library and associated
> resolver daemon that fully support forward and reverse lookups of both IPv4
> and IPv6 addresses. This library is still considered experimental and is not a
> complete replacement for the BIND 8 resolver library. Applications that use
> the BIND 8 res_* functions to perform DNS lookups or dynamic updates still 
> need to be linked against the BIND 8 libraries. For DNS lookups, they can also
> use the new "getrrsetbyname()" API.

If we're going to rewrite the DNS service, it may be worth investigating using
lwres(3):

http://resin.csoft.net/cgi-bin/man.cgi?section=3&topic=lwres

and lwresd(8):

http://resin.csoft.net/cgi-bin/man.cgi?section=8&topic=lwresd

It has a number of advantages.

* if desired, the daemon should be able to do all the DNS caching, including
negative caching, for us, and takes into account stuff like TTLs

* it provides a low-level interface, so any clients that want other DNS
information (keys, SRV or MX records, TTLs) could potentially get it

* the low-level interface is non-blocking async interface so it's not necessary
to use multiple threads

Even if we choose not to go with lwres(d), it may turn out that
getrrsetbyname(3) is a better thing to write in terms of.
dmose:

my understanding is that BIND only handles DNS.  by calling gethostbyname (or
the more preferred getaddrinfo), we'll be able to handle mappings stored in
/etc/hosts, NIS, or other name resolution systems as configured by the host
system.  i don't think we want to be talking the DNS protocol directly.
Is the re-write going to affect any of the higher level behaviors?
benc: other than fix bugs and add support for IPv6 under WIN32, it should not ;-)
Maybe you should retitle this bug, s/DNS/name resolution/, to avoid confusion.
how about i just mention the name of the C++ class ;)
Summary: DNS service rewrite → nsDnsService rewrite
Remeber that we still have to compile/run under RH6.2, so playing with fancy
bind9 resolvers isn't likely to work.
Blocks: 70213
Blocks: 154816
Target Milestone: --- → mozilla1.5alpha
Blocks: 208312
Attached file proposed new DNS interface (obsolete) —
here's IDL for a new interface to the DNS service.  it is basically my attempt
at cleaning up the old interface to make it easier to use.  it also gives
JavaScript access to more information, including the canonical hostname and the
full range of IP addresses returned for a given hostname.  this interface also
allows callers to optionally bypass the DNS cache for individual lookups.  i
may also add methods to control the DNS cache size and global behavior.
License.

I don;t like having to include the nsIEventQueue.  Is it possible that this can
just be a nsISupports.  

what is the format of getNextAddrAsString?  Will it be in ipv6?  And is this
actually useful for anyone?  Can you do anything in a script without NSPR?  

Do you wannt just nix the hasMore method and have getNext* throw when there
isn't any more elements?

why does the nsIDNSService have init and shutdown methods?  shouldn't necko do
this for us?

in asyncResolve, does the format of the IP-address-literal matter?

I think you can use lowercase "TRUE" in idl.

I think that the service should have some method that 'flushes the cache"


Do you want to have any methods on the nsIDNSRecord that tell the client what
kind of address the current record is?  For example, is-local-address, or
is-routable?

What about equality -- do we care?  Could this somehow tie into the caps IP
binding issues?


Blocks: 211267
Attached patch v0 patch - preliminary version (obsolete) — Splinter Review
this patch is mostly complete.	major things left to do:

1) move contents of prgai.{h,c} into prnetdb.{h,c}
2) test test test

i'm submitting this preliminary version of the patch in case folks want to jump
in and start reviewing ;-)
Depends on: 211501
dougt,

>I don;t like having to include the nsIEventQueue.  Is it possible that this can
>just be a nsISupports.  

yeah, what do you think about a nsISupports parameter that will be QI'd to
nsIEventQueue or nsIThread?  also, it'd be nice if NS_CURRENT_THREAD didn't have
a value of 0 since i'd like to be able to use those constants and have a value
that means no proxy.  NULL seems like a better choice for no proxy.


>what is the format of getNextAddrAsString?  Will it be in ipv6?  And is this
>actually useful for anyone?  Can you do anything in a script without NSPR?  

It corresponds to whatever PR_NetAddrToString returns.  IOW it returns an IPv6
address literal if the PRNetAddr is IPv6 and IPv4 address literal otherwise.


>Do you wannt just nix the hasMore method and have getNext* throw when there
>isn't any more elements?

I think JavaScript needs the hasMore method.  C++ code can just call getNext
until it returns a failure code.  that'll be documented :-)


>why does the nsIDNSService have init and shutdown methods?  shouldn't necko do
>this for us?

yes, grumble grumble.. i've always wanted to make these methods internal.  the
socket transport service has these too.  it is currently called by the IO
service to shutdown the services when we go offline.  i guess it is necessary to
expose these via XPCOM in case someone wanted to replace the socket transport
service, etc.


>in asyncResolve, does the format of the IP-address-literal matter?

no, er.. well.. we treat it as an opaque string that is simply passed to the
system's getaddrinfo.  so, if the system understand the IP address literal, then
so will mozilla.  i think this is the right way to handle IP address literals.


>I think you can use lowercase "TRUE" in idl.

ok, though i typically uppercase TRUE and FALSE in comments for emphasis.


>I think that the service should have some method that 'flushes the cache"

currently that's Shutdown followed by Init.  of course, Shutdown will terminate
active DNS requests, so this is probably not the right way to implement it.


>Do you want to have any methods on the nsIDNSRecord that tell the client what
>kind of address the current record is?  For example, is-local-address, or
>is-routable?

not at this time.  we could add methods like that later on.


>What about equality -- do we care?  Could this somehow tie into the caps IP
>binding issues?

not sure i follow... what do you mean by "equality"?
Attached patch patch: netwerk/dns/ (obsolete) — Splinter Review
this is the patch for netwerk/dns/
Attachment #126644 - Attachment is obsolete: true
Attachment #126867 - Attachment is obsolete: true
Attached patch patch: netwerk/dns/ (v1) (obsolete) — Splinter Review
i previously posted the wrong patch.
Attachment #126966 - Attachment is obsolete: true
Attached patch patch: netwerk/ (v1) (obsolete) — Splinter Review
this patch contains all netwerk/ changes other than those in the dns/
subdirectory.
Attached patch patch: security/manager/ (v1) (obsolete) — Splinter Review
changes in security/manager/
Attached patch patch: directory/xpcom/ (v1) (obsolete) — Splinter Review
changes for LDAP
Attachment #126967 - Flags: review?(gordon)
Attachment #126968 - Flags: review?(dougt)
Attachment #126969 - Flags: review?(kaie)
Attachment #126970 - Flags: review?(dmose)
a couple issues remaining:

1- i've removed the res_ninit call in favor of letting the DNS worker threads
die.  i'm not sure this is a good solution though, and i will probably add back
the res_ninit calls.

2- i might want to make some changes based on dougt's last comments.  in
particular, adding a flushCache function might be a good idea even though we
don't have a use for it at the moment.

as for the nsISocketProvider changes in the netwerk/ patch, i took the liberty
of cleaning up some of the SOCKS code along with my changes to the
nsISocketProvider interface (to add an address family parameter).  the SOCKS
code changes mostly amount to some minor footprint savings.
Status: NEW → ASSIGNED
>yeah, what do you think about a nsISupports parameter that will be QI'd to
nsIEventQueue or nsIThread?  also, it'd be nice if NS_CURRENT_THREAD didn't have
a value of 0 since i'd like to be able to use those constants and have a value
that means no proxy.  NULL seems like a better choice for no proxy.

That sounds fine.  

>no, er.. well.. we treat it as an opaque string that is simply passed to the
system's getaddrinfo.  so, if the system understand the IP address literal, then
so will mozilla.  i think this is the right way to handle IP address literals.

Opaque?  So, I can't use it with other libraries, etc?

> currently that's Shutdown followed by Init.  of course, Shutdown will
terminate active DNS requests, so this is probably not the right way to
implement it.

Can you doc it, or fix it?

> not at this time.  we could add methods like that later on.

Now is cheaper.

>not sure i follow... what do you mean by "equality"?

Well, I am not sure either.  Imagine if these dns record objects and an equal()
method.  Would that be useful?  Could CAPs use something like this instead of
their IP matching?
Comment on attachment 126969 [details] [diff] [review]
patch: security/manager/ (v1)

r=kaie

You are including nsNetError.h, but I don't see why you need it.
Attachment #126969 - Flags: review?(kaie) → review+
kaie: the definition of some error codes were moved from IDL files into nsNetError.h


dougt: 

>>no, er.. well.. we treat it as an opaque string that is simply passed to the
>>system's getaddrinfo.  so, if the system understand the IP address literal, 
>>then so will mozilla.  i think this is the right way to handle IP address 
>>literals.
>
>Opaque?  So, I can't use it with other libraries, etc?

no, you can.. my point was that we don't have any need to inspect the IP
address.  we just ask the operating system to interpret it for us... therefore,
we are treating it as though it is just some "opaque" string.


>> currently that's Shutdown followed by Init.  of course, Shutdown will
>terminate active DNS requests, so this is probably not the right way to
>implement it.
>
>Can you doc it, or fix it?

yeah, i'll do that... probably i'll add the new method.


>> not at this time.  we could add methods like that later on.
>
>Now is cheaper.

true :-)


>>not sure i follow... what do you mean by "equality"?
>
>Well, I am not sure either.  Imagine if these dns record objects and an equal()
>method.  Would that be useful?  Could CAPs use something like this instead of
>their IP matching?

maybe.  but, i don't have a good way of implementing equal() at the moment.  i'd
basically need to add something to NSPR for that since PRAddrInfo is an opaque
data structure.
Target Milestone: mozilla1.5alpha → mozilla1.5beta
Attached patch patch: netwerk/dns/ (v1.1) (obsolete) — Splinter Review
Attachment #126967 - Attachment is obsolete: true
Attachment #127563 - Flags: review?(harishd)
Attachment #126967 - Flags: review?(gordon)
Attachment #127562 - Flags: review?(gordon)
Attached patch patch: netwerk/ (v1.1) (obsolete) — Splinter Review
added preference to disable DNS pinning (default remains enabled).
Attachment #126968 - Attachment is obsolete: true
Attachment #126968 - Flags: review?(dougt)
Attachment #127565 - Flags: review?(dougt)
Attached patch patch: mailnews/addrbook/ (v1) (obsolete) — Splinter Review
Attachment #127566 - Flags: review?(dougt)
Comment on attachment 127563 [details] [diff] [review]
patch: extensions/webservices/ (v1)

What happens when DNS lookup times out? Do you think I should make use of
aStatus?
Attachment #127563 - Flags: review?(harishd) → review+
hmm.. good question.  perhaps you'd want to make GetOfficialHostName return
aStatus as an error code.  at the moment (and with this patch)
GetOfficialHostName just returns NS_OK with an empty result if the lookup fails.
 that could be ok too depending on how the caller of GetOfficialHostName reacts.
Blocks: 162871
Blocks: 174590
Attached patch patch: netwerk/ (v1.2) (obsolete) — Splinter Review
revised patch to completely remove DNS pinning.  after discussing this with
brendan, jruderman, and others as well as the numerous requests for DNS pinning
to be dropped, i've decided to do away with it completely.  it was a feable
attempt to fix a problem that mozilla simply cannot fix.  RFCs are RFCs for a
reason... let's not break the DNS RFC any longer ;)
Attachment #127565 - Attachment is obsolete: true
Attachment #127565 - Flags: review?(dougt)
Comment on attachment 127751 [details] [diff] [review]
patch: netwerk/ (v1.2)

this one should be final...
Attachment #127751 - Flags: review?(dougt)
if you remove dns pinning, i think you should reopen 149943.
An experimental WIN32 build is available here:

http://friedfish.homeip.net/builds/dns/

I hope to add a Linux (RH9) build shortly.  The domain name is dynamic, so you
might need to use the following site to resolve this domain name:

http://www.zoneedit.com/lookup.html?host=friedfish.homeip.net&type=A&server=&forward=Look+it+up
Darin, I downloaded and tested mozilla-dns-win32-20030714.zip to test the IPv6
support, but it didn't seem to work.

For example, If I go to http://www.ipng.nl/tools/traceroute.php using the
Windows build, my connection shows up as IPv4, but if I connect using IE, it
shows up as IPv6 (I am using XP SP1 and IPv6 is enabled and working).

Is there anything I can do to help debug the problem? I'll start by looking at
the NSPR logs first, but let me know if there's anything else I can do...
This is what I get if I try to connect to a host that has both IPv6 and IPv4
addresses (www.uniroma3.6net.garr.it):

On Linux (IPv6 works):

16386[8118e40]: nsSocketTransport::ResolveHost [this=8844a60]
16386[8118e40]:   advancing to STATE_RESOLVING
16386[8118e40]: nsSocketTransport::SendStatus [this=8844a60 status=804b0003]
16386[8118e40]: nsHttpTransaction::OnSocketStatus [this=8844350 status=804b0003
progress=0]
16386[8118e40]: nsSocketTransport::OnSocketEvent [this=8844a60 type=0 u=0 v=0]
16386[8118e40]:   MSG_ENSURE_CONNECT
16386[8118e40]:   ignoring redundant event
16386[8118e40]:   calling PR_Poll [active=0 idle=0]
32771[8155120]: nsSocketTransport::OnFound [www.uniroma3.6net.garr.it:80
this=8844a60] lookup succeeded [FQDN=dns.uniroma3.6net.garr.it]
32771[8155120]:   => 2001:760:4::1
32771[8155120]:   => ::ffff:193.204.161.165
32771[8155120]: nsSocketTransport::OnStopLookup [this=8844a60 status=0]
32771[8155120]: nsSocketTransportService::PostEvent [handler=8844a6c type=1 u=0 v=0]
16386[8118e40]: nsSocketTransport::OnSocketEvent [this=8844a60 type=1 u=0 v=0]
16386[8118e40]:   MSG_DNS_LOOKUP_COMPLETE
16386[8118e40]: nsSocketTransport::InitiateSocket [this=8844a60]
16386[8118e40]: nsSocketTransport::BuildSocket [this=8844a60]
16386[8118e40]: nsSocketTransportService::AttachSocket [handler=8844a60]
16386[8118e40]: nsSocketTransportService::AddToIdleList [handler=8844a60]
16386[8118e40]:   active=0 idle=1
16386[8118e40]:   advancing to STATE_CONNECTING
16386[8118e40]: nsSocketTransport::SendStatus [this=8844a60 status=804b0007]
16386[8118e40]: nsHttpTransaction::OnSocketStatus [this=8844350 status=804b0007
progress=0]
16386[8118e40]:   idle [0] { handler=8844a60 condition=0 pollflags=6 }


On Windows XP with the experimental build (IPv6 doesn't work):

7832[a10f50]: nsSocketTransport::ResolveHost [this=1ec1978]
7832[a10f50]:   advancing to STATE_RESOLVING
7832[a10f50]: nsSocketTransport::SendStatus [this=1ec1978 status=804b0003]
7832[a10f50]: nsHttpTransaction::OnSocketStatus [this=1e0fd58 status=804b0003
progress=0]
7832[a10f50]: nsSocketTransport::OnSocketEvent [this=1ec1978 type=0 u=0 v=0]
7832[a10f50]:   MSG_ENSURE_CONNECT
7832[a10f50]:   ignoring redundant event
7832[a10f50]:   calling PR_Poll [active=0 idle=0]
0[283e28]: nsHttpTransaction::SocketStatus_Handler [trans=1e0fd58]
0[283e28]: sending status notification [this=1d623d8 status=804b0003 progress=0/0]
7712[1682ef8]: nsSocketTransportService::PostEvent [handler=1ec1984 type=1 u=0
v=1e82cb8]
7832[a10f50]: nsSocketTransport::OnSocketEvent [this=1ec1978 type=1 u=0 v=1e82cb8]
7832[a10f50]:   MSG_DNS_LOOKUP_COMPLETE
7832[a10f50]: nsSocketTransport::InitiateSocket [this=1ec1978]
7832[a10f50]: nsSocketTransport::BuildSocket [this=1ec1978]
7832[a10f50]: nsSocketTransportService::AttachSocket [handler=1ec1978]
7832[a10f50]: nsSocketTransportService::AddToIdleList [handler=1ec1978]
7832[a10f50]:   active=0 idle=1
7832[a10f50]:   advancing to STATE_CONNECTING
7832[a10f50]: nsSocketTransport::SendStatus [this=1ec1978 status=804b0007]
7832[a10f50]: nsHttpTransaction::OnSocketStatus [this=1e0fd58 status=804b0007
progress=0]
7832[a10f50]:   trying address: 193.204.161.165
7832[a10f50]:   idle [0] { handler=1ec1978 condition=0 pollflags=6 }

Is there anything else I can do without modifying the code? I'm not set up to
build on Windows, unfortunately...
Lorenzo: thank very much for giving this build a try.  can you perhaps send me
your entire NSPR log?  (NSPR_LOG_MODULES=nsSocketTransport:5).  thanks!
Attached file nspr log for ipv6 site on xp (obsolete) —
Darin, here is the log you requested.

I used the experimental Win32 build under Windows XP SP1. The site I connected
to, www.uniroma3.6net.garr.it, has both AAAA and A records.
Actually, it seems that Mozilla is not asking for an AAAA record at all.

I did an "ipconfig /flushdns" on the XP machine and then run tcpdump on my DNS
server, and this is what I got (x.x.x.x is the XP client and y.y.y.y is the DNS
server). All I saw was a query for an A record:


19:59:40.631285 x.x.x.x.1027 > y.y.y.y.domain:  [udp sum ok] 114+ A?
www.uniroma3.6net.garr.it. [|domain] (ttl 128, id 57312, len 71)

19:59:40.634580 y.y.y.y.domain > x.x.x.x.1027:  [udp sum ok] 114 q: A?
www.uniroma3.6net.garr.it. 2/2/1 www.uniroma3.6net.garr.it. CNAME
dns.uniroma3.6net.garr.it., dns.uniroma3.6net.garr.it. A 193.204.161.165 ns:
uniroma3.6net.garr.it. NS 6net.garr.it., uniroma3.6net.garr.it. NS
dns.uniroma3.6net.garr.it. ar: dns.uniroma3.6net.garr.it. AAAA 2001:760:4::1
(133) (DF) (ttl 64, id 0, len 161)
Attached file nspr log for ipv6 site on Win2k (obsolete) —
it seems I confirm comment 35, on Win2k with the special build 20030714 where
server2-v6.ipv6 has a AAAA record only (using Bind as the DNS server) and
Mozilla only query A record, not AAAA. The sniffer on Win2k shows DNS server
returns a host not found to Mozilla and Mozilla doesn't reply back asking for
the AAAA record.
Olivier: Mozilla uses the getaddrinfo function to fetch AAA records. 
Unfortunately, Microsoft only provides that function in WinXP.  But, maybe there
is a way to get around that restriction... do other applications on Win2k
support IPv6?  If so, do you know what Microsoft function call they are making?

Lorenzo: Ok, thanks for the info.  Can I send you a test program to run?  I'd
like to collect some information about your system so I can figure out what's
going on here.  Would that be OK?
Darin: if it will help Mozilla support IPv6, I'm game. Will the program if run
by an unprivileged user?
Lorenzo: nevermind, i think i found the problem.  i'll post a new build shortly.
ok, i've posted a new build that should fix the problem.

http://friedfish.homeip.net/builds/dns/mozilla-dns-win32-20030716.zip
Darin, in reply to comment 37, IE6 works fine, and other apps seem to have been
ported:
http://win6.jp/win2kxp-ipv6apps.html
http://win6.jp/memos/index.html

According to Microsoft, getaddrinfo() and getnameinfo() are available on Win2k +
MSRIPv6:
http://msdn.microsoft.com/downloads/sdks/platform/tpipv6/faq.asp
This build is better. Mozilla is asking for and getting IPv6 addresses, but it
is failing to connect using IPv6 and falling back to IPv4. This is what the log
says:

8244[a2afa8]: nsSocketTransportService::AddToIdleList [handler=1df2398]
8244[a2afa8]:   active=0 idle=1
8244[a2afa8]:   advancing to STATE_CONNECTING
8244[a2afa8]: nsSocketTransport::SendStatus [this=1df2398 status=804b0007]
8244[a2afa8]:   trying address: 2001:6e0::250:4ff:fe4a:7708
8244[a2afa8]: ErrorAccordingToNSPR [in=-5980 out=804b000d]
8244[a2afa8]:   after event [this=1df2398 cond=804b000d]
8244[a2afa8]:   idle [0] { handler=1df2398 condition=804b000d pollflags=0 }
8244[a2afa8]: nsSocketTransportService::DetachSocket [handler=1df2398]
8244[a2afa8]: nsSocketTransport::OnSocketDetached [this=1df2398 cond=804b000d]
8244[a2afa8]: nsSocketTransport::RecoverFromError [this=1df2398 state=3
cond=804b000d]
8244[a2afa8]:   trying again with next ip address

The same happens with numeric IPv6 addresses (e.g. http://[2001:760:4::1]/ ),
except of course in this case mozilla does not fall back to IPv4 and displays a
connection refused message.
Here is a full log using the new build. I tried to load a webpage from
www.ipng.nl (which I can ping and connect to in IPv6).
Attachment #127823 - Attachment is obsolete: true
Same as before, except connecting to a numeric IPv6 address, which takes DNS
out of the picture.
the error according to NSPR is -5980, which maps to
PR_NETWORK_UNREACHABLE_ERROR.  i'll need to check to see what causes that error
under WIN32 builds of NSPR.
from the source file win32_error.c, it seems this error corresponds to a WINSOCK
connect failure of WSAEHOSTUNREACH.  according to MSDN,

  WSAEHOSTUNREACH
  10065 	
  No route to host.
    A socket operation was attempted to an unreachable host. See WSAENETUNREACH.

are you sure you have your system properly configured?
Yes, pretty sure. I can ping that address, and IE can connect to that site using
IPv6.

I think the problem is at a lower level. DNS resolution is working correctly,
it's the connection itself that fails. This would also explain the fact that I
can't connect to numeric IPv6 addresses either.

Perhaps it's an NSPR problem? Perhaps NSPR is only creating IPv4 sockets, and
then replies "host unreachable" if you pass it an IPv6 address (without actually
calling into the socket API)? In bug 175340 comment 0, wtc cited 2 things to do
for NSPR to support IPv6 on Windows. This bug probably works around #1, but
might #2 still be a problem?
lorenzo: yeah, you're right... there must still be some problem in NSPR.  i
should verify that the address family parameter makes its way from
PR_OpenTCPSocket to the WINSOCK "socket" function.
oh, notice that in
http://lxr.mozilla.org/mozilla/source/nsprpub/pr/src/io/prsocket.c#1267
there are #ifdefs that control how the input address family gets mapped to the
operating system's AF_ value.  in some cases PR_AF_INET6 is silently replaced
with PR_AF_INET.  that is probably happening in WIN32 builds.  i think we want
to have _PR_INET6_PROBE defined for WIN32 NSPR.

... and i just verified that this is indeed the problem.  i'll try defining
_PR_INET6_PROBE in my custom build...
Yes, we need to have _PR_INET6_PROBE defined for WIN32 NSPR.
You should also look at the Unix socket code in
mozilla/nsprpub/pr/src/pthreads/ptio.c.  All the platforms
with IPv6 support in NSPR are Unix platforms, so the
IPv6-related code in ptio.c may be more correct than the
IPv6-related code in prsocket.c.
here's a new build that has _PR_INET6_PROBE defined:

http://friedfish.homeip.net/builds/dns/mozilla-dns-win32-20030718.zip

i've also enabled _PR_HAVE_THREADSAFE_GETHOST per bug 209855, so that WIN32
platforms without getaddrinfo will still benefit from overlapped DNS lookups. 
so, if anyone is able to test this on a system that does not provide getaddrinfo
(e.g., Win98 or Win2k w/o the IPv6 pack) that'd be really great.
Darin, I just tested your 20030718 build and it seems to behave in exactly the
same way:

11888[a1ec90]: nsSocketTransportService::AddToIdleList [handler=1e63b38]
11888[a1ec90]:   active=0 idle=1
11888[a1ec90]:   advancing to STATE_CONNECTING
11888[a1ec90]: nsSocketTransport::SendStatus [this=1e63b38 status=804b0007]
11888[a1ec90]:   trying address: 2001:6e0::250:4ff:fe4a:7708
11888[a1ec90]: ErrorAccordingToNSPR [in=-5980 out=804b000d]
11888[a1ec90]:   after event [this=1e63b38 cond=804b000d]
11888[a1ec90]:   idle [0] { handler=1e63b38 condition=804b000d pollflags=0 }
11888[a1ec90]: nsSocketTransportService::DetachSocket [handler=1e63b38]

Log coming up.

As regards testing it platforms without getaddrinfo, I can try it on WinME. Do I
have to run any special tests, or just generally see if it resolves (multiple
concurrent) addresses correctly?
sigh.. ok.. thanks lorenzo.  as for winme testing, yeah... just spending some
time with the build to confirm that it can still browse the web while a
particular host lookup has yet to complete would be great.  plus, it would be
good to verify that things are stable and generally well behaved ;-)

sounds like i need to put together a custom test app to figure out what's going
on with IPv6.
Builds and runs correctly on OS/2.
The 20030718 binary works on WinME, but if I thrash it a bit (e.g. open all the
links in google news in quick succession and close the browser while it's
working) I crash on exit in xpcom.dll (I think).

Talkback doesn't come up, so I can't get a talkback ID or stack trace. Is there
any way to get talkback to work?
As pointed out by Lorenzo there is a significant performance bug resulting from
disabling DNS pinning:

> I gave the build a spin, and it seems to work fine. Great work!
>
> I did note one issue, though: if I connect to a host that has both IPv6 and
IPv4 addresses but only has a webserver listening on IPv4, moz tries every
connection using IPv6 first and only goes back to IPv4 for that connection. This
means that it tries IPv6 to get the web page, fails and falls back to IPv4, then
tries IPv6 to get the first image in the page, fails and falls back to IPv4,
then tries IPv6 to get the second image in the page, fails and... etc.
>
> If I remember correctly, previous builds on Linux only tried IPv6 for the
first connection, and if it didn't work used IPv4 for all future connections.
This is obviously better for performance because you're only waiting for IPv6 to
fail before making the first connection, not while making all subsequent
connections.
>
> Note: I realize that the above explanation is not very clear. To clarify, I
can send you a packet log, or you can try to connect to the host
"woodstock.uniroma3.6net.garr.it" using the test build on the IPv6 box and see
what happens.
>
> Perhaps this was due to DNS pinning, and the nsDnsService rewrite changed this
behaviour as a side effect? If so, is there a way to work around this? Perhaps a
cache that holds the mapping long enough to download a web page, at least?
>
> This could be a performance problem for people who have IPv6 enabled...
>
>
> Bye and thanks again,
> Lorenzo

He's right on about the cause of the problem.  Instead of bringing back the hash
table in the socket transport service, i'm thinking seriously about adding a
method to nsIDNSRecord so i can store the fact that IPv6 or IPv4 is preferred
for a given host.  Of course, it really needs to be host:port prefers IP address
X since a given host could have a HTTP server on V6 and a IMAP server on V4.  hmm...
Is that a bug in mozilla, or the site, though? If there are multiple A records,
but not all of them have webservers on them, then we should give errors some of
teh time.

What do the RFCs say about AAAA vs A?
Bradley, I'm not sure what the RFC's say, but what is usually done in IPv6-ready
applications, even if you get an AAAA record, is to try IPv6 first and fall back
to IPv4 if something doesn't work. This could happen, for example, if you have
IPv6 set up but no connectivity, or if there is a temporary problem on the IPv6
network, not just if the server doesn't support it.

Itojun's "Implementing AF-independent application", which is the de facto
standard for porting applications to IPv6, suggests:

"Client side program may want to connect to all resolved addresses, as telnet
program does (telnet tries to connect to all resolved addresses, sequentially
until connection is established)."

So I think Mozilla is right not to complain if it can't connect using IPv6.
Fair enough, and I don't think people will object to pinning the ipv6 vs ipv4 type.

We may also want to consider never looking up the AAAA record if creating an
INETV6  socket fails - do we do that already?
>We may also want to consider never looking up the AAAA record if creating an
>INETV6  socket fails - do we do that already?

no, we don't do that.  but, we do (or will with this patch) set the
AI_ADDRCONFIG flag (if the host system supports it), which tells the operating
system that we only care about addresses that the system can actually use.  for
older getaddrinfo implementations (like the one in glibc), AI_ADDRCONFIG isn't
supported, but i think that it basically behaves as if you had specified the flag.
after thinking about this IPv6 fallback to IPv4 problem some more, i think it'd
be best to solve it as a follow up to the current patch.  i think it would be
better to not delay the current patch on account of this relatively minor issue.

lorenzo: can you check whether or not the site you were visiting uses keep-alive
connections?  if not, then an interim solution might be to try to convince the
site admin to enable keep-alive connections.
Darin: you're right, the server is sending a "Connection: close" response
header. If the ping-pong between IPv6 and IPv4 only happens when keepalive is
turned off, then perhaps we can live without a fix for the moment.

Just a thought: if the code is caching all DNS replies, the following might work
as a temporary stopgap. If an IPv6 connection fails, change the order in which
the addresses are stored so the IPv4 address is stored first, and therefore
tried first for each subsequent request. Perhaps this would be easier to
implement than pinning IPv6 vs. IPv4? N.B. I haven't looked at the code, so this
may be complete lunacy. :)

Another thing: does the patch abolish Mozilla's DNS cache completely? If so, it
will work fine on systems like Win2k/XP which have an OS-level DNS cache, but
will it work well on, for example, a Linux machine with a dialup link to the
nameserver? If you attach the latest patch, I can build and test this on Linux
if you like...
> will it work well on, for example, a Linux machine with a dialup link to the
> nameserver?

I wouldn't worry too much about this case since you can install bind on Linux. I
suspect most people do.
I'm not sure most people do install bind, but nscd (Nameserver Caching Daemon)
should work as well here, I guess, and a bunch of Distros (SuSE at least) are
installing that one by default.
Are you suggesting that removing Mozilla's DNS cache is a good idea?

What about people who don't want to install bind because of security issues,
configuration difficulties, resource usage, etc. etc.? And what about people on
other OS's such as Windows 9x/ME? If the DNS cache were removed, these people
would take a big performance hit.
>Just a thought: if the code is caching all DNS replies, the following might 
>work as a temporary stopgap. If an IPv6 connection fails, change the order in 
>which the addresses are stored so the IPv4 address is stored first, and 
>therefore tried first for each subsequent request. Perhaps this would be easier 
>to implement than pinning IPv6 vs. IPv4? N.B. I haven't looked at the code, so
>this may be complete lunacy. :)

the code is caching all DNS replies (for a fixed, relatively short time period).
 however, we don't own the data structure.  we're just holding onto the |struct
addrinfo| linked list that is returned from getaddrinfo.  reording that list is
only an option if we copy the data structures, which we could do, but the result
is not exactly a trivial code change.
Blocks: 214625
fyi, the updated IPv6 stack for XP (Jul 23rd):
http://msdn.microsoft.com/library/default.asp?url=/downloads/list/winxppeer.asp
not sure how it affects Mozilla though as there's no details besides better NAT
support.
punting to 1.6 alpha...

i just haven't had the time lately to drive this, and now that the trunk is
frozen for 1.5 beta, there is no chance that this will make it in for 1.5 final :-(
Target Milestone: mozilla1.5beta → mozilla1.6alpha
Too bad :-(

Question: why are the expire-calculations done in minutes, not in seconds ? See
all calls for NowInMinutes(), mMaxCacheLifetime and kPrefDnsCacheExpiration in
attachment 127562 [details] [diff] [review] . As far as I can seem this isn't necessary.
I was looking how to implement bug 208312, which might require a sub-minute
expiration time for negative lookups (fixes in nsHostResolver::ResolveHost and
nsHostResolver::OnLookupComplete).
>Question: why are the expire-calculations done in minutes, not in seconds ? See
>all calls for NowInMinutes(), mMaxCacheLifetime and kPrefDnsCacheExpiration in
>attachment 127562 [details] [diff] [review] . As far as I can seem this isn't necessary.
>I was looking how to implement bug 208312, which might require a sub-minute
>expiration time for negative lookups (fixes in nsHostResolver::ResolveHost and
>nsHostResolver::OnLookupComplete).

hmm... yeah, makes sense.  i was thinking the same thing actually.  if we use
seconds instead, then the patch to add support for cached negative lookups will
be extremely trivial!  i'll make this change (since anyways there's now some time).
Blocks: 137322
Blocks: 168566
Attached patch v1.3 patchSplinter Review
ok, here's the full patch for the tree minus NSPR changes.  to use this patch,
it is necessary to pull NSPR_GETADDRINFO_BRANCH.
Attachment #126969 - Attachment is obsolete: true
Attachment #126970 - Attachment is obsolete: true
Attachment #127562 - Attachment is obsolete: true
Attachment #127563 - Attachment is obsolete: true
Attachment #127566 - Attachment is obsolete: true
Attachment #127751 - Attachment is obsolete: true
Attachment #130495 - Flags: superreview?(bryner)
Attachment #130495 - Flags: review?(dougt)
Comment on attachment 130495 [details] [diff] [review]
v1.3 patch

+/* XXX move this someplace more generic */
+#define NET_DECL_REFCOUNTED_THREADSAFE

why not use NS_IMPL_THREADSAFE_ADDREF and _RELEASE?
http://lxr.mozilla.org/seamonkey/source/xpcom/glue/nsISupportsImpl.h#689
Blocks: 209729
Comment on attachment 130495 [details] [diff] [review]
v1.3 patch

>-  nsresult rv = nsSSLIOLayerAddToSocket(host, port, proxyHost, proxyPort,
>+  nsresult rv = nsSSLIOLayerAddToSocket(family, host, port, proxyHost, proxyPort,
>                                         sock, info, forSTARTTLS);

Why does |family| need to be a parameter to nsSSLIOLayerAddToSocket?  I can't
see any way that it would be needed.


>Index: netwerk/dns/public/nsIDNSService.idl
>===================================================================
>RCS file: /cvsroot/mozilla/netwerk/dns/public/nsIDNSService.idl,v
>retrieving revision 1.9
>diff -p -u -1 -0 -r1.9 nsIDNSService.idl
>--- netwerk/dns/public/nsIDNSService.idl	4 Dec 2002 04:17:51 -0000	1.9
>+++ netwerk/dns/public/nsIDNSService.idl	28 Aug 2003 00:27:53 -0000
>-
>-interface nsIRequest;
>-interface nsIDNSListener;
>-
>-[scriptable, uuid(598f2f80-206f-11d3-9348-00104ba0fd40)]
>+#include "nsIDNSRecord.idl"
>+#include "nsIDNSRequest.idl"
>+#include "nsIDNSListener.idl"
>+

Should these be forward declared instead of included?  I guess maybe you could
assume that anyone including this interface will need the others.

I wish I had more to say, but this all looks fine to me, and I know you've had
it baking for quite awhile.
Attachment #130495 - Flags: superreview?(bryner) → superreview+
>Why does |family| need to be a parameter to nsSSLIOLayerAddToSocket?  I can't
>see any way that it would be needed.

  PR_OpenTCPSocket(PRInt16 af)

basically, the nsISocketProvider needs to know the address family so it can
create a socket that is compatible with the PRNetAddr that we are going to be
passing to PR_Connect later.


>Should these be forward declared instead of included?  I guess maybe you could
>assume that anyone including this interface will need the others.

originally, i had them forward declared, but i found that every consumer of
nsIDNSService simply needed all the other interfaces to, so i decided to just
include them all from nsIDNSService.idl.


thanks for the review brian!
Blocks: 110565
Comment on attachment 130495 [details] [diff] [review]
v1.3 patch

r=dougt.
Attachment #130495 - Flags: review?(dougt) → review+
Has anyone done timing tests? It would be nice to know if this is a 
performance win or loss.
tenthumbs: no, most of the performance information is qualitative.  i have
confirmed that my linux build now does overlapped DNS requests, which can be a
huge user perceptible performance win.  today, without this patch, DNS queries
are serialized on all platforms except windows (and mac classic, which doesn't
count since we don't build it).
ok, this patch has landed on the trunk! :)
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
great work darin!
IPv6 now works! Great work Darin!

There is still a minor problem, though: Moz doesn't fall back to IPv4 if it gets
an error using IPv6. I used Firebird 20030912 on Windows XP and tried to connect
to a host that has an IPv6 address but does not have an IPv6-capable web server.
Firebird tries the IPv6 address and gets a connection refused, but instead of
trying IPv4, it displays a connection refused error message.

In the nsSocketTransport log it says "trying again with next ip address", but
the address it tries is the same IPv6 address that it failed to connect to on
the previous try.

This is particularly serious if the proxy server you have configured has an IPv6
address: in this case, nothing will work.

Should I open another bug for this? In the meantime I'll attach the nspr log.
Note the fact that the log says "trying again with next ip address" but it
tries the same IPv6 address again instead of trying the IPv4 address. It might
be that getaddrinfo() is at fault here; is there any way of finding out what it
is returning?
Attachment #127874 - Attachment is obsolete: true
Attachment #127916 - Attachment is obsolete: true
Attachment #127917 - Attachment is obsolete: true
Lorenzo: please see bug 219088.  thanks for the bug report!
What, exactly, is the preference to be used for disabling pinning?  In reading
through an earlier patch, it appeared to be
"network.socket.addr-pinning-enabled" but I can't find that preference in a
later patch that replaced it, nor does it appear in the about:config of a 9/12
build...
(After re-reading comments several more times.)

Or is pinning simply removed altogether as per comment 27?  (My apologies for
the SPAM here.)
Could someone look over the code changes for this bug relative to retrying
connects?  This appears to be the bug that is causing the regression detailed in
bug <a href="http://bugzilla.mozilla.org/show_bug.cgi?id=219150>219150</a>.

It appears that failed connection attempts are now retried indefinitely.
Blocks: 135724
Blocks: 220498
Please see bug 220498; it must be caused by the patch for this one.
Attachment #127566 - Flags: review?(dougt)
Attachment #126970 - Flags: review?(dmose)
Attachment #127562 - Flags: review?(gordon)
Attachment #127751 - Flags: review?(dougt)
Blocks: 170128
Blocks: 174733
Depends on: 146769
Blocks: 53967
V: fixed.
New problems to new bugs please.
Status: RESOLVED → VERIFIED
No longer blocks: 53967
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: