684893 - Re-implement "happy eyeballs" IPv6 autodetection at TCP open, or similar

Assignee

Description

•

13 years ago

+++ This bug was initially created as a clone of Bug #621558 +++ At the moment, many dual-stack environments are wholly or partially broken, presenting problems for seamless IPv6 deployment. Long timeouts are bad for users, making the system appear to be broken. http://tools.ietf.org/html/draft-wing-http-new-tech-00 "Happy Eyeballs: Successful Introduction of New Technology to HTTP" suggests the use of overlapped attempts to resolve names for and to open IPv4 and IPv6 sockets to HTTP-like network endpoints as a way to autodetect IPv6 operation in a dual-stack environment, without significantly degrading performance if that environment is broken. Support for this, or something similar, in the core networking code would greatly ease the development of practical IPv6 interoperability in Mozilla products. The Internet-Draft describes a number of heuristics that may be possible to improve upon over time -- if this code were to be implemented centrally, this would also provide a good place to implement these new heuristics. I'll take a look at this.

Masatoshi Kimura [:emk]

Comment 1

•

13 years ago

Latest draft: https://tools.ietf.org/html/draft-ietf-v6ops-happy-eyeballs It refers what Firefox and Chrome implemented.

URL: https://tools.ietf.org/html/draft-iet...

Patrick McManus [:mcmanus]

Updated

•

13 years ago

Depends on: 621558

Patrick McManus [:mcmanus]

Updated

•

13 years ago

No longer depends on: 621558

Patrick McManus [:mcmanus]

Updated

•

13 years ago

Depends on: 621558

Neil Harris

Comment 2

•

13 years ago

Interesting. The draft above talks about calling getaddrinfo() to get all the IP addresses. But the same flaky-connectivity problem that can cause problems with TCP can also cause problems with DNS. Perhaps DNS requests should be issued simultaneously via IPv4 and IPv6, letting the two race against one another, and possibly also act as a hint regarding the quality of IPv6 connectivity? And then there's the case of TCP-based DNS lookups, which will provide yet another avenue for timeouts...

Masatoshi Kimura [:emk]

Comment 3

•

13 years ago

Unfortunately, we have no way to know which address the system prefer without calling getaddrinfo() unless the system provides a way to get a prefix policy table.

Dan Wing

Comment 4

•

13 years ago

DNS query response time is affected by dropped DNS query (or response) packets, and if the DNS server has cached the answer (that is, someone else already asked for that A or AAAA record). It isn't a useful indicator of the viability of the IPv6 path to the server, which is what we really want to know. Also, if we don't use getaddrinfo() and instead craft a separate A and AAAA query inside the application, it's impossible to follow the host's IP address preference -- unless you're on Windows which provides an API to have the OS re-order IPv6 and IPv4 addresses into the host's preferred address order. Perhaps other OSs provide a similar API; if they do, that would be a reasonable solution.

Erik

Comment 5

•

13 years ago

Remember that the question being asked (A vs AAAA) and the IP version of the DNS packets are orthogonal so you can't reliably infer the quality of IPv6 connectivity just from the DNS lookup response time. For example, "A" lookups can be made by the resolver over IPv6 or IPv4 and AAAA DNS lookups can be made over either IP version as well. The IP version from the browser to the resolver and the IP version from the resolver to the various authorities may also be different. (The IPv6/IPv4 performance from the browser to the resolver may also not be indicative of general connectivity as that may be over localhost or over a LAN while IPv6 network brokenness commonly may exist when going beyond the LAN, especially in the cases such as a home gateway that provides dualstack IPv6 within the home but then uses flakey 6to4 to reach the world.) If we assume that the browser to resolver connectivity is good (since introducing IPv6 sites doesn't change anything here), then this becomes an issue for the resolver. Many resolvers are fairly good about handling some DNS authorities so should be able to handle some of the authorities having unreachable AAAA records for some of their NS records. As such, this comes down to whether getaddrinfo() issues A and AAAA requests in-parallel (which is preferable from a performance perspective) as well as how the OS getaddrinfo() handles fail-over between multiple configured resolvers (which is independent of IPv4 vs IPv6).

Masatoshi Kimura [:emk]

Comment 6

•

13 years ago

At least both connections can share one getaddrinfo() result. 1. Call getaddrinfo() via nsHostResolver::ResolveHost() with address family AF_UNSPEC. It will return addresses in the system preferred order. 2. Pick the first address from the result. 3. Make a primary connection and start a backup timer. 4. If the primary connection has been established before the timer has been fired, cancel the timer and skip all remaining steps. 5. When backup timer has been fired, call nsHostResolver::ResolveHost() with the same address family and flags as step 1. It will return the result from our internal cache without sending DNS queries. 6.1. If there's only one address in the result, choose the address regardless of the address family and go to step 7. 6.2. Search IPv4 addresses from the result. If IPv4 address is not found, choose the second address from the list regardless of the family and go to step 7. If there's only one IPv4 address in the result, choose the address and go to step 7. 6.3. If the IPv4 address is at the top of the result (which is used by primary connection), choose the second IPv4 address from the result and go to step 7. 6.4. Choose the first IPv4 address from the result. 7. Make a backup connection using the chosen address. 8. When whichever connection has been established, abort another one.

Masatoshi Kimura [:emk]

Comment 7

•

13 years ago

Ugh, the above algorithm didn't consider the case when system prefer the IPv4 address. Read "IPv4" as "other address family".

Dan Wing

Comment 8

•

13 years ago

Yeah, I was just writing about that. Easily fixed. Instead of "search for the first IPv4 address" in Steps 6.2 and 6.3, it needs to be "search for the 'other' IP address family from the first result".

Eric Shepherd [:sheppy]

Updated

•

13 years ago

Keywords: dev-doc-needed

Honza Bambas (:mayhemer)

Assignee

Comment 9

•

13 years ago

Attached patch v1 (obsolete) — Details — Splinter Review

Based on https://bugzilla.mozilla.org/show_bug.cgi?id=679090#c9 and https://bugzilla.mozilla.org/show_bug.cgi?id=621558#c16 this patch does: - starts the backup timer in CONNECTING_TO notification that happens after DN resolution, this is done just ones, there maybe more CONNECTING_TO notifications due to recover - cancels the backup timer in CONNECTED_TO notification, there is no need for a backup connection since then - when a host cannot be resolved and DISABLE_IPV6 flag had been set on a transport, drop that flag and try DN resolving again - when we tried all IPs and DISABLE_IPV6 flag had been set, drop that flag and try DN resolving again The first two fixes also mitigates an issue with too much unused connections we are opening ; I have discovered that some time ago during pipelining stuff review (missing telemetry for that right now..) The remaining parts may get optimized: if there were no IPv4 address in the list of the primary transport -> bypass the second IPV4-including pass. That is vague and racy but may save some resources. The primary goal is to let IPv4 transport fallback to IPV4/6 back again when it is worth to. Also there is a space to cache the information we obtained about the host. But I'd rather do this separately and start getting stick with the happy-eyeballs spec here. I can think of preserving NS_ABORT_IF_FALSE (out == mStreamOut, "timer for non existant stream") somehow if still applicable. Pushed to try: https://tbpl.mozilla.org/?tree=Try&rev=76ebad540b33

Attachment #569507 - Flags: review?(mcmanus)

Masatoshi Kimura [:emk]

Comment 10

•

13 years ago

The tryserver build seems to work for me. However, my IPv6 connection may be too fast to reproduce bug 679090. My ISP provides a native IPv6 connection along with IPv4. Eduardo, could you confirm that this patch wouldn't regress the issue?

Honza Bambas (:mayhemer)

Assignee

Comment 11

•

13 years ago

Attached patch v1.1 — Details — Splinter Review

- fixed missing check for !mSynTimer before call to SetupBackupTimer - added logs

Attachment #569507 - Attachment is obsolete: true

Attachment #569507 - Flags: review?(mcmanus)

Attachment #569768 - Flags: review?(mcmanus)

Lorenzo Colitti

Comment 12

•

13 years ago

Is there a place I can test a build with this patch without building my own from source? I tried the tbpl link above but it says "Loading failed: error".

Honza Bambas (:mayhemer)

Assignee

Comment 13

•

13 years ago

(In reply to Lorenzo Colitti from comment #12) > Is there a place I can test a build with this patch without building my own > from source? I tried the tbpl link above but it says "Loading failed: error". Here it should be: https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/honzab.moz@firemni.cz-ca789b1898bc/ The previous one was probably wiped out, there were some problems with the Try server recently.

Patrick McManus [:mcmanus]

Comment 14

•

13 years ago

I'll get to this review monday or earlier.

Patrick McManus [:mcmanus]

Comment 15

•

13 years ago

Comment on attachment 569768 [details] [diff] [review] v1.1 Review of attachment 569768 [details] [diff] [review]: ----------------------------------------------------------------- The nshttpconnectionmgr changes are good, I see the problem they solve. What do you think about landing them as a separate (first) patch so if the ipv6 specific stuff needs to be backed out we can keep the timer changes? and the whole patch is fine (and smaller than I thought) so r=mcmanus It took me a little while to figure out the root problem being solved from biesi's original patch - can you confirm that christian's patch disabled ipv6 on backup connections and therefore a backup connection made to a v6-only host (as opposed to a dual stack host) always failed. nsHalfOpenSocket::OnOutputStreamReady() passes any connection result (good or bad) through to the http stack - so these failed connections were actually being used (this is the part that wasn't obvious to my memory) and resulted in page load errors. Your patch makes the backup connection prefer v4, but still do v6 if the v4 path doesn't connect. So the backup connections will still work with v6 only stacks. Do I have that right? ::: netwerk/protocol/http/nsHttpConnectionMgr.cpp @@ +1618,5 @@ > + case nsISocketTransport::STATUS_CONNECTING_TO: > + // Passed DNS resolution, now trying to connect, start the backup timer > + // only prevent creating another backup transport > + if (!mBackupTransport && !mSynTimer) > + SetupBackupTimer(); this is a good change

Attachment #569768 - Flags: review?(mcmanus) → review+

Honza Bambas (:mayhemer)

Assignee

Comment 16

•

13 years ago

(In reply to Patrick McManus from comment #15) > Comment on attachment 569768 [details] [diff] [review] [diff] [details] [review] > v1.1 > > Review of attachment 569768 [details] [diff] [review] [diff] [details] [review]: Thanks. > What > do you think about landing them as a separate (first) patch so if the ipv6 > specific stuff needs to be backed out we can keep the timer changes? Yes, I'll split the patch to two. > It took me a little while to figure out the root problem being solved from > biesi's original patch - can you confirm that christian's patch disabled > ipv6 on backup connections and therefore a backup connection made to a > v6-only host (as opposed to a dual stack host) always failed. > nsHalfOpenSocket::OnOutputStreamReady() passes any connection result (good > or bad) through to the http stack - so these failed connections were > actually being used (this is the part that wasn't obvious to my memory) and > resulted in page load errors. Your patch makes the backup connection prefer > v4, but still do v6 if the v4 path doesn't connect. So the backup > connections will still work with v6 only stacks. Do I have that right? This is exactly that. I thought I wrote that in the comment with the patch.. But this is better, you double-confirmed this way the solution is correct.

Honza Bambas (:mayhemer)

Assignee

Comment 17

•

13 years ago

Comment on attachment 569768 [details] [diff] [review] v1.1 Landed as two parts: https://hg.mozilla.org/integration/mozilla-inbound/rev/790fff5fd07a https://hg.mozilla.org/integration/mozilla-inbound/rev/3af633ba04ab

Attachment #569768 - Flags: checkin+

Honza Bambas (:mayhemer)

Assignee

Updated

•

13 years ago

Whiteboard: [inbound]

Marco Bonardo [:mak]

Comment 18

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/790fff5fd07a https://hg.mozilla.org/mozilla-central/rev/3af633ba04ab Unless you need it for you tracking purposes, the [inbound] status whiteboard entry is not mandatory anymore (we'd prefer not having to handle it, indeed).

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Whiteboard: [inbound]

Target Milestone: --- → mozilla10

Honza Bambas (:mayhemer)

Assignee

Updated

•

13 years ago

Depends on: 700402

Matthias Versen [:Matti]

Updated

•

13 years ago

Blocks: 725587

v1 13 years ago Honza Bambas (:mayhemer) 9.72 KB, patch		Details \| Diff \| Splinter Review
v1.1 13 years ago Honza Bambas (:mayhemer) 10.45 KB, patch	mcmanus : review+ mayhemer : checkin+	Details \| Diff \| Splinter Review