Closed Bug 1441256 Opened 2 years ago Closed 2 years ago

Loads randomly fails when using TRR

Categories

(Core :: Networking: DNS, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox60 --- fixed

People

(Reporter: valentin, Assigned: bagder)

References

Details

(Whiteboard: [necko-triaged][trr])

Attachments

(1 file)

I've been using TRR mode 2 (TRR-first) and sometimes pages fail to load. The same was reported by Daniel, and by :ekr (in TRR-only mode).

From what I can tell, it's probably related to how we cache DNS responses, as I've only seen it happen when navigating on a page I already had open for a long time, never when navigating to a new domain.

I'll try to reproduce it with a domain with a low TTL.
Here's what happened for me in one of these cases (TRR-first):

1. The TRR AAAA response comes back first and is used before the second (A) response comes back. (in my case for "accounts.google.come" used for gmail etc). The use-first-TRR-response-instead-of-waiting-for-both is meant to be an optimization.
2. My host can't speak IPv6 over the wire, as my ISP doesn't provide it and I have no global IPv6 address. It only works locally. So the IPv6-only addresses we get for the host are unusable. :-(
3. The subsequent connect fails and the host name gets added to the TRR blacklist
4. The backup logic that is supposed to then retry the connection without TRR doesn't seem to work. I can see that Firefox makes a second resolve for the name using the native resolver and it gets both A and AAAA addresses but the browser is still stuck on a "Unable to connect"-page.

If we get the AAAA response first, it is important that we know that IPv6 truly works before we can use that (immediately). I'm not sure we actually have this information anywhere? I presume the reversed situation could be the case for a IPv6-only host.
Step 4 fails because the cache is not bypassed in the second round, so it just reuses the same cached addresses *again* in the second attempt.

Step 1 can still use an optimized approach where TRR only returns "early" if the first answer is type A. I'll deal with that separately in bug 1443489.
Assignee: valentin.gosu → daniel
Priority: P2 → P1
Comment on attachment 8956439 [details]
bug 1441256 - bypass cache when retrying connection without TRR

https://reviewboard.mozilla.org/r/225336/#review231280
Attachment #8956439 - Flags: review?(valentin.gosu) → review+
Depends on: 1443489
rather than bypassing the cache, why don't we purge the entry we know to be invalid?
I believe the result is the same. Bypassing the DNS cache simply means that it won't look for the entry in the cache before looking it up, but the response to the new lookup will be stored in the cache and replace the previous contents for that host.
but there can be other lookups started in that lookup rtt.. you want them to pend on the new cache entry, not reuse the old response. If that already happens with the bypass approach then you're good to go, but its a significant distinction.
Ah yes, that's a very good point.

The existing BYPASS_CACHE functionality does not work like that. It fires up a new resolve while the existing entry is still there so a second resolve on the same host name while the new resolve is in progress will get the (older) cached data back.

I can't say I know exactly what other good purposes we have for using this bit, but it seems like this behavior could be a bit surprising and undesirable even for other use cases.

In the case for this particular bug, fixing this is "just" an optimization since if there are other resolves that get the cached data in the mean time (until the second resolve is done), they will simply also take the connection retry route with BYPASS_CACHE set. In most cases, that's a rather short time period and a limited amount of retries.

I would like to address this shortcoming by introducing yet another bit that works similar to BYPASS_CACHE but with an added instant cache invalidation of the existing entry (if one is present). Called RES_FLUSH_CACHE perhaps?
I'm testing a take with a new bit called REFRESH_CACHE that can be used together with BYPASS_CACHE and if so, simply invalidates the cache entry when the new resolve starts.
Comment on attachment 8956439 [details]
bug 1441256 - bypass cache when retrying connection without TRR

https://reviewboard.mozilla.org/r/225336/#review231676
Attachment #8956439 - Flags: review?(mcmanus) → review+
Pushed by daniel@haxx.se:
https://hg.mozilla.org/integration/autoland/rev/09b580ee681b
bypass cache when retrying connection without TRR r=mcmanus,valentin
https://hg.mozilla.org/mozilla-central/rev/09b580ee681b
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
You need to log in before you can comment on or make changes to this bug.