Closed Bug 1518730 Opened 5 years ago Closed 5 years ago

TRR: Firefox prefers IPv4 no matter what

Categories

(Core :: Networking: DNS, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla69
Tracking Status
firefox66 --- wontfix
firefox69 --- fixed

People

(Reporter: kubrick, Assigned: valentin)

References

Details

(Whiteboard: [necko-triaged][trr])

Attachments

(5 files)

With
network.trr.mode;3

Even with:
network.http.fast-fallback-to-IPv4;false
network.trr.early-AAAA;true

Firefox refuses to prefer IPv6 if a hostname also has a A record (but it does use IPv6 if no other choice is possible).

Switching to TRR mode 0 (disabled) works as expected and IPv6 is preferred all the time.

Build ID 20190108215840
Arch Linux

Excellent bug report. Thanks!

Assignee: nobody → valentin.gosu
Blocks: 1434852
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P3
Whiteboard: [necko-triaged][trr]

Hi Francois,

I just got around to testing this. I'm running in mode3 and I see a bunch of connections to ipv6. I tested on google.com
I looked in the devtools Networking panel, and I see resources loaded from IPv6 addresses.
I also looked at about:networking#sockets and I see connections to IPv6 addresses.

Would you mind checking if you can still reproduce the bug with the latest Nightly?
And if you can, it would be great if you could record some logs for us: https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

Thanks!

Flags: needinfo?(kubrick)

I will check when I get some IPv6 connectivity but if I remember correctly, it's not about not resolving in IPv6, it does the correct lookups, it's just that it prefers IPv4 if it gets both.

Flags: needinfo?(kubrick)
Flags: needinfo?(kubrick)

(In reply to Francois Guerraz from comment #3)

I will check when I get some IPv6 connectivity but if I remember correctly, it's not about not resolving in IPv6, it does the correct lookups, it's just that it prefers IPv4 if it gets both.

That's what I checked, and I easily got it to use IPv6, without setting fast-fallback-to-IPv4;false or network.trr.early-AAAA;true
But what happens sometimes, is that if a connection fails to use IPv6 and falls back to IPv4, then it will remember this preference. You're likely to see that for long-lived HTTP/2 connections. The reason why the IPv6 fails would be interesting to find out. It may be due to the fact that TRR doesn't use ECS information, and the IPv6 address you get from it might have to take a different route.

In any case, if you are able to, please send us the logs. Thanks!

Attached file log.txt-main.14023.xz

Network log with TRR=3. Build ID 20190521094447
I can still reproduce the issue, despite good IPv6 connectivity (sub 30ms for this server) firefox prefers IPv4 to connect to https://www.fgv6.net/

Flags: needinfo?(kubrick)

Just to be clear, this is not just happening with my website. If I ask google what's my IP, I'll get the v4 with TRR and the v6 with normal DNS, same for https://test-ipv6.com/, it says "Your browser has real working IPv6 address - but is avoiding using it. We're concerned about this" with TRR.

Hi Francois,

Thanks for the logs. Did you have network.trr.early-AAAA;true while you got them? Because it seems we sent the IPv4 request first.

2019-05-21 16:52:03.061925 UTC - [Parent 14023: Socket Thread]: D/nsHostResolver TRR Resolve www.fgv6.net type 1 (IPv4)
2019-05-21 16:52:03.061944 UTC - [Parent 14023: Socket Thread]: D/nsHostResolver TRR Resolve www.fgv6.net type 28 (IPv6)

Also, are you using the cloudflare DoH resolver, or another?
And do you have any networking/trr prefs that have been changed?

In any case, it looks like we get the A - IPv4 response first, so we go on using that. At some point we actually record a preference for the IPv4 address, so the IPv6 never gets to be used. I'll keep looking into this. It's possible this is just a normal consequence of the fact that TRR responses arrive separately, as opposed to getaddrinfo, where the OS merges the A and AAAA responses in single one.

// TRR response with IPv4 address is received
2019-05-21 16:52:03.109080 UTC - [Parent 14023: Main Thread]: D/nsHostResolver TRR:OnStopRequest 0x7f1ea13e7000 www.fgv6.net 1 failed=0 code=0
2019-05-21 16:52:03.109089 UTC - [Parent 14023: Main Thread]: D/nsHostResolver doh decode www.fgv6.net 128 bytes
2019-05-21 16:52:03.109094 UTC - [Parent 14023: Main Thread]: D/nsHostResolver TRR Decode: 2 answer records (128 bytes body) www.fgv6.net index=30
2019-05-21 16:52:03.109100 UTC - [Parent 14023: Main Thread]: D/nsHostResolver TRR::DohDecode CNAME host www.fgv6.net => igor.fgv6.net
2019-05-21 16:52:03.109106 UTC - [Parent 14023: Main Thread]: D/nsHostResolver done with record type 5 len 7 index now 49 of 128
2019-05-21 16:52:03.109113 UTC - [Parent 14023: Main Thread]: D/nsHostResolver TRR asked for www.fgv6.net data but got igor.fgv6.net
2019-05-21 16:52:03.109119 UTC - [Parent 14023: Main Thread]: D/nsHostResolver done with record type 1 len 4 index now 65 of 128
2019-05-21 16:52:03.109123 UTC - [Parent 14023: Main Thread]: D/nsHostResolver TRR Decode: 0 ns records (128 bytes body)
2019-05-21 16:52:03.109126 UTC - [Parent 14023: Main Thread]: D/nsHostResolver TRR Decode: 1 additional resource records (128 bytes body)
2019-05-21 16:52:03.109130 UTC - [Parent 14023: Main Thread]: D/nsHostResolver done with additional rr now 128 of 128
2019-05-21 16:52:03.109133 UTC - [Parent 14023: Main Thread]: D/nsHostResolver TRR: check for CNAME record for igor.fgv6.net within previous response
2019-05-21 16:52:03.109137 UTC - [Parent 14023: Main Thread]: D/nsHostResolver doh decode igor.fgv6.net 128 bytes
2019-05-21 16:52:03.109141 UTC - [Parent 14023: Main Thread]: D/nsHostResolver TRR Decode: 2 answer records (128 bytes body) www.fgv6.net index=30
2019-05-21 16:52:03.109144 UTC - [Parent 14023: Main Thread]: D/nsHostResolver TRR asked for igor.fgv6.net data but got www.fgv6.net
2019-05-21 16:52:03.109148 UTC - [Parent 14023: Main Thread]: D/nsHostResolver done with record type 5 len 7 index now 49 of 128
2019-05-21 16:52:03.109163 UTC - [Parent 14023: Main Thread]: D/nsHostResolver DOHresp:Add 151.80.214.201
2019-05-21 16:52:03.109168 UTC - [Parent 14023: Main Thread]: D/nsHostResolver done with record type 1 len 4 index now 65 of 128
2019-05-21 16:52:03.109171 UTC - [Parent 14023: Main Thread]: D/nsHostResolver TRR Decode: 0 ns records (128 bytes body)
2019-05-21 16:52:03.109174 UTC - [Parent 14023: Main Thread]: D/nsHostResolver TRR Decode: 1 additional resource records (128 bytes body)
2019-05-21 16:52:03.109178 UTC - [Parent 14023: Main Thread]: D/nsHostResolver done with additional rr now 128 of 128
2019-05-21 16:52:03.109186 UTC - [Parent 14023: Main Thread]: D/nsHostResolver TRR: Got the CNAME record without asking for it
2019-05-21 16:52:03.109193 UTC - [Parent 14023: Main Thread]: D/nsHostResolver nsHostResolver::CompleteLookup www.fgv6.net 0x7f1eab4968d0 0 trr=1 stillResolving=1
2019-05-21 16:52:03.109197 UTC - [Parent 14023: Main Thread]: D/nsHostResolver TRR lookup Complete (1) www.fgv6.net OK
2019-05-21 16:52:03.109202 UTC - [Parent 14023: Main Thread]: D/nsHostResolver nsHostResolver record 0x7f1ea5a8c580 new gencnt
2019-05-21 16:52:03.109207 UTC - [Parent 14023: Main Thread]: D/nsHostResolver Caching host [www.fgv6.net] record for 300 seconds (grace 0).
2019-05-21 16:52:03.109211 UTC - [Parent 14023: Main Thread]: D/nsHostResolver CompleteLookup: www.fgv6.net has 151.80.214.201
// We pass the IPv4 address back to the callers
2019-05-21 16:52:03.109215 UTC - [Parent 14023: Main Thread]: D/nsHostResolver nsHostResolver record 0x7f1ea5a8c580 calling back dns users

// We get back the IPv6 response too.
2019-05-21 16:52:03.137361 UTC - [Parent 14023: Main Thread]: D/nsHostResolver nsHostResolver::CompleteLookup www.fgv6.net 0x7f1eab4a9c90 0 trr=28 stillResolving=0
2019-05-21 16:52:03.137393 UTC - [Parent 14023: Main Thread]: D/nsHostResolver TRR lookup Complete (28) www.fgv6.net OK
2019-05-21 16:52:03.137406 UTC - [Parent 14023: Main Thread]: D/nsHostResolver different_rrset www.fgv6.net
2019-05-21 16:52:03.137461 UTC - [Parent 14023: Main Thread]: D/nsHostResolver nsHostResolver record 0x7f1ea5a8c580 new gencnt
2019-05-21 16:52:03.137467 UTC - [Parent 14023: Main Thread]: D/nsHostResolver Caching host [www.fgv6.net] record for 300 seconds (grace 0).
2019-05-21 16:52:03.137473 UTC - [Parent 14023: Main Thread]: D/nsHostResolver nsHostResolver Suppressing callback for second TRR response for www.fgv6.net
2019-05-21 16:52:03.137481 UTC - [Parent 14023: Main Thread]: D/nsHostResolver CompleteLookup: www.fgv6.net has 2001:41d0:a:c44::2
2019-05-21 16:52:03.137508 UTC - [Parent 14023: Main Thread]: D/nsHostResolver CompleteLookup: www.fgv6.net has 151.80.214.201

// preference is recorded. will only use IPv4 from now on.
// Unclear to me why this happens.
2019-05-21 16:52:03.137721 UTC - [Parent 14023: Socket Thread]: V/nsHttp nsConnectionEntry::RecordIPFamilyPreference 0x7f1ea6834660, af=2
2019-05-21 16:52:03.137725 UTC - [Parent 14023: Socket Thread]: V/nsHttp   0x7f1ea6834660 prefer ipv4=1, ipv6=0

I have this issue too. I would much prefer that Firefox always preferred IPv6 when available instead of legacy IPv4.

Attached file log.txt-main.2597.xz

Ok, so I can confirm that I didn't have network.trr.early-AAAA;true for the last test, with that pref, it works sometimes, there must be a race condition or something. See the attached log for www.whatismyip.net, it resolved to IPv4 despite of the option.

(In reply to Francois Guerraz from comment #9)

with that pref, it works sometimes, there must be a race condition or something.

Thanks for confirming. The race is caused by the connection to the DoH server. Even if we send the AAAA request first, there is no guarantee that the response will arrive before the A one.
But there is indeed some room for improvement here. We could at least automatically set early-AAAA to true when we detect the user has IPv6 connectivity.

I wonder what "first" means, if it's one immediately after the other then in practical terms that µs is going to make no difference compared to network jitter, or is it sent over the same TCP connection (and therefore in order?).

(In reply to Francois Guerraz from comment #11)

I wonder what "first" means, if it's one immediately after the other then in practical terms that µs is going to make no difference compared to network jitter, or is it sent over the same TCP connection (and therefore in order?).

They are sent over the same HTTP2 over TCP connection. But the problem is at the server. It needs to perform the two DNS resolutions. If it gets the IPv6 request first, it might send back the IPv6 response first, but that's not guaranteed. All we can do is to send the IPv6 request first when we know it will be used.

Ok, so TRR is not really good for IPv6 transition, unless you're willing to play DNS roulette.
I would suggest doing away with network.trr.early-AAAA as all it gives is a revers probablity to get AAAA rather than A and do what normal resolvers do, fire both requests and wait for both of them. I know it sucks from a performance point of view but I prefer determinism, so I would prefer a network.trr.AAAA-timeout option, my opinion.

This is an optimization. If we detect that the system can't use the IPv6
address, there's no point in making a request for it.

Depends on D33475

This way we preserve the behaviour of getaddrinfo, where both A and AAAA
responses come back at the same time.
Without this Firefox will always be biased, as the first request will usually
be resolved first. So if we requested IPv4 first, we'd mostly be using IPv4.
If we requested IPv6 first, normally we'll wait for the IPv4 response to come
back too, which is functionally equivalent to the new behaviour.
However, if the pref is set network.trr.early-AAAA;true then we'd use the IPv6
response immediately, possibly leading to a failed request if the IPv6
connection fails before we have an IPv4 address to fall back to.

A test for this patch was added in bug 1542561.

Depends on D33476

Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/61ee2e790ebd
Disable TRR for connectivity checks r=dragana
https://hg.mozilla.org/integration/autoland/rev/b857157a05c3
Only send AAAA TRR requests when the system has IPv6 connectivity r=dragana
https://hg.mozilla.org/integration/autoland/rev/bf3e5c1163b7
Wait for both A and AAAA responses to come back before notifying the listeners r=dragana

This fix causes a persistent startup crash on my machine, for instance: https://crash-stats.mozilla.org/report/index/f199dbcf-e58e-4fcc-8114-999d50190605

The relevant pieces of the report are:

MOZ_CRASH(Using observer service off the main thread!)
nsObserverService::AddObserver(nsIObserver*, char const*, bool) 	xpcom/ds/nsObserverService.cpp:208
mozilla::net::NetworkConnectivityService::Init() 	netwerk/base/NetworkConnectivityService.cpp:39
mozilla::net::NetworkConnectivityService::GetSingleton() 	netwerk/base/NetworkConnectivityService.cpp:29

All previous callers to this function were xpcom, so I guess they were probably going through the main thread, but this new code is what my instance routinely hits.

Just for a reference:
Setting "network.trr.mode" to "3" causes crash.
Setting "network.trr.mode" to "2" doesn't cause crash.

Regressions: 1556950
Regressions: 1556911
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: