Discord Not Loading with DNS-over-HTTPS Enabled
Categories
(Core :: Networking: DNS, defect, P2)
Tracking
()
People
(Reporter: nils, Assigned: kershaw)
Details
(Whiteboard: [necko-triaged])
Attachments
(4 files, 3 obsolete files)
142.41 KB,
image/png
|
Details | |
48 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-beta+
|
Details | Review |
48 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-beta+
|
Details | Review |
48 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-beta+
|
Details | Review |
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0
Steps to reproduce:
- Enable DNS-over-HTTPS to any provider
- Load https://discord.com/app
Actual results:
Discord splash page sometimes works, other times the channel/server will load, but the chat window will have greyed out/unloaded content.
As soon as I disable DoH, everything loads properly. Other sites seem to load fine with DoH, except for Discord.
Expected results:
All content should load normally.
Comment 1•4 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Networking: DNS' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.
Assignee | ||
Comment 2•4 years ago
|
||
Could you try to got the http log?
Since this might have something to do with websocket, could you also add nsWebSocket:5
to MOZ_LOG
variable? Thanks.
(In reply to Kershaw Chang [:kershaw] from comment #2)
Could you try to got the http log?
Since this might have something to do with websocket, could you also addnsWebSocket:5
toMOZ_LOG
variable? Thanks.
I went ahead and generated a log file with the added variable. File was fairly large, I didn't have any other tabs open except Discord when the log was generated. Since it was so large, I had to use an external sharing service. URL below:
Assignee | ||
Comment 4•4 years ago
|
||
Thanks for the log, but I can't find anything wrong from the log.
Could you try to do things below and see if you can still reproduce this (please keep DNS-over-HTTPS enabled)?
- Go to
about:config
and disablenetwork.dns.use_https_rr_as_altsvc
. - Try to reproduce with a clean profile.
- Disable
network.http.spdy.websockets
.
Thanks.
(In reply to Kershaw Chang [:kershaw] from comment #4)
Thanks for the log, but I can't find anything wrong from the log.
Could you try to do things below and see if you can still reproduce this (please keep DNS-over-HTTPS enabled)?
- Go to
about:config
and disablenetwork.dns.use_https_rr_as_altsvc
.- Try to reproduce with a clean profile.
- Disable
network.http.spdy.websockets
.Thanks.
I disabled network.dns.use_https_rr_as_altsvc
. This allowed Discord to load normally. Using Cloudflare as the DoH provider.
Assignee | ||
Comment 6•4 years ago
|
||
(In reply to Nils from comment #5)
(In reply to Kershaw Chang [:kershaw] from comment #4)
Thanks for the log, but I can't find anything wrong from the log.
Could you try to do things below and see if you can still reproduce this (please keep DNS-over-HTTPS enabled)?
- Go to
about:config
and disablenetwork.dns.use_https_rr_as_altsvc
.- Try to reproduce with a clean profile.
- Disable
network.http.spdy.websockets
.Thanks.
I disabled
network.dns.use_https_rr_as_altsvc
. This allowed Discord to load normally. Using Cloudflare as the DoH provider.
Thanks! Now I see the problem from the log.
This seems to be related to http3. Here is what happened.
- An HTTPS RR record is used to connect to
discord.com
with http3.
2021-04-12 12:59:34.267000 UTC - [Parent 16268: Socket Thread]: V/nsHttp nsHttpTransaction::OnHTTPSRRAvailable [this=142c1bde800] mActivated=0
2021-04-12 12:59:34.267000 UTC - [Parent 16268: Socket Thread]: V/nsHttp HTTPSSVC: use new routed host (discord.com) and new npnToken (h3-29)
- For some reason, Firefox can't establish the http3 connection, so we try to fallback to h2.
2021-04-12 12:59:34.318000 UTC - [Parent 16268: Socket Thread]: V/nsHttp nsHttpTransaction::OnFastFallbackTimer [142c1bde800] mConnected=0
2021-04-12 12:59:34.318000 UTC - [Parent 16268: Socket Thread]: V/nsHttp HTTPSSVC: use new routed host (discord.com) and new npnToken (h2)
2021-04-12 12:59:34.318000 UTC - [Parent 16268: Socket Thread]: V/nsHttp Init nsHttpConnectionInfo @142be97c200
2021-04-12 12:59:34.318000 UTC - [Parent 16268: Socket Thread]: V/nsHttp NulHttpTransaction::NullHttpTransaction() mActivityDistributor is active [this=142b37eff80, discord.com]
2021-04-12 12:59:34.318000 UTC - [Parent 16268: Socket Thread]: V/nsHttp FindCoalescableConnection .S......[tlsflags0x00000000]discord.com:443 {NPN-TOKEN h2}^partitionKey=%28https%2Cdiscord.com%29
2021-04-12 12:59:34.318000 UTC - [Parent 16268: Socket Thread]: V/nsHttp FindCoalescableConnection(.S......[tlsflags0x00000000]discord.com:443 {NPN-TOKEN h2}^partitionKey=%28https%2Cdiscord.com%29) no matching conn
2021-04-12 12:59:34.318000 UTC - [Parent 16268: Socket Thread]: V/nsHttp GetH2orH3ActiveConn() request for ent 142bfbb1e20 .S......[tlsflags0x00000000]discord.com:443 {NPN-TOKEN h2}^partitionKey=%28https%2Cdiscord.com%29 did not find an active connection
2021-04-12 12:59:34.318000 UTC - [Parent 16268: Socket Thread]: V/nsHttp Init nsHttpConnectionInfo @142be97c7a0
2021-04-12 12:59:34.318000 UTC - [Parent 16268: Socket Thread]: D/nsHttp Destroying nsHttpConnectionInfo @142be97c7a0
2021-04-12 12:59:34.318000 UTC - [Parent 16268: Socket Thread]: V/nsHttp OnMsgSpeculativeConnect Transport not created due to existing connection count
- The speculative connection is failed to create, since the connection limitation is reached (it seems
network.http.speculative-parallel-limit
is 0 here). In the end, the transaction is stayed in the pending queue forever.
Dragana, do you probably have an idea why we failed to establish a http3 connection?
Thanks.
Assignee | ||
Comment 7•4 years ago
|
||
(In reply to Nils from comment #5)
(In reply to Kershaw Chang [:kershaw] from comment #4)
Thanks for the log, but I can't find anything wrong from the log.
Could you try to do things below and see if you can still reproduce this (please keep DNS-over-HTTPS enabled)?
- Go to
about:config
and disablenetwork.dns.use_https_rr_as_altsvc
.- Try to reproduce with a clean profile.
- Disable
network.http.spdy.websockets
.Thanks.
I disabled
network.dns.use_https_rr_as_altsvc
. This allowed Discord to load normally. Using Cloudflare as the DoH provider.
Could you make another http log with network.dns.use_https_rr_as_altsvc
disabled?
Just wondering if the http3 connection can be made by alt-svc header. Thanks.
(In reply to Kershaw Chang [:kershaw] from comment #7)
(In reply to Nils from comment #5)
(In reply to Kershaw Chang [:kershaw] from comment #4)
Thanks for the log, but I can't find anything wrong from the log.
Could you try to do things below and see if you can still reproduce this (please keep DNS-over-HTTPS enabled)?
- Go to
about:config
and disablenetwork.dns.use_https_rr_as_altsvc
.- Try to reproduce with a clean profile.
- Disable
network.http.spdy.websockets
.Thanks.
I disabled
network.dns.use_https_rr_as_altsvc
. This allowed Discord to load normally. Using Cloudflare as the DoH provider.Could you make another http log with
network.dns.use_https_rr_as_altsvc
disabled?
Just wondering if the http3 connection can be made by alt-svc header. Thanks.
Here's the new log as requested:
https://ufile.io/036w1vg7
Comment 9•4 years ago
|
||
This if statement return false :
if (mNumDnsAndConnectSockets < parallelSpeculativeConnectLimit &&
((ignoreIdle &&
(ent->IdleConnectionsLength() < parallelSpeculativeConnectLimit)) ||
!ent->IdleConnectionsLength()) &&
!(keepAlive && ent->RestrictConnections()) && <<<<<<<<<< I think we are failing in this line
!AtActiveConnectionLimit(ent, aTrans->Caps())) { <<<<<<<<< we never call this line because there is no log (the function has loggings)
RestrictConnections() calls AvailableForDispatchNow, AvailableForDispatchNow calls GetH2orH3ActiveConn. GetH2orH3ActiveConn prints:
021-04-12 12:59:34.318000 UTC - [Parent 16268: Socket Thread]: V/nsHttp FindCoalescableConnection .S......[tlsflags0x00000000]discord.com:443 {NPN-TOKEN h2}^partitionKey=%28https%2Cdiscord.com%29
2021-04-12 12:59:34.318000 UTC - [Parent 16268: Socket Thread]: V/nsHttp FindCoalescableConnection(.S......[tlsflags0x00000000]discord.com:443 {NPN-TOKEN h2}^partitionKey=%28https%2Cdiscord.com%29) no matching conn
2021-04-12 12:59:34.318000 UTC - [Parent 16268: Socket Thread]: V/nsHttp GetH2orH3ActiveConn() request for ent 142bfbb1e20 .S......[tlsflags0x00000000]discord.com:443 {NPN-TOKEN h2}^partitionKey=%28https%2Cdiscord.com%29 did not find an active connection
So we probably have connection or unconnected DnsAndConnectSocket. That may happen :(
We need to force creating of a speculative connection and ignore or limits.
Assignee | ||
Comment 10•4 years ago
|
||
Here's the new log as requested:
https://ufile.io/036w1vg7
Thanks for this log!
From the log, I see the http3 connection is made with h3-27
, which is different with the previous one (h3-29
).
2021-04-14 16:20:53.248000 UTC - [Parent 5852: Socket Thread]: V/nsHttp Creating DnsAndConnectSocket [this=241c11d9480 trans=241c4332050 ent=discord.com key=.S......[tlsflags0x00000000]discord.com:443 <ROUTE-via discord.com:443> {NPN-TOKEN h3-27}^partitionKey=%28https%2Cdiscord.com%29]
The reason is that the alt-svc header from discord.com is alt-svc: h3-27=":443"; ma=86400, h3-28=":443"; ma=86400, h3-29=":443"; ma=86400
, but the HTTPS RR is 1 discord.com (alpn=“h3-29,h3-28,h3-27,h2” ipv4hint=“162.159.128.233, 162.159.135.232, 162.159.136.232, 162.159.137.232, 162.159.138.232" )
.
The spec rfc7838 says:
When multiple values are present, the order of the values reflects
the server's preference (with the first value being the most
preferred alternative).
We use h3-27
when alt-svc header is used. However, it seems the order alpn-id
is not defined in this spec. When HTTPS RR is available, Firefox chooses the first supported alpn-id (h3-29
) to connect.
Assignee | ||
Comment 11•4 years ago
|
||
Hi Nils,
Could you check what's the value of network.http.speculative-parallel-limit
at your side?
Thanks.
Reporter | ||
Comment 12•4 years ago
|
||
(In reply to Kershaw Chang [:kershaw] from comment #11)
Hi Nils,
Could you check what's the value ofnetwork.http.speculative-parallel-limit
at your side?
Thanks.
It's currently set to '0'.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 13•4 years ago
|
||
Assignee | ||
Comment 14•4 years ago
|
||
Assignee | ||
Comment 15•4 years ago
|
||
Depends on D112348
Updated•4 years ago
|
Comment 16•4 years ago
|
||
Comment 17•4 years ago
|
||
Backed out 2 changesets (Bug 1703934) as requested on irc by kershaw for causing a possible regression.
https://hg.mozilla.org/integration/autoland/rev/0751c8ab736b1967556c951435df71357f660480
Assignee | ||
Comment 18•4 years ago
|
||
Assignee | ||
Comment 19•4 years ago
|
||
Assignee | ||
Comment 20•4 years ago
|
||
Depends on D113332
Updated•4 years ago
|
Updated•4 years ago
|
Comment 21•4 years ago
|
||
Comment 22•4 years ago
|
||
Backed out 3 changesets (Bug 1703934) for causing xpcshell failures in test_http3_fast_fallback.js
Backout link: https://hg.mozilla.org/integration/autoland/rev/33e6726dee20181958342c8197f76cd37818df67
Push with failures, failure log.
Assignee | ||
Comment 23•4 years ago
|
||
Comment 24•4 years ago
|
||
Comment 25•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/8c713e2f43e7
https://hg.mozilla.org/mozilla-central/rev/4869f5f35758
https://hg.mozilla.org/mozilla-central/rev/f15ea1f11770
Assignee | ||
Comment 26•4 years ago
|
||
Comment on attachment 9217668 [details]
Bug 1703934 - P3: Make sure we always fast fallback to a non-http3 connection, r=dragana
Beta/Release Uplift Approval Request
- User impact if declined: The fast fallback mechanism for http3 is not working for those users who have set
network.http.speculative-parallel-limit
to 0. - Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: N/A
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This patch has been verified in Nightly for two days and we have tests for this.
- String changes made/needed: N/A
Assignee | ||
Updated•4 years ago
|
Comment 27•4 years ago
|
||
Comment on attachment 9217668 [details]
Bug 1703934 - P3: Make sure we always fast fallback to a non-http3 connection, r=dragana
This baked in nightly for a week and is covered by tests, we are in early beta so that seems like a good time to uplift, thanks.
Updated•4 years ago
|
Updated•4 years ago
|
Comment 28•4 years ago
|
||
Kershaw, it seems that you have clear steps to reproduce the bug manually, are you sure that we don't need QA to verify the fix in nightly and beta?
Comment 29•4 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/b52e924fe814
https://hg.mozilla.org/releases/mozilla-beta/rev/b3b6307b6f15
https://hg.mozilla.org/releases/mozilla-beta/rev/5908c2adc03f
Assignee | ||
Comment 30•4 years ago
|
||
(In reply to Pascal Chevrel:pascalc from comment #28)
Kershaw, it seems that you have clear steps to reproduce the bug manually, are you sure that we don't need QA to verify the fix in nightly and beta?
To reproduce this, we need a http3 connection fails first, but I think it's not easy to setup a test environment for this.
I think we don't need QA to verify since we already have an automatic test for this.
Description
•