Tabs do not load after restart with network.trr.fetch_off_main_thread=true and under specific condition
Categories
(Core :: Networking: DNS, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr68 | --- | unaffected |
firefox73 | --- | unaffected |
firefox74 | --- | unaffected |
firefox75 | --- | fixed |
firefox76 | --- | fixed |
People
(Reporter: Fanolian+BMO, Assigned: kershaw)
References
(Regression)
Details
(Keywords: nightly-community, regression, reproducible, Whiteboard: [necko-triaged])
Attachments
(8 files)
3.48 MB,
video/mp4
|
Details | |
1017.79 KB,
text/plain
|
Details | |
748.39 KB,
text/plain
|
Details | |
249.24 KB,
application/octet-stream
|
Details | |
1.38 MB,
application/octet-stream
|
Details | |
2.01 MB,
video/mp4
|
Details | |
47 bytes,
text/x-phabricator-request
|
jcristau
:
approval-mozilla-beta+
|
Details | Review |
47 bytes,
text/x-phabricator-request
|
jcristau
:
approval-mozilla-beta+
|
Details | Review |
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0
Build ID: 20200307213339
Tabs may not load properly only when right after starting the browser.
Here are the requirements which I cannot reduce any further:
network.trr.fetch_off_main_thread
istrue
- Firefox Private Network extension is logged in (I use it outside US so your result may vary)
- uMatrix is enabled. Settings (and rules) leave as default.
- (Maybe) a slow computer, network, or Firefox profile
Steps to reproduce
- Use Nightly 2020-03-07 build in a new profile. Prepare the above stuff.
- Open https://www.reddit.com. Pinning the tab is not necessary but easier to test since the tab will automatically load right after starting the browser.
** I think any (heavy) site would work as long as it does not load instantly. - Focus the tab, restart browser either by:
a.Ctrl-Alt-R
in Browser Console; or
b. EnableRestore previous session
, close with X and start browser again. - Repeat Step 3 if you fail to reproduce the issue.
Actual result
(Please refer to the attached video.)
The tab fails to load. Reloading it brings everything back to normal.
The issue can be reproduced 100% on my main profile, which I do not want to share, but only occasionally in a new profile, as seen in the attached video.
The maximum number of pinned tabs failed to load I have seen is 3 which is the maximum number of concurrent restores after restarting the browser.
Notes
I cannot reproduce the issue if either requirement 1, 2, or 3 is omitted. Or maybe I did not re-test those conditions enough.
Other sites that fail to load include but not limited to logged-in Gmail, Google Calendar, and Pocket.
From Mozregression,
Last good Nightly: 2020-03-05
First bad Nightly: 2020-03-06
Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=c991b81a783355827d979bfd7e221fb3c6940662&tochange=d2ac41047c108daa9a8933fa53af59d695063d6f
Autoland bisection:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=beedb2cf04024ae27dc7f891aa65a4d90b97e9cf&tochange=e716e788e64d950f612a30bb9dbcaf71afa8ee44
Regressed by bug 1620300.
Comment 2•5 years ago
|
||
Kershaw, please have a look at this bug.
I toggle network.trr.fetch_off_main_thread
to true
manually and test in previous builds. The regression range is:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=d64dfccd44ed694c4f323ac7c27afbe327f497ca&tochange=4c8620247a5e626d21fa5818d9a636d813d29dd9
This should be regressed by bug 1615335.
Comment 4•5 years ago
|
||
I got this too, now with Build ID 20200308095244. But I use network.trr.mode = 3 (required) with a third-party pihole-based DoH resolver.
Upon restarting after the update, everything failed to load. Looking at the Browser Console window there is a GET for the page but no GET for the DNS to the TRR resolver like there usually is - and the page load times out without DNS resolved. Only sites in network.trr.excluded-domains work.
Conversely when I turn it off and reload the a page, I get an appropriate DNS request showing up (four, in fact - two before the page is loaded and two after).
In case it matters, this is a 2011-era netbook running a dual-core AMD Bobcat E-350. Pretty slow, but it normally works (though, sometimes when loading many tabs, I get an alternating pattern of one tab loading and one not for a while).
network.trr.bootstrapAddress 2a00:d880:5:bf0::7c93
network.trr.uri https://dns.aaflalo.me/dns-query
doesn't seem to matter if DNS requests are GET or POST.
Assignee | ||
Comment 5•5 years ago
|
||
I guess somehow the TRR background thread is not created.
Could you try to capture the http log?
Thanks.
Comment 6•5 years ago
|
||
It seems like something is happening, but when it's broken we get "TRR bad incoming DOH, eject!" and "TRR::On200Response DohDecode 80070057" (ERROR_INVALID_PARAMETER).
Comment 7•5 years ago
|
||
When it's working sometimes there is a "TRR::On200Response DohDecode 80004005" (unspecified error) but later it works.
Maybe there is something different about how the HTTP part is handled, e.g. gzip processing?
This is a log using command line arguments. I start the browser with the profile used in comment 0, and close the browser when I see the favicon changes to a warning sign indicating the page cannot be loaded.
Comment 9•5 years ago
|
||
I think the gzip is the issue. By default network.http.accept-encoding.secure == "gzip, deflate, br", but nsHttpCompresssConv is not invoked in the background thread. If I blank out network.http.accept-encoding.secur, the request succeeds.
Comment 10•5 years ago
|
||
The sizes are different in this case and the response does not contain the characteristic GZ header (1F 8B 08 00).
This trace was a force-refresh to cause DNS to be re-retrieved after setting network.http.accept-encoding.secure=''
Assignee | ||
Comment 11•5 years ago
|
||
(In reply to Laurence "GreenReaper" Parry from comment #10)
Created attachment 9131748 [details]
Log when network.trr.fetch_off_main_thread=true and network.http.accept-encoding.secure='' (working without gzip response)The sizes are different in this case and the response does not contain the characteristic GZ header (1F 8B 08 00).
This trace was a force-refresh to cause DNS to be re-retrieved after setting network.http.accept-encoding.secure=''
Thanks for the log!
Yes, the problem is that TRRServiceChannel
doesn't handle gzip encoding. I'll try to fix this.
Reporter | ||
Comment 12•5 years ago
|
||
Here is an easier way to reproduce that requires Firefox Private Network (FPN) only. This may be the same issue.
Steps to reproduce
- In a new profile in Nightly 2020-03-07 or 08 build, install FPN extension, log in and activate it.
- Go to about:addons.
Actual result
The recommendations cannot be loaded. Please refer to the attached video.
Workaround
Toggle network.trr.fetch_off_main_thread
to false
and reload about:addons.
Notes
Turning FPN off (and enable DoH in about:preferences) while leaving network.trr.fetch_off_main_thread = true
can also fix the issue. The bug, however, does not present in Nightly Mar 05 build which introduced network.trr.fetch_off_main_thread
but was defaulted to false
.
Assignee | ||
Comment 13•5 years ago
|
||
Assignee | ||
Comment 14•5 years ago
|
||
(In reply to Fanolian from comment #12)
Created attachment 9131754 [details]
comment #12: conflict with FPN.mp4Here is an easier way to reproduce that requires Firefox Private Network (FPN) only. This may be the same issue.
Steps to reproduce
- In a new profile in Nightly 2020-03-07 or 08 build, install FPN extension, log in and activate it.
- Go to about:addons.
Actual result
The recommendations cannot be loaded. Please refer to the attached video.
Workaround
Toggle
network.trr.fetch_off_main_thread
tofalse
and reload about:addons.Notes
Turning FPN off (and enable DoH in about:preferences) while leaving
network.trr.fetch_off_main_thread = true
can also fix the issue. The bug, however, does not present in Nightly Mar 05 build which introducednetwork.trr.fetch_off_main_thread
but was defaulted tofalse
.
Thanks! I can reproduce with this STR.
From the log, I saw that the TRRServiceChannel
failed to connect to the proxy, since the proxy returned 407
response.
I think the root cause is that the FPN
addon is unable to add Proxy-Authorization
header to TRRServiceChannel
, since TRRServiceChannel
doesn't send http-on-*
notifications at all.
There could be two workarounds:
- Let
TRRServiceChannel
ignore proxy settings, soTRRServiceChannel
always connects to DoH server directly. - Turn off
network.trr.fetch_off_main_thread
byFPN
addon.
Valentin, Dragana, what do you think?
Comment 15•5 years ago
|
||
Taking a step back here (and apologies if it's not the right place): does it actually make sense to request compression for these (usually) very small requests?
If I'm reasing this correctly, we will now be switching back to the main thread to do decompression, before returning to the TRR thread, only to come back again - with context-switching eating into the gains of threading. And it's likely similar on the other end to compress it, perhaps increasing latency.
Moreover, the header specifying encodings likely increases the amount sent upstream (though maybe less for HTTP/2), which in common home environments can be an order of magnitude lower in bandwidth thank downstream.
Maybe it'd be more sensible to just remove Accept-Encoding from the request, unless there are some specific queries likely to be of a larger size? Or better, to specify a blank Accept-Encoding, as:
"If no Accept-Encoding field is in the request, any content-coding is considered acceptable by the user agent."
...so it could just decide gzip is fine, but:
"An Accept-Encoding header field with a combined field-value that is empty implies that the user agent does not want any Content-Encoding in response. If an Accept-Encoding header field is present in a request and none of the available representations for the response have a content-coding that is listed as acceptable, the origin server SHOULD send a response without any content-coding."
(In practice I suspect most servers would just send it plain without Accept-Encoding. But maybe that isn't universal.)
Obviously SHOULD is not MUST, and we should handle the situation where the response has a content encoding. But perhaps it should not be the default that we request.
Assignee | ||
Comment 16•5 years ago
|
||
(In reply to Laurence "GreenReaper" Parry from comment #15)
Taking a step back here (and apologies if it's not the right place): does it actually make sense to request compression for these (usually) very small requests?
If I'm reasing this correctly, we will now be switching back to the main thread to do decompression, before returning to the TRR thread, only to come back again - with context-switching eating into the gains of threading. And it's likely similar on the other end to compress it, perhaps increasing latency.
Moreover, the header specifying encodings likely increases the amount sent upstream (though maybe less for HTTP/2), which in common home environments can be an order of magnitude lower in bandwidth thank downstream.
Maybe it'd be more sensible to just remove Accept-Encoding from the request, unless there are some specific queries likely to be of a larger size? Or better, to specify a blank Accept-Encoding, as:
"If no Accept-Encoding field is in the request, any content-coding is considered acceptable by the user agent."
...so it could just decide gzip is fine, but:
"An Accept-Encoding header field with a combined field-value that is empty implies that the user agent does not want any Content-Encoding in response. If an Accept-Encoding header field is present in a request and none of the available representations for the response have a content-coding that is listed as acceptable, the origin server SHOULD send a response without any content-coding."
(In practice I suspect most servers would just send it plain without Accept-Encoding. But maybe that isn't universal.)Obviously SHOULD is not MUST, and we should handle the situation where the response has a content encoding. But perhaps it should not be the default that we request.
This is a good suggestion. Thanks.
I'll update my patch and remove the Accept-Encoding
header.
Comment 17•5 years ago
|
||
Thanks! I can reproduce with this STR.
From the log, I saw that theTRRServiceChannel
failed to connect to the proxy, since the proxy returned407
response.
I think the root cause is that theFPN
addon is unable to addProxy-Authorization
header toTRRServiceChannel
, sinceTRRServiceChannel
doesn't sendhttp-on-*
notifications at all.
There could be two workarounds:
- Let
TRRServiceChannel
ignore proxy settings, soTRRServiceChannel
always connects to DoH server directly.- Turn off
network.trr.fetch_off_main_thread
byFPN
addon.Valentin, Dragana, what do you think?
Turn off network.trr.fetch_off_main_thread. TRR is not really used a lot with FPN. There are just special cases, like WebRTC, that are using it.
Assignee | ||
Comment 18•5 years ago
|
||
(In reply to Dragana Damjanovic [:dragana] from comment #17)
Thanks! I can reproduce with this STR.
From the log, I saw that theTRRServiceChannel
failed to connect to the proxy, since the proxy returned407
response.
I think the root cause is that theFPN
addon is unable to addProxy-Authorization
header toTRRServiceChannel
, sinceTRRServiceChannel
doesn't sendhttp-on-*
notifications at all.
There could be two workarounds:
- Let
TRRServiceChannel
ignore proxy settings, soTRRServiceChannel
always connects to DoH server directly.- Turn off
network.trr.fetch_off_main_thread
byFPN
addon.Valentin, Dragana, what do you think?
Turn off network.trr.fetch_off_main_thread. TRR is not really used a lot with FPN. There are just special cases, like WebRTC, that are using it.
I filed an issue for this.
Assignee | ||
Comment 19•5 years ago
|
||
Comment 20•5 years ago
|
||
Assignee | ||
Comment 21•5 years ago
|
||
Leave this open until the fix in FPN extension is verified.
Comment 22•5 years ago
|
||
bugherder |
Assignee | ||
Comment 23•5 years ago
|
||
Comment on attachment 9131755 [details]
Bug 1620824 - Add content encoding support for TRRServiceChannel
Beta/Release Uplift Approval Request
- User impact if declined: DoH is not working if the server returns a response content with gzip encoding.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Most of the patch did is sending an empty Accept-Encoding header, so it's not risky.
The code is already verified by the trr unit test. - String changes made/needed:
Assignee | ||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 24•5 years ago
|
||
(In reply to Kershaw Chang [:kershaw] from comment #21)
Leave this open until the fix in FPN extension is verified.
Where/when is that happening?
Assignee | ||
Comment 25•5 years ago
|
||
(In reply to Julien Cristau [:jcristau] from comment #24)
(In reply to Kershaw Chang [:kershaw] from comment #21)
Leave this open until the fix in FPN extension is verified.
Where/when is that happening?
I am not sure when the FPN extension will be updated.
baku, do you know if we have a nightly version of FPN that can be used to verify if turning off network.trr.fetch_off_main_thread
is working?
Thanks.
Comment 26•5 years ago
|
||
I can provide an unsigned version. I'll share an unsigned version on slack.
Assignee | ||
Comment 27•5 years ago
|
||
Hi Fanolian,
Would you like test the unsigned FPN extension?
I can send it to you via email.
Thanks.
Reporter | ||
Comment 28•5 years ago
|
||
(In reply to Kershaw Chang [:kershaw] from comment #27)
Hi Fanolian,
Would you like test the unsigned FPN extension?
I can send it to you via email.Thanks.
Sure. Thanks. You can send it to my bugzilla account email.
Comment 29•5 years ago
|
||
Fanolian, in order to test the unsigned version, there are a few things to do:
- set pref extensions.experiments.enabled to true
- install the XPI from about:debugging, as a temporary addon.
If you have any question, you can write me directly on slack/matrix/email. Thanks.
Reporter | ||
Comment 30•5 years ago
|
||
The issue in about:addons (see comment 12) is fixed with FPN v21. Thanks.
I cannot test the issue in comment 0. Extension installed in about:debugging is removed after a restart.
However I cannot get to FPN's login page if I try to install it as a permanent extension, with extensions.experiments.enabled=true
and xpinstall.signatures.required=false
.
FPN says Connection timeout, check your internet connection.
when I click the extension icon on toolbar. In Browser Console (with Show Content Messages
checked in the gear icon) there is an error regarding the problem:
Error: Attempt to postMessage on disconnected port view.js:194:24
Assignee | ||
Comment 31•5 years ago
|
||
(In reply to Fanolian from comment #30)
The issue in about:addons (see comment 12) is fixed with FPN v21. Thanks.
I cannot test the issue in comment 0. Extension installed in about:debugging is removed after a restart.However I cannot get to FPN's login page if I try to install it as a permanent extension, with
extensions.experiments.enabled=true
andxpinstall.signatures.required=false
.
FPN saysConnection timeout, check your internet connection.
when I click the extension icon on toolbar. In Browser Console (withShow Content Messages
checked in the gear icon) there is an error regarding the problem:Error: Attempt to postMessage on disconnected port view.js:194:24
Thanks for testing.
I think we can close this bug, since it's verified that network.trr.fetch_off_main_thread
is set to false by FPN
extension and network.trr.fetch_off_main_thread
is the required condition for the issue in comment 0.
Updated•5 years ago
|
Updated•5 years ago
|
Comment 32•5 years ago
|
||
Comment on attachment 9131755 [details]
Bug 1620824 - Add content encoding support for TRRServiceChannel
approved for 75.0b5
Updated•5 years ago
|
Comment 33•5 years ago
|
||
bugherder uplift |
Updated•5 years ago
|
Description
•