Non-tab extension loads do not support DocumentChannel process switches
Categories
(WebExtensions :: General, defect, P3)
Tracking
(Fission Milestone:M7, firefox-esr78 disabled, firefox86 disabled, firefox87 disabled, firefox88 fixed)
People
(Reporter: nika, Assigned: zombie)
References
(Blocks 1 open bug)
Details
(Whiteboard: [ETA ? patches landed but backed out])
Crash Data
Attachments
(1 file)
This means that they will not correctly switch processes during loads using DocumentChannel, and the BrowsingContextGroup selected by the changes in bug 1599579 will not be correct.
These non-tab browsers should be updated to support the APIs added in bug 1640019.
Assignee | ||
Comment 1•4 years ago
|
||
Nika, is this something we need to do when we setup <browser> elements?
Reporter | ||
Comment 2•4 years ago
|
||
Yes, when setting up the <browser>
element, you need to set the maychangeremoteness
attribute to true
.
If you need to handle the process switch in extension code, you can add "WillChangeBrowserRemoteness"
and "DidChangeBrowserRemoteness"
listeners which will fire immediately before and after the process change occurs. If you need to do an async flush of state from the old process first, you can also override the prepareToChangeRemoteness
async method to do some async work, but ideally you won't need that.
Reporter | ||
Updated•4 years ago
|
Updated•4 years ago
|
Comment 3•4 years ago
|
||
The severity field is not set for this bug.
:mixedpuppy, could you have a look please?
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 4•4 years ago
|
||
Thanks Nika.
Kashav, can you please confirm how https://phabricator.services.mozilla.com/D80987 works without changes that Nika mentions are needed?
Assignee | ||
Updated•4 years ago
|
The test works because we're still process-switching subframes for remote browsers: https://searchfox.org/mozilla-central/rev/31d8600b73dc85b4cdbabf45ac3f1a9c11700d8e/toolkit/content/widgets/browser-custom-element.js#1859-1865 (and the test only checks that the subframe lands in a difference process than the ContentPage).
I suppose it'd make more sense to use the new attribute rather than relying on something with a FIXME: Remove this?
. Will update.
Edit: Oops, was looking at the wrong test.
(In reply to Tomislav Jovanovic :zombie from comment #4)
Kashav, can you please confirm how https://phabricator.services.mozilla.com/D80987 works without changes that Nika mentions are needed?
Okay, so the test was originally only added to xpcshell.ini
, which means it wasn't even being run with remote content scripts. It fails when moved to xpcshell-common.ini
, as expected. Adding the changes from comment #2 makes it work in all configurations.
Adding "maychangeremoteness" to the ContentPage browser causes test_ext_startup_request_handler.js to fail:
Unexpected Results
------------------
xpcshell-remote.ini:toolkit/components/extensions/test/xpcshell/test_ext_startup_request_handler.js
FAIL test_startup_request_handler - [test_startup_request_handler : 265] A promise chain failed to handle a rejection: No matching message handler - stack: (No stack available.)
Rejection date: Mon Jul 06 2020 15:28:00 GMT-0400 (Eastern Daylight Time) - false == true
resource://testing-common/PromiseTestUtils.jsm:assertNoUncaughtRejections:265
/testing/xpcshell/head.js:_execute_test:578
-e:null:1
FAIL test_startup_request_handler - [test_startup_request_handler : 352] Extension left running at test shutdown - "running" == "unloaded"
resource://testing-common/ExtensionXPCShellUtils.jsm:ExtensionWrapper/<:352
/testing/xpcshell/head.js:_execute_test/<:641
/testing/xpcshell/head.js:_execute_test:650
-e:null:1
FAIL xpcshell-remote.ini:toolkit/components/extensions/test/xpcshell/test_ext_startup_request_handler.js - xpcshell return code: 0
ERROR Unexpected exception No matching message handler
undefined
ERROR NS_ERROR_ABORT:
_abort_failed_test@/testing/xpcshell/head.js:833:20
do_report_result@/testing/xpcshell/head.js:934:5
Assert<@/testing/xpcshell/head.js:73:21
proto.report@resource://testing-common/Assert.jsm:233:10
equal@resource://testing-common/Assert.jsm:275:8
ExtensionWrapper/<@resource://testing-common/ExtensionXPCShellUtils.jsm:352:24
_execute_test/<@/testing/xpcshell/head.js:641:28
_execute_test@/testing/xpcshell/head.js:650:5
@-e:1:1
The error comes from here.
I've requested a pernosco trace (edit: https://pernos.co/debug/vH7JWRt3imaP2jznXQ-D9g/index.html).
Re-loading the framescript after receiving a "DidChangeBrowserRemoteness" seems to fix things.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 10•4 years ago
|
||
So, the only place we use custom <browsers> is in extension popups and sidebars, but navigating those away from an extension page (to trigger a process change) is not even supported, though we never bothered to prevent it.
Hey Nika, I'm trying to write a meaningful test for this, but I can't think of any extension-facing functionality that doesn't work without this patch. Do you have an idea perhaps, or should I just poke at the sidebar's BC's process ID after navigation to confirm it was changed?
Reporter | ||
Comment 11•4 years ago
|
||
We discussed this a bit on matrix. One situation which may misbehave here is navigating to some website like "https://example.com` which has a same-origin iframe. There is a chance that with fission due to the lack of process switching the subframe will end up in the wrong process.
Assignee | ||
Comment 12•4 years ago
|
||
Comment 13•4 years ago
|
||
Comment 14•4 years ago
|
||
Backed out changeset 6e6788b1853e (Bug 1646817) for causing bc failures in browser_ext_popup_api_injection.js
Backout link: https://hg.mozilla.org/integration/autoland/rev/7f2c74bbf08bf72d37d8153d82ec8601c306d05b
Failure log: https://treeherder.mozilla.org/logviewer?job_id=325373974&repo=autoland&lineNumber=15752
Assignee | ||
Updated•4 years ago
|
Comment 15•4 years ago
|
||
Comment 16•4 years ago
|
||
bugherder |
Comment 17•4 years ago
|
||
Backed out for causing crashes on macOS.
backout: https://hg.mozilla.org/mozilla-central/rev/1926d7641900f10134d1591637fa1bc90815c2cf
Comment 18•4 years ago
|
||
Backed out on suspicion for causing macOS crashes with signature [@ mozilla::dom::Element::GetAttr], e.g. bp-8a56e7c2-5919-486e-9b1b-cfc4f0210118.
Comment 19•4 years ago
|
||
(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #18)
Backed out on suspicion for causing macOS crashes with signature [@ mozilla::dom::Element::GetAttr], e.g. bp-8a56e7c2-5919-486e-9b1b-cfc4f0210118.
Tom, do you know if your DocumentChannel patch caused these macOS crashes? Do you have an ETA for fixing and re-landing your patch?
Comment 20•4 years ago
|
||
We're crashing here, when the embedder element for the top BrowsingContext is null:
At least, that seems to be the only plausible place, given the incomplete stack.
There are some problems with that, though.
The only way for the embedder element to be null there is if the FrameLoader has been destroyed. We don't allow top-level browsing contexts to be created in content processes, and there's no way for either of the <browser> elements that enabled process switching in the patch to be non-top or type=chrome (which would cause Top() to return the BC for the browser window, which obviously has no embedder element). That might be somewhat plausibly related to the fact that the extension sidebar <browser>s are in documents hosted in type="content" <browser>s rather than the usual case of being directly hosted in top-level browser windows.
However. Before we get to the line that calls GetAttr
on the embedder element as linked above, we call ContextCanProcessSwitch
on the same BrowsingContext, and that returns false if there's no embedder element.
And the only even remotely plausible place I can see for the embedder element to disappear between those two places is in the JS-implemented nsIBrowser::GetProcessSwitchBehavior
getter... but I can't think of a way even that could do it. We'd have to hit a JS interrupt callback that triggered a reflow that caused the FrameLoader to be destroyed, which I'm pretty sure we won't.
So, basically, this crash should not actually be possible.
Assignee | ||
Comment 21•4 years ago
|
||
Right, (what's there of the) stack is very confusing on how we get to the crash. I tried running a local build with the patch applied for two weeks, before I realized it only seems to be crashing on OSX. :/
There shouldn't be anything platform specific about this code, except maybe the hidden window we have only on OSX (to keep some UI things alive I believe). Perhaps we're trying to add a sidebar to that window?
Seeing Kris also mentions our setup in the sidebar as a potential avenue, I'm thinking of landing the patch with only popup browser changed, to eliminate at least one of the code paths.
Overall, I don't think this must be fixed in M6c. Yes, some iframes (or just redirects?) in extension popups/sidebars will end up in the wrong process, which we don't wanna, but it shouldn't break many things.
Comment 22•4 years ago
•
|
||
(In reply to Tomislav Jovanovic :zombie from comment #21)
Overall, I don't think this must be fixed in M6c. Yes, some iframes (or just redirects?) in extension popups/sidebars will end up in the wrong process, which we don't wanna, but it shouldn't break many things.
@ zombie: kmag hypothesizes this crash was just a "bad build" from a compiler bug. He suggests we re-land the patch as-is and see if it still crashes.
kmag agrees that it doesn't need to block M6c. Deferring to Fission M7 Beta.
needinfo'ing Nika to file a follow-up bug to block navigation in this case instead of switching processes or abort if code attempts to swap top-level browsing context. We should fail-safe instead of fail-open.
Comment 23•4 years ago
|
||
Since ContextCanProcessSwitch may run scrips, it be better to add some checks there anyhow.
(But it is unlikely that those missing checks are the reason for the crash.)
Updated•4 years ago
|
Assignee | ||
Comment 24•4 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #22)
@ zombie: kmag hypothesizes this crash was just a "bad build" from a compiler bug. He suggests we re-land the patch as-is and see if it still crashes.
I still decided to land in two parts, it doesn't cost us anything, and it may help localize the cause if crashes persist.
leave-open
to land the sidebar part in a few days.
Comment 25•4 years ago
|
||
Comment 26•4 years ago
|
||
bugherder |
Comment 27•4 years ago
|
||
I hit this [@ mozilla::dom::Element::GetAttr ] crash in today's Nightly build 2021-02-27:
Comment 28•4 years ago
|
||
Backed out from central for causing macOS nightly crashes: https://hg.mozilla.org/mozilla-central/rev/97a677952b4b25afd4548562dd06e604c50b373e
Comment 29•4 years ago
|
||
There are crashes from 55+ installations on macOS with the latest Nightly. Nightly updates have been halted until the new Nightly will be available. This is similar to comment 17.
Comment 30•4 years ago
|
||
zombie recommends that this bug still block Fission M7. kmag's fix for bug 1697305 should fix this bug.
Updated•4 years ago
|
Comment 31•4 years ago
|
||
Comment 32•4 years ago
|
||
bugherder |
Assignee | ||
Comment 33•4 years ago
|
||
No crashes for two days, so I'm resolving this.
Updated•4 years ago
|
Updated•4 years ago
|
Reporter | ||
Updated•7 months ago
|
Description
•