Closed Bug 1646817 Opened 4 years ago Closed 4 years ago

Non-tab extension loads do not support DocumentChannel process switches

Categories

(WebExtensions :: General, defect, P3)

defect

Tracking

(Fission Milestone:M7, firefox-esr78 disabled, firefox86 disabled, firefox87 disabled, firefox88 fixed)

RESOLVED FIXED
88 Branch
Fission Milestone M7
Tracking Status
firefox-esr78 --- disabled
firefox86 --- disabled
firefox87 --- disabled
firefox88 --- fixed

People

(Reporter: nika, Assigned: zombie)

References

(Blocks 1 open bug)

Details

(Whiteboard: [ETA ? patches landed but backed out])

Crash Data

Attachments

(1 file)

This means that they will not correctly switch processes during loads using DocumentChannel, and the BrowsingContextGroup selected by the changes in bug 1599579 will not be correct.

These non-tab browsers should be updated to support the APIs added in bug 1640019.

Nika, is this something we need to do when we setup <browser> elements?

Flags: needinfo?(nika)

Yes, when setting up the <browser> element, you need to set the maychangeremoteness attribute to true.

If you need to handle the process switch in extension code, you can add "WillChangeBrowserRemoteness" and "DidChangeBrowserRemoteness" listeners which will fire immediately before and after the process change occurs. If you need to do an async flush of state from the old process first, you can also override the prepareToChangeRemoteness async method to do some async work, but ideally you won't need that.

Flags: needinfo?(nika)
Fission Milestone: ? → M6c

The severity field is not set for this bug.
:mixedpuppy, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(mixedpuppy)

Thanks Nika.

Kashav, can you please confirm how https://phabricator.services.mozilla.com/D80987 works without changes that Nika mentions are needed?

Flags: needinfo?(mixedpuppy) → needinfo?(kmadan)
Severity: -- → S3
Priority: -- → P3

The test works because we're still process-switching subframes for remote browsers: https://searchfox.org/mozilla-central/rev/31d8600b73dc85b4cdbabf45ac3f1a9c11700d8e/toolkit/content/widgets/browser-custom-element.js#1859-1865 (and the test only checks that the subframe lands in a difference process than the ContentPage).

I suppose it'd make more sense to use the new attribute rather than relying on something with a FIXME: Remove this?. Will update.

Edit: Oops, was looking at the wrong test.

Flags: needinfo?(kmadan)

(In reply to Tomislav Jovanovic :zombie from comment #4)

Kashav, can you please confirm how https://phabricator.services.mozilla.com/D80987 works without changes that Nika mentions are needed?

Okay, so the test was originally only added to xpcshell.ini, which means it wasn't even being run with remote content scripts. It fails when moved to xpcshell-common.ini, as expected. Adding the changes from comment #2 makes it work in all configurations.

Adding "maychangeremoteness" to the ContentPage browser causes test_ext_startup_request_handler.js to fail:

Unexpected Results
------------------
xpcshell-remote.ini:toolkit/components/extensions/test/xpcshell/test_ext_startup_request_handler.js
  FAIL test_startup_request_handler - [test_startup_request_handler : 265] A promise chain failed to handle a rejection: No matching message handler - stack: (No stack available.)
Rejection date: Mon Jul 06 2020 15:28:00 GMT-0400 (Eastern Daylight Time) - false == true
resource://testing-common/PromiseTestUtils.jsm:assertNoUncaughtRejections:265
/testing/xpcshell/head.js:_execute_test:578
-e:null:1
  FAIL test_startup_request_handler - [test_startup_request_handler : 352] Extension left running at test shutdown - "running" == "unloaded"
resource://testing-common/ExtensionXPCShellUtils.jsm:ExtensionWrapper/<:352
/testing/xpcshell/head.js:_execute_test/<:641
/testing/xpcshell/head.js:_execute_test:650
-e:null:1
  FAIL xpcshell-remote.ini:toolkit/components/extensions/test/xpcshell/test_ext_startup_request_handler.js - xpcshell return code: 0
ERROR Unexpected exception No matching message handler
undefined
ERROR NS_ERROR_ABORT: 
_abort_failed_test@/testing/xpcshell/head.js:833:20
do_report_result@/testing/xpcshell/head.js:934:5
Assert<@/testing/xpcshell/head.js:73:21
proto.report@resource://testing-common/Assert.jsm:233:10
equal@resource://testing-common/Assert.jsm:275:8
ExtensionWrapper/<@resource://testing-common/ExtensionXPCShellUtils.jsm:352:24
_execute_test/<@/testing/xpcshell/head.js:641:28
_execute_test@/testing/xpcshell/head.js:650:5
@-e:1:1

The error comes from here.

I've requested a pernosco trace (edit: https://pernos.co/debug/vH7JWRt3imaP2jznXQ-D9g/index.html).

Re-loading the framescript after receiving a "DidChangeBrowserRemoteness" seems to fix things.

See Also: → 1580811
Assignee: nobody → tomica
Flags: needinfo?(tomica)

So, the only place we use custom <browsers> is in extension popups and sidebars, but navigating those away from an extension page (to trigger a process change) is not even supported, though we never bothered to prevent it.

Hey Nika, I'm trying to write a meaningful test for this, but I can't think of any extension-facing functionality that doesn't work without this patch. Do you have an idea perhaps, or should I just poke at the sidebar's BC's process ID after navigation to confirm it was changed?

Flags: needinfo?(tomica) → needinfo?(nika)

We discussed this a bit on matrix. One situation which may misbehave here is navigating to some website like "https://example.com` which has a same-origin iframe. There is a chance that with fission due to the lack of process switching the subframe will end up in the wrong process.

Flags: needinfo?(nika)
Pushed by tjovanovic@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6e6788b1853e Support DocumentChannel process switching in sidebars and popups r=mixedpuppy
Flags: needinfo?(tomica)
Pushed by tjovanovic@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0b6e13411d3f Support DocumentChannel process switching in sidebars and popups r=mixedpuppy
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 86 Branch
Status: RESOLVED → REOPENED
Flags: needinfo?(tomica)
Resolution: FIXED → ---
Target Milestone: 86 Branch → ---

Backed out on suspicion for causing macOS crashes with signature [@ mozilla::dom::Element::GetAttr], e.g. bp-8a56e7c2-5919-486e-9b1b-cfc4f0210118.

Crash Signature: [@ mozilla::dom::Element::GetAttr]

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #18)

Backed out on suspicion for causing macOS crashes with signature [@ mozilla::dom::Element::GetAttr], e.g. bp-8a56e7c2-5919-486e-9b1b-cfc4f0210118.

Tom, do you know if your DocumentChannel patch caused these macOS crashes? Do you have an ETA for fixing and re-landing your patch?

Whiteboard: [ETA ? patches landed but backed out]

We're crashing here, when the embedder element for the top BrowsingContext is null:

https://searchfox.org/mozilla-central/rev/b2433a832c250c55255e0ee37d05192d04f20427/netwerk/ipc/DocumentLoadListener.cpp#1499-1503

At least, that seems to be the only plausible place, given the incomplete stack.

There are some problems with that, though.

The only way for the embedder element to be null there is if the FrameLoader has been destroyed. We don't allow top-level browsing contexts to be created in content processes, and there's no way for either of the <browser> elements that enabled process switching in the patch to be non-top or type=chrome (which would cause Top() to return the BC for the browser window, which obviously has no embedder element). That might be somewhat plausibly related to the fact that the extension sidebar <browser>s are in documents hosted in type="content" <browser>s rather than the usual case of being directly hosted in top-level browser windows.

However. Before we get to the line that calls GetAttr on the embedder element as linked above, we call ContextCanProcessSwitch on the same BrowsingContext, and that returns false if there's no embedder element.

And the only even remotely plausible place I can see for the embedder element to disappear between those two places is in the JS-implemented nsIBrowser::GetProcessSwitchBehavior getter... but I can't think of a way even that could do it. We'd have to hit a JS interrupt callback that triggered a reflow that caused the FrameLoader to be destroyed, which I'm pretty sure we won't.

So, basically, this crash should not actually be possible.

Right, (what's there of the) stack is very confusing on how we get to the crash. I tried running a local build with the patch applied for two weeks, before I realized it only seems to be crashing on OSX. :/

https://crash-stats.mozilla.org/signature/?build_id=%3E20210118000000&product=Firefox&version=86.0a1&signature=mozilla%3A%3Adom%3A%3AElement%3A%3AGetAttr&date=%3E%3D2021-01-15T00%3A00%3A00.000Z&date=%3C2021-01-18T23%3A59%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-date&page=1#aggregations

There shouldn't be anything platform specific about this code, except maybe the hidden window we have only on OSX (to keep some UI things alive I believe). Perhaps we're trying to add a sidebar to that window?

Seeing Kris also mentions our setup in the sidebar as a potential avenue, I'm thinking of landing the patch with only popup browser changed, to eliminate at least one of the code paths.

Overall, I don't think this must be fixed in M6c. Yes, some iframes (or just redirects?) in extension popups/sidebars will end up in the wrong process, which we don't wanna, but it shouldn't break many things.

Flags: needinfo?(tomica)

(In reply to Tomislav Jovanovic :zombie from comment #21)

Overall, I don't think this must be fixed in M6c. Yes, some iframes (or just redirects?) in extension popups/sidebars will end up in the wrong process, which we don't wanna, but it shouldn't break many things.

@ zombie: kmag hypothesizes this crash was just a "bad build" from a compiler bug. He suggests we re-land the patch as-is and see if it still crashes.

kmag agrees that it doesn't need to block M6c. Deferring to Fission M7 Beta.

needinfo'ing Nika to file a follow-up bug to block navigation in this case instead of switching processes or abort if code attempts to swap top-level browsing context. We should fail-safe instead of fail-open.

Fission Milestone: M6c → M7
Flags: needinfo?(tomica)
Flags: needinfo?(nika)

Since ContextCanProcessSwitch may run scrips, it be better to add some checks there anyhow.
(But it is unlikely that those missing checks are the reason for the crash.)

Attachment #9194553 - Attachment description: Bug 1646817 - Support DocumentChannel process switching in sidebars and popups → Bug 1646817 - Part 1: Support DocumentChannel process switching in sidebars

(In reply to Chris Peterson [:cpeterson] from comment #22)

@ zombie: kmag hypothesizes this crash was just a "bad build" from a compiler bug. He suggests we re-land the patch as-is and see if it still crashes.

I still decided to land in two parts, it doesn't cost us anything, and it may help localize the cause if crashes persist.

leave-open to land the sidebar part in a few days.

Flags: needinfo?(tomica)
Keywords: leave-open
Pushed by tjovanovic@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6a75d5ea4301 Part 1: Support DocumentChannel process switching in sidebars r=mixedpuppy

I hit this [@ mozilla::dom::Element::GetAttr ] crash in today's Nightly build 2021-02-27:

bp-116c38ed-dce9-4332-b8db-0e1340210228

Backed out from central for causing macOS nightly crashes: https://hg.mozilla.org/mozilla-central/rev/97a677952b4b25afd4548562dd06e604c50b373e

Flags: needinfo?(tomica)

There are crashes from 55+ installations on macOS with the latest Nightly. Nightly updates have been halted until the new Nightly will be available. This is similar to comment 17.

Depends on: 1697305

zombie recommends that this bug still block Fission M7. kmag's fix for bug 1697305 should fix this bug.

Attachment #9194553 - Attachment description: Bug 1646817 - Part 1: Support DocumentChannel process switching in sidebars → Bug 1646817 - Support DocumentChannel process switching in sidebars and popups
Pushed by tjovanovic@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e61b5ab8b1f7 Support DocumentChannel process switching in sidebars and popups r=mixedpuppy

No crashes for two days, so I'm resolving this.

Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Flags: needinfo?(tomica)
Resolution: --- → FIXED
Target Milestone: --- → 88 Branch
Flags: needinfo?(nika)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: