Closed Bug 1589102 Opened 1 year ago Closed 3 months ago

Fix loads that inherit principals - about:blank and about:srcdoc - and make them go via DocumentChannel

Categories

(Core :: DOM: Navigation, defect, P3)

defect

Tracking

()

RESOLVED FIXED
82 Branch
Fission Milestone M6b
Tracking Status
firefox82 --- fixed

People

(Reporter: zombie, Assigned: annyG)

References

(Blocks 1 open bug, Regressed 2 open bugs)

Details

Attachments

(17 files, 2 obsolete files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review

In a mochitest, with a debug build and --enable-fission:

let win = window.open();
win.location = "http://example.net/browser";
setTimeout(() => win.location = "about:blank", 300); // crash

crashes the content process with

Assertion failure: PermissionAvailable(prin, aType), at z:/build/build/src/extensions/permissions/nsPermissionManager.cpp:2376

Fission Milestone: --- → ?
Attachment #9101564 - Attachment is obsolete: true

zombie, is there a testcase for this? I see you marked the attachment as obsolete, so I assume it isn't that.

Flags: needinfo?(tomica)
Priority: -- → P3

https://phabricator.services.mozilla.com/D49427
That's still the test case. I didn't know how to proceed working on this, so I abandoned the revision to get it out of my phabricator dashboard, and apparently that marked it as obsolete here.

Flags: needinfo?(tomica)
Attachment #9101564 - Attachment is obsolete: false

I tried and failed to reproduce.
Does one need to do anything special here to reproduce?

Flags: needinfo?(tomica)

Nothing special, just an (artifact) debug build on Windows. I've updated to the latest m-c, applied the patch, ran

mach test test_fission_navigation_assert.html --enable-fission

and it still reproduces. Note however that, at least for me locally, the crash is not immediately obvious from the test summary:

0 INFO TEST-START | Shutdown
1 INFO Passed:  0
2 INFO Failed:  0

I need to grep the log spam to actually find the crash:

A content process crashed and MOZ_CRASHREPORTER_SHUTDOWN is set, shutting down
Flags: needinfo?(tomica)

I tried to reproduce this again, but no luck on Linux/debug.

Fission Milestone: ? → M5

(In reply to Olli Pettay [:smaug] from comment #7)

I tried to reproduce this again, but no luck on Linux/debug.

I'm not sure if it's the same issue here, but I hit the same assertion running fresh profile when setting privacy.firstparty.isolate = true in ~/.mozbuild/machrc.

Also reproducible applying D49427 without touching any prefs.

(In reply to Gary Chen [:xeonchen] from comment #8)

I'm not sure if it's the same issue here, but I hit the same assertion running fresh profile when setting privacy.firstparty.isolate = true in ~/.mozbuild/machrc.

I assume that is without Fission being enabled?

Flags: needinfo?(xeonchen)

(In reply to Olli Pettay [:smaug] from comment #9)

(In reply to Gary Chen [:xeonchen] from comment #8)

I'm not sure if it's the same issue here, but I hit the same assertion running fresh profile when setting privacy.firstparty.isolate = true in ~/.mozbuild/machrc.

I assume that is without Fission being enabled?

Correct, fission is disabled by default.

Flags: needinfo?(xeonchen)

(In reply to Gary Chen [:xeonchen] from comment #8)

I hit the same assertion running fresh profile when setting privacy.firstparty.isolate = true in ~/.mozbuild/machrc.

Nika asked me to needinfo her to determine whether this bug is Fission-specific or not.

Debug assertion failures don't block Fission Dogfooding (M5), but this assertion gets in the way of engineers trying to run tests locally with Fission.

Flags: needinfo?(nika)

Nika says this is a bug in about:blank process switching and remote principals. The code assumes that loading about:blank does not process switch, but it currently does. Nika will talk with Matt about using DocumentChannel to load about:blank.

What is the expected principal of about:blank opened in a new popup? Null principal, system principal, or inherit the popup creator's principal? What does the spec say?

Deferring to Fission Nightly (M6) because the fix is not clear and could be a lot of work.

Fission Milestone: M5 → M6

The root of the issue here is that loading about:blank inside of a BrowsingContext loads a document which inherits the principal from the global which triggered the load. This causes an issue, as we don't run about:blank loads through DocumentChannel, which means that we don't trigger a process switch to the correct process, and end up loading an about:blank with a principal which doesn't match the target process.

We need to support performing a process switch in this case so that the final loaded about:blank document is loaded in the current process.

Flags: needinfo?(nika) → needinfo?(matt.woodrow)
Summary: Navigating a fission tab to about:blank crashes in debug builds with Assertion PermissionAvailable → Navigating remote BrowsingContext to `about:blank` doesn't trigger a process switch

Nika says we want all loads to go through DocumentChannel. This code doesn't go through DocumentChannel.

Anny, this about:blank bug is the same issue as srcdoc loads, so this would be a good bug for you to work on after srcdoc.

about:blank should use DocumentChannel (in some cases).

Tracking for Fission Nightly M6b

Fission Milestone: M6 → M6b
Flags: needinfo?(matt.woodrow) → needinfo?(agakhokidze)
Assignee: nobody → agakhokidze
Flags: needinfo?(agakhokidze)

I am changing how about:blank loads (making it go via DocumentChannel mechanism) which means the load for the blank page is now taking longer and is not instantaneous. This seems to lead to some races in the devtools code. E.g. here https://treeherder.mozilla.org/#/jobs?repo=try&selectedTaskRun=Fu-VfGsTRU25Z7cfPUnzIg.0&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunning%2Cpending%2Crunnable&revision=4db9b9f2c2a709afd15277078ff5463f60e03e30 we have a leftover devtools:toolbox window, even though https://searchfox.org/mozilla-central/rev/d6d8fcc22c3820f2ae08229e0d37be19fba74db9/devtools/client/framework/test/browser_toolbox_toggle.js#103-106 is supposed to wait for the toolbox to be destroyed.

Tracing through the code shows that by the time “toolbox-destroyed” is emitted, it is not necessarily that the devtools toolbox has been closed, as we don’t really wait for the window to close. It seems that previously about:blank load (https://searchfox.org/mozilla-central/rev/d6d8fcc22c3820f2ae08229e0d37be19fba74db9/devtools/client/framework/toolbox.js#3778) was happening much faster and triggered the unload event earlier and by the time “toolbox-destroyed” was emitted, the window happened to be closed.

I thought I could fix it with these changes https://hg.mozilla.org/try/diff/201ce0b7999a465c43682dd7b35bd75e4f905744/devtools/client/framework/devtools.js but while that works for the test above, it creates many more other failures https://treeherder.mozilla.org/#/jobs?repo=try&selectedTaskRun=Uv3TPcDDSBOGqlVIsifhpA.0&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunning%2Cpending%2Crunnable&author=agakhokidze%40mozilla.com

I’m wondering if you have some other suggestions for how I can fix toolbox closing or how I can improve this solution?

Flags: needinfo?(jdescottes)
Flags: needinfo?(daisuke)

Keeping the need info, but I am looking into this.

The toolbox-host-manager uses the unload event from the toolbox window to trigger the host destruction.
Could we explicitly ask the host manager to destroy the host from toolbox's destroy instead? Or do we have codepaths which trigger an unload without going through toolbox destroy (would be surprising)?

From the comments around the about:blank navigation, it seems we rely on this to release the document.
I don't know why closing the host is not enough though. (:ochameau might have more context but he is currently on PTO)
If the navigation is really necessary, it seems like a workaround, and maybe we could use something else?

So for now, I will try to emit an event from the toolbox destroy and listen to it in the toolbox-host-manager in order to replace the "unload" event mechanism. Let's see what try says: https://treeherder.mozilla.org/#/jobs?repo=try&resultStatus=testfailed%2Cbusted%2Cexception%2Csuccess%2Cusercancel%2Crunning%2Cpending%2Crunnable&revision=ac9699f18bbedc852525789314ce663ef7900000

From other try pushes I did a bit earlier I think the only failure left is devtools/client/webconsole/test/browser/browser_webconsole_multiple_windows_and_tabs.js | Uncaught exception - 2147942487 but I am not sure this is linked to the same issue.

I thought I could fix it with these changes
https://hg.mozilla.org/try/diff/201ce0b7999a465c43682dd7b35bd75e4f905744/devtools/client/framework/devtools.js
but while that works for the test above, it creates many more other failures
https://treeherder.mozilla.org/#/jobs?repo=try&selectedTaskRun=Uv3TPcDDSBOGqlVIsifhpA.0&

If we add this, we should probably remove the unload handler from the toolbox-host-manager, otherwise we have two potential call sites that can trigger a host destroy, and if both kick in only the first one will run successfully. The second one will fail with can't access property "frame", this.host is null (which I see in a few failed tests in this try run).

Also the additional await here is always worrying, because the toolbox destroy is very fragile and susceptible to races. But overall I think that's the good direction. It's similar to what I am currently doing, but I am using a separate event, fired exactly when we used to navigate to about:blank, in case it matters (as I said, toolbox destroy timing can be very finicky) .

Summary: Navigating remote BrowsingContext to `about:blank` doesn't trigger a process switch → Fix loads that inherit principals - about:blank and about:srcdoc - and make them go via DocumentChannel
Duplicate of this bug: 1644825
Attachment #9101564 - Attachment is obsolete: true
Attachment #9101564 - Attachment is obsolete: false
Depends on: 1654745

Moved the DevTools specific discussion to Bug 1654745.
I pinged Daisuke on the patch attached to this other bug, so clearing the ni? here.

On my try push the only remaining devtools failure seems to be

devtools/client/webconsole/test/browser/browser_webconsole_multiple_windows_and_tabs.js | Uncaught exception - 2147942487

https://treeherder.mozilla.org/#/jobs?repo=try&resultStatus=testfailed%2Cbusted%2Cexception%2Csuccess%2Cusercancel%2Crunning%2Cpending%2Crunnable&revision=ac9699f18bbedc852525789314ce663ef7900000&selectedTaskRun=CPt74I7xQ5WnJjbVPom_WQ.0

Any idea if this could be related to your change or if this is a devtools issue?

Flags: needinfo?(jdescottes)
Flags: needinfo?(daisuke)

I don't know what the specific cause is, but figured I'd add that that number corresponds to NS_ERROR_ILLEGAL_VALUE (or NS_ERROR_INVALID_ARG, NS_ERROR_INVALID_POINTER, NS_ERROR_NULL_POINTER, NS_ERROR_XPATH_INVALID_ARG - we use that number for a bunch of errors apparently).

I wrote up an explanation of where the error was coming from and then solved it but I'll leave the explanation here!

In the above test we have the following code const tab3 = await addTab(TEST_URI, { window: win2 }); where we call addTab from devtools/client/shared/test/shared-head.js, which calls BrowserTestUtils.addTab where we call tabbrowser.addTab in the last line. There, we have the following call SessionStore.setLastClosedTabCount(window, 1); and in SessionStoreInternal.setLastClosedTabCount we discover that the window does not have __SSi property so we throw NS_ERROR_INVALID_ARG.

Then I figured we are probably not waiting long enough for the newly opened window to load here before we try to add a new tab to it, so I modified the test to use BrowserTestUtils.waitForNewWindow() to wait for the new window to open instead of the "load" event and it worked (at least locally!)

This patch enables sandboxed srcdoc loads to take place via DocumentChannel,
and adds mechanisms for enabling unsandboxed ones.

Both unsandboxed srcdoc, and in subsequent patches, about:blank, loads require
that the triggering principal and the principal to inherit point to the same
instance if the load takes place in the same process as where we are inheriting
those principals from. We save those principals on a target browsing context before
we load the URI, and later, when we are deserializing LoadInfoArgs into
LoadInfo in the content process, we retrieve the saved principals if the
current load identifier of the target BC matches the load identifier saved
along with the principals.

We also need to make sure that during a process switch for about:srcdoc load,
we don't use the original URI for about:srcdoc to determine the remote type and
instead we use channel's result principal.

In process selection logic, ensure that we don't use the original URI for
about:blank and instead use the result principal. If the about:blank load has a
null principal, then revert to using the original URI.

Also, remove an extra about:blank load when an nsFrameLoaderOwner is changing
remoteness to prevent races.

Depends on D85080

Creating more than 2 nested iframes is not allowed and is now enforced for
about:blank loads because they now take place via DocumentChannel.

Depends on D85083

about:blank loads now take place via DocumentChannel and no longer happen
instantenously.

Depends on D85084

Attachment #9101564 - Attachment description: bug 1589102 - Navigating a fission tab to about:blank crashes in debug builds with Assertion PermissionAvailable → Bug 1589102 - Part 9: Add a crashtest for navigating remote page to about:blank, r=mattwoodrow
Pushed by agakhokidze@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b187548e258f
Part 1: Enable about:srcdoc loads via DocumentChannel, r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/3b92f7306c64
Part 2: Fix tests that were broken as a result of about:srcdoc going via DocumentChannel, r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/ffe8da473b91
Part 3: Enable about:blank loads to take place via DocumentChannel, r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/ec04e1a20597
Part 4: Allow (de)serialization of nested about: uris, r=kmag
https://hg.mozilla.org/integration/autoland/rev/31949872cc1d
Part 5: Fix tests that relied on about:blank loads happening instantenously, r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/d1f6185030af
Part 6: Do not load more than 2 nested frames in browser_browsingContext-01.js test, r=farre
https://hg.mozilla.org/integration/autoland/rev/29ee0fbe855f
Part 7: Fix marionette reftest.js to wait for test url's load events, r=jgraham,marionette-reviewers,maja_zf
https://hg.mozilla.org/integration/autoland/rev/20905f91e2bb
Part 8: Fix test_browsing_context_structured_clone.js to instead schedule precise gc, r=kmag
https://hg.mozilla.org/integration/autoland/rev/14bcaf2a452c
Part 9: Add a crashtest for navigating remote page to about:blank, r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/0d22ad297b19
Part 10: Update annotation for iframe-inheritance-about-blank.html test, r=mattwoodrow

After spending too long trying to find out why test testing/web-platform/tests/html/browsers/browsing-the-web/navigating-across-documents/010.html has been timing out (and with the help of my teammates), I figured I'd type up an explanation here, for posterity.

Roughly, the test is supposed to do the following:

  • click on the <a> element (targetting iframe test), which has an onclick handler that will submit a form causing (a) to execute, and an href attribute, set to load href.html, where the JS code will do (b).
  • (a) Note that form's action is javascript: URI and the form is targetting iframe test. The JS code opens an XHR request for blank.html?pipe=trickle(d2), and is expecting a delay of 2 seconds from the server before the response is sent back. If the XHR completes before the test has ended (fail case scenario), post a message to the parent with the value write and do other things.
  • (b) href.html will post a message to the parent with value "href"

The expected scenario is that the XHR will be so slow that href.html will have a chance to be loaded in the iframe before the XHR completes, and thus the message that will be posted to the parent will be sent by script code in href.html.

However, now that non-initial (explicit) about:blank loads are taking place via DocumentChannel, it might take longer for about:blank to load. This results in the following scenario as explained to me by Kris:

  • When the code inside of the form's action URI starts executing, it is being executed inside of the document that has the initial about:blank loaded. Shortly before that, we started loading the explicit about:blank page (happens whenever we have an iframe without a <src> element). The XHR starts, suppress event handling and ends up spinning an event loop.
  • Then we start loading href.html which causes the explicit about:blank load to be cancelled.
  • We end up reusing about:blank's DocumentViewer for href.html load, and because the XHR is associated with the initial about:blank, we don't end up removing it and so it keeps spinning the event loop. If the no-src about:blank load happened faster, the XHR request would have loaded inside of it and when the load for href.html began, we wouldn't have reused the document (because it's not an initial one) thus destroying the document and XHR with it. Removing the XHR would cause the event loop to unblock and to unsupress event handling.

The solution is to make sure we wait for onload event before we execute test steps (i.e. clicking on <a>).

Flags: needinfo?(agakhokidze)
Attachment #9166446 - Attachment description: Bug 1589102 - Part 8: Fix test_browsing_context_structured_clone.js to instead schedule precise gc, r=kmag! → Bug 1589102 - Part 7: Fix test_browsing_context_structured_clone.js to instead schedule precise gc, r=kmag!
Attachment #9101564 - Attachment description: Bug 1589102 - Part 9: Add a crashtest for navigating remote page to about:blank, r=mattwoodrow → Bug 1589102 - Part 8: Add a crashtest for navigating remote page to about:blank, r=mattwoodrow
Attachment #9166447 - Attachment description: Bug 1589102 - Part 10: Update annotation for iframe-inheritance-about-blank.html test, r=mattwoodrow → Bug 1589102 - Part 9: Update annotation for iframe-inheritance-about-blank.html test, r=mattwoodrow
Attachment #9170469 - Attachment description: Bug 1589102 - Part 11: Fix more races in tests that open about:blank pages, r=mattwoodrow → Bug 1589102 - Part 10: Fix tests failing due to not waiting for the window load events, r=mattwoodrow

The race occurs when the parent changes the owner process for a BC, but the
child does not know about it and proceeds to call SetCurrentInnerWindowId on a
BC it no longer owns. To fix this, in child process, whenever we call
SetCurrentInnerWindowId on a BC, check that the BC is in process and that the
docshell has not been notified about an upcoming process change.

Depends on D87933

Attachment #9166445 - Attachment is obsolete: true
Attachment #9170470 - Attachment description: Bug 1589102 - Part 12: Make navigating-across-documents/010.html wait for load before running test steps, r=nika! → Bug 1589102 - Part 11: Make navigating-across-documents/010.html wait for load before running test steps, r=nika!
Attachment #9170470 - Attachment is obsolete: true
Depends on: 1661332
Attachment #9172559 - Attachment description: Bug 1589102 - Part 17: Don't manually set driver context in tests/unit/test_reftest.py, r=jgraham! → Bug 1589102 - Part 17: Don't manually set driver context in tests/unit/test_reftest.py
Attachment #9172559 - Attachment description: Bug 1589102 - Part 17: Don't manually set driver context in tests/unit/test_reftest.py → Bug 1589102 - Part 17: Don't set chrome context in Marionette reftest unit tests.
Attachment #9172559 - Attachment description: Bug 1589102 - Part 17: Don't set chrome context in Marionette reftest unit tests. → Bug 1589102 - Part 17: Don't manually set driver context in tests/unit/test_reftest.py, r=jgraham!
Attachment #9172559 - Attachment description: Bug 1589102 - Part 17: Don't manually set driver context in tests/unit/test_reftest.py, r=jgraham! → Bug 1589102 - Part 17: Don't manually set driver context in tests/unit/test_reftest.py
Attachment #9172559 - Attachment description: Bug 1589102 - Part 17: Don't manually set driver context in tests/unit/test_reftest.py → Bug 1589102 - Part 17: Don't set chrome context in Marionette reftest unit tests
Pushed by agakhokidze@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6064f8676546
Part 1: Enable about:srcdoc loads via DocumentChannel, r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/cc5151920b08
Part 2: Fix tests that were broken as a result of about:srcdoc going via DocumentChannel, r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/20664562b914
Part 3: Enable about:blank loads to take place via DocumentChannel, r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/e3be5dbce78d
Part 4: Allow (de)serialization of nested about: uris, r=kmag
https://hg.mozilla.org/integration/autoland/rev/baebe0172124
Part 5: Fix tests that relied on about:blank loads happening instantenously, r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/60305abadb0b
Part 6: Do not load more than 2 nested frames in browser_browsingContext-01.js test, r=farre
https://hg.mozilla.org/integration/autoland/rev/79d25a5c4998
Part 7: Fix test_browsing_context_structured_clone.js to instead schedule precise gc, r=kmag
https://hg.mozilla.org/integration/autoland/rev/a8992824e237
Part 8: Add a crashtest for navigating remote page to about:blank, r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/25825439eae1
Part 9: Update annotation for iframe-inheritance-about-blank.html test, r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/32418f54f50d
Part 10: Fix tests failing due to not waiting for the window load events, r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/a8edf2857458
Part 11: Make navigating-across-documents/010.html wait for load before running test steps, r=nika
https://hg.mozilla.org/integration/autoland/rev/4f885480d348
Part 12: Fix file_scrollRestoration.html to wait for about:blank to load before proceeding, r=peterv
https://hg.mozilla.org/integration/autoland/rev/ff23c8d322ef
Part 13: Fix a race in process switching code due to reftest.js forcing a process switch, r=kmag
https://hg.mozilla.org/integration/autoland/rev/7f1db08be307
Part 14: Unskip test test_window_cross_origin_props.html and mark as failing, r=hsivonen
https://hg.mozilla.org/integration/autoland/rev/a617c624c24f
Part 15: Make test_continuous_wheel_events.html wait for load event, r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/cbf3b7ea7990
Part 16: Fix devtools/sharedbrowser_saveHeapSnapshot_e10s_01.js to wait for the browser to load, r=kmag
https://hg.mozilla.org/integration/autoland/rev/e763b6e09a49
Part 17: Don't set chrome context in Marionette reftest unit tests r=marionette-reviewers,whimboo
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/25276 for changes under testing/web-platform/tests
Upstream PR merged by moz-wptsync-bot
Regressions: 1661806
Regressions: 1661833
Regressions: 1663238
You need to log in before you can comment on or make changes to this bug.