Fix process allocation for null principal
Categories
(Core :: DOM: Navigation, task, P3)
Tracking
()
People
(Reporter: neha, Assigned: nika)
References
(Blocks 2 open bugs)
Details
(Keywords: perf-alert, Whiteboard: fission-hard-blocker)
Attachments
(9 files)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
From our Fission meeting discussions:
We currently use a random process for null principal. We should instead allocate the process after DocumentChannel is done resolving the final principal.
Reporter | ||
Updated•4 years ago
|
Comment 1•4 years ago
|
||
I think it is usually written as "null principal".
Reporter | ||
Updated•4 years ago
|
Comment 2•4 years ago
|
||
Doesn't need to block Fission Nightly. Tracking for our Fission Beta experiment (M7).
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Comment 3•3 years ago
|
||
We should do this before starting Release experiment in M8.
Assignee | ||
Comment 4•3 years ago
|
||
I had some conversations with :annevk about the difficulty of deciding on a process selection strategy for null principals on Slack earlier today. Leaving a ni? for any further thoughts on the issues next week.
One issue is that we likely cannot afford to mint a new process for every single null principal. We probably need to do some sort of strategy for collecting these processes together. This matches what Chrome does already where they appear to track the true site origins of null principals over time, so that when they're loaded in the future, they're loaded within the site which created them. For something like a data:
URI, this would be the site which initiated the load (TriggeringPrincipal
here), and for something like a sandboxed frame, it would be the unsandboxed result principal.
We could try to follow this strategy as well, however it can become quite complicated in some situations, as doing so might require adding tracking to our null principal representation of the origin which created that null principal. We don't do anything like that right now, and it seems like a strange addition to the null principal type, especially when it's only intended to help with tracking process selections. Alternatively, we could do some form of best-effort tracking, and when all else fails use a process fallback.
The most secure option, of course, would be to use a distinct process for every null principal which we load, but that is likely untenable in MVP for memory use and performance reasons.
Assignee | ||
Comment 5•3 years ago
|
||
My current idea is something along the lines of this (ignoring the potential issues around re-using a process for sandboxed loads and unsandboxed loads. This could potentially be adapted to handle that situation by adding an extra sandbox flag which is used during process selection):
- If another global already exists in the BCG with this exact null principal, use the same process as that global.
- If the load is a sandboxed load:
- If the unsandboxed principal is a content principal, use that principal to perform process selection.
- If another global exists in the BCG with the unsandboxed principal, use the same process as that global.
- If the load is a data URI load:
- If the principal to be inherited (as-if it was
about:blank
) was a content principal, use that principal to perform process selection. - If another global exists in the BCG with the principal to be inherited, used the same process as that global.
- If the principal to be inherited (as-if it was
- If none of the previous steps successfully found a process to perform the load in, perform the load in a
web
content process.
I think this should catch most cases, but will miss some such as history navigations to documents created by other documents with null principals. In those cases, the creator document may have been destroyed and so looking up its' null principal for process selection would fail, and we'd end up in the catchall web
content process. We could include details like process selection decisions in history entries to work around this in that specific case, but I wouldn't want to persist this information in sessionstore, as I would prefer not to stabilize process selection decisions or the remote type format.
This also has an odd edge case if somehow two loads with the exact same null principal are triggered at the same time with different triggering principals, where there is a short time between the process selection decision and the document being created during which we could make a different process selection decision for the second null principal. I can't think of a way this would happen in the wild though, and the worst case scenario is just that these two documents cannot communicate when they should be able to.
Comment 6•3 years ago
|
||
Can the final fallback not be a process for the null principal in question? It seems quite easy to end up in 4.
The one alternative I came up with that I'd like us to consider (and maybe urge Chrome to also adopt) is that each site has a parallel sandboxed site that we use for null principals. So https://example.com
gets a process and if https://example.com
ended up sandboxed or did a data URL load, those would end up in the sandboxed https://example.com
process instead. While still not ideal, it seems a lot better than all of that ending up in the same process.
Having some telemetry available here would also be helpful.
Assignee | ||
Comment 7•3 years ago
|
||
(In reply to Anne (:annevk) from comment #6)
Can the final fallback not be a process for the null principal in question? It seems quite easy to end up in 4.
Possibly. There's a risk of ending up with an absurd number of processes if we end up in 4 too often, but it might be OK? Perhaps it can be added as a tweakable preference.
The one alternative I came up with that I'd like us to consider (and maybe urge Chrome to also adopt) is that each site has a parallel sandboxed site that we use for null principals. So
https://example.com
gets a process and ifhttps://example.com
ended up sandboxed or did a data URL load, those would end up in thesandboxed https://example.com
process instead. While still not ideal, it seems a lot better than all of that ending up in the same process.
Yeah, this is what the future steps around sandboxed loads would probably end up looking like for process selection. I think adding support for that is a follow-up issue though, so I'm not focusing on it right now.
Assignee | ||
Comment 8•3 years ago
|
||
After the changes in this bug, about:blank loads triggered by chrome will
finish in a "web" content process, as they have an untrusted null principal
without a precursor. In a few places throughout the codebase, however, we
perform about:blank loads with the explicit expectation that they do not change
processes. This new remoteTypeOverride option allows the intended final process
to be explicitly specified in this situation.
For security & simplicity reasons, this new attribute is limited to only be
usable on system-principal triggered loads of about:blank in toplevel browsing
contexts.
Depends on D120670
Assignee | ||
Comment 9•3 years ago
|
||
There are races which are more common after these patches where an implicit
about:blank load races with a speculative parent process load's process switch.
In this situation, bad behaviour can result as we process a navigation started
by a process which we process-switched away from. By tracking the explicit
ContentParent which is making the DocumentLoadListener request, we can catch
situations like this and avoid navigations being started from the wrong
processes.
Depends on D120671
Assignee | ||
Comment 10•3 years ago
|
||
This is a large refactoring of the DocumentChannel process switch codepath,
with the end goal of being better able to support future process switch
requirements such as dynamic isolation on android, as well as the immediate
requirement of null principal handling.
The major changes include:
- The logic is in C++ and has less failure cases, meaning it should be harder
for us to error out unexpectedly and not process switch. - Process selection decisions are more explicit, and tend to rely less on
state such as the current remoteType when possible. This makes reasoning
about where a specific load will complete easier. - Additional checks are made after a "WebContent" behavior is selected to
ensure that if an existing document in the same BCG is found, the load will
finish in the required content process. This should make dynamic checks such
as Android's logged-in site isolation easier to implement. - ProcessIsolation logging is split out from DocumentChannel so that it's
easier to log just the information related to process selection when
debugging. - Null result principal precursors are considered when performing process
selection.
Other uses of E10SUtils for process selection have not yet been migrated to the
new design as they have slightly different requirements. This will be done in
follow-up bugs.
Depends on D120672
Assignee | ||
Comment 11•3 years ago
|
||
The changes in the previous part had a few behaviour changes which are visible
in tests, including cross-origin iframes with sandboxed origins now loading
remotely, and process selection for chrome-triggered null principal loads
behaving differently. In general this caused more process switches.
Depends on D120673
Assignee | ||
Comment 12•3 years ago
|
||
This load was causing a process switch after the changes in this patch.
This caused the view-source load to also process-switch back when loaded
and confused some view-source machinery. The load isn't necessary, and
if skipped the process switch due to navigation will not occur.
Depends on D120674
Comment 13•3 years ago
|
||
This bug is a hard blocker for Fission M8.
Comment 14•3 years ago
|
||
Pushed by nlayzell@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8bb303f6831a Part 1: Add a remoteTypeOverride option for about:blank loads triggered by chrome, r=annyG,kmag https://hg.mozilla.org/integration/autoland/rev/d9eeca699dec Part 2: Track which ContentParent is used to create a DocumentLoadListener, r=annyG,kmag https://hg.mozilla.org/integration/autoland/rev/c5d267a1907c Part 3: Rework DocumentChannel-triggered process switches to support null principals, r=annyG,kmag https://hg.mozilla.org/integration/autoland/rev/5ae2b2641484 Part 4: Update various tests with new expectations, r=annyG,kmag https://hg.mozilla.org/integration/autoland/rev/26ddad079ad3 Part 5: Skip the unnecessary about:blank load when loading view-source for a document, r=Gijs
Comment 15•3 years ago
•
|
||
Backed out 5 changesets (bug 1650089) for Browser-chrome failures in browser/components/originattributes/test/browser/browser_firstPartyIsolation_aboutPages.js. CLOSED TREE
Log:
https://treeherder.mozilla.org/logviewer?job_id=346677880&repo=autoland&lineNumber=2242
https://treeherder.mozilla.org/logviewer?job_id=346677486&repo=autoland&lineNumber=10632
Push with failures:
https://treeherder.mozilla.org/jobs?repo=autoland&group_state=expanded&revision=26ddad079ad388748a19574ee4e28e51c2c7bec8
Backout:
https://hg.mozilla.org/integration/autoland/rev/bbc5bf72442512d19c1bbe37fe3e5a6ccdec92df
Assignee | ||
Comment 16•3 years ago
|
||
This change makes all browsers which were not created with an initial remote
attribute within a non-useRemoteTabs
window be unable to process-switch, as
otherwise we may attempt to switch loads into a content process. We need to
keep process switching enabled for explicitly-remote browsers loaded in a
non-useRemoteTabs
window as it's relied on for tests and can lead to
assertion failures due to loading remote content in the parent process.
Depends on D120736
Assignee | ||
Comment 17•3 years ago
|
||
Depends on D121285
Assignee | ||
Updated•3 years ago
|
Comment 18•3 years ago
|
||
Pushed by nlayzell@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/54d12a4333a0 Part 1: Add a remoteTypeOverride option for about:blank loads triggered by chrome, r=annyG,kmag https://hg.mozilla.org/integration/autoland/rev/dbdee40ef8a0 Part 2: Track which ContentParent is used to create a DocumentLoadListener, r=annyG,kmag https://hg.mozilla.org/integration/autoland/rev/091c4efa36a7 Part 3: Rework DocumentChannel-triggered process switches to support null principals, r=annyG,kmag https://hg.mozilla.org/integration/autoland/rev/ee61b69ba556 Part 4: Update various tests with new expectations, r=annyG,kmag https://hg.mozilla.org/integration/autoland/rev/ca9ba60010c6 Part 5: Skip the unnecessary about:blank load when loading view-source for a document, r=Gijs https://hg.mozilla.org/integration/autoland/rev/092451e931ce Part 6: Don't set maychangeremoteness on non-e10s browsers, r=Gijs https://hg.mozilla.org/integration/autoland/rev/4e73beb8872c Part 7: Avoid process-switching in devtools test to keep it passing, r=nchevobbe
Comment 19•3 years ago
|
||
Backed out 7 changesets (Bug 1650089) foe causing bustages in ProcessIsolation.cpp
Log: https://treeherder.mozilla.org/logviewer?job_id=347272105&repo=autoland&lineNumber=33323
Backout: https://hg.mozilla.org/integration/autoland/rev/78fc766e179ac5667a233554d259f6c9d1986c10
Comment 20•3 years ago
|
||
Pushed by nlayzell@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/32e207558b3d Part 1: Add a remoteTypeOverride option for about:blank loads triggered by chrome, r=annyG,kmag https://hg.mozilla.org/integration/autoland/rev/a26afdc56d91 Part 2: Track which ContentParent is used to create a DocumentLoadListener, r=annyG,kmag https://hg.mozilla.org/integration/autoland/rev/37e5185dae14 Part 3: Rework DocumentChannel-triggered process switches to support null principals, r=annyG,kmag https://hg.mozilla.org/integration/autoland/rev/8fc2f428694d Part 4: Update various tests with new expectations, r=annyG,kmag https://hg.mozilla.org/integration/autoland/rev/c470e4c65117 Part 5: Skip the unnecessary about:blank load when loading view-source for a document, r=Gijs https://hg.mozilla.org/integration/autoland/rev/283ba29cdbeb Part 6: Don't set maychangeremoteness on non-e10s browsers, r=Gijs https://hg.mozilla.org/integration/autoland/rev/336d6eb2fc15 Part 7: Avoid process-switching in devtools test to keep it passing, r=nchevobbe
Assignee | ||
Updated•3 years ago
|
Comment 21•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/32e207558b3d
https://hg.mozilla.org/mozilla-central/rev/a26afdc56d91
https://hg.mozilla.org/mozilla-central/rev/37e5185dae14
https://hg.mozilla.org/mozilla-central/rev/8fc2f428694d
https://hg.mozilla.org/mozilla-central/rev/c470e4c65117
https://hg.mozilla.org/mozilla-central/rev/283ba29cdbeb
https://hg.mozilla.org/mozilla-central/rev/336d6eb2fc15
Comment 22•3 years ago
|
||
Pushed by geoff@darktrojan.net: https://hg.mozilla.org/comm-central/rev/5203de061348 Port bug 1650089 - Use NavigationIsolationOptions instead of RemotenessChangeOptions. rs=bustage-fix
Comment 23•3 years ago
|
||
Backout by malexandru@mozilla.com: https://hg.mozilla.org/mozilla-central/rev/07984fd9d7e6 Backed out 7 changesets for causing xpcshell failures in test_ext_cookieBehaviors.js
Comment 24•3 years ago
•
|
||
Backed out 7 changesets (Bug 1650089) for causing xpcshell failures in test_ext_cookieBehaviors.js
Backout links:
https://hg.mozilla.org/mozilla-central/rev/07984fd9d7e67f3fc1daf8e31c738e45e3d945cc
https://hg.mozilla.org/integration/autoland/rev/07984fd9d7e67f3fc1daf8e31c738e45e3d945cc
Push with failures, failure log.
Comment 25•3 years ago
|
||
(In reply to Pulsebot from comment #14)
Pushed by nlayzell@mozilla.com:
...
https://hg.mozilla.org/integration/autoland/rev/26ddad079ad3
Part 5: Skip the unnecessary about:blank load when loading view-source for a
document, r=Gijs
== Change summary for alert #30773 (as of Wed, 04 Aug 2021 10:31:45 GMT) ==
Regressions:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
19% | welcome | loadtime | macosx1015-64-shippable-qr | warm webrender | 21.69 -> 25.79 |
17% | welcome | loadtime | macosx1015-64-shippable-qr | warm webrender | 21.67 -> 25.29 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=30773
Comment 26•3 years ago
|
||
(In reply to Dorel Luca [:dluca] from comment #15)
...
Backout:
https://hg.mozilla.org/integration/autoland/rev/bbc5bf72442512d19c1bbe37fe3e5a6ccdec92df
== Change summary for alert #30746 (as of Mon, 02 Aug 2021 11:44:20 GMT) ==
Improvements:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
16% | welcome | loadtime | macosx1015-64-shippable-qr | warm webrender | 25.79 -> 21.58 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=30746
Assignee | ||
Comment 27•3 years ago
|
||
Without this change, there was a race where the promiseBrowserLoaded promise
from the loadURL call could return early due to observing the load completing
for the initial about:blank document, rather than for the second explicit
about:blank load, leading to intermittent timeouts due to loads interrupting
one-another. This change instead makes loadContentPage wait for the initial
frame src about:blank load if called with "about:blank".
Depends on D121286
Comment 28•3 years ago
|
||
Pushed by nlayzell@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e870508c1ddc Part 1: Add a remoteTypeOverride option for about:blank loads triggered by chrome, r=annyG,kmag https://hg.mozilla.org/integration/autoland/rev/78c012d4b071 Part 2: Track which ContentParent is used to create a DocumentLoadListener, r=annyG,kmag https://hg.mozilla.org/integration/autoland/rev/a7a4f37a5d72 Part 3: Rework DocumentChannel-triggered process switches to support null principals, r=annyG,kmag https://hg.mozilla.org/integration/autoland/rev/1b8b4e939e82 Part 4: Update various tests with new expectations, r=annyG,kmag https://hg.mozilla.org/integration/autoland/rev/24a5bd7f97dd Part 5: Skip the unnecessary about:blank load when loading view-source for a document, r=Gijs https://hg.mozilla.org/integration/autoland/rev/bcdd2f5c9840 Part 6: Don't set maychangeremoteness on non-e10s browsers, r=Gijs https://hg.mozilla.org/integration/autoland/rev/24d7898ec4bd Part 7: Avoid process-switching in devtools test to keep it passing, r=nchevobbe https://hg.mozilla.org/integration/autoland/rev/ee763318d378 Part 8: Avoid race when calling loadContentPage with about:blank, r=kmag
Assignee | ||
Updated•3 years ago
|
Comment 29•3 years ago
|
||
Backed out for causing Xpcshell failures on test_ext_redirects.js.
Backout link: https://hg.mozilla.org/integration/autoland/rev/c8c5ee96321f28bab4804ccd2ae5015c920280fd
Failure log: https://treeherder.mozilla.org/logviewer?job_id=347594226&repo=autoland&lineNumber=5819
Assignee | ||
Comment 30•3 years ago
|
||
Depends on D121778
Comment 31•3 years ago
|
||
Pushed by nlayzell@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/347bfa0c1d6c Part 1: Add a remoteTypeOverride option for about:blank loads triggered by chrome, r=annyG,kmag https://hg.mozilla.org/integration/autoland/rev/4630cee1d75b Part 2: Track which ContentParent is used to create a DocumentLoadListener, r=annyG,kmag https://hg.mozilla.org/integration/autoland/rev/7eab4924b217 Part 3: Rework DocumentChannel-triggered process switches to support null principals, r=annyG,kmag https://hg.mozilla.org/integration/autoland/rev/6d77efa0db2a Part 4: Update various tests with new expectations, r=annyG,kmag https://hg.mozilla.org/integration/autoland/rev/c7d3ded0379b Part 5: Skip the unnecessary about:blank load when loading view-source for a document, r=Gijs https://hg.mozilla.org/integration/autoland/rev/600306a230c7 Part 6: Don't set maychangeremoteness on non-e10s browsers, r=Gijs https://hg.mozilla.org/integration/autoland/rev/0f6db246705a Part 7: Avoid process-switching in devtools test to keep it passing, r=nchevobbe https://hg.mozilla.org/integration/autoland/rev/a1de3e90c71f Part 8: Avoid race when calling loadContentPage with about:blank, r=kmag https://hg.mozilla.org/integration/autoland/rev/a7bfb6ebedc5 Part 9: Report errors back to caller when messaging from outside of a GeckoView controlled window, r=agi
Assignee | ||
Updated•3 years ago
|
Comment 32•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/347bfa0c1d6c
https://hg.mozilla.org/mozilla-central/rev/4630cee1d75b
https://hg.mozilla.org/mozilla-central/rev/7eab4924b217
https://hg.mozilla.org/mozilla-central/rev/6d77efa0db2a
https://hg.mozilla.org/mozilla-central/rev/c7d3ded0379b
https://hg.mozilla.org/mozilla-central/rev/600306a230c7
https://hg.mozilla.org/mozilla-central/rev/0f6db246705a
https://hg.mozilla.org/mozilla-central/rev/a1de3e90c71f
https://hg.mozilla.org/mozilla-central/rev/a7bfb6ebedc5
Comment 33•3 years ago
|
||
Pushed by geoff@darktrojan.net: https://hg.mozilla.org/comm-central/rev/6364a72f4dc8 Port bug 1650089 - Use NavigationIsolationOptions instead of RemotenessChangeOptions. rs=bustage-fix
Updated•3 years ago
|
Comment 35•3 years ago
|
||
Setting status-firefox92=wontfix because we don't want to uplift this fix to Beta 92. The fix is risky and is not a user-visible behavior change.
Updated•1 year ago
|
Updated•1 year ago
|
Description
•