Closed Bug 1650089 Opened 4 years ago Closed 3 years ago

Fix process allocation for null principal

Categories

(Core :: DOM: Navigation, task, P3)

task

Tracking

()

RESOLVED FIXED
93 Branch
Fission Milestone M8
Tracking Status
firefox-esr78 --- wontfix
firefox-esr91 --- wontfix
firefox91 --- wontfix
firefox92 --- wontfix
firefox93 --- fixed

People

(Reporter: neha, Assigned: nika)

References

(Blocks 2 open bugs)

Details

(Keywords: perf-alert, Whiteboard: fission-hard-blocker)

Attachments

(9 files)

48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review

From our Fission meeting discussions:
We currently use a random process for null principal. We should instead allocate the process after DocumentChannel is done resolving the final principal.

Blocks: fission
Severity: -- → N/A
Type: defect → task
Fission Milestone: --- → M6b
Priority: -- → P3

I think it is usually written as "null principal".

Summary: Fix process allocation for NULL principal → Fix process allocation for null principal
Fission Milestone: M6b → M6c

Doesn't need to block Fission Nightly. Tracking for our Fission Beta experiment (M7).

Fission Milestone: M6c → M7
Depends on: 1671983
Fission Milestone: M7 → MVP

We should do this before starting Release experiment in M8.

Assignee: nobody → nika
Fission Milestone: MVP → M8

I had some conversations with :annevk about the difficulty of deciding on a process selection strategy for null principals on Slack earlier today. Leaving a ni? for any further thoughts on the issues next week.

One issue is that we likely cannot afford to mint a new process for every single null principal. We probably need to do some sort of strategy for collecting these processes together. This matches what Chrome does already where they appear to track the true site origins of null principals over time, so that when they're loaded in the future, they're loaded within the site which created them. For something like a data: URI, this would be the site which initiated the load (TriggeringPrincipal here), and for something like a sandboxed frame, it would be the unsandboxed result principal.

We could try to follow this strategy as well, however it can become quite complicated in some situations, as doing so might require adding tracking to our null principal representation of the origin which created that null principal. We don't do anything like that right now, and it seems like a strange addition to the null principal type, especially when it's only intended to help with tracking process selections. Alternatively, we could do some form of best-effort tracking, and when all else fails use a process fallback.

The most secure option, of course, would be to use a distinct process for every null principal which we load, but that is likely untenable in MVP for memory use and performance reasons.

Flags: needinfo?(annevk)

My current idea is something along the lines of this (ignoring the potential issues around re-using a process for sandboxed loads and unsandboxed loads. This could potentially be adapted to handle that situation by adding an extra sandbox flag which is used during process selection):

  1. If another global already exists in the BCG with this exact null principal, use the same process as that global.
  2. If the load is a sandboxed load:
    • If the unsandboxed principal is a content principal, use that principal to perform process selection.
    • If another global exists in the BCG with the unsandboxed principal, use the same process as that global.
  3. If the load is a data URI load:
    • If the principal to be inherited (as-if it was about:blank) was a content principal, use that principal to perform process selection.
    • If another global exists in the BCG with the principal to be inherited, used the same process as that global.
  4. If none of the previous steps successfully found a process to perform the load in, perform the load in a web content process.

I think this should catch most cases, but will miss some such as history navigations to documents created by other documents with null principals. In those cases, the creator document may have been destroyed and so looking up its' null principal for process selection would fail, and we'd end up in the catchall web content process. We could include details like process selection decisions in history entries to work around this in that specific case, but I wouldn't want to persist this information in sessionstore, as I would prefer not to stabilize process selection decisions or the remote type format.

This also has an odd edge case if somehow two loads with the exact same null principal are triggered at the same time with different triggering principals, where there is a short time between the process selection decision and the document being created during which we could make a different process selection decision for the second null principal. I can't think of a way this would happen in the wild though, and the worst case scenario is just that these two documents cannot communicate when they should be able to.

Can the final fallback not be a process for the null principal in question? It seems quite easy to end up in 4.

The one alternative I came up with that I'd like us to consider (and maybe urge Chrome to also adopt) is that each site has a parallel sandboxed site that we use for null principals. So https://example.com gets a process and if https://example.com ended up sandboxed or did a data URL load, those would end up in the sandboxed https://example.com process instead. While still not ideal, it seems a lot better than all of that ending up in the same process.

Having some telemetry available here would also be helpful.

Flags: needinfo?(annevk)

(In reply to Anne (:annevk) from comment #6)

Can the final fallback not be a process for the null principal in question? It seems quite easy to end up in 4.

Possibly. There's a risk of ending up with an absurd number of processes if we end up in 4 too often, but it might be OK? Perhaps it can be added as a tweakable preference.

The one alternative I came up with that I'd like us to consider (and maybe urge Chrome to also adopt) is that each site has a parallel sandboxed site that we use for null principals. So https://example.com gets a process and if https://example.com ended up sandboxed or did a data URL load, those would end up in the sandboxed https://example.com process instead. While still not ideal, it seems a lot better than all of that ending up in the same process.

Yeah, this is what the future steps around sandboxed loads would probably end up looking like for process selection. I think adding support for that is a follow-up issue though, so I'm not focusing on it right now.

Blocks: 1709660
Depends on: 1715167

After the changes in this bug, about:blank loads triggered by chrome will
finish in a "web" content process, as they have an untrusted null principal
without a precursor. In a few places throughout the codebase, however, we
perform about:blank loads with the explicit expectation that they do not change
processes. This new remoteTypeOverride option allows the intended final process
to be explicitly specified in this situation.

For security & simplicity reasons, this new attribute is limited to only be
usable on system-principal triggered loads of about:blank in toplevel browsing
contexts.

Depends on D120670

There are races which are more common after these patches where an implicit
about:blank load races with a speculative parent process load's process switch.
In this situation, bad behaviour can result as we process a navigation started
by a process which we process-switched away from. By tracking the explicit
ContentParent which is making the DocumentLoadListener request, we can catch
situations like this and avoid navigations being started from the wrong
processes.

Depends on D120671

This is a large refactoring of the DocumentChannel process switch codepath,
with the end goal of being better able to support future process switch
requirements such as dynamic isolation on android, as well as the immediate
requirement of null principal handling.

The major changes include:

  1. The logic is in C++ and has less failure cases, meaning it should be harder
    for us to error out unexpectedly and not process switch.
  2. Process selection decisions are more explicit, and tend to rely less on
    state such as the current remoteType when possible. This makes reasoning
    about where a specific load will complete easier.
  3. Additional checks are made after a "WebContent" behavior is selected to
    ensure that if an existing document in the same BCG is found, the load will
    finish in the required content process. This should make dynamic checks such
    as Android's logged-in site isolation easier to implement.
  4. ProcessIsolation logging is split out from DocumentChannel so that it's
    easier to log just the information related to process selection when
    debugging.
  5. Null result principal precursors are considered when performing process
    selection.

Other uses of E10SUtils for process selection have not yet been migrated to the
new design as they have slightly different requirements. This will be done in
follow-up bugs.

Depends on D120672

The changes in the previous part had a few behaviour changes which are visible
in tests, including cross-origin iframes with sandboxed origins now loading
remotely, and process selection for chrome-triggered null principal loads
behaving differently. In general this caused more process switches.

Depends on D120673

This load was causing a process switch after the changes in this patch.
This caused the view-source load to also process-switch back when loaded
and confused some view-source machinery. The load isn't necessary, and
if skipped the process switch due to navigation will not occur.

Depends on D120674

This bug is a hard blocker for Fission M8.

Whiteboard: fission-hard-blocker
Pushed by nlayzell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8bb303f6831a
Part 1: Add a remoteTypeOverride option for about:blank loads triggered by chrome, r=annyG,kmag
https://hg.mozilla.org/integration/autoland/rev/d9eeca699dec
Part 2: Track which ContentParent is used to create a DocumentLoadListener, r=annyG,kmag
https://hg.mozilla.org/integration/autoland/rev/c5d267a1907c
Part 3: Rework DocumentChannel-triggered process switches to support null principals, r=annyG,kmag
https://hg.mozilla.org/integration/autoland/rev/5ae2b2641484
Part 4: Update various tests with new expectations, r=annyG,kmag
https://hg.mozilla.org/integration/autoland/rev/26ddad079ad3
Part 5: Skip the unnecessary about:blank load when loading view-source for a document, r=Gijs

This change makes all browsers which were not created with an initial remote
attribute within a non-useRemoteTabs window be unable to process-switch, as
otherwise we may attempt to switch loads into a content process. We need to
keep process switching enabled for explicitly-remote browsers loaded in a
non-useRemoteTabs window as it's relied on for tests and can lead to
assertion failures due to loading remote content in the parent process.

Depends on D120736

Flags: needinfo?(nika)
Regressions: 1723688
Pushed by nlayzell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/54d12a4333a0
Part 1: Add a remoteTypeOverride option for about:blank loads triggered by chrome, r=annyG,kmag
https://hg.mozilla.org/integration/autoland/rev/dbdee40ef8a0
Part 2: Track which ContentParent is used to create a DocumentLoadListener, r=annyG,kmag
https://hg.mozilla.org/integration/autoland/rev/091c4efa36a7
Part 3: Rework DocumentChannel-triggered process switches to support null principals, r=annyG,kmag
https://hg.mozilla.org/integration/autoland/rev/ee61b69ba556
Part 4: Update various tests with new expectations, r=annyG,kmag
https://hg.mozilla.org/integration/autoland/rev/ca9ba60010c6
Part 5: Skip the unnecessary about:blank load when loading view-source for a document, r=Gijs
https://hg.mozilla.org/integration/autoland/rev/092451e931ce
Part 6: Don't set maychangeremoteness on non-e10s browsers, r=Gijs
https://hg.mozilla.org/integration/autoland/rev/4e73beb8872c
Part 7: Avoid process-switching in devtools test to keep it passing, r=nchevobbe
Pushed by nlayzell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/32e207558b3d
Part 1: Add a remoteTypeOverride option for about:blank loads triggered by chrome, r=annyG,kmag
https://hg.mozilla.org/integration/autoland/rev/a26afdc56d91
Part 2: Track which ContentParent is used to create a DocumentLoadListener, r=annyG,kmag
https://hg.mozilla.org/integration/autoland/rev/37e5185dae14
Part 3: Rework DocumentChannel-triggered process switches to support null principals, r=annyG,kmag
https://hg.mozilla.org/integration/autoland/rev/8fc2f428694d
Part 4: Update various tests with new expectations, r=annyG,kmag
https://hg.mozilla.org/integration/autoland/rev/c470e4c65117
Part 5: Skip the unnecessary about:blank load when loading view-source for a document, r=Gijs
https://hg.mozilla.org/integration/autoland/rev/283ba29cdbeb
Part 6: Don't set maychangeremoteness on non-e10s browsers, r=Gijs
https://hg.mozilla.org/integration/autoland/rev/336d6eb2fc15
Part 7: Avoid process-switching in devtools test to keep it passing, r=nchevobbe
Flags: needinfo?(nika)
Pushed by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/5203de061348
Port bug 1650089 - Use NavigationIsolationOptions instead of RemotenessChangeOptions. rs=bustage-fix
Backout by malexandru@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/07984fd9d7e6
Backed out 7 changesets for causing xpcshell failures in test_ext_cookieBehaviors.js
Status: RESOLVED → REOPENED
Flags: needinfo?(nika)
Resolution: FIXED → ---
Target Milestone: 92 Branch → ---

(In reply to Pulsebot from comment #14)

Pushed by nlayzell@mozilla.com:
...
https://hg.mozilla.org/integration/autoland/rev/26ddad079ad3
Part 5: Skip the unnecessary about:blank load when loading view-source for a
document, r=Gijs

== Change summary for alert #30773 (as of Wed, 04 Aug 2021 10:31:45 GMT) ==

Regressions:

Ratio Suite Test Platform Options Absolute values (old vs new)
19% welcome loadtime macosx1015-64-shippable-qr warm webrender 21.69 -> 25.79
17% welcome loadtime macosx1015-64-shippable-qr warm webrender 21.67 -> 25.29

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=30773

(In reply to Dorel Luca [:dluca] from comment #15)

...
Backout:
https://hg.mozilla.org/integration/autoland/rev/bbc5bf72442512d19c1bbe37fe3e5a6ccdec92df

== Change summary for alert #30746 (as of Mon, 02 Aug 2021 11:44:20 GMT) ==

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
16% welcome loadtime macosx1015-64-shippable-qr warm webrender 25.79 -> 21.58

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=30746

Without this change, there was a race where the promiseBrowserLoaded promise
from the loadURL call could return early due to observing the load completing
for the initial about:blank document, rather than for the second explicit
about:blank load, leading to intermittent timeouts due to loads interrupting
one-another. This change instead makes loadContentPage wait for the initial
frame src about:blank load if called with "about:blank".

Depends on D121286

Pushed by nlayzell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e870508c1ddc
Part 1: Add a remoteTypeOverride option for about:blank loads triggered by chrome, r=annyG,kmag
https://hg.mozilla.org/integration/autoland/rev/78c012d4b071
Part 2: Track which ContentParent is used to create a DocumentLoadListener, r=annyG,kmag
https://hg.mozilla.org/integration/autoland/rev/a7a4f37a5d72
Part 3: Rework DocumentChannel-triggered process switches to support null principals, r=annyG,kmag
https://hg.mozilla.org/integration/autoland/rev/1b8b4e939e82
Part 4: Update various tests with new expectations, r=annyG,kmag
https://hg.mozilla.org/integration/autoland/rev/24a5bd7f97dd
Part 5: Skip the unnecessary about:blank load when loading view-source for a document, r=Gijs
https://hg.mozilla.org/integration/autoland/rev/bcdd2f5c9840
Part 6: Don't set maychangeremoteness on non-e10s browsers, r=Gijs
https://hg.mozilla.org/integration/autoland/rev/24d7898ec4bd
Part 7: Avoid process-switching in devtools test to keep it passing, r=nchevobbe
https://hg.mozilla.org/integration/autoland/rev/ee763318d378
Part 8: Avoid race when calling loadContentPage with about:blank, r=kmag
Flags: needinfo?(nika)
Pushed by nlayzell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/347bfa0c1d6c
Part 1: Add a remoteTypeOverride option for about:blank loads triggered by chrome, r=annyG,kmag
https://hg.mozilla.org/integration/autoland/rev/4630cee1d75b
Part 2: Track which ContentParent is used to create a DocumentLoadListener, r=annyG,kmag
https://hg.mozilla.org/integration/autoland/rev/7eab4924b217
Part 3: Rework DocumentChannel-triggered process switches to support null principals, r=annyG,kmag
https://hg.mozilla.org/integration/autoland/rev/6d77efa0db2a
Part 4: Update various tests with new expectations, r=annyG,kmag
https://hg.mozilla.org/integration/autoland/rev/c7d3ded0379b
Part 5: Skip the unnecessary about:blank load when loading view-source for a document, r=Gijs
https://hg.mozilla.org/integration/autoland/rev/600306a230c7
Part 6: Don't set maychangeremoteness on non-e10s browsers, r=Gijs
https://hg.mozilla.org/integration/autoland/rev/0f6db246705a
Part 7: Avoid process-switching in devtools test to keep it passing, r=nchevobbe
https://hg.mozilla.org/integration/autoland/rev/a1de3e90c71f
Part 8: Avoid race when calling loadContentPage with about:blank, r=kmag
https://hg.mozilla.org/integration/autoland/rev/a7bfb6ebedc5
Part 9: Report errors back to caller when messaging from outside of a GeckoView controlled window, r=agi
Flags: needinfo?(nika)
Pushed by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/6364a72f4dc8
Port bug 1650089 - Use NavigationIsolationOptions instead of RemotenessChangeOptions. rs=bustage-fix
No longer regressions: 1723688
Regressions: 1725270
Regressions: 1725405

Setting status-firefox92=wontfix because we don't want to uplift this fix to Beta 92. The fix is risky and is not a user-visible behavior change.

Regressions: 1755490
Whiteboard: fission-hard-blocker → fission-hard-blocker [sp3]
Whiteboard: fission-hard-blocker [sp3] → fission-hard-blocker
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: