Open Bug 1476101 Opened 5 years ago Updated 8 months ago

Optimize Windows sandbox policy construction

Categories

(Core :: Security: Process Sandboxing, enhancement, P2)

x86_64
Windows
enhancement

Tracking

()

Performance Impact medium

People

(Reporter: jld, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf:responsiveness, perf:startup)

Attachments

(2 files, 1 obsolete file)

I profiled content process launch on Windows, and found that we spend about half the total ContentParent::LaunchSubprocess time constructing the sandbox policy: https://perfht.ml/2zM88PS

Specifically, we spend a lot of time adding the filesystem rules, and more specifically in sandbox::IsReparsePoint and sandbox::ConvertToLongPath.

But the policy is mostly (or entirely?) the same for every process, so ideally we could just create the common policy once and copy it per-process like we did on B2G, or maybe cache that work some other way.


Data-driven warning: We don't have telemetry data on what portion of launch time this is for the median user.  The CONTENT_PROCESS_LAUNCH_TIME_MS metric has some bugs (it's collecting times from two different points in the process), but I'm seeing a median around 8ms on the system where I profiled that vs. 34ms on Beta, so that profile isn't exactly representative.  Bug 1474991 would help with this.
Priority: -- → P2
Whiteboard: [qf]
Whiteboard: [qf] → [qf:p1:f67]

Hey bobowen, I hear you work on the Windows sandbox! :) What are the chances this can be prioritized sometime this quarter?

Flags: needinfo?(bobowencode)
Whiteboard: [qf:p1:f67] → [qf:p2:responsiveness][fxperf]

After reviewing the sandbox policy code, I don't see an obvious reason why we can't share them (when the policy settings are the same of course). In fact the policy code itself appears to have been created with that in mind.
Some of the other chromium sandbox code that uses it pretty much assumes a single process per policy, but it doesn't look like it will require big changes.

I'll try and take a stab at a patch soon.
Leaving the NI to nag me.

(In reply to Bob Owen (:bobowen) from comment #2)

I'll try and take a stab at a patch soon.
Leaving the NI to nag me.

Any progress to report here, bobowen? Can we assign this one to you?

Whiteboard: [qf:p2:responsiveness][fxperf] → [qf:p2:responsiveness][fxperf:p2]

I'm not actively working on this, but I will try to when I get a chance.

Things aren't quite so simple as I first thought in comment 2, as there is some information in the policy that changes per process, but it doesn't look too bad.

Hello, I'm an Outreachy applicant and I will start working on this bug.

See Also: → 1541246
Assignee: nobody → emalysz

Proposed plan:

  1. Change policy by splitting out dynamic parts and passing those by another means to SpawnTarget
  2. Check use of policy object and avoid any sharing problems.
  3. Let SandboxBroker hold 1 common process

To be clear, this ought to have no impact on startup time, yes? As we'll still have to do the work for the first content process, we'll just be avoiding it for subsequent launches? I imagine there's no way we have in mind for reducing this time for the first content process launch of a session, correct?

Do we know how often UI changes are blocked on content process creation, vs when we are able to complete process creation in the background?

Attachment #9091594 - Attachment is obsolete: true

(In reply to Doug Thayer [:dthayer] from comment #9)

To be clear, this ought to have no impact on startup time, yes? As we'll still have to do the work for the first content process, we'll just be avoiding it for subsequent launches? I imagine there's no way we have in mind for reducing this time for the first content process launch of a session, correct?

Hm, but if we can share some of the policy between normal content processes, the privileged content process, the gpu process and the networking process, it might benefit startup...

(In reply to :Gijs (he/him) from comment #11)

(In reply to Doug Thayer [:dthayer] from comment #9)

To be clear, this ought to have no impact on startup time, yes? As we'll still have to do the work for the first content process, we'll just be avoiding it for subsequent launches? I imagine there's no way we have in mind for reducing this time for the first content process launch of a session, correct?

Hm, but if we can share some of the policy between normal content processes, the privileged content process, the gpu process and the networking process, it might benefit startup...

The only sharing that can be done here (at least with the planned changes) is between the normal content processes and the privileged content process (even this might cause problems, in the future).
Given that some of these are changes to the chromium sandbox code, I've asked if we can quantify that benefit, once we have a working prototype.
(Ideally we would get changes upstreamed, but that goes for quite a lot of the changes we already carry. As always it's a matter of priorities.)

The GPU process isn't sandbox yet, but hopefully will be soon, but the policy will be different.
The networking process is not sandboxed yet, but again the policy will be different.

Flags: needinfo?(bobowencode)

emalysz - before we go through another round of reviews, I'm assuming you have a working version now. Does this seem to be actually giving us the performance benefits we were hoping for?

Flags: needinfo?(emalysz)

We're currently unable to get a symbolicated profile due to Bug 1594097, so we cannot demonstrate performance benefits (if any) until that is resolved

Flags: needinfo?(emalysz)

bobowen- I'll comment profiles once I figure out how to track the sandbox code. But for right now, I added markers within ContentParent.cpp LaunchSubprocessInternal() and saw that the second and third time the marker is hit it is reduced from ~14 ms to ~7ms.
Without patch: https://perfht.ml/33QQtRM
With patch: https://perfht.ml/2QjCaRF

I'm having trouble profiling the "IPCLaunchThread" (something about naming), but here are preliminary results.

Flags: needinfo?(bobowencode)

(In reply to Emma Malysz from comment #15)

bobowen- I'll comment profiles once I figure out how to track the sandbox code. But for right now, I added markers within ContentParent.cpp LaunchSubprocessInternal() and saw that the second and third time the marker is hit it is reduced from ~14 ms to ~7ms.

Great work, I'm on PTO for some of Monday, so I'll try to get to the patch on Tuesday.
As has been mentioned this might not improve initial start-up, because that might rely more on the first process start.
It may well help with fission though, particularly when restoring multiple tabs.

However, I've been thinking about initial start-up and as the IPC Launch is single threaded on Windows, we could possibly post a task to that thread after the GPU and socket threads have started, to create the policy ready for the first content process launch.
Something for a follow-up though.

Without patch: https://perfht.ml/33QQtRM
With patch: https://perfht.ml/2QjCaRF

I'm having trouble profiling the "IPCLaunchThread" (something about naming), but here are preliminary results.

Looks like the name of that thread is "IPC Launch" [1].
I think it is the name that the profiler needs in the settings.

[1] https://searchfox.org/mozilla-central/rev/cac340f10816707e91c46db6d285f80259420f07/ipc/glue/GeckoChildProcessHost.cpp#856

Flags: needinfo?(bobowencode)

Here are two more profiles, this time gathered with the IPC Launch Thread:
with patch: http://bit.ly/37laVvZ
without patch: http://bit.ly/2r2zrkY

In both cases, I loaded two tabs. With the patch, it has reduced the time spent from 17 ms to 4 ms in SetSecurityLevelForContentProcess.
The marker set in this method showed a small decrease (7.2 to 6.9) on the first instance, but the subsequent look ups were significantly reduced: 7.5-10ms to 0.

The sandbox team has a few concerns with the current implementation and wants to investigate policy interactions more to see if this is still a viable option. :bobowen also suggested that we could cache the reparse point lookup, as this is what causes the performance issue. This eliminates the risk of the policy and use of it not being thread-safe.

There also can be performance wins when the policy is first constructed if file paths are similar in the rules.

Assignee: emalysz → nobody

Just to bring things up to speed in this bug.
After discussions with the chromium sandbox team, it looks like we can probably remove the checks that cause a fair amount of the I/O during sandbox setup. (See https://phabricator.services.mozilla.com/D112285#3844519)
There are some complications to getting that working for symlinks (bug 1695556), although for some of the possible performance benefits, we might just be able to remove the check.

However after very quick testing of this, the performance benefits of this alone are possibly somewhat masked by other mitigations we have in place, like the pre-allocated process. It looks like it might be a bit more noticeable for fission.

I hope to be looking at getting these changes made this half.

Performance Impact: --- → P2
Whiteboard: [qf:p2:responsiveness][fxperf:p2] → [fxperf:p2]
Keywords: perf:startup
Whiteboard: [fxperf:p2]
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.