[Mac] Start the content sandbox earlier

RESOLVED FIXED in Firefox 65

Status

()

enhancement
P1
normal
RESOLVED FIXED
2 years ago
6 months ago

People

(Reporter: haik, Assigned: haik)

Tracking

58 Branch
mozilla65
Unspecified
macOS
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox65 fixed)

Details

(Whiteboard: sb+)

Attachments

(8 attachments, 2 obsolete attachments)

46 bytes, text/x-phabricator-request
Alex_Gaynor
: review+
Details | Review
46 bytes, text/x-phabricator-request
Alex_Gaynor
: review+
Details | Review
46 bytes, text/x-phabricator-request
Alex_Gaynor
: review+
Details | Review
46 bytes, text/x-phabricator-request
Alex_Gaynor
: review+
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
4.66 MB, text/plain
Details
There's a significant time window between content process startup and when the content sandbox is enabled. One downside to this is that some Mac services that we don't grant access to in our sandbox rules are used before enabling the sandbox and, as a result, are accessible from content processes after the sandbox is enabled. By starting the sandbox earlier, we might be able to 1) reduce the OS services content processes have access to and 2) understand which implicitly allowed services we depend on.
Assignee: nobody → haftandilian
Priority: -- → P1
Whiteboard: sb+
Blocks: 1468163
Duplicate of this bug: 1468163
Depends on D5099
Simplify the content sandbox policy by removing APP_BINARY_PATH and APP_DIR Mac sandbox parameters and their associated rules in the policy. Keep APP_PATH which is a parent directory of APP_BINARY_PATH and APP_DIR.

Depends on D6717
Pass sandbox parameters to content processes on the command
line allowing for early sandbox startup. Limited to Nightly
until confirmed to be stable and ready to ride the trains.

Enable early sandbox startup by default on Nightly and use
pref "security.sandbox.content.mac.earlyinit" to disable
early startup for debugging purposes.

Once early startup is stable, the original sandbox startup
code can be removed.

Depends on D6719
ASSERT that the sandbox is already enabled when receiving the SetProcessSandbox message.

Depends on D6720
Attachment #9010480 - Attachment is obsolete: true
The posted patches start the Mac content process early during process startup using sandbox parameters passed on the command line. I considered using shared memory to pass the parameters, but shared memory is setup much later in process startup and command line parameters let us start the sandbox as early as possible without pulling in any dependencies.

Using lsmp to compare with/without the fix, this stops plugin-container connecting to Dock, coreservicesd, and distnoted.

The changes include the pref "security.sandbox.content.mac.earlyinit" which is set to true by default. This enables early sandbox start for the content process and is meant as a temporary pref to help debug any issues that arise after landing on Nightly. My plan is to remove the pref and the original code paths for starting the content sandbox later after these changes are known to be stable.

I'd like to apply the same approach to the Mac GMP process in a follow-up bug.

The sandbox parameter list built in ContentParent is not cached in this implemenation. I plan to file a follow-up bug to cache the parameters that are constant for the life of the browser instance to make content process startup more efficient.
Comment on attachment 9011623 [details]
Bug 1431441 - Part 1 - Move GetAppPaths and GetDirectoryPath to nsMacUtilsImpl as static methods r=Alex_Gaynor

Alex Gaynor [:Alex_Gaynor] has approved the revision.
Attachment #9011623 - Flags: review+
Comment on attachment 9011626 [details]
Bug 1431441 - Part 2 - Remove rules for APP_BINARY_PATH and APP_DIR Mac sandbox parameters r=Alex_Gaynor

Alex Gaynor [:Alex_Gaynor] has approved the revision.
Attachment #9011626 - Flags: review+
Only allow access to "com.apple.windowserver.active" when the pref
"security.sandbox.content.mac.disconnect-windowserver" is set to true.

Depends on D6721
Attachment #9011623 - Attachment description: Bug 1431441 - Part 1 - Move GetAppPaths and GetDirectoryPath to nsMacUtilsImpl as static methods r?Alex_Gaynor → Bug 1431441 - Part 1 - Move GetAppPaths and GetDirectoryPath to nsMacUtilsImpl as static methods r=Alex_Gaynor
Attachment #9011626 - Attachment description: Bug 1431441 - Part 2 - Remove rules for APP_BINARY_PATH and APP_DIR Mac sandbox parameters r?Alex_Gaynor → Bug 1431441 - Part 2 - Remove rules for APP_BINARY_PATH and APP_DIR Mac sandbox parameters r=Alex_Gaynor
Comment on attachment 9011627 [details]
Bug 1431441 - Part 3 - Start the Mac content sandbox earlier r?Alex_Gaynor

Alex Gaynor [:Alex_Gaynor] has approved the revision.
Attachment #9011627 - Flags: review+
Comment on attachment 9011628 [details]
Bug 1431441 - Part 4 - ASSERT the sandbox is already enabled r?Alex_Gaynor

Alex Gaynor [:Alex_Gaynor] has approved the revision.
Attachment #9011628 - Flags: review+
Pushed by haftandilian@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ac51f86f5cac
Part 1 - Move GetAppPaths and GetDirectoryPath to nsMacUtilsImpl as static methods r=Alex_Gaynor
https://hg.mozilla.org/integration/autoland/rev/f61ec0f140c2
Part 2 - Remove rules for APP_BINARY_PATH and APP_DIR Mac sandbox parameters r=Alex_Gaynor
https://hg.mozilla.org/integration/autoland/rev/db6b7ee04187
Part 3 - Start the Mac content sandbox earlier r=Alex_Gaynor
https://hg.mozilla.org/integration/autoland/rev/be7ec7438701
Part 4 - ASSERT the sandbox is already enabled r=Alex_Gaynor
https://hg.mozilla.org/integration/autoland/rev/94a1d1d67191
Part 5 - Parameterize access to the windowserver in the Mac content sandbox policy r=Alex_Gaynor
Backout by toros@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/d770ea2a1b25
Backed out 5 changesets for failing devtools at client/debugger/new/test/mochitest/browser_dbg_rr_breakpoints-01.js on OSX opt a=backout
Backed out 5 changesets for failing devtools at client/debugger/new/test/mochitest/browser_dbg_rr_breakpoints-01.js on OSX opt

Backout push: https://hg.mozilla.org/mozilla-central/rev/d770ea2a1b257febbbbe07a38163d66bdc47e1fd

Failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=94a1d1d67191e9115a4058160397a2e47156f738&selectedJob=202986416
Status: RESOLVED → REOPENED
Flags: needinfo?(haftandilian)
Resolution: FIXED → ---
Target Milestone: mozilla64 → ---
Thanks. Brian, could you take a look and see if this is a missing redirection or something similar? I'm not seeing anything specific in the failure log (https://treeherder.mozilla.org/logviewer.html#?job_id=202986416&repo=autoland&lineNumber=8860). I'll also try and debug further.

I hit failures with this test intermittently (I thought) before, but couldn't get it to work on a clean central build and filed these bugs:

Bug 1495574 browser_dbg_rr_breakpoints-01.js test failure due to EXC_BAD_ACCESS crash
Bug 1495457 browser_dbg_rr_breakpoints-01.js Timeout and Assert Failure

And we have

Bug 1485566 Intermittent devtools/client/debugger/new/test/mochitest/browser_dbg_rr_breakpoints-01.js | Test timed out -
Flags: needinfo?(haftandilian) → needinfo?(bhackett1024)
(And thanks Alex for referring me to those bugs.)
Unfortunately, the problem here is more complicated than a missing redirection.  That failure log actually does indicate the problem: a bind() call which we make in recordreplay::parent::OpenChannel fails, presumably because the sandbox has been initialized and has disallowed this call.

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x000000011b98b3ad XUL`mozilla::recordreplay::parent::OpenChannel(aMiddlemanPid=44255, aChannelId=0, aConnection=0x00007fff57e41598) at Channel.cpp:49
    frame #1: 0x000000011b98b8ac XUL`mozilla::recordreplay::Channel::Channel(this=0x0000000108488020, aId=0, aMiddlemanRecording=true, aHandler=0x00007fff57e417c0)> const&) at Channel.cpp:93
    frame #2: 0x000000011b98be24 XUL`mozilla::recordreplay::Channel::Channel(this=0x0000000108488020, aId=0, aMiddlemanRecording=true, aHandler=0x00007fff57e417c0)> const&) at Channel.cpp:69
    frame #3: 0x000000011b9c86d6 XUL`mozilla::recordreplay::parent::ChildProcessInfo::LaunchSubprocess(this=0x0000000108486020, aRecordingProcessData=0x00007fff57e418c0) at ChildProcess.cpp:487
    frame #4: 0x000000011b9c8443 XUL`mozilla::recordreplay::parent::ChildProcessInfo::ChildProcessInfo(this=0x0000000108486020, aRole=UniquePtr<mozilla::recordreplay::parent::ChildRole, mozilla::DefaultDelete<mozilla::recordreplay::parent::ChildRole> > @ 0x00007fff57e418a8, aRecordingProcessData=0x00007fff57e418c0) at ChildProcess.cpp:59
    frame #5: 0x000000011b9c8e7d XUL`mozilla::recordreplay::parent::ChildProcessInfo::ChildProcessInfo(this=0x0000000108486020, aRole=<unavailable>, aRecordingProcessData=0x00007fff57e418c0) at ChildProcess.cpp:47
    frame #6: 0x000000011b9d3973 XUL`mozilla::recordreplay::parent::SpawnRecordingChild(aRecordingProcessData=0x00007fff57e41908) at ParentIPC.cpp:554
    frame #7: 0x000000011b9d36c3 XUL`mozilla::recordreplay::parent::InitializeMiddleman(aArgc=31, aArgv=0x00007fff57e42288, aParentPid=44251, aPrefsHandle=0x00007fff57e41bd8, aPrefMapHandle=0x00007fff57e41b08) at ParentIPC.cpp:901
    frame #8: 0x0000000118aad00a XUL`mozilla::dom::ContentProcess::Init(this=0x0000000108417800, aArgc=31, aArgv=0x00007fff57e42288) at ContentProcess.cpp:286
    frame #9: 0x000000011ba1d063 XUL`XRE_InitChildProcess(aArgc=31, aArgv=0x00007fff57e42288, aChildData=0x00007fff57e420f8) at nsEmbedFunctions.cpp:739
    frame #10: 0x000000011ba28c27 XUL`mozilla::BootstrapImpl::XRE_InitChildProcess(this=0x00000001084041b0, argc=34, argv=0x00007fff57e42288, aChildData=0x00007fff57e420f8) at Bootstrap.cpp:69
    frame #11: 0x0000000107dbe32a plugin-container`content_process_main(bootstrap=0x00000001084041b0, argc=34, argv=0x00007fff57e42288) at plugin-container.cpp:50
    frame #12: 0x0000000107dbe612 plugin-container`main(argc=35, argv=0x00007fff57e42288) at MozillaRuntimeMain.cpp:37
    frame #13: 0x00007fffc3f22235 libdyld.dylib`start + 1
    frame #14: 0x00007fffc3f22235 libdyld.dylib`start + 1

Web Replay relies on the current sandbox behavior in a couple of ways:

- Middleman processes need to spawn their own recording child process during recordreplay::parent::InitializeMiddleman.  This can't be done by asking e.g. the ContentParent to spawn the process on the middleman's behalf, because the recording child needs to be started up before connecting with the ContentParent (the ContentParent initializes state in both the middleman and the recording child).

- Recording and replaying child processes open a file handle for their recording during recordreplay::Initialize, and soon after recordreplay::child::InitRecordingOrReplayingProcess connects to the middleman process via some IPC that might be allowed by the sandbox, though I'm not sure.

As a workaround you could check for gProcessKindOption when initializing the sandbox, or avoid passing the early initialize command line parameter when the ContentParent is recording/replaying, but this doesn't fit in with the longer term plans in comment 7.  Do you know where the services referred to in comment 0 are first used?  It should be possible to initialize the sandbox very early in a recording/replaying process' startup (recordreplay::Initialize is called at the start of XRE_InitChildProcess), but it would be hard architecturally to do the same for middleman processes (which FWIW never interact with web content).
Flags: needinfo?(bhackett1024)
Thanks, Brian. I'll work on fixing this. I think we'll want to start the sandbox later in the middleman process, and change the sandbox rules for the record/replay child processes to allow access to the recording file. I need to setup a build on an earlier OS version (than Mojave) in order to test/debug.
Depends on: 1497716
I'll post updated patches soon that include two new patches to address the WebReplay test failure that caused the backout: part 6 to start the sandbox later for middleman processes and part 7 from Brian to change WebReplay code to permit all the fcntl options when thread events are being passed through.

We also needed to add redirections for realpath calls (including realpath$DARWIN_EXTSN which is used on older 10.12 and earlier Mac systems.) That was addressed with bug 1497716.

I want to experiment with starting the WebReplay sandbox earlier with a more permission sandbox ruleset to eventually unify how we start the sandbox for content, GMP, and WebReplay middleman processes. I don't plan to change how the Flash sandbox is started in order to minimize any changes to Flash given it's EOL roadmap.
Patch provided by Brian Hackett <bhackett@mozilla.com>.

Only allow a limited set of commands to be used when events are not
passed through and we are recording/replaying the outputs.

Depends on D8474
Attachment #9016423 - Attachment is obsolete: true
(In reply to Haik Aftandilian [:haik] from comment #26)
> Created attachment 9016479 [details]
> Bug 1431441 - Part 7 - Relax WebReplay fcntl rules to avoid sqlite crash
> r?froydnj

To provide more explanation, without this patch the replay process was crashing in a call to fcntl from libsqlite3.dylib from somewhere within AppKit. See attachment for full test output including a crash stack trace.
Pushed by haftandilian@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b9cfda58fed3
Part 1 - Move GetAppPaths and GetDirectoryPath to nsMacUtilsImpl as static methods r=Alex_Gaynor
https://hg.mozilla.org/integration/autoland/rev/262da0be2fed
Part 2 - Remove rules for APP_BINARY_PATH and APP_DIR Mac sandbox parameters r=Alex_Gaynor
https://hg.mozilla.org/integration/autoland/rev/2a635530dfa3
Part 3 - Start the Mac content sandbox earlier r=Alex_Gaynor
https://hg.mozilla.org/integration/autoland/rev/1e2bb579b824
Part 4 - ASSERT the sandbox is already enabled r=Alex_Gaynor
https://hg.mozilla.org/integration/autoland/rev/b59b1651fc15
Part 5 - Parameterize access to the windowserver in the Mac content sandbox policy r=Alex_Gaynor
https://hg.mozilla.org/integration/autoland/rev/fb1a4ddbf9bf
Part 6 - Start middleman WebReplay process sandbox later r=Alex_Gaynor
https://hg.mozilla.org/integration/autoland/rev/1dfdc7ba998d
Part 7 - Relax WebReplay fcntl rules to avoid sqlite crash r=froydnj
Depends on: 1498869
Backed out as per :haik request on https://bugzilla.mozilla.org/show_bug.cgi?id=1498840#c6
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla64 → ---
Backout by nbeleuzu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/556f31a5e223
Backed out 7 changesets as per haik`s request.
Before the backout, we noticed these big performance regressions:

== Change summary for alert #16766 (as of Fri, 12 Oct 2018 13:23:55 GMT) ==

Regressions:

 16%  sessionrestore osx-10-10 opt e10s stylo                     640.67 -> 745.17
 13%  ts_paint osx-10-10 opt e10s stylo                           707.67 -> 802.83
 13%  sessionrestore_no_auto_restore osx-10-10 opt e10s stylo     675.29 -> 764.67
 13%  ts_paint_webext osx-10-10 opt e10s stylo                    712.33 -> 803.92

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=16766
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #33)
> Before the backout, we noticed these big performance regressions:
> 
> == Change summary for alert #16766 (as of Fri, 12 Oct 2018 13:23:55 GMT) ==
> 
> Regressions:
> 
>  16%  sessionrestore osx-10-10 opt e10s stylo                     640.67 ->
> 745.17
>  13%  ts_paint osx-10-10 opt e10s stylo                           707.67 ->
> 802.83
>  13%  sessionrestore_no_auto_restore osx-10-10 opt e10s stylo     675.29 ->
> 764.67
>  13%  ts_paint_webext osx-10-10 opt e10s stylo                    712.33 ->
> 803.92
> 
> For up to date results, see:
> https://treeherder.mozilla.org/perf.html#/alerts?id=16766

I suspect this is due to the same root cause as bug 1498869. Debugging needed.
The issue reported in bug 1498840 (while this fix was on Nightly before being backed out) was due to a bug in patch 2 "Remove rules for APP_BINARY_PATH and APP_DIR Mac sandbox parameters". When early startup is enabled, GetAppPath() is called in the parent and returns the path to the outer .app dir (e.g., Firefox.app). But when the earlyinit pref was set to false, GetAppPath is called from the child process an returns the path to the inner plugin-container.app. For the sandbox rule set, we need to always use the parent .app path. I've just posted a fix to patch 2 for this problem. I've also changed patch 3 to disable early startup by default by setting security.sandbox.content.mac.earlyinit to false. Once the remaining issues are resolved, we can enable this by default.

The "Not Responding" in Activity Monitor issue reported in bug 1498845 was due to not calling CGSShutdownServerConnections() with early startup. And the UI issues reported in bug 1498869 appear to be caused by shader cache issues from the content process. With my debugging so far, both of these issues appear to be fixable. I'm choosing to land the feature in the disabled state now while I work to resolve the these issues.
Pushed by haftandilian@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a26abadaa0f9
Part 1 - Move GetAppPaths and GetDirectoryPath to nsMacUtilsImpl as static methods r=Alex_Gaynor
https://hg.mozilla.org/integration/autoland/rev/9bb3261dd6ac
Part 2 - Remove rules for APP_BINARY_PATH and APP_DIR Mac sandbox parameters r=Alex_Gaynor
https://hg.mozilla.org/integration/autoland/rev/03cb6cfc053d
Part 3 - Start the Mac content sandbox earlier r=Alex_Gaynor
https://hg.mozilla.org/integration/autoland/rev/abbae9f25834
Part 4 - ASSERT the sandbox is already enabled r=Alex_Gaynor
https://hg.mozilla.org/integration/autoland/rev/cd9c1a610dd7
Part 5 - Parameterize access to the windowserver in the Mac content sandbox policy r=Alex_Gaynor
https://hg.mozilla.org/integration/autoland/rev/c70d57adec82
Part 6 - Start middleman WebReplay process sandbox later r=Alex_Gaynor
https://hg.mozilla.org/integration/autoland/rev/178100c1204c
Part 7 - Relax WebReplay fcntl rules to avoid sqlite crash r=froydnj
Blocks: 1501126
Depends on: 1504188
Blocks: 1505438
Blocks: 1505507
Blocks: 1505573
Blocks: 1506103
See Also: → 1525086
You need to log in before you can comment on or make changes to this bug.