Closed Bug 1498624 Opened 6 years ago Closed 5 years ago

Implement win/osx sandboxing for new RDD process

Categories

(Core :: Audio/Video: Playback, enhancement, P2)

58 Branch
enhancement

Tracking

()

RESOLVED FIXED
mozilla65
Tracking Status
firefox65 + fixed

People

(Reporter: mjf, Assigned: mjf)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 1 obsolete file)

While some windows sandboxing was required to make Shmem work in the RDD process (which is based on the GPU process), true linux/win/osx sandboxing implementation is needed.
Initial patch. This include a fix from mjf and a dummy fopen call that must fail
to show the sandbox is enabled.

This applies on top of the patches in 1471535.
Assignee: nobody → mfroman
Priority: -- → P2
I commented on this in the other bug, but Linux is going to be more complicated if the process needs to allocate shared memory, because the GMP policy won't work as-is there.

My vague idea here is to move the file broker hooks from ContentSandboxPolicy to SandboxPolicyCommon, but have it deny the operations if a broker client isn't set (so that the GMP policy isn't changed).  Then the RDD parent side can create a broker policy with only the rule for shared memory, or none if it's not necessary (see also bug 1440203), and the RDD child side can inherit from SandboxPolicyCommon and add rules as needed.

It might make more sense, at least for Linux, to have someone on the sandboxing side deal with this.  (See also GMP: it was landed unsandboxed and I think either preffed off or config'ed off, and bugs were filed against whatever component sandboxing was in back then to create the policies before it was shipped.)  In addition to the refactoring mentioned above, the process for adjusting a seccomp-bpf policy needs some familiarity with the system call interface, and it's somewhat tedious if you're not used to it; it might be easier for everyone not to try delegating all of that.
I'm going to write some stuff up for Windows and let Bob correct me if needed.  For Windows we want to start from the GMP sandbox and then harden it.

The GMP sandbox lockdown is here: https://searchfox.org/mozilla-central/rev/e0c879c86b95bdc752b1dbff6088169735674e4a/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp#920

The two things we want to add are ACG and win32k.sys lockdown.

win32k.sys lockdown is present in the GMP sandbox code, but you want to remove the test for sGmpWin32kDisable and just check IsWin10OrLater().

ACG doesn't have any example code; but I believe all you'll need to do is add 
> mitigations |= sandbox::MITIGATION_DYNAMIC_CODE_DISABLE
Flags: needinfo?(bobowencode)
The macOS sandbox policy used in this patch looks good (I didn't review the patch itself :-)).

If there are windowserver connections, we should also make the calls described here: https://searchfox.org/mozilla-central/source/dom/ipc/ContentChild.cpp#1541-1556
(In reply to Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧ from comment #2)
> I commented on this in the other bug, but Linux is going to be more
> complicated if the process needs to allocate shared memory, because the GMP
> policy won't work as-is there.
> 
> My vague idea here is to move the file broker hooks from
> ContentSandboxPolicy to SandboxPolicyCommon, but have it deny the operations
> if a broker client isn't set (so that the GMP policy isn't changed).  Then
> the RDD parent side can create a broker policy with only the rule for shared
> memory, or none if it's not necessary (see also bug 1440203), and the RDD
> child side can inherit from SandboxPolicyCommon and add rules as needed.
> 
> It might make more sense, at least for Linux, to have someone on the
> sandboxing side deal with this.  (See also GMP: it was landed unsandboxed
> and I think either preffed off or config'ed off, and bugs were filed against
> whatever component sandboxing was in back then to create the policies before
> it was shipped.)  In addition to the refactoring mentioned above, the
> process for adjusting a seccomp-bpf policy needs some familiarity with the
> system call interface, and it's somewhat tedious if you're not used to it;
> it might be easier for everyone not to try delegating all of that.

Any thoughts on when/who will tackle the linux changes you detail here?
Flags: needinfo?(jld)
(In reply to Michael Froman [:mjf] from comment #5)
> Any thoughts on when/who will tackle the linux changes you detail here?

I can take this, once the basic RDD support lands.  For reference, the refactoring I mentioned is bug 1500297.

Something to think about is how we'll enable/disable the sandbox.  I don't think we want a separate MOZ_RDD_SANDBOX config flag (see also bug 1375863), but we probably ought to have an environment variable for troubleshooting/debugging.

On Linux there's a bunch of extra baggage in SandboxInfo around MOZ_DISABLE_{CONTENT,GMP}_SANDBOX, but that's mainly to allow special handling of the case where the kernel doesn't support sandboxing but it wasn't explicitly disabled (on desktop to turn off GMP, and on B2G to refuse to boot on newer devices).  I think we can skip copying any of that for RDD; it should be enough to just check PR_GetEnv("MOZ_DISABLE_RDD_SANDBOX") when starting the sandbox.
Flags: needinfo?(jld)
(In reply to Tom Ritter [:tjr] from comment #3)
> I'm going to write some stuff up for Windows and let Bob correct me if
> needed.  For Windows we want to start from the GMP sandbox and then harden
> it.
> 
> The GMP sandbox lockdown is here:
> https://searchfox.org/mozilla-central/rev/
> e0c879c86b95bdc752b1dbff6088169735674e4a/security/sandbox/win/src/
> sandboxbroker/sandboxBroker.cpp#920
> 
> The two things we want to add are ACG and win32k.sys lockdown.
> 
> win32k.sys lockdown is present in the GMP sandbox code, but you want to
> remove the test for sGmpWin32kDisable and just check IsWin10OrLater().
> 
> ACG doesn't have any example code; but I believe all you'll need to do is
> add 
> > mitigations |= sandbox::MITIGATION_DYNAMIC_CODE_DISABLE

Yes that all sounds good, thanks.
In addition, we should always be able to use USER_LOCKDOWN and lose all of the rules after the gecko-crash-server-pipe one.

Is any shared memory being shared from the RDD process or always to it?
If it is aharing memory you will need at least one of the SUBSYS_HANDLES "section" rules that the content process has.

I'm also guessing that we won't need the Output Protection Manager proxying, so we can use FAKE_USER_GDI_INIT instead of IMPLEMENT_OPM_APIS for the SUBSYS_WIN32K_LOCKDOWN rule.
Flags: needinfo?(bobowencode)
On Mac, for content processes, we're changing the sandbox to be started earlier in the content process lifetime because that results in fewer system services being accessible. This is in Nightly and was just pref'd on with the fix for bug 1501126. The parameters needed for starting the sandbox are passed to content processes on the command line and then we enable the sandbox just after the port exchange by calling EarlyStartMacSandboxIfEnabled() from XRE_InitChildProcess().

Bug 1498742 is filed to take the same approach with the GMP process (and I hope unify as much of the code to avoid duplication.)

We'll want to also do this for the RDD process, but don't consider it a requirement. I'd be happy to work on this at the same time as bug 1498742. If it's something we wanted to do in this bug, I'm available to help out.
Keywords: leave-open
Pushed by mfroman@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/da37c7e24c90
pt1 - Implement OSX sandbox for RDD process. r=haik
(In reply to Bob Owen (:bobowen) from comment #7)
> (In reply to Tom Ritter [:tjr] from comment #3)
> > I'm going to write some stuff up for Windows and let Bob correct me if
> > needed.  For Windows we want to start from the GMP sandbox and then harden
> > it.
> > 
> > The GMP sandbox lockdown is here:
> > https://searchfox.org/mozilla-central/rev/
> > e0c879c86b95bdc752b1dbff6088169735674e4a/security/sandbox/win/src/
> > sandboxbroker/sandboxBroker.cpp#920
> > 
> > The two things we want to add are ACG and win32k.sys lockdown.
> > 
> > win32k.sys lockdown is present in the GMP sandbox code, but you want to
> > remove the test for sGmpWin32kDisable and just check IsWin10OrLater().
> > 
> > ACG doesn't have any example code; but I believe all you'll need to do is
> > add 
> > > mitigations |= sandbox::MITIGATION_DYNAMIC_CODE_DISABLE
> 
> Yes that all sounds good, thanks.
> In addition, we should always be able to use USER_LOCKDOWN and lose all of
> the rules after the gecko-crash-server-pipe one.
> 
> Is any shared memory being shared from the RDD process or always to it?
> If it is aharing memory you will need at least one of the SUBSYS_HANDLES
> "section" rules that the content process has.
> 
> I'm also guessing that we won't need the Output Protection Manager proxying,
> so we can use FAKE_USER_GDI_INIT instead of IMPLEMENT_OPM_APIS for the
> SUBSYS_WIN32K_LOCKDOWN rule.

I've pushed the Win sandbox code to review here: https://phabricator.services.mozilla.com/D13101

The RDD Win sandbox code is very closely based on GMP, but there are two issues that I need help with:
1. Calling SetJobLevel w/ sandbox::JOB_LOCKDOWN causes the sandbox to kill the RDD process.  Currently I'm using sandbox::JOB_RESTRICTED.
2. The SUBSYS_WIN32K_LOCKDOWN (w/ either IMPLEMENT_OPM_APIS or FAKE_USER_GDI_INIT) causes the sandbox to kill the RDD process.  Currently it is #if 0'd out, but included in the patch to show what I tried.

Any help you can provide on these to issues would be very much appreciated!  Once resolved, I'll push some cleanup to review for the final patch.
Flags: needinfo?(bobowencode)
(In reply to Michael Froman [:mjf] from comment #13)
...
> The RDD Win sandbox code is very closely based on GMP, but there are two
> issues that I need help with:
> 1. Calling SetJobLevel w/ sandbox::JOB_LOCKDOWN causes the sandbox to kill
> the RDD process.  Currently I'm using sandbox::JOB_RESTRICTED.

We currently use JOB_LOCKDOWN for the content process, so I'm surprised this causes problems.
JOB_LOCKDOWN only adds the JOB_OBJECT_LIMIT_DIE_ON_UNHANDLED_EXCEPTION flag, so that would seem to point to a problem you have in the RDD process.
If you debug with windbg, you can attach to child processes automatically and it will hopefully catch the crash as it happens.

> 2. The SUBSYS_WIN32K_LOCKDOWN (w/ either IMPLEMENT_OPM_APIS or
> FAKE_USER_GDI_INIT) causes the sandbox to kill the RDD process.  Currently
> it is #if 0'd out, but included in the patch to show what I tried.

I'm less surprised at this, I guess we're still making win32k calls somewhere.
Again debugging using windbg to catch the problem at process start-up would be my first thought.
Flags: needinfo?(bobowencode)
(In reply to Michael Froman [:mjf] from comment #13)
> (In reply to Bob Owen (:bobowen) from comment #7)
> > (In reply to Tom Ritter [:tjr] from comment #3)
> > > I'm going to write some stuff up for Windows and let Bob correct me if
> > > needed.  For Windows we want to start from the GMP sandbox and then harden
> > > it.
> > > 
> > > The GMP sandbox lockdown is here:
> > > https://searchfox.org/mozilla-central/rev/
> > > e0c879c86b95bdc752b1dbff6088169735674e4a/security/sandbox/win/src/
> > > sandboxbroker/sandboxBroker.cpp#920
> > > 
> > > The two things we want to add are ACG and win32k.sys lockdown.
> > > 
> > > win32k.sys lockdown is present in the GMP sandbox code, but you want to
> > > remove the test for sGmpWin32kDisable and just check IsWin10OrLater().
> > > 
> > > ACG doesn't have any example code; but I believe all you'll need to do is
> > > add 
> > > > mitigations |= sandbox::MITIGATION_DYNAMIC_CODE_DISABLE
> > 
> > Yes that all sounds good, thanks.
> > In addition, we should always be able to use USER_LOCKDOWN and lose all of
> > the rules after the gecko-crash-server-pipe one.
> > 
> > Is any shared memory being shared from the RDD process or always to it?
> > If it is aharing memory you will need at least one of the SUBSYS_HANDLES
> > "section" rules that the content process has.
> > 
> > I'm also guessing that we won't need the Output Protection Manager proxying,
> > so we can use FAKE_USER_GDI_INIT instead of IMPLEMENT_OPM_APIS for the
> > SUBSYS_WIN32K_LOCKDOWN rule.
> 
> I've pushed the Win sandbox code to review here:
> https://phabricator.services.mozilla.com/D13101
> 
> The RDD Win sandbox code is very closely based on GMP, but there are two
> issues that I need help with:
> 1. Calling SetJobLevel w/ sandbox::JOB_LOCKDOWN causes the sandbox to kill
> the RDD process.  Currently I'm using sandbox::JOB_RESTRICTED.
> 2. The SUBSYS_WIN32K_LOCKDOWN (w/ either IMPLEMENT_OPM_APIS or
> FAKE_USER_GDI_INIT) causes the sandbox to kill the RDD process.  Currently
> it is #if 0'd out, but included in the patch to show what I tried.
> 
> Any help you can provide on these to issues would be very much appreciated! 
> Once resolved, I'll push some cleanup to review for the final patch.

I'd suggest landing the windows sandbox code you have for 65 with these two features removed. My team can work on hardening further next year.
Happy to see that 

> mitigations |= sandbox::MITIGATION_DYNAMIC_CODE_DISABLE;

didn't cause general issues. That's the first time we've turned that on for anything. This is a fairly new Windows 10 feature, so make sure PI peeps know to test on the latest win10 revision.
A couple more comments - 

> pref("security.sandbox.rdd.level", 1);

Do we need this? We have levels in processes where we don't control the content they load for easy fall back. The RDD feels more understood and controlled by us. I'd suggest just hardcoding the sandbox on and running with it.

You can also enable - 

MITIGATION_IMAGE_LOAD_PREFER_SYS32

Which we have on for most everything now.
(In reply to Jim Mathies [:jimm] from comment #16)
> Happy to see that 
> 
> > mitigations |= sandbox::MITIGATION_DYNAMIC_CODE_DISABLE;
> 
> didn't cause general issues. That's the first time we've turned that on for
> anything. This is a fairly new Windows 10 feature, so make sure PI peeps
> know to test on the latest win10 revision.

Given this would be first use for this, I think I'd rather remove it and let that come with the additional hardening later.
Bob, I've removed the two features that were causing issues as Jim suggested and pushed the change up for final review.  Please let me know what you think.
Flags: needinfo?(bobowencode)
I'll reply here rather than in phabricator, so this conversation doesn't get split up.
(Just goes to demonstrate what we lose by not integrating comments ...)

(In reply to Jim Mathies [:jimm] from comment #15)
...
> I'd suggest landing the windows sandbox code you have for 65 with these two
> features removed. My team can work on hardening further next year.

I agree over win32k disable, but I think the fact that JOB_LOCKDOWN (i.e. JOB_OBJECT_LIMIT_DIE_ON_UNHANDLED_EXCEPTION) is causing a problem, probably needs resolving first.
It's quite possible that one of the other settings is causing something to fail and JOB_LOCKDOWN just makes that more obvious.

(In reply to Michael Froman [:mjf] from comment #18)
> (In reply to Jim Mathies [:jimm] from comment #16)
> > Happy to see that 
> > 
> > > mitigations |= sandbox::MITIGATION_DYNAMIC_CODE_DISABLE;
> > 
> > didn't cause general issues. That's the first time we've turned that on for
> > anything. This is a fairly new Windows 10 feature, so make sure PI peeps
> > know to test on the latest win10 revision.
> 
> Given this would be first use for this, I think I'd rather remove it and let
> that come with the additional hardening later.

MITIGATION_DYNAMIC_CODE_DISABLE is available on Win8.1 and later.
It's the per-thread opt-out that was added some time in Win10.
I really think we should turn this on if it doesn't seem to cause problems.

As Jim says MITIGATION_IMAGE_LOAD_PREFER_SYS32 is already on for the content process, so should be fine for this process.
As indeed should MITIGATION_IMAGE_LOAD_NO_REMOTE and MITIGATION_IMAGE_LOAD_NO_LOW_LABEL, if we're not running from a network drive:
https://searchfox.org/mozilla-central/rev/f2028b4c38bff2a50ed6aa1763f6dc5ee62b0cc4/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp#462

However, given that we're using USER_LOCKDOWN and INTEGRITY_LEVEL_UNTRUSTED, once we start the sandbox we're effectively blocking any DLL loading. Looks like your doing that pretty early on, so I think we can wait and add these image load mitigations later.

On another note I noticed that you did add the two rules for duplicating shared memory handles.
Is the RDD process sharing memory to other processes?
From a very quick look at the code it looked like the content process was sharing to the RDD process.
Flags: needinfo?(bobowencode) → needinfo?(mfroman)
(In reply to Bob Owen (:bobowen) from comment #20)
> I agree over win32k disable, but I think the fact that JOB_LOCKDOWN (i.e.
> JOB_OBJECT_LIMIT_DIE_ON_UNHANDLED_EXCEPTION) is causing a problem, probably
> needs resolving first.
> It's quite possible that one of the other settings is causing something to
> fail and JOB_LOCKDOWN just makes that more obvious.
My issue is getting this figured out prior to the 65 merge.  I don't think I can, and we need this in 65.  I'd more than willing to file a follow-up bug that we could then uplift.  Otherwise, I'll need help debugging this since win debugging experience is limited.


> MITIGATION_DYNAMIC_CODE_DISABLE is available on Win8.1 and later.
> It's the per-thread opt-out that was added some time in Win10.
> I really think we should turn this on if it doesn't seem to cause problems.
I'll add this back.

> As Jim says MITIGATION_IMAGE_LOAD_PREFER_SYS32 is already on for the content
> process, so should be fine for this process.
I'll add this and do a quick test to make sure things still work.

> On another note I noticed that you did add the two rules for duplicating
> shared memory handles.
> Is the RDD process sharing memory to other processes?
> From a very quick look at the code it looked like the content process was
> sharing to the RDD process.
RDD also shares memory back to content as decoded media.
Flags: needinfo?(mfroman)
(In reply to Michael Froman [:mjf] from comment #21)
> (In reply to Bob Owen (:bobowen) from comment #20)
> > I agree over win32k disable, but I think the fact that JOB_LOCKDOWN (i.e.
> > JOB_OBJECT_LIMIT_DIE_ON_UNHANDLED_EXCEPTION) is causing a problem, probably
> > needs resolving first.
> > It's quite possible that one of the other settings is causing something to
> > fail and JOB_LOCKDOWN just makes that more obvious.
> My issue is getting this figured out prior to the 65 merge.  I don't think I
> can, and we need this in 65.  I'd more than willing to file a follow-up bug
> that we could then uplift.  Otherwise, I'll need help debugging this since
> win debugging experience is limited.

OK let's do that and maybe we can look at this in Orlando.

> > MITIGATION_DYNAMIC_CODE_DISABLE is available on Win8.1 and later.
> > It's the per-thread opt-out that was added some time in Win10.
> > I really think we should turn this on if it doesn't seem to cause problems.
> I'll add this back.
> 
> > As Jim says MITIGATION_IMAGE_LOAD_PREFER_SYS32 is already on for the content
> > process, so should be fine for this process.
> I'll add this and do a quick test to make sure things still work.

Thanks on both of those.

> > On another note I noticed that you did add the two rules for duplicating
> > shared memory handles.
> > Is the RDD process sharing memory to other processes?
> > From a very quick look at the code it looked like the content process was
> > sharing to the RDD process.
> RDD also shares memory back to content as decoded media.

OK we should only need the HANDLES_DUP_ANY rule then not the HANDLES_DUP_BROKER one.
Pushed by mfroman@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e6fa8908256b
pt2 - Implement Win sandbox for RDD process. r=bobowen
Michael do you want to keep this open until bug 1506291 is fixed, or can we close this one (as the patches have landed)?
Do we have a follow up bug for flipping the pref for OSX?
Flags: needinfo?(mfroman)
I guess I can change the name of the bug and then close it since there is a separate bug for the linux sandbox.
Flags: needinfo?(mfroman)
Summary: Implement linux/win/osx sandboxing for new RDD process → Implement win/osx sandboxing for new RDD process
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Depends on: 1515826
Attachment #9017869 - Attachment is obsolete: true
Blocks: RDD
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: