Closed Bug 985252 Opened 6 years ago Closed 5 years ago

Sandbox Gecko Media Plugins (including OpenH264) for Windows

Categories

(Core :: Security, defect, P1)

x86_64
Windows 7
defect

Tracking

()

RESOLVED FIXED
mozilla33

People

(Reporter: mreavy, Assigned: TimAbraldes)

References

(Blocks 2 open bugs)

Details

(Keywords: sec-want, Whiteboard: [est:?, p=.25, s=fx33])

Attachments

(1 file, 12 obsolete files)

24.97 KB, patch
TimAbraldes
: review+
Details | Diff | Splinter Review
For security reasons we want to sandbox the OpenH264 plug (which runs in separate process so it should be straightforward).
Assignee: nobody → tabraldes
Tim and I had a meeting with Josh today and he asked that we wait a week before starting on this. He said things would be in a better build-able state for us to play with after a week.

He mentioned that it would be similar to NPAPI plugins though, which should be pretty close to the e10s content process stuff that we've been sandboxing.

Tim and I will work together in the meantime to play around with the sandbox and the sandbox settings.
Blocks: 925570
No longer blocks: 880004
Attached patch patch v1 (WIP) (obsolete) — Splinter Review
This patch sandboxes GMP plugins on Windows. It appears to successfully initialize the process and enter a message loop, but I noticed that our `RecvInitEncode`, `RecvInitDecode`, `Encode`, and `Decode` functions are not being called so that might indicate an issue that I need to track down.

I think it will make sense to break this patch into two patches, one for m-c (default) and one for the gmp branch. Below are the changes made in this patch organized by which location I think those changes should be applied.

m-c:
  This patch updates `SandboxBroker` to allow consumers to ask for certain levels of security. Specifically, this patch adds `SandboxBroker::SetSecurityLevelForContentProcess`, `SandboxBroker::SetSecurityLevelForPluginProcess`, and `SandboxBroker::SetSecurityLevelForIPDLUnitTestProcess`. Each of those functions sets up the `SandboxBroker` object so that a subsequent call to `SandboxBroker::LaunchApp` will launch the process with the permissions corresponding to the `SetSecurityLevelFor*` function  that was called.
  This patch updates `GeckoChildProcessHost` to call a `SandboxBroker::SetSecurityLevelFor*` function corresponding to the type of process being created.

gmp:
  This patch adds a `SandboxBroker::SetSecurityLevelForGMPlugin` function, calls that function from `GeckoChildProcessHost`, and updates `GMPChild` to call `mozilla::SandboxTarget::Instance()->StartSandbox()` when it is finished loading the specified GMP DLL. I would have liked to have `GeckoChildProcessHost` be responsible for calling `SandboxTarget::StartSandbox`. Specifically, I tried placing the call in `XRE_InitChildProcess` after it calls `process->Init()` but before it calls `uiMessageLoop.MessageLoop::Run()`. This actually seemed to work for GMPlugin processes and for regular plugin processes, but failed for content processes (bug 880808).
Attached patch patch v2 (WIP) (obsolete) — Splinter Review
With this patch applied:
  - The plugin process is created and initialized
  - The plugin process enters its message loop
  - `RecvInitEncode` and related functions are all called in the plugin process

A TODO item has been added to deal with the fact that `USER_INTERACTIVE` appears to be the lowest token we can use and still have the plugin function: Ideally we would use `USER_LOCKDOWN` instead.
Attachment #8411968 - Attachment is obsolete: true
Whiteboard: [est:?, p=.25, s=fx32]
Target Milestone: --- → mozilla32
Depends on: 1007971
Maire - I'm working on sandboxing the OpenH264 plugin on Windows. Do we also want to enable process sandboxing on other platforms? I don't have a linux or mac machine set up and from speaking with :kang it sounds like sandboxing might be a lot of additional work on other platforms.
Status: NEW → ASSIGNED
Flags: needinfo?(mreavy)
Hi Tim, We do need it for Linux and Mac.  Can you and/or :kang capture your concerns?  

Ekr -- I believe you've thought about this some for Linux.  Can you provide some guidance?

Thanks.
Flags: needinfo?(tabraldes)
Flags: needinfo?(mreavy)
Flags: needinfo?(gdestuynder)
Flags: needinfo?(ekr)
So, we certainly do need this for Mac and Linux.

My impression from a brief conversation with Sid was that at least on
Linux we could use the seccomp stuff and it would mostly be a matter of
setting the right policies. Mac may be more difficult. I suggest we
land the Windows stuff and then Maire and I can work with DougT and Sid to round
someone up to look at the other platforms in turn.
Flags: needinfo?(ekr)
I don't think any work has been done on implementing sandboxing on Mac yet.
On Linux if the plugin attempts to access file paths and the like, the filter may be a little complicated until we have a nicer framework for it, otherwise its pretty straight forward.
Flags: needinfo?(gdestuynder)
(In reply to Eric Rescorla (:ekr) from comment #6)
> So, we certainly do need this for Mac and Linux.
> 
> My impression from a brief conversation with Sid was that at least on
> Linux we could use the seccomp stuff and it would mostly be a matter of
> setting the right policies. Mac may be more difficult. I suggest we
> land the Windows stuff and then Maire and I can work with DougT and Sid to
> round
> someone up to look at the other platforms in turn.

Sounds good

(In reply to Guillaume Destuynder [:kang] (use NEEDINFO!) from comment #7)
> I don't think any work has been done on implementing sandboxing on Mac yet.
> On Linux if the plugin attempts to access file paths and the like, the
> filter may be a little complicated until we have a nicer framework for it,
> otherwise its pretty straight forward.

The plugin initially needs to access the library that it is going to be loading (in this case the openH264 library), but after that it shouldn't need to do any file access. We can make the call to `LowerToken`/`StartSandbox` after the library is loaded.
Flags: needinfo?(tabraldes)
Attached patch Patch v3 (obsolete) — Splinter Review
This patch:
  - Enables sandboxbroker to set different policies for different process types
  - Adds a "SetProcessSandbox" message to the GMP protocol so that the parent process can tell the child process to call `LowerToken` - this mimics the behavior of `ContentParent`/`ContentChild`
  - Causes the Chromium sandboxing code to be built on Windows regardless of whether MOZ_CONTENT_SANDBOX (--enable-content-sandbox) is set
  - Makes it so that GMPlugin processes are always sandboxed (this bug), regular plugin processes are never sandboxed (bug 934476), and content processes are sandboxed based on whether MOZ_CONTENT_SANDBOX (--enable-content-sandbox) is set
Attachment #8415722 - Attachment is obsolete: true
Attachment #8422160 - Flags: review?(netzen)
Attached patch Patch v4 (obsolete) — Splinter Review
Same patch, updated to apply over bug 957928 attachment 8421228 [details] [diff] [review] ("fix v1.9")
Attachment #8422160 - Attachment is obsolete: true
Attachment #8422160 - Flags: review?(netzen)
Attachment #8422663 - Flags: review?(netzen)
Attached patch Patch v5 (obsolete) — Splinter Review
Same patch, rebased onto bug 957928 attachment 8423200 [details] [diff] [review] ("fix v2.0")

This ran through try:
  https://tbpl.mozilla.org/?tree=Try&rev=f67326e0d5c2

I'll investigate the "tests should have failed" issue
Attachment #8422663 - Attachment is obsolete: true
Attachment #8422663 - Flags: review?(netzen)
Attachment #8423340 - Flags: review?(netzen)
Comment on attachment 8423340 [details] [diff] [review]
Patch v5

Review of attachment 8423340 [details] [diff] [review]:
-----------------------------------------------------------------

Looking very good.
Do you know the installer size increase? I think that should be checked before giving an r+.
Also I think we should post about enabling building all of this on dev-platform as a heads up.

::: ipc/glue/GeckoChildProcessHost.cpp
@@ +778,5 @@
>  
> +#if defined(XP_WIN)
> +  bool shouldSandboxCurrentProcess = false;
> +  switch (mProcessType) {
> +    case GeckoProcessType_Content:

I think this whole case GeckoProcessType_Content should be behind the MOZ_CONTENT_SANDBOX define.

::: security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp
@@ +110,5 @@
> +  //       creation to fail. Perhaps we're calling `StartSandbox` too early?
> +  mPolicy->SetTokenLevel(sandbox::USER_RESTRICTED_SAME_ACCESS,
> +                         sandbox::USER_RESTRICTED_SAME_ACCESS);
> +  mPolicy->SetDelayedIntegrityLevel(sandbox::INTEGRITY_LEVEL_UNTRUSTED);
> +  mPolicy->SetAlternateDesktop(true);

We may want to set even more things to restrict, but we can tweak that later.
Attachment #8423340 - Flags: review?(netzen) → feedback+
I opened Bug 1012949 and Bug 1012951 for Mac and Linux, respectively, since all the Windows work has been done here.  I'll copy Comment 13 into the new Mac bug.
Summary: Sandbox the OpenH264 plugin → Sandbox the OpenH264 plugin for Windows
Does the OpenH264 plugin run in a separate process on Windows?  Does it have its own "app" (like plugin-container.exe)?
Updating the title and dependencies of this bug to reflect the fact that we're actually working on sandboxing "Gecko Media Plugins" which includes the OpenH264 plugin, but could potentially include many others as well.

(In reply to Steven Michaud from comment #15)
> Does the OpenH264 plugin run in a separate process on Windows?  Does it have
> its own "app" (like plugin-container.exe)?

Yes, GMP plugins run in a separate process. There is a dedicated GMP thread that selects a particular GMP plugin from the ones it knows about, then launches a child process with command-line args telling it which GMP plugin to load. The process that is launched is plugin-container.exe. The child process is responsible for loading the plugin - a dynamic library that implements an interface specified by GMP - and calling the appropriate GMP functions in response to IPC from the parent.

Aside: I know what the 'P' in "GMP" stands for but I'm still going to say "GMP plugin" for the same reason that I say "ATM machine" :)
Depends on: 957928
Summary: Sandbox the OpenH264 plugin for Windows → Sandbox Gecko Media Plugins (including OpenH264) for Windows
It would be good to have a pref that can disable the sandbox to easily determine whether the reason why code in the plugin is failing is because of the sandbox.
(In reply to Chris Pearce (:cpearce) from comment #17)
> It would be good to have a pref that can disable the sandbox to easily
> determine whether the reason why code in the plugin is failing is because of
> the sandbox.

I'm kind of opposed to a pref because users can be tricked into enabling/disabling prefs. We have a build flag "--enable-content-sandbox" that controls whether content processes are sandboxed. How would you feel about adding a similar build flag like "--disable-gmp-sandbox"?
Totally understanable. How about we only add the pref to toggle the sandbox if --disable-gmp-sandbox is set?
(In reply to Chris Pearce (:cpearce) from comment #19)
> Totally understanable. How about we only add the pref to toggle the sandbox
> if --disable-gmp-sandbox is set?

Sounds good to me, but then we should call it something more like "--enable-gmp-sandbox-pref"
Depends on: 1009452
Depends on: 1014002
Keywords: sec-want
(In reply to Brian R. Bondy [:bbondy] from comment #12)
> Looking very good.

Thanks!

> Do you know the installer size increase? I think that should be checked
> before giving an r+.

Comparing the size of the installer generated in my Try build against the installer created by other Try builds, it looks like the installer size increase is about 61K or 0.18%

> Also I think we should post about enabling building all of this on
> dev-platform as a heads up.

I'm writing this up now; I'll link it here when I've posted.

> ::: ipc/glue/GeckoChildProcessHost.cpp
> @@ +778,5 @@
> >  
> > +#if defined(XP_WIN)
> > +  bool shouldSandboxCurrentProcess = false;
> > +  switch (mProcessType) {
> > +    case GeckoProcessType_Content:
> 
> I think this whole case GeckoProcessType_Content should be behind the
> MOZ_CONTENT_SANDBOX define.

I've made this change locally and will include it in the next patch I post here.

> ::: security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp
> @@ +110,5 @@
> > +  //       creation to fail. Perhaps we're calling `StartSandbox` too early?
> > +  mPolicy->SetTokenLevel(sandbox::USER_RESTRICTED_SAME_ACCESS,
> > +                         sandbox::USER_RESTRICTED_SAME_ACCESS);
> > +  mPolicy->SetDelayedIntegrityLevel(sandbox::INTEGRITY_LEVEL_UNTRUSTED);
> > +  mPolicy->SetAlternateDesktop(true);
> 
> We may want to set even more things to restrict, but we can tweak that later.

Indeed. I'll file followup bugs blocking bug 1011491 for individual policy changes we want to make to the gmp plugin process. We can ratchet down the policy once we have the initial code landed on m-c.
> > Also I think we should post about enabling building all of this on
> > dev-platform as a heads up.
> 
> I'm writing this up now; I'll link it here when I've posted.

https://groups.google.com/forum/#!topic/mozilla.dev.platform/t68-2VIR5ck
Attached patch Patch v6 (obsolete) — Splinter Review
This patch addresses the review comments from patch 5 and additionally addresses the fact that we'd like to be able to disable sandboxing of GMP processes at runtime (comment 17 and subsequent discussion).

This patch causes `GeckoChildProcessHost` to check for an environment variable `MOZ_DISABLE_GMP_SANDBOX`. If the environment variable is present, then GMP plugins will not be sandboxed. This mimics the behavior of the `MOZ_DISABLE_CONTENT_SANDBOX` environment variable.

This is currently running through try:
  https://tbpl.mozilla.org/?tree=Try&rev=73454b0a0be5

I'll investigate any issues that come up from the try run.
Attachment #8423340 - Attachment is obsolete: true
Attachment #8430344 - Flags: review?(netzen)
Comment on attachment 8430344 [details] [diff] [review]
Patch v6

Review of attachment 8430344 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp
@@ -59,5 @@
> -  mPolicy->SetAlternateDesktop(false);
> -
> -  // Set stdout and stderr, to allow inheritance for logging.
> -  mPolicy->SetStdoutHandle(::GetStdHandle(STD_OUTPUT_HANDLE));
> -  mPolicy->SetStderrHandle(::GetStdHandle(STD_ERROR_HANDLE));

These lines will need to be copied into any of the SetSecurityLevelFor* functions where we want the processes to have their output redirected to log files on tinderbox.
Attached patch Patch v7 (obsolete) — Splinter Review
(In reply to Bob Owen (:bobowen) from comment #24)
> Comment on attachment 8430344 [details] [diff] [review]
> Patch v6
> 
> Review of attachment 8430344 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp
> @@ -59,5 @@
> > -  mPolicy->SetAlternateDesktop(false);
> > -
> > -  // Set stdout and stderr, to allow inheritance for logging.
> > -  mPolicy->SetStdoutHandle(::GetStdHandle(STD_OUTPUT_HANDLE));
> > -  mPolicy->SetStderrHandle(::GetStdHandle(STD_ERROR_HANDLE));
> 
> These lines will need to be copied into any of the SetSecurityLevelFor*
> functions where we want the processes to have their output redirected to log
> files on tinderbox.

Apologies for removing those lines! I added it back in `LaunchApp`. If we eventually want to have different stdout/stderr for each type of process we can move those lines into the various `SetSecurityLevelFor*Process` functions.

Hopefully the logging issues weren't obscuring any real problems but I'm running this new version through Try just to be sure:
  https://tbpl.mozilla.org/?tree=Try&rev=7f36d93a075a
Attachment #8430344 - Attachment is obsolete: true
Attachment #8430344 - Flags: review?(netzen)
Attachment #8430953 - Flags: review?(netzen)
Comment on attachment 8430953 [details] [diff] [review]
Patch v7

Review of attachment 8430953 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for posting to dev.platform and for taking this work on!

The patch looks great to me.
You may want to seek out a super review though because these changes have a big effect.
Also because there are various files in here that I'm not supposed to review.
Attachment #8430953 - Flags: review?(netzen) → review+
Bob Owen mentioned that bug 1014002 might be an issue with this patch and it does indeed seem to manifest itself in debug builds:
  https://tbpl.mozilla.org/?tree=Try&rev=80292c145faa

I'll see if I can help Bob track down the issues in that bug (see other bug for investigation/discussion). We can't land the patch in this bug until bug 1014002 is resolved (though we could land the patch for bug 1014002 as part of the patch for this bug)
Whiteboard: [est:?, p=.25, s=fx32] → [est:?, p=.25, s=fx33]
Target Milestone: mozilla32 → mozilla33
Comment on attachment 8430953 [details] [diff] [review]
Patch v7

I think bug 1014002 will be out of the way pretty soon, so I'm going to move forward with asking for reviews for this patch.

Trying to minimize the number of reviewers: I think we can get by with just 2!

bbondy (already r+):
 * browser/ (including browser/installer/windows)
 * toolkit/
 * toolkit/library

bent:
 * content/media/gmp
 * dom/ipc
 * ipc/app
 * ipc/glue

unknown:
 * security/sandbox/win

bbondy - are you comfortable giving review for the security/sandbox/win piece? The modules page doesn't mention any owners or peers for that part of the tree.
Attachment #8430953 - Flags: review?(bent.mozilla)
Priority: -- → P1
:bent - review ping
Comment on attachment 8430953 [details] [diff] [review]
Patch v7

Review of attachment 8430953 [details] [diff] [review]:
-----------------------------------------------------------------

This looks mostly fine but I'd like to see the changes you make here:

::: content/media/gmp/GMPParent.cpp
@@ +75,5 @@
>      return NS_ERROR_FAILURE;
>    }
>  
> +  if (!SendSetProcessSandbox()) {
> +    UnloadProcess();

Have you tested this? Please do, because process shutdown is complicated and usually error-prone.

Here, UnloadProcess() will do absolutely nothing since mState is still GMPStateNotLoaded.

I think you might actually need to do something similar to ContentParent::KillHard() if this call fails. Closing the pipe doesn't guarantee that the child will commit suicide.

::: ipc/glue/GeckoChildProcessHost.cpp
@@ +808,5 @@
> +      }
> +      break;
> +    case GeckoProcessType_Default:
> +    default:
> +      NS_ASSERTION(false, "Bad process type in GeckoChildProcessHost");

This should probably just be MOZ_CRASH().

::: security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp
@@ +16,5 @@
>  
>  SandboxBroker::SandboxBroker()
>  {
> +  // XXX: This is not thread-safe! Two threads could simultaneously try
> +  // to set `sBrokerService`

Is there a bug filed for this?

@@ +110,5 @@
> +
> +  mPolicy->SetJobLevel(sandbox::JOB_RESTRICTED, 0);
> +  // TODO: We should be able to set the delayed token
> +  //       to `sandbox::USER_LOCKDOWN`, but somehow that causes channel
> +  //       creation to fail. Perhaps we're calling `StartSandbox` too early?

Is there a bug filed for this?
Attachment #8430953 - Flags: review?(bent.mozilla) → review-
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #30)
> Comment on attachment 8430953 [details] [diff] [review]
> Patch v7
> 
> Review of attachment 8430953 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This looks mostly fine but I'd like to see the changes you make here:
> 
> ::: content/media/gmp/GMPParent.cpp
> @@ +75,5 @@
> >      return NS_ERROR_FAILURE;
> >    }
> >  
> > +  if (!SendSetProcessSandbox()) {
> > +    UnloadProcess();
> 
> Have you tested this? Please do, because process shutdown is complicated and
> usually error-prone.

I have not; good catch!

> Here, UnloadProcess() will do absolutely nothing since mState is still
> GMPStateNotLoaded.
>
> I think you might actually need to do something similar to
> ContentParent::KillHard() if this call fails. Closing the pipe doesn't
> guarantee that the child will commit suicide.

I've moved this hunk from where it is to after where we set `mState`, which should at least alleviate this issue. I'm going to test how well this works and take a look at `ContentParent::KillHard` as well.
 
> ::: ipc/glue/GeckoChildProcessHost.cpp
> @@ +808,5 @@
> > +      }
> > +      break;
> > +    case GeckoProcessType_Default:
> > +    default:
> > +      NS_ASSERTION(false, "Bad process type in GeckoChildProcessHost");
> 
> This should probably just be MOZ_CRASH().

Fixed locally. Will post updated patch.

> ::: security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp
> @@ +16,5 @@
> >  
> >  SandboxBroker::SandboxBroker()
> >  {
> > +  // XXX: This is not thread-safe! Two threads could simultaneously try
> > +  // to set `sBrokerService`
> 
> Is there a bug filed for this?

Filed bug 1027916.

> @@ +110,5 @@
> > +
> > +  mPolicy->SetJobLevel(sandbox::JOB_RESTRICTED, 0);
> > +  // TODO: We should be able to set the delayed token
> > +  //       to `sandbox::USER_LOCKDOWN`, but somehow that causes channel
> > +  //       creation to fail. Perhaps we're calling `StartSandbox` too early?
> 
> Is there a bug filed for this?

Filed bug 1027906 and removed this comment locally.
Attached patch Patch v8 (obsolete) — Splinter Review
I've tested all the race/hang/termination conditions I can think of, and I think I've finally got `GMPParent` in a place where I feel faintly confident that it won't leave a sub-process lying around or try to access objects after they've been deleted.

I've tried to comment the areas that seemed tricky or unintuitive.
Attachment #8430953 - Attachment is obsolete: true
Attachment #8450722 - Flags: review?(bent.mozilla)
Comment on attachment 8450722 [details] [diff] [review]
Patch v8

Review of attachment 8450722 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/gmp/GMPParent.cpp
@@ +12,5 @@
>  #include "nsCharSeparatedTokenizer.h"
>  #include "nsThreadUtils.h"
>  #include "nsIRunnable.h"
>  #include "mozIGeckoMediaPluginService.h"
> +#include "chrome/common/process_watcher.h"

This is unused, I'll remove it in a subsequent patch

@@ +97,5 @@
> +
> +  if (!SendSetProcessSandbox()) {
> +    ipc::MessageChannel* channel = GetIPCChannel();
> +    if (channel) {
> +      channel->CloseWithError();

Note: I'm using `CloseWithError` here because the implementation of `ContentParent` seems to care whether we close the channel with an error or not. I'm not totally clear as to why that is; if we don't need to do this I'll happily remove the extra lines of code that this creates

@@ +244,5 @@
>    nsRefPtr<GMPVideoDecoderParent> vdp = static_cast<GMPVideoDecoderParent*>(pvdp);
>    mVideoDecoders.AppendElement(vdp);
>    vdp.forget(aGMPVD);
>  
> +  mKungFuDeathGrip = this;

I'll add a comment saying the following:
"When this encoder/decoder is destroyed, it will result in a call to `GMPParent::VideoDecoderDestroyed` which tries to modify member variables. Thus, it is not safe to destroy this `GMPParent` until all of its encoders/decoders have been destroyed"

@@ +266,5 @@
>    nsRefPtr<GMPVideoEncoderParent> vep = static_cast<GMPVideoEncoderParent*>(pvep);
>    mVideoEncoders.AppendElement(vep);
>    vep.forget(aGMPVE);
>  
> +  mKungFuDeathGrip = this;

I'll add the same comment as above

@@ +282,5 @@
> +  // If the child process gets terminated, the encoders and decoders will be destroyed, but before the call to `MaybeUnload`, we will enter this block
> +  mState = GMPStateNotLoaded;
> +  ipc::MessageChannel* channel = GetIPCChannel();
> +  if (channel) {
> +    channel->CloseWithError();

Same `CloseWithError` vs `Close` question from above

@@ +297,5 @@
> +  MessageLoop::current()->PostDelayedTask(
> +     FROM_HERE,
> +     NewRunnableFunction(KillGMPProcess,
> +                         mProcess),
> +     5000);

I'm using 5s here. Maybe this should come from a pref?

::: ipc/app/plugin-container.exe.manifest
@@ -29,5 @@
> -  <ms_asmv3:application xmlns:ms_asmv3="urn:schemas-microsoft-com:asm.v3">
> -    <ms_asmv3:windowsSettings xmlns="http://schemas.microsoft.com/SMI/2005/WindowsSettings">
> -      <dpiAware>true</dpiAware>
> -    </ms_asmv3:windowsSettings>
> -  </ms_asmv3:application>

This is from some experimentation I was doing to ratchet down our process permissions. I'll revert this change before landing.
Attached patch Patch v9 (obsolete) — Splinter Review
Updated to address the comments I made in the previous patch, and to apply cleanly to trunk. Also I resolved a build failure in debug builds.

Still to do:
  1) Track down why some compilers/OSes fail to use the `RunnableMethodTraits` defined in GMPProcessParent.h - causing issues like these [1]
  2) Decide whether 5s is a valid timeout for killing child processes or whether that value should be configurable or something (this could be a follow-up bug)
  3) Determine whether calling `CloseWithError` on the channel actually matters vs just calling `Close` (this could also be addressed in a follow-up)

There's some urgency in trying to get this landed before the next uplift, so I'm adding a couple more reviewers to see if someone can get to it:
  - :bent can review everything, as mentioned in comment 28
  - :cjones can review (or delegate) the IPC pieces
  - :bz can review (or delegate) the content and dom pieces
Also mentioned in comment 28 are which pieces already have review from :bbondy

[1] https://tbpl.mozilla.org/?tree=Try&rev=45eb9748b6a2
Attachment #8450722 - Attachment is obsolete: true
Attachment #8450722 - Flags: review?(bent.mozilla)
Attachment #8454620 - Flags: review?(cjones.bugs)
Attachment #8454620 - Flags: review?(bzbarsky)
Attachment #8454620 - Flags: review?(bent.mozilla)
Duplicate of this bug: 1007971
Finally getting around to updating the bug component to security (which is more accurate).
Component: WebRTC → Security
Comment on attachment 8454620 [details] [diff] [review]
Patch v9

Looks like Chris is the one who's mostly been involved in the GMPChild/Parent code.
Attachment #8454620 - Flags: review?(bzbarsky) → review?(cpearce)
I'm lead to believe bent is about to go out on PTO, so you might want to ask someone else instead of him, like nical maybe?
(In reply to Chris Pearce (:cpearce) from comment #38)
> I'm lead to believe bent is about to go out on PTO, so you might want to ask
> someone else instead of him, like nical maybe?

The people I asked for review in addition to :bent are in case he can't get to this review. I don't know :nical but from looking him up it seems like he's a graphics person. Which pieces should I ask him to review?
Comment on attachment 8454620 [details] [diff] [review]
Patch v9

Review of attachment 8454620 [details] [diff] [review]:
-----------------------------------------------------------------

nical can review IPC related things too I've been led to believe.

I'm only qualified to r+ stuff in content/media here, and not really qualified to r+ the stuff doing IPC in content/media for that matter.

::: content/media/gmp/GMPParent.h
@@ +43,5 @@
>    GMPParent();
>  
>    nsresult Init(nsIFile* aPluginDir);
>    nsresult LoadProcess();
> +  void CleanupEncodersAndDecoders();

I think this should be called CleanupActors() or CleanupAPIObjects(), as soon there will be things other than encoders and decoders here, like decryptors, storage services, and others.
Attachment #8454620 - Flags: review?(cpearce) → review+
(In reply to Chris Pearce (:cpearce) from comment #40)
> Comment on attachment 8454620 [details] [diff] [review]
> Patch v9
> 
> Review of attachment 8454620 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> nical can review IPC related things too I've been led to believe.
> 
> I'm only qualified to r+ stuff in content/media here, and not really
> qualified to r+ the stuff doing IPC in content/media for that matter.
> 
> ::: content/media/gmp/GMPParent.h
> @@ +43,5 @@
> >    GMPParent();
> >  
> >    nsresult Init(nsIFile* aPluginDir);
> >    nsresult LoadProcess();
> > +  void CleanupEncodersAndDecoders();
> 
> I think this should be called CleanupActors() or CleanupAPIObjects(), as
> soon there will be things other than encoders and decoders here, like
> decryptors, storage services, and others.

OK I'll change it to `CleanupActors` unless there are objections. Thanks!
Comment on attachment 8454620 [details] [diff] [review]
Patch v9

Review of attachment 8454620 [details] [diff] [review]:
-----------------------------------------------------------------

I rebased my EME GMP implementation on top of your patch, and it seems that the sandbox is turned on *after* GMPChild::Init() is called. GMPChild::Init() calls GMPChild::LoadPluginLibrary() which loads the DLLs (which can the arbitrary code in the DLL's DllMain() right?) but also GMPChild::LoadPluginLibrary() calls the GMPInit() and GMPGetAPI() functions of the plugin, with full privileges. This is bad!

We should initiate the calls to GMPInit() and GMPGetAPI() from the parent after the sandbox has been turned on. I moved them into GMPChild::RecvSetProcessSandbox() after the StartSandbox() call, and it seemed to work. Please make that (or an equivalent) change.

Once I moved the GMP function calls to after when the sandbox was engaged however, the DllMain() was run *after* the StartSandbox() call too, by Windows on a non-main thread, so I'm not sure what's going on there. Maybe it's timing dependent? Or is this expected? If you want to test this, ping me and I'll send you my EME/GMP patches.

Also, what exactly does this restrict? With the sandbox turned on my GMP was able to successfully start up Windows Media Foundation and do software H.264 decoding, including successfully calling LoadLibraryA("msmpeg2vdec.dll"). Is it expected that we can load libraries inside the sandbox? I was unable to open a file handle after the sandbox was started (bravo!!).
(In reply to Chris Pearce (:cpearce) from comment #42)
> Comment on attachment 8454620 [details] [diff] [review]
> Patch v9
> 
> Review of attachment 8454620 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I rebased my EME GMP implementation on top of your patch, and it seems that
> the sandbox is turned on *after* GMPChild::Init() is called.
> GMPChild::Init() calls GMPChild::LoadPluginLibrary() which loads the DLLs
> (which can the arbitrary code in the DLL's DllMain() right?)

Right!

> but also
> GMPChild::LoadPluginLibrary() calls the GMPInit() and GMPGetAPI() functions
> of the plugin, with full privileges. This is bad!

If we're dealing with a malicious plugin, then this is definitely bad. Are we trying to protect users from malicious plugins though? I would expect that we trust the plugin until it starts processing data, at which point we stop trusting it because it could have been hijacked by some malicious input.

> We should initiate the calls to GMPInit() and GMPGetAPI() from the parent
> after the sandbox has been turned on. I moved them into
> GMPChild::RecvSetProcessSandbox() after the StartSandbox() call, and it
> seemed to work. Please make that (or an equivalent) change.

Can do! Let me know what your thoughts are on the above point.
 
> Once I moved the GMP function calls to after when the sandbox was engaged
> however, the DllMain() was run *after* the StartSandbox() call too, by
> Windows on a non-main thread, so I'm not sure what's going on there. Maybe
> it's timing dependent? Or is this expected? If you want to test this, ping
> me and I'll send you my EME/GMP patches.

From [1]:
 "[...] after a call to LoadLibrary, the system scans the list of loaded DLLs for the process. For each DLL that has not already been called with the DLL_PROCESS_ATTACH value, the system calls the DLL's entry-point function. This call is made in the context of the thread that caused the process address space to change, such as [...] the thread that called LoadLibrary."

It sounds like `GMPInit` was called on a non-main thread. `GMPInit` called `LoadLibrary` which caused `DllMain` to be called on the same non-main thread. Does this description match the behavior you saw?

> Also, what exactly does this restrict? With the sandbox turned on my GMP was
> able to successfully start up Windows Media Foundation and do software H.264
> decoding, including successfully calling LoadLibraryA("msmpeg2vdec.dll"). Is
> it expected that we can load libraries inside the sandbox? I was unable to
> open a file handle after the sandbox was started (bravo!!).

As you noticed, the sandbox isn't super effective yet. Bug 1011491 exists to track our ratcheting down of permissions after landing the initial implementation.

[1] http://msdn.microsoft.com/en-us/library/windows/desktop/ms682583%28v=vs.85%29.aspx
(In reply to Tim Abraldes [:TimAbraldes] [:tabraldes] from comment #43)
> > but also
> > GMPChild::LoadPluginLibrary() calls the GMPInit() and GMPGetAPI() functions
> > of the plugin, with full privileges. This is bad!
> 
> If we're dealing with a malicious plugin, then this is definitely bad. Are
> we trying to protect users from malicious plugins though?

I wouldn't go so far as calling EME plugins "malicious", but we gone to great lengths with the design of our EME implementation to ensure user's privacy (that is, their identity) is protected. Since EME plugins are expected to be closed source, in order to ensure there's no scope for shenanigans, and to ensure our claims are credible, the sandbox must be engaged before *any* code in the CDM can run.


> > We should initiate the calls to GMPInit() and GMPGetAPI() from the parent
> > after the sandbox has been turned on. I moved them into
> > GMPChild::RecvSetProcessSandbox() after the StartSandbox() call, and it
> > seemed to work. Please make that (or an equivalent) change.
> 
> Can do! Let me know what your thoughts are on the above point.

The sandbox must be engaged before *any* code in GMPs can run.



>  
> > Once I moved the GMP function calls to after when the sandbox was engaged
> > however, the DllMain() was run *after* the StartSandbox() call too, by
> > Windows on a non-main thread, so I'm not sure what's going on there. Maybe
> > it's timing dependent? Or is this expected? If you want to test this, ping
> > me and I'll send you my EME/GMP patches.
> 
> From [1]:
>  "[...] after a call to LoadLibrary, the system scans the list of loaded
> DLLs for the process. For each DLL that has not already been called with the
> DLL_PROCESS_ATTACH value, the system calls the DLL's entry-point function.
> This call is made in the context of the thread that caused the process
> address space to change, such as [...] the thread that called LoadLibrary."
> 
> It sounds like `GMPInit` was called on a non-main thread. `GMPInit` called
> `LoadLibrary` which caused `DllMain` to be called on the same non-main
> thread. Does this description match the behavior you saw?

What I saw didn't match that. DLLMain() was called on a non main thread, while I was stepping through GMPInit() on the main thread.
Comment on attachment 8454620 [details] [diff] [review]
Patch v9

Review of attachment 8454620 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/gmp/GMPParent.cpp
@@ +22,5 @@
> +  // that its pointer has its `GMPProcessParent::Delete` method
> +  // called. Note that this function can be called after all
> +  // `GMPParent`s have been deleted.
> +  void
> +  KillGMPProcess(GMPProcessParent* aProcess)

This code requires a RunnableMethodTraits<GMPProcessParent> specialization to compile. However the RunnableMethodTraits<GMPProcessParent> specialization is defined in GMPProcessParent.cpp, so it's sometimes visible in unified builds, but not always.

So this code can fail to compile in unified builds, and will fail in non unified builds.

So should you should move the RunnableMethodTraits<GMPProcessParent> definition out of GMPProcessParent.cpp and into GMPProcessParent.h.
(In reply to Chris Pearce (:cpearce) from comment #45)
> Comment on attachment 8454620 [details] [diff] [review]
> Patch v9
> 
> Review of attachment 8454620 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: content/media/gmp/GMPParent.cpp
> @@ +22,5 @@
> > +  // that its pointer has its `GMPProcessParent::Delete` method
> > +  // called. Note that this function can be called after all
> > +  // `GMPParent`s have been deleted.
> > +  void
> > +  KillGMPProcess(GMPProcessParent* aProcess)
> 
> This code requires a RunnableMethodTraits<GMPProcessParent> specialization
> to compile. However the RunnableMethodTraits<GMPProcessParent>
> specialization is defined in GMPProcessParent.cpp, so it's sometimes visible
> in unified builds, but not always.
> 
> So this code can fail to compile in unified builds, and will fail in non
> unified builds.
> 
> So should you should move the RunnableMethodTraits<GMPProcessParent>
> definition out of GMPProcessParent.cpp and into GMPProcessParent.h.

That solves problem 1 from comment 34, thanks!


(In reply to Chris Pearce (:cpearce) from comment #44)
> (In reply to Tim Abraldes [:TimAbraldes] [:tabraldes] from comment #43)
> > > but also
> > > GMPChild::LoadPluginLibrary() calls the GMPInit() and GMPGetAPI() functions
> > > of the plugin, with full privileges. This is bad!
> > 
> > If we're dealing with a malicious plugin, then this is definitely bad. Are
> > we trying to protect users from malicious plugins though?
> 
> I wouldn't go so far as calling EME plugins "malicious", but we gone to
> great lengths with the design of our EME implementation to ensure user's
> privacy (that is, their identity) is protected. Since EME plugins are
> expected to be closed source, in order to ensure there's no scope for
> shenanigans, and to ensure our claims are credible, the sandbox must be
> engaged before *any* code in the CDM can run.

Fair enough! This makes sense in the context of EME
 
> > > We should initiate the calls to GMPInit() and GMPGetAPI() from the parent
> > > after the sandbox has been turned on. I moved them into
> > > GMPChild::RecvSetProcessSandbox() after the StartSandbox() call, and it
> > > seemed to work. Please make that (or an equivalent) change.
> > 
> > Can do! Let me know what your thoughts are on the above point.
> 
> The sandbox must be engaged before *any* code in GMPs can run.

I'll update the patch

> > > Once I moved the GMP function calls to after when the sandbox was engaged
> > > however, the DllMain() was run *after* the StartSandbox() call too, by
> > > Windows on a non-main thread, so I'm not sure what's going on there. Maybe
> > > it's timing dependent? Or is this expected? If you want to test this, ping
> > > me and I'll send you my EME/GMP patches.
> > 
> > From [1]:
> >  "[...] after a call to LoadLibrary, the system scans the list of loaded
> > DLLs for the process. For each DLL that has not already been called with the
> > DLL_PROCESS_ATTACH value, the system calls the DLL's entry-point function.
> > This call is made in the context of the thread that caused the process
> > address space to change, such as [...] the thread that called LoadLibrary."
> > 
> > It sounds like `GMPInit` was called on a non-main thread. `GMPInit` called
> > `LoadLibrary` which caused `DllMain` to be called on the same non-main
> > thread. Does this description match the behavior you saw?
> 
> What I saw didn't match that. DLLMain() was called on a non main thread,
> while I was stepping through GMPInit() on the main thread.

OK I'll see if I can track down locally what's going on here. I'll ping you if I can't repro with my test case.
Attached patch patch v10 (obsolete) — Splinter Review
This patch:
  1) moves the `RunnableMethodTraits<GMPProcessParent>` specialization from GMPProcessParent.cpp to GMPProcessParent.h
  2) removes `GMPChild::RecvSetProcessSandbox` and moves sandbox initialization to `GMPChild::Init` ahead of the call to `GMPChild::LoadPluginLibrary`

Only the ipc/* pieces still require review so I'm requesting review from either :cjones or :nical - feel free to delegate!

> > > > Once I moved the GMP function calls to after when the sandbox was engaged
> > > > however, the DllMain() was run *after* the StartSandbox() call too, by
> > > > Windows on a non-main thread, so I'm not sure what's going on there. Maybe
> > > > it's timing dependent? Or is this expected? If you want to test this, ping
> > > > me and I'll send you my EME/GMP patches.
> > > 
> > > From [1]:
> > >  "[...] after a call to LoadLibrary, the system scans the list of loaded
> > > DLLs for the process. For each DLL that has not already been called with the
> > > DLL_PROCESS_ATTACH value, the system calls the DLL's entry-point function.
> > > This call is made in the context of the thread that caused the process
> > > address space to change, such as [...] the thread that called LoadLibrary."
> > > 
> > > It sounds like `GMPInit` was called on a non-main thread. `GMPInit` called
> > > `LoadLibrary` which caused `DllMain` to be called on the same non-main
> > > thread. Does this description match the behavior you saw?
> > 
> > What I saw didn't match that. DLLMain() was called on a non main thread,
> > while I was stepping through GMPInit() on the main thread.
> 
> OK I'll see if I can track down locally what's going on here. I'll ping you
> if I can't repro with my test case.

With the attached patch, I saw that the `DllMain` of the openh264 plugin ran as expected on the main thread of plugin-container.exe. Here's the stack I saw:

00 openh264!DllMain(void * hDllHandle = 0x618e0000, unsigned long dwReason = 1, void * lpreserved = 0x00000000)
01 openh264!__DllMainCRTStartup(void * hDllHandle = 0x618e0000, unsigned long dwReason = 1, void * lpreserved = 0x00000000)+0x6c
02 openh264!_DllMainCRTStartup(void * hDllHandle = 0x618e0000, unsigned long dwReason = 1, void * lpreserved = 0x00000000)+0x1e
03 ntdll!LdrxCallInitRoutine+0x16
04 ntdll!LdrpCallInitRoutine+0x43
05 ntdll!LdrpInitializeNode+0x11d
06 ntdll!LdrpInitializeGraph+0x59
07 ntdll!LdrpPrepareModuleForExecution+0x121
08 ntdll!LdrpLoadDll+0x392
09 ntdll!LdrLoadDll+0x67
0a KERNELBASE!LoadLibraryExW+0xc2
0b nss3!pr_LoadLibraryByPathname(char * name = 0x00fdf434 "C:\src\gmp-openh264\openh264.dll", int flags = 0n1)+0x1e9
0c nss3!PR_LoadLibraryWithFlags(struct PRLibSpec libSpec = struct PRLibSpec, int flags = 0n1)+0x32
0d nss3!PR_LoadLibrary(char * name = 0x00fdf434 "C:\src\gmp-openh264\openh264.dll")+0x30
0e xul!mozilla::gmp::GMPChild::LoadPluginLibrary(class std::basic_string<char,std::char_traits<char>,std::allocator<char> > * aPluginPath = 0x00fdf4b4)+0x21d
0f xul!mozilla::gmp::GMPChild::Init(class std::basic_string<char,std::char_traits<char>,std::allocator<char> > * aPluginPath = 0x00fdf4b4, void * aParentProcessHandle = 0x000001e0, class MessageLoop * aIOLoop = 0x018ffb60, class IPC::Channel * aChannel = 0x01702190)+0x21
10 xul!mozilla::gmp::GMPProcessChild::Init(void)+0x80
11 xul!XRE_InitChildProcess(int aArgc = 0n6, char ** aArgv = 0x0170e700, GeckoProcessType aProcess = GeckoProcessType_GMPlugin (0n4))+0x659
12 plugin_container!NS_internal_main(int argc = 0n8, char ** argv = 0x0170e700)+0x147
13 plugin_container!wmain(int argc = 0n9, wchar_t ** argv = 0x015c3ed8)+0x119
14 plugin_container!__tmainCRTStartup(void)+0x122
15 KERNEL32!BaseThreadInitThunk+0xe
16 ntdll!__RtlUserThreadStart+0x20
17 ntdll!_RtlUserThreadStart+0x1b

So roughly `GMPChild::Init` -> `GMPChild::LoadPluginLibrary` -> `LoadLibraryExW` -> `DllMain`

Let me know if you see the same weird behavior from before with this patch applied.
Attachment #8454620 - Attachment is obsolete: true
Attachment #8454620 - Flags: review?(cjones.bugs)
Attachment #8454620 - Flags: review?(bent.mozilla)
Attachment #8455453 - Flags: review?(nical.bugzilla)
Attachment #8455453 - Flags: review?(cjones.bugs)
I just want to observe the for OpenH264, we are not primarily concerned that the plugin is malicious, but rather about containing potential compromise, so it's OK to let the plugin run stuff as long as it doesn't take any external input in doing so.
(In reply to Eric Rescorla (:ekr) from comment #48)
> I just want to observe the for OpenH264, we are not primarily concerned that
> the plugin is malicious, but rather about containing potential compromise,
> so it's OK to let the plugin run stuff as long as it doesn't take any
> external input in doing so.

Right. The original design trusted the plugin only until the plugin started receiving input, which makes sense for OpenH264.

For EME, if I understand correctly, we have a requirement not to trust the plugin even before it receives input.

Are we OK with locking down OpenH264 (and other GMP plugin) processes before they are loaded? If so, I think we can treat GMP and EME plugin processes identically. Otherwise we'll have to write some logic that chooses when to lock down based on which type of plugin we're loading.
(In reply to Tim Abraldes [:TimAbraldes] [:tabraldes] from comment #49)
> (In reply to Eric Rescorla (:ekr) from comment #48)
> > I just want to observe the for OpenH264, we are not primarily concerned that
> > the plugin is malicious, but rather about containing potential compromise,
> > so it's OK to let the plugin run stuff as long as it doesn't take any
> > external input in doing so.
> 
> Right. The original design trusted the plugin only until the plugin started
> receiving input, which makes sense for OpenH264.
> 
> For EME, if I understand correctly, we have a requirement not to trust the
> plugin even before it receives input.
> 
> Are we OK with locking down OpenH264 (and other GMP plugin) processes before
> they are loaded? If so, I think we can treat GMP and EME plugin processes
> identically. Otherwise we'll have to write some logic that chooses when to
> lock down based on which type of plugin we're loading.

Yes, this is fine. I just don't want to block landing on this, so if the
EME version is going to be hard...
(In reply to Eric Rescorla (:ekr) from comment #50)
> (In reply to Tim Abraldes [:TimAbraldes] [:tabraldes] from comment #49)
> > Are we OK with locking down OpenH264 (and other GMP plugin) processes before
> > they are loaded? If so, I think we can treat GMP and EME plugin processes
> > identically. Otherwise we'll have to write some logic that chooses when to
> > lock down based on which type of plugin we're loading.
> 
> Yes, this is fine. I just don't want to block landing on this, so if the
> EME version is going to be hard...

... then it's fine to defer the hard bit until later, since we're targeting Fx35 for EME on desktop, whereas OpenH264's target is much closer. But initializing the sandbox before calling the plugin's GMPInit() seems an easy first step that we can do here.
Comment on attachment 8455453 [details] [diff] [review]
patch v10

Review of attachment 8455453 [details] [diff] [review]:
-----------------------------------------------------------------

I hate to minus again but I have more concerns here (mostly about the shutdown sequence)

::: content/media/gmp/GMPParent.cpp
@@ +82,5 @@
>    }
>  
>    mProcess = new GMPProcessParent(path.get());
>    if (!mProcess->Launch(30 * 1000)) {
> +    KillGMPSubProcess();

Why would you wait an additional 5 seconds in KillGMPSubProcess if this failed?

@@ +87,5 @@
>      return NS_ERROR_FAILURE;
>    }
>  
> +  if (!Open(mProcess->GetChannel(), mProcess->GetChildProcessHandle())) {
> +    KillGMPSubProcess();

Here too.

@@ +114,5 @@
> +    // All of the encoders and decoders have been destroyed, so it
> +    // is no longer dangerous to delete this `GMPParent`. Note that
> +    // this might be the last reference, which means the destructor
> +    // is about to run.
> +    mKungFuDeathGrip = nullptr;

I don't understand the point of this extra reference... The GMPService should strongly own the GMPParent through its mPlugins array, right? And there are only two places I see where GMPParent objects are removed:

1. http://mxr.mozilla.org/mozilla-central/source/content/media/gmp/GMPService.cpp#424
2. http://mxr.mozilla.org/mozilla-central/source/content/media/gmp/GMPService.cpp#261

[1] looks unused since it only has one caller (PathRunnable::Run() via GeckoMediaPluginService::RemovePluginDirectory()), and I can't find any caller of RemovePluginDirectory.

[2] is only called at shutdown (via observer service), and it first calls CleanupActors() on the GMPParent.

So I don't quite see where we need another ref. What caused you to add it?

@@ +119,5 @@
>    }
>  }
>  
>  void
> +GMPParent::CleanupActors()

So you've removed the eager Close() call here. That means if we ever get called here from shutdown (via GMPService::UnloadPlugins) then... Well, I don't know what will happen exactly, but this doesn't look like it will force the process to die.

@@ +278,5 @@
> +  // enter this block
> +  mState = GMPStateNotLoaded;
> +  ipc::MessageChannel* channel = GetIPCChannel();
> +  if (channel) {
> +    channel->CloseWithError();

What's the rationale for calling CloseWithError here? ActorDestroy is called for normal (clean) shutdown as well as crashes.

@@ +280,5 @@
> +  ipc::MessageChannel* channel = GetIPCChannel();
> +  if (channel) {
> +    channel->CloseWithError();
> +  } else {
> +    Close();

Won't this be called more than once in some cases? MaybeUnloadProcess() may call Close() which should trigger ActorDestroy(), so calling Close() again should probably cause an abort here: http://mxr.mozilla.org/mozilla-central/source/ipc/glue/MessageChannel.cpp?force=1#1676

Have you tested that code path?
Attachment #8455453 - Flags: review?(nical.bugzilla)
Attachment #8455453 - Flags: review?(cjones.bugs)
Attachment #8455453 - Flags: review-
Also, feel free to catch me on irc tomorrow if you can so we can discuss this.
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #52)
> Comment on attachment 8455453 [details] [diff] [review]
> patch v10
> 
> Review of attachment 8455453 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I hate to minus again but I have more concerns here (mostly about the
> shutdown sequence)

Don't feel bad; that's why we have reviews!

> ::: content/media/gmp/GMPParent.cpp
> @@ +82,5 @@
> >    }
> >  
> >    mProcess = new GMPProcessParent(path.get());
> >    if (!mProcess->Launch(30 * 1000)) {
> > +    KillGMPSubProcess();
> 
> Why would you wait an additional 5 seconds in KillGMPSubProcess if this
> failed?

I suppose because I don't know whether the process will die of natural causes. It sounds like the child process will only terminate itself if the channel gets set up properly and then we close the channel. I'll rework this so that there's no wait.

> @@ +87,5 @@
> >      return NS_ERROR_FAILURE;
> >    }
> >  
> > +  if (!Open(mProcess->GetChannel(), mProcess->GetChildProcessHandle())) {
> > +    KillGMPSubProcess();
> 
> Here too.

Same as above
 
> @@ +114,5 @@
> > +    // All of the encoders and decoders have been destroyed, so it
> > +    // is no longer dangerous to delete this `GMPParent`. Note that
> > +    // this might be the last reference, which means the destructor
> > +    // is about to run.
> > +    mKungFuDeathGrip = nullptr;
> 
> I don't understand the point of this extra reference... The GMPService
> should strongly own the GMPParent through its mPlugins array, right? And
> there are only two places I see where GMPParent objects are removed:
> 
> 1.
> http://mxr.mozilla.org/mozilla-central/source/content/media/gmp/GMPService.
> cpp#424
> 2.
> http://mxr.mozilla.org/mozilla-central/source/content/media/gmp/GMPService.
> cpp#261
> 
> [1] looks unused since it only has one caller (PathRunnable::Run() via
> GeckoMediaPluginService::RemovePluginDirectory()), and I can't find any
> caller of RemovePluginDirectory.
> 
> [2] is only called at shutdown (via observer service), and it first calls
> CleanupActors() on the GMPParent.
> 
> So I don't quite see where we need another ref. What caused you to add it?

The [2] that you identified looks like (after applying this patch):
    for (uint32_t i = 0; i < mPlugins.Length(); i++) {
      mPlugins[i]->CleanupActors();
    }
    mPlugins.Clear();
As you noted, we call `CleanupActors` before releasing the `GMPParent` (in the call to `Clear`). `GMPParent::CleanupActors` calls the `DecodingComplete` method of each of the `GMPParent`'s encoders and decoders, but that does not immediately cause `GMPParent::VideoDecoderDestroyed` or `GMPParent::VideoEncoderDestroyed` to be called; those methods get called later from the message loop. Thus, the `mPlugins.Clear()` call could be releasing the last reference to a `GMPParent` whose encoders and decoders have not yet run their destruction sequence. When we later enter `GMPParent::Video*Destroyed`, disaster could strike.

> @@ +119,5 @@
> >    }
> >  }
> >  
> >  void
> > +GMPParent::CleanupActors()
> 
> So you've removed the eager Close() call here. That means if we ever get
> called here from shutdown (via GMPService::UnloadPlugins) then... Well, I
> don't know what will happen exactly, but this doesn't look like it will
> force the process to die.

The call to `Close` should happen after all the encoders and decoders have been destroyed - see `GMPParent::VideoDecoderDestroyed` and `GMPParent::VideoEncoderDestroyed` where we dispatch a call to `GMPParent::MaybeUnloadProcess` which in turn calls `Close` and `KillGMPSubProcess`

> @@ +278,5 @@
> > +  // enter this block
> > +  mState = GMPStateNotLoaded;
> > +  ipc::MessageChannel* channel = GetIPCChannel();
> > +  if (channel) {
> > +    channel->CloseWithError();
> 
> What's the rationale for calling CloseWithError here? ActorDestroy is called
> for normal (clean) shutdown as well as crashes.

In a clean shutdown, we will have set `mState` to `GMPStateNotLoaded` in `GMPParent::MaybeUnloadProcess` so the `if` at the beginning of our `ActorDestroy` method will prevent us from entering this block. I could add some comments to make this more clear, or I'd be happy to rearrange the logic if there's a cleaner/more obvious way to go about this.

> @@ +280,5 @@
> > +  ipc::MessageChannel* channel = GetIPCChannel();
> > +  if (channel) {
> > +    channel->CloseWithError();
> > +  } else {
> > +    Close();
> 
> Won't this be called more than once in some cases? MaybeUnloadProcess() may
> call Close() which should trigger ActorDestroy(), so calling Close() again
> should probably cause an abort here:
> http://mxr.mozilla.org/mozilla-central/source/ipc/glue/MessageChannel.
> cpp?force=1#1676
> 
> Have you tested that code path?

Yes, `MaybeUnloadProcess` will trigger `ActorDestroy` by calling `Close`, but before it does that it will set `mState` to `GMPStateNotLoaded` which will prevent the body of `ActorDestroy` from doing anything. Again, this is the cleanest way I was able to come up with to accomplish the desired behavior, but I'm happy to reorganize or add comments to make this easier to read.
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #53)
> Also, feel free to catch me on irc tomorrow if you can so we can discuss
> this.

Thanks for the offer! I'll ping you when I get to the office tomorrow
(In reply to Tim Abraldes [:TimAbraldes] [:tabraldes] from comment #54)
> When we later enter `GMPParent::Video*Destroyed`, disaster could
> strike.

Yeah, I see. I think this is getting a whole lot more complicated than it needs to be...

> The call to `Close` should happen after all the encoders and decoders have
> been destroyed - see `GMPParent::VideoDecoderDestroyed` and
> `GMPParent::VideoEncoderDestroyed` where we dispatch a call to
> `GMPParent::MaybeUnloadProcess` which in turn calls `Close` and
> `KillGMPSubProcess`

Right, but there's no guarantee that the plugin will respond. Also, since you're no longer blocking shutdown of the parent process (spinning the event loop) there's no guarantee that the parent process won't just exit and leak everything. Then the child processes will just crash.

> In a clean shutdown, we will have set `mState` to `GMPStateNotLoaded` in
> `GMPParent::MaybeUnloadProcess` so the `if` at the beginning of our
> `ActorDestroy` method will prevent us from entering this block.

Hm, that's pretty subtle.

So can you just not change *anything* in the shutdown sequence? My original objection was that your shutdown was wonky if the call to SendSetProcessSandbox failed. Since that happens automatically now at process startup before any third-party code is loaded I don't think that's such a worry...
Attached patch Patch v11 (obsolete) — Splinter Review
This patch reverts the changes to the shutdown sequence. I kept the piece that kills the child process; this matches what `ContentParent` does better and provides a stronger guarantee that we don't leave a plugin-container process sitting around.
Attachment #8455453 - Attachment is obsolete: true
Attachment #8456307 - Flags: review?(bent.mozilla)
(In reply to Tim Abraldes [:TimAbraldes] [:tabraldes] from comment #57)
> Created attachment 8456307 [details] [diff] [review]
> Patch v11
> 
> This patch reverts the changes to the shutdown sequence. I kept the piece
> that kills the child process; this matches what `ContentParent` does better
> and provides a stronger guarantee that we don't leave a plugin-container
> process sitting around.

I should mention though that I'd be happy to do this in a separate patch/bug also.
Comment on attachment 8456307 [details] [diff] [review]
Patch v11

Review of attachment 8456307 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/gmp/GMPParent.cpp
@@ +22,5 @@
> +  // that its pointer has its `GMPProcessParent::Delete` method
> +  // called. Note that this function can be called after all
> +  // `GMPParent`s have been deleted.
> +  void
> +  KillGMPProcess(GMPProcessParent* aProcess)

I think you can revert all this (including the RunnableTraits stuff) and just add Join() to GMPProcessParent::Delete(), right?
Comment on attachment 8456307 [details] [diff] [review]
Patch v11

Review of attachment 8456307 [details] [diff] [review]:
-----------------------------------------------------------------

Will wait for the next patch.
Attachment #8456307 - Flags: review?(bent.mozilla)
Attached patch Patch v12 (obsolete) — Splinter Review
This patch reverts the pieces that kill the gmp subprocess, opting instead to just `Join` inside `GMPProcessParent::Delete`
Attachment #8456307 - Attachment is obsolete: true
Attachment #8456476 - Flags: review?(bent.mozilla)
Comment on attachment 8456476 [details] [diff] [review]
Patch v12

Review of attachment 8456476 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with these fixed:

::: content/media/gmp/GMPParent.cpp
@@ +68,5 @@
>      mProcess = nullptr;
>      return NS_ERROR_FAILURE;
>    }
>  
> +  if (!Open(mProcess->GetChannel(), mProcess->GetChildProcessHandle())) {

Nit: I'd revert this so you don't have blame for the line.

::: ipc/app/MozillaRuntimeMain.cpp
@@ +87,5 @@
>      for (int i = 1; i < argc; i++) {
> +        isNuwa = isNuwa || (strcmp(argv[i], "-nuwa") == 0);
> +#if defined(XP_WIN)
> +        gIsSandboxEnabled = gIsSandboxEnabled
> +                         || (strcmp(argv[i], "-sandbox") == 0);

Nit: I'd revert these |= changes to avoid blame.

::: ipc/glue/GeckoChildProcessHost.h
@@ +166,5 @@
>  
>  #ifdef XP_WIN
>    void InitWindowsGroupID();
>    nsString mGroupId;
> +  mozilla::SandboxBroker mSandboxBroker;

Nit: You're already inside the mozilla namespace so you shouldn't need to qualify like this.
Attachment #8456476 - Flags: review?(bent.mozilla) → review+
Attached patch Patch v13Splinter Review
Addressed review nits, updated patch comment to make it ready to land

I pushed patch v12 to try but hg.mozilla.org seems to be having trouble and I can't see my results. If that succeeds, I'll post the results here and land.
Attachment #8456476 - Attachment is obsolete: true
Attachment #8456562 - Flags: review+
The try build for some reason didn't build:
  https://tbpl.mozilla.org/?tree=Try&rev=02765f458c74

Self-serve tells me that "Revision 02765f458c74 not found on branch try"

Trying a new build with the most recent patch:
  https://tbpl.mozilla.org/?tree=Try&rev=b7e761bd73c4
Blocks: 1040016
https://hg.mozilla.org/mozilla-central/rev/fbd06fa70b84
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Blocks: 1040939
No longer blocks: 1027906
Depends on: 1045533
You need to log in before you can comment on or make changes to this bug.