Closed Bug 641685 (e10s-plugin-ipc) Opened 13 years ago Closed 10 years ago

Have multiple content processes "share" plugin subprocesses

Categories

(Core Graveyard :: Plug-ins, defect, P1)

defect

Tracking

(e10sm3+)

RESOLVED FIXED
mozilla36
Tracking Status
e10s m3+ ---

People

(Reporter: cjones, Assigned: billm)

References

Details

Attachments

(4 files, 13 obsolete files)

5.73 KB, patch
benjamin
: review+
Details | Diff | Splinter Review
3.23 KB, patch
benjamin
: review+
Details | Diff | Splinter Review
19.38 KB, patch
benjamin
: review+
Details | Diff | Splinter Review
122.74 KB, patch
benjamin
: review+
Details | Diff | Splinter Review
IMHO this is probably the biggest TODO item for multi-process firefox.  I think what we approximately want is
 - each plugin subprocess is launched by the chrome process
 - PPluginModule is divested of plugin instance creation/maintenance
 - each content process creates an IPC channel with the plugin(s) it needs to load instances of, using IPDL "bridges"
 - the side-channel protocol looks a lot like PPluginModule does currently

Another option is to route all communication with plugins through the chrome process in order to create fewer IPC pipe/sockets, but that would add a lot of complexity.

Also need to figure out how to support windowed plugins in this world, which fennec doesn't support currently.  This work should allow us to remove the deadlock-avoidance code in ipc/glue.
Assignee: nobody → jones.chris.g
Here are some issues that have arisen so far

How to synchronize MIME-type-handler changes across content processes (i.e., know which plugin to load for which MIME type)
 - this is a general problem.  Not sure how this works in fennec atm.

How will about:plugins work
 - shouldn't be much different from current code, except that (i) we'll need to keep around nsINPAPI* stuff in the chrome process; (ii) about:plugins will need to be a "local" (in-process) tab

Will plugin-check Just Work
 - I believe so

Creating the content->plugin "bridge" will be asynchronous, which will force plugin instantiation to be asynchronous sometimes.  Are there backwards-compat issues here?  E.g. what if script asks for a plugin JS object before onload.
 - this is the most worrying part of this work so far

Will we need to support no-OOP-content-but-OOPP mode?
 - yes, this is part of the e10s "exit strategy".  Shouldn't be too hard.

Plugin crash detection is all in chrome.  Plugin-crashed UI is in each content process.  How does crash submission work?
 - probably mostly the same as today; each content process can detect plugin crashes, and we just need to keep around a bit more state in chrome for crash submission.  We also need to fire a notification back to content on submission so that the UI can go away.

Will chrome be allowed to instantiate plugins?
 - boy I hope not.  Let's plan assuming "no", and make this part of the e10s-compat requirements for addons.  Unless there are compelling uses I'm overlooking.
(In reply to comment #1)
> Creating the content->plugin "bridge" will be asynchronous, which will force
> plugin instantiation to be asynchronous sometimes.  Are there
> backwards-compat issues here?  E.g. what if script asks for a plugin JS
> object before onload.
>  - this is the most worrying part of this work so far

I think we do need synchronous instantiation here.
Why?  (Not trying to be argumentative, just wondering if you know of relevant specs and/or existing compat issues.)

True synchronous instantiation, handing back a fully-capable JS object, is going to require something akin to spinning a nested event loop.  That hurts.  Maybe we can get away with something short of that.
Our tests depend on it for sure.

I can't remember off the top of my head sites that rely on it in the wild. Maybe there aren't any, but I get the feeling there are some. Ask Josh maybe?
The tests I'm familiar with start touching plugins after onload or MozReftestInvalidate.

Josh?

Real-world concerns trump spec'ery, but I'll try to find something covering this.  Will also see what chrome does.
Plugin instantiation has to be synchronous: youtube and other sites break if it is not. I don't think that will require spinning an event loop, though: can't it just be an IPDL sync message to the chrome process that initializes a bridge? Can we initialize a bridge asynchronously?
Synchronous instantiation is OK.  Synchronous instantiation *before* onload is not OK, as plans stand.

Bridging has to be asynchronous in general, but if we need to hack for backwards-compat the plugin bridge can be created synchronously.  I'm not sure whether I prefer hacking that in or spinning the event loop, but we can cross that bridge when we come to it (ha!).

But we should find out whether sites require synchronous instantiation before onload.  Anyone know?
(In reply to comment #7)
> Synchronous instantiation is OK.  Synchronous instantiation *before* onload
> is not OK, as plans stand.

Actually, never mind --- sites can dynamically create objects/embeds of unknowable MIME type after load, so we need to support this.

Crap.
Yes they do: many sites check immediately after the <embed> is inserted into the DOM tree to see whether it has plugin properties so they can provide scriptable fallback content or do version-checking while document.write is still possible.
Plan of attack

 - Create PPlugin; move NP_* stuff for managing modules into PPlugin
 - Make PPluginModule a managed protocol of PPlugin
 - Create PPluginModule in chrome at a convenient time
Land on m-c.  Should be relatively easy, ~1 week's work.

 - Add PContentPluginBridge with PPluginModule managee
 - Add content-process code to request a PContentPluginBridge from chrome
 - Add chrome code to create PPlugin at arbitrary times
 - Figure out v0 approach to maintaining synchronous-instantiation illusion.  v0 might be spinning event loop.
Land on m-c.  Content-process code will inherit current pref'ing-off.  There some unknowns here, but maybe 2-3 week's work.

At this point, windowless plugins should mostly work with OOP content when the pref is flipped on.  The rest of the work (figure out how to test OOPP+content, shippable impl of synchronous-instantiation illusion, windowed plugins) can go to followups before enabling by default.
I'd started work on this in bug 997483, probably should have checked if it had been filed!

Stealing.
Alias: plugin-ipc-e10s
Assignee: cjones.bugs → jschoenick
Status: NEW → ASSIGNED
OS: Linux → All
Hardware: x86_64 → All
Blocks: 890059
As long as a content process on Linux needs to be able fork/exec its own plugin process, rather than asking the chrome process to do so, that will be a significant problem for content process sandboxing.
Priority: -- → P1
See Also: → 1057908
Moving old M2 P1 bugs to M3.
Required IPDL fix for plugin bridges to compile
Another IPDL fix needed to compile WIP patches
Not for landing:
Somewhat-helpful debug spam for tracing where the IPDL compiler is exploding on your bridge statements
Part 1 - Adds the IPDL layout for making PPluginInstance top level:

> protocol PPluginModule { }
> protocol PPluginInstance { bridges PContentPluginSystem, PPluginModule }
> protocol PPlugin { manager PContentPluginSystem }
> protocol PContentPluginSystem { parent spawns PPluginModule; manages PPlugin }
> 
>                  /\                    ||               ^
>                  ::                    ||               |
>                  :: Bridged protocol   || Owns/spawns   | Protocol
>                  ::                    ||               |
>                  \/                    \/               v
> 
>                                   { Chrome process }
>                                    +--------------+
>                              ++===== PluginSystem ====++
>                +-------------\/----+-+---------+--+---\/--------------------+
>                | PPluginModuleParent |         | PContentPluginSystemParent | [ per content ]
>                +---||-------------^--+         +--+---------\/----+------||-+
>                    ||             *~~~~~~~~~~~~~~~~ PPluginParent |      ||
>                    ||                             +--------^------+      ||
>                    ||    [ per module per content ] -^     |             ||
>                    ||                                      |             ||
>                    ||                                      |             ||
>                    ||                                      |             ||
> { Plugin Process } ||                                      |             || { Content Process }
>      +-------------\/-----+                                |     +-------\/------------------+
>      | PPluginModuleChild |                                |     | PContentPluginSystemChild |
>      +---------+----------+-----------+                +---v-----+----+----------------------+
>                | PPluginInstanceChild |                | PPluginChild |
>                +------------------/\--+          +-----+--------------+--+
>                                   :::::::::::::::> PPluginInstanceParent |
>                                                  +-----------------------+
>
Alias: plugin-ipc-e10s → e10s-plugin-ipc
Blocks: 1079247
Blocks: 1059775
Attachment #8491767 - Attachment is obsolete: true
Attachment #8491793 - Attachment is obsolete: true
Attachment #8491768 - Attachment is obsolete: true
Attachment #8491769 - Attachment is obsolete: true
Attached patch refactor-objectmap (obsolete) — Splinter Review
This patch just moves the object map to a per-process singleton.
Attachment #8508379 - Flags: review?(benjamin)
Attached patch special-powersSplinter Review
This patch makes the setTestPluginEnabledState function work in tests. Unfortunately, this function is synchronous. The enabled state is determined by a preference. There isn't really any way to synchronously set a pref from a content process as far as I know. So I spun a nested event loop :-(.
Attachment #8508380 - Flags: review?(benjamin)
Attached patch plugin-work (obsolete) — Splinter Review
This patch has most of the work in it. I probably would make sense to split it into two pieces: one that mirrors the plugin tags in the content process and another that handles the bridging and actor stuff.

I've made some progress on splitting up the PluginModuleParent code into chrome and content parts. However, it's all still part of PPluginModule. I'm still not entirely sure that we want two protocols there. If we do decide to do that, I think it should be done as a refactoring on top of a patch like this one. The way this patch is structured right now makes it clear what parts of the code are changing. If I break things up more, I'm afraid it will be much harder to distinguish code movement from actual changes during review. That's what happened with my initial patch, which tried to use separate protocols.

Jim, I'd be interested to hear how this works for you. You'll need to base these patches on top of bug 1081353 (already landed but not yet merged) and bug 1085710.
Attachment #8508387 - Flags: feedback?(benjamin)
Attached patch refactor-objectmap v2 (obsolete) — Splinter Review
New version that fixes a few bugs.
Attachment #8508379 - Attachment is obsolete: true
Attachment #8508379 - Flags: review?(benjamin)
Attachment #8508461 - Flags: review?(benjamin)
Attached patch plugin-work v2 (obsolete) — Splinter Review
Also fixed some bugs here. We're now passing all the plugin mochitests except the ones I specifically disabled in the manifest (one of which also fails on me normally).

The main thing I still want to fix is how shutdown works. First I need to think more about what we want to happen though.
Attachment #8508387 - Attachment is obsolete: true
Attachment #8508387 - Flags: feedback?(benjamin)
Attachment #8508462 - Flags: feedback?(jmathies)
Attachment #8508462 - Flags: feedback?(benjamin)
Attached patch windows - build fixes (obsolete) — Splinter Review
Minor tweaks to get builds and windowless mode working on Windows.
On Windows windowed mode + non-e10s is working fine. Windowed mode w/e10s isn't, digging into that currently.
Windowed mode with e10s is failing creating the GeckoPluginWindow widget, which we try to do in the child process here - 

http://mxr.mozilla.org/mozilla-central/source/dom/plugins/base/nsPluginInstanceOwner.cpp#2444

Looks like this widget creation needs to happen in the parent. It can't succeed in the child using a  PuppetWidget.
Assignee: jschoenick → wmccloskey
Note there's some pretty heavy bitrot on mc tip here from bug 818307.
Attached patch plugin-work v3 (obsolete) — Splinter Review
Here's a rebased version that incorporates Jim's fixes.
Attachment #8508462 - Attachment is obsolete: true
Attachment #8508462 - Flags: feedback?(jmathies)
Attachment #8508462 - Flags: feedback?(benjamin)
Attachment #8509995 - Flags: feedback?(benjamin)
Comment on attachment 8509995 [details] [diff] [review]
plugin-work v3

Review of attachment 8509995 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/plugins/ipc/PluginModuleParent.h
@@ +307,5 @@
> +     * Used for chrome hang annotations.
> +     */
> +    void
> +    OnHangUIContinue();
> +#endif // XP_WIN

Had to move this section to public to compile on windows as it is called by PluginHangUIParent.
Attachment #8508660 - Attachment is obsolete: true
Attached file start up sequence txt (obsolete) —
Looking at this windowed mode widget call, What I'm thinking we need here is a widget proxy in the content process which forwards calls sync over to a native parent widget. The proxy wouldn't be a full implementation of nsIWidget, more like a utility proxy with calls for Create, Show, Destroy, Invalidate, and probably a few helper apis for methods like GetNetscapeWindow, ShowNativeContextMenu, and ConvertPoint.

Not sure how this fits in with the mac work currently taking place, need to chat with Josh.

http://mxr.mozilla.org/mozilla-central/source/dom/plugins/base/nsPluginInstanceOwner.cpp#2444

Billm, thoughts?
Attached file start up sequence txt (obsolete) —
Attachment #8510500 - Attachment is obsolete: true
Comment on attachment 8510532 [details]
start up sequence txt

I've invaded billm's bug here for basic support, moving the windowed mode discussion down to the proper bug - bug 669200.
Attachment #8510532 - Attachment is obsolete: true
Attached patch ipdl-changesSplinter Review
Splitting out the IPDL changes. This patch solves a silly problem introduced with bridging. We're now creating a PPluginModuleParent inside of PContentChild. That's kind of new. The problem is that we're not generating the necessary #includes and forward decls for PPluginModuleParent since it's from a different |side| than PPluginChild (parent versus child). This patch fixes that in several ways:

- The only time the #include file issue seems to come up is in the PContentChild.cpp code that receives the PPluginModuleMsgStart message and creates the new protocol. That code requires we #include "PPluginModuleParent.h". The patch adds the #include from makeHandlerCase. This function is only called when one protocol opens another, which is somewhat rare. So we're not adding too many extra #include files.

- There's also a problem when we declare AllocPPluginModuleParent in PContentChild. We lack a forward declaration for mozilla::plugins::PPluginModuleParent as well as a typedef for it (which eliminates the need for the explicit scope). So this patch just adds typedefs and forward decls of both sides of every protocol that gets included in another protocol. This is kinda overkill, but it's an easy way to avoid dupes, and I doubt that these decls are that expensive for the compiler to process.

The diff here is pretty boring: just some extra #include statements, forward decls, and typedefs.
Attachment #8511370 - Flags: review?(benjamin)
Attached patch plugin-work v4 (obsolete) — Splinter Review
This is ready for review. I think I've fixed everything we discussed on Friday. It's green on try except for some content process leaks under e10s. I'd like to fix these, but they're pretty intermittent and I'd rather land with e10s debug plugin tests disabled initially.

For the setexception issue, I couldn't find a nice way to keep track of what's on the stack. So I'm preserving the existing behavior in the single-process case and ignoring the exception for multiple processes.
Attachment #8509995 - Attachment is obsolete: true
Attachment #8509995 - Flags: feedback?(benjamin)
Attachment #8511776 - Flags: review?(benjamin)
Attachment #8508380 - Flags: review?(benjamin) → review+
Comment on attachment 8508461 [details] [diff] [review]
refactor-objectmap v2

I'd really like to see the commit message you're going to use for this.

>diff --git a/dom/plugins/ipc/PluginScriptableObjectChild.cpp b/dom/plugins/ipc/PluginScriptableObjectChild.cpp

>+nsTHashtable<PluginScriptableObjectChild::NPObjectData> PluginScriptableObjectChild::sObjectMap;

Static hashtables aren't a good idea. This should be a pointer/lazy in some way. This is especially true since the main process contains this code but won't ever need it.
Attachment #8508461 - Flags: review?(benjamin) → review-
Attachment #8511370 - Flags: review?(benjamin) → review+
Attachment #8508461 - Attachment is obsolete: true
Attachment #8512764 - Flags: review?(benjamin)
Comment on attachment 8511776 [details] [diff] [review]
plugin-work v4

This is really close. I'm mainly marking r- for the reloadplugins and setexception issues. The rest is just comments/rearranging/asserts.

>diff --git a/dom/ipc/PContent.ipdl b/dom/ipc/PContent.ipdl

>+    intr LoadPlugin(uint32_t id);

Does this magically do a bridging thing that we can't see in the IPDL signature? Needs a doccomment.

>+    sync FindPlugins(uint32_t pluginEpoch) returns (PluginTag[] plugins, uint32_t newPluginEpoch);

I think this needs a comment. In particular what happens if the passed-in epoch matches the "current" epoch? I believe that this finds the new plugins if necessary, and returns []/0 if not, but it's not clear from the signature.

>diff --git a/dom/plugins/base/nsPluginHost.cpp b/dom/plugins/base/nsPluginHost.cpp

Because we're sharing the nsPluginHost c++ object for both content and chrome processes, I think it would be valuable to add assertions to the top of most methods of this file which should be called in only-chrome or only-content.

>+nsresult
>+nsPluginHost::GetPluginForContentProcess(uint32_t aPluginId, nsNPAPIPlugin** aPlugin)
>+{

e.g. assert chrome-only here

>+nsPluginTag*
>+nsPluginHost::FirstPluginWithId(uint32_t aId)

"first" seems a little odd here. Since IDs are sequential, isn't it a guarantee that there is only a single plugin tag with any ID?

>@@ -1978,42 +2069,99 @@ nsresult nsPluginHost::LoadPlugins()
>   // do not do anything if it is already done
>   // use ReloadPlugins() to enforce loading
>   if (mPluginsLoaded)
>     return NS_OK;
> 
>   if (mPluginsDisabled)
>     return NS_OK;
> 
>+  mPluginEpoch++;
>+
>   bool pluginschanged;
>   nsresult rv = FindPlugins(true, &pluginschanged);
>   if (NS_FAILED(rv))
>     return rv;

It seems odd that we are incrementing the epoch even if the plugins haven't changed. Is that intentional? If so it deserves a explanatory comment.

>+    // If there's already a plugin with the same modification date, we skip adding it.

A lot of this complexity appears to related to the "fact" that plugin tags are sorted by modification date. I don't think that is actually useful any more, because we now select the "preferred" plugin tag by picking the highest version number, not by modification date:

http://hg.mozilla.org/mozilla-central/annotate/c0ddb1b098ec/dom/plugins/base/nsPluginHost.cpp#l1142

Also I don't understand the comment here about the modification date: why is that the important bit of this equation, instead of just using the ID?

>+bool
>+mozilla::plugins::FindPluginsForContent(uint32_t aPluginEpoch,
>+                                        nsTArray<PluginTag>* aPlugins,
>+                                        uint32_t* aNewPluginEpoch)
>+{
>+    nsRefPtr<nsPluginHost> host = nsPluginHost::GetInst();

nit, 2-space indent here

>+void
>+nsPluginHost::FindPluginsForContent(uint32_t aPluginEpoch,
>+                                    nsTArray<PluginTag>* aPlugins,
>+                                    uint32_t* aNewPluginEpoch)
>+{
>+  // Load plugins so that the epoch is correct.
>+  LoadPlugins();
>+
>+  *aNewPluginEpoch = mPluginEpoch;
>+  if (aPluginEpoch == mPluginEpoch) {
>+    return;
>+  }

This doesn't set aNewPluginEpoch, which I would normally consider an outparam bug. Is this intentional, and if it is where is the contract documented?

>diff --git a/dom/plugins/base/nsPluginHost.h b/dom/plugins/base/nsPluginHost.h

>+  // This epoch increases each time we load the list of plugins from disk.
>+  uint32_t mPluginEpoch;
>+
>+  // This is the most recent value of the chrome process's mPluginEpoch that
>+  // we've observed.
>+  uint32_t mPluginEpochChrome;

Let me double-check my understanding:
mPluginEpoch is only used in chrome processes.
mPluginEpochChrome is only used in content processes.

If that is correct, can the variables be combined?

>diff --git a/dom/plugins/ipc/PPluginModule.ipdl b/dom/plugins/ipc/PPluginModule.ipdl

>+  async NotifyContentModuleDestroyed();

Needs a doc-comment.

>diff --git a/dom/plugins/ipc/PluginBridge.h b/dom/plugins/ipc/PluginBridge.h
>new file mode 100644
>--- /dev/null
>+++ b/dom/plugins/ipc/PluginBridge.h
>@@ -0,0 +1,29 @@
>+/* -*- Mode: C++; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*-

New file, please use 2-space indent. I know existing code in dom/plugins/ipc is a terrible mix, but new code can use standard style.

>diff --git a/dom/plugins/ipc/PluginModuleChild.cpp b/dom/plugins/ipc/PluginModuleChild.cpp

> PluginModuleChild::~PluginModuleChild()
> {
>-    NS_ASSERTION(gInstance == this, "Something terribly wrong here!");
>+    if (mTransport) {
>+        // For some reason IPDL doesn't autmatically delete the channel for a
>+        // bridged protocol. So we have to do it ourselves.
>+        XRE_GetIOMessageLoop()->PostTask(FROM_HERE, new DeleteTask<Transport>(mTransport));

Followup bug filed? Something feels fishy here. Is mTransport only set for bridged protocols, or does this risk dual-deleting the transport in some cases?

>+PluginModuleChild::CommonInit(base::ProcessHandle aParentProcessHandle,
>+                              MessageLoop* aIOLoop,
>+                              IPC::Channel* aChannel)
> {
>     PLUGIN_LOG_DEBUG_METHOD;
> 
>-    GetIPCChannel()->SetAbortOnError(true);

I believe that we should continue to abort-on-error on all the relevant channels, not just the chrome channel.

>     // Request Windows message deferral behavior on our channel. This
>     // applies to the top level and all sub plugin protocols since they
>     // all share the same channel.
>     GetIPCChannel()->SetChannelFlags(MessageChannel::REQUIRE_DEFERRED_MESSAGE_PROTECTION);

File a followup: this may not be required or useful for e10s plugins.

> void
> _reloadplugins(NPBool aReloadPages)
> {
>     PLUGIN_LOG_DEBUG_FUNCTION;
>     ENSURE_PLUGIN_THREAD_VOID();
> 
>-    PluginModuleChild::current()->SendNPN_ReloadPlugins(!!aReloadPages);
>+    // Send the reload message to all modules. Chrome will need to reload from
>+    // disk and content will need to request a new list of plugin tags from
>+    // chrome.
>+    for (size_t i = 0; i < gAllInstances->Length(); i++) {
>+        (*gAllInstances)[i]->SendNPN_ReloadPlugins(!!aReloadPages);
>+    }

This still doesn't feel right. I'm about 99% sure that this message should go only to the chrome process, and if reloading causes the plugin list to change that should be broadcast or trigger. It should essentially be the same thing that happens if any content process calls navigator.plugins.refresh()

> void
> _setexception(NPObject* aNPObj,
>               const NPUTF8* aMessage)

>+    for (size_t i = 0; i < gAllInstances->Length(); i++) {
>+        (*gAllInstances)[i]->SendNPN_SetException(NullableString(aMessage));
>     }

I don't think this can be right. We should either do stack-tracking (painful) or just give up on this and make the API a no-op. Let's try no-op and see what breaks?

>diff --git a/dom/plugins/ipc/PluginModuleParent.cpp b/dom/plugins/ipc/PluginModuleParent.cpp

>+/* static */ PluginModuleContentParent*
>+PluginModuleContentParent::Create(mozilla::ipc::Transport* aTransport,
>+                                  base::ProcessId aOtherProcess)
>+{
>+    nsAutoPtr<PluginModuleContentParent> parent(new PluginModuleContentParent());
>+    ProcessHandle handle;
>+    if (!base::OpenProcessHandle(aOtherProcess, &handle)) {
>+        // XXX need to kill |aOtherProcess|, it's boned

Make sure a followup is filed.

>+    DebugOnly<bool> ok = parent->Open(aTransport, handle, XRE_GetIOMessageLoop(),
>+                                      mozilla::ipc::ParentSide);
>+    MOZ_ASSERT(ok);

Should this really be DebugOnly? There's no better error handling to do here?

> #if defined(XP_WIN) || defined(XP_MACOSX)
> nsresult
> PluginModuleParent::NP_GetEntryPoints(NPPluginFuncs* pFuncs, NPError* error)
> {
>     NS_ASSERTION(pFuncs, "Null pointer!");
> 
>-    // We need to have the child process update its function table
>-    // here by actually calling NP_GetEntryPoints since the parent's
>-    // function table can reflect nullptr entries in the child's table.
>-    if (!CallNP_GetEntryPoints(error)) {
>-        return NS_ERROR_FAILURE;
>-    }
>-    else if (*error != NPERR_NO_ERROR) {
>-        return NS_OK;
>+    if (IsChrome()) {
>+        // We need to have the child process update its function table
>+        // here by actually calling NP_GetEntryPoints since the parent's
>+        // function table can reflect nullptr entries in the child's table.
>+        if (!CallNP_GetEntryPoints(error)) {
>+            return NS_ERROR_FAILURE;
>+        }
>+        else if (*error != NPERR_NO_ERROR) {
>+            return NS_OK;
>+        }

The comment here should be outside the if block, since it applies to both chrome and content plugin modules, but the actual work is done in SetPluginFuncs, not CallNP_GetEntryPoints.
Attachment #8511776 - Flags: review?(benjamin) → review-
Attachment #8512764 - Flags: review?(benjamin) → review+
Will post a new patch in a moment. Anything not mentioned here was fixed.

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #38)
> >diff --git a/dom/plugins/base/nsPluginHost.cpp b/dom/plugins/base/nsPluginHost.cpp
> 
> Because we're sharing the nsPluginHost c++ object for both content and
> chrome processes, I think it would be valuable to add assertions to the top
> of most methods of this file which should be called in only-chrome or
> only-content.

I added assertions where possible. It's a bit difficult because lots of the functions in nsPluginHost are for use by "plugin consumers", which can be the chrome process for non-remote pages and the content process for remote pages.

> >+nsPluginTag*
> >+nsPluginHost::FirstPluginWithId(uint32_t aId)
> 
> "first" seems a little odd here. Since IDs are sequential, isn't it a
> guarantee that there is only a single plugin tag with any ID?

Yeah, First doesn't make sense. I'm not too familiar with Gecko naming, so I just dropped First.

> It seems odd that we are incrementing the epoch even if the plugins haven't
> changed. Is that intentional? If so it deserves a explanatory comment.

Not intentional.

> >+    // If there's already a plugin with the same modification date, we skip adding it.
> 
> A lot of this complexity appears to related to the "fact" that plugin tags
> are sorted by modification date. I don't think that is actually useful any
> more, because we now select the "preferred" plugin tag by picking the
> highest version number, not by modification date:
> 
> http://hg.mozilla.org/mozilla-central/annotate/c0ddb1b098ec/dom/plugins/base/
> nsPluginHost.cpp#l1142
> 
> Also I don't understand the comment here about the modification date: why is
> that the important bit of this equation, instead of just using the ID?

I just coped the comment from the existing code. I'm not sure what it meant. I removed the sorting so that we just add each new plugin to the head of the list.

> >+void
> >+nsPluginHost::FindPluginsForContent(uint32_t aPluginEpoch,
> >+                                    nsTArray<PluginTag>* aPlugins,
> >+                                    uint32_t* aNewPluginEpoch)
> >+{
> >+  // Load plugins so that the epoch is correct.
> >+  LoadPlugins();
> >+
> >+  *aNewPluginEpoch = mPluginEpoch;
> >+  if (aPluginEpoch == mPluginEpoch) {
> >+    return;
> >+  }
> 
> This doesn't set aNewPluginEpoch, which I would normally consider an
> outparam bug. Is this intentional, and if it is where is the contract
> documented?

I think you missed the line |*aNewPluginEpoch = mPluginEpoch;|.

> Let me double-check my understanding:
> mPluginEpoch is only used in chrome processes.
> mPluginEpochChrome is only used in content processes.
> 
> If that is correct, can the variables be combined?

Yes, I've combined them. I'm not if this is simpler, but I added a bunch of assertions, so it should be reasonably clear.

> >diff --git a/dom/plugins/ipc/PluginModuleChild.cpp b/dom/plugins/ipc/PluginModuleChild.cpp
> 
> > PluginModuleChild::~PluginModuleChild()
> > {
> >-    NS_ASSERTION(gInstance == this, "Something terribly wrong here!");
> >+    if (mTransport) {
> >+        // For some reason IPDL doesn't autmatically delete the channel for a
> >+        // bridged protocol. So we have to do it ourselves.
> >+        XRE_GetIOMessageLoop()->PostTask(FROM_HERE, new DeleteTask<Transport>(mTransport));
> 
> Followup bug filed? Something feels fishy here. Is mTransport only set for
> bridged protocols, or does this risk dual-deleting the transport in some
> cases?

mTransport is indeed only set for bridged instances. I filed bug 1090570 to simplify this.

> >+PluginModuleChild::CommonInit(base::ProcessHandle aParentProcessHandle,
> >+                              MessageLoop* aIOLoop,
> >+                              IPC::Channel* aChannel)
> > {
> >     PLUGIN_LOG_DEBUG_METHOD;
> > 
> >-    GetIPCChannel()->SetAbortOnError(true);
> 
> I believe that we should continue to abort-on-error on all the relevant
> channels, not just the chrome channel.

I don't agree. One use case I would like to work is when the content process crashes. In this case I'd really like the plugin process to continue functioning. Especially when we have multiple content processes this will be important. The current code makes this work.

I've read that this abort code was created to solve a problem where Flash hangs and the plugin process somehow never gets killed even after Firefox exits. I think that, as long as the chrome process's channel has abort on error, we'll kill the Flash process when FF exits.

> >     GetIPCChannel()->SetChannelFlags(MessageChannel::REQUIRE_DEFERRED_MESSAGE_PROTECTION);
> 
> File a followup: this may not be required or useful for e10s plugins.

Filed bug 1090573.

> > void
> > _reloadplugins(NPBool aReloadPages)
> 
> This still doesn't feel right. I'm about 99% sure that this message should
> go only to the chrome process, and if reloading causes the plugin list to
> change that should be broadcast or trigger. It should essentially be the
> same thing that happens if any content process calls
> navigator.plugins.refresh()

Filed bug 1090576 and converted the code to send the reload message to the chrome process only.

> > void
> > _setexception(NPObject* aNPObj,
> >               const NPUTF8* aMessage)
> 
> I don't think this can be right. We should either do stack-tracking
> (painful) or just give up on this and make the API a no-op. Let's try no-op
> and see what breaks?

OK, I make this a no-op. Now there's a good deal of dead code for handling these exceptions. I filed bug 1090589 to remove this code once we're more sure that this is the right approach.

> >+        // XXX need to kill |aOtherProcess|, it's boned
> 
> Make sure a followup is filed.

Bug 1090578.

> >+    DebugOnly<bool> ok = parent->Open(aTransport, handle, XRE_GetIOMessageLoop(),
> >+                                      mozilla::ipc::ParentSide);
> >+    MOZ_ASSERT(ok);
> 
> Should this really be DebugOnly? There's no better error handling to do here?

Also bug 1090578.
Attached patch plugin-work v5Splinter Review
See comment above.
Attachment #8511776 - Attachment is obsolete: true
Attachment #8513074 - Flags: review?(benjamin)
Comment on attachment 8513074 [details] [diff] [review]
plugin-work v5

I'm still worried about the logic in nsPluginHost::FindPluginsInContent with the timestamps. I don't understand why we're comparing timestamps and IDs there. I'm going to mark r+ here because I don't think it actually matters, but I suspect it's sufficient just to compare IDs and remove all of the comments about timestamps.
Attachment #8513074 - Flags: review?(benjamin) → review+
FYI, the final landing here seems to have broken e10s plugins on Windows. Nightly currently displays a gray rect in place of plugin content.

Note the last rev of this I had applied (comment 35'ish) was working locally for me.
Depends on: 1091621
Depends on: 1092008
See Also: → 1088840
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.