Flash settings menu doesn't work on windowless plugins with low integrity sandbox

RESOLVED FIXED in Firefox 41

Status

()

defect
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: bobowen, Assigned: bobowen)

Tracking

41 Branch
mozilla43
All
Windows
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox41 fixed, firefox42 fixed, firefox43 fixed)

Details

Attachments

(4 attachments)

The right-click Flash settings menu doesn't work on windowless plugins, if you have the low integrity sandbox turned on.

This is because it still creates a surrogate window [1], which I think is just for this menu.

[1] http://hg.mozilla.org/mozilla-central/file/2c91d57441fd/dom/plugins/ipc/PluginInstanceChild.cpp#l1947
https://treeherder.mozilla.org/#/jobs?repo=try&revision=341cf57ff324

Wasn't working with e10s, because I thought I was working with |PluginWidgetProxy|s, but they were |PuppetWidget|s.
Depends on: 1185472
Bug 1182411: Change winless popup surrogate to have its parent set in the chrome process. r=jimm

The creation of the surrogate native window in the child NPAPI process was
failing when then sandbox was at low integrity, because the parent is from the
chrome process, so at medium integrity.
Instead of making an IPC call to get the parent, we now create the window and
send it in an IPC message to be parented in the chrome process.
Attachment #8636004 - Flags: review?(jmathies)
Bit of extra context as I couldn't see how to add it to Review Board.

In nsPluginInstanceOwner I refactored GetNetscapeWindow with two new helpers, one as a private method (GetContainingWidgetIfOffset) because it required to two members and one as a static (GetRootWidgetForPluginFrame) as that made sense with just the member passed in.

This allowed me to re-use this logic in SetNetscapeWindowAsParent.

The rest is just IPC plumbing, generally following the existing stuff for NPN_GetValue_NPNVnetscapeWindow.
Blocks: 1185532
https://reviewboard.mozilla.org/r/13607/#review12339

::: dom/ipc/TabParent.h:222
(Diff revision 1)
> +    bool RecvSetNativeChildOfShareableWindow(const uintptr_t& childWindow) override;

no virtual use here? seems odd.

::: widget/nsIWidget.h:115
(Diff revision 1)
> +#define NS_NATIVE_CHILD_OF_SHAREABLE_WINDOW 105

This is in use by NS_NATIVE_PLUGIN_ID below.

::: dom/plugins/base/nsPluginInstanceOwner.cpp:742
(Diff revision 1)
> +  nsIWidget* offsetWidget = GetContainingWidgetIfOffset();

so is this the non-e10s case? I don't see you adding NS_NATIVE_CHILD_WINDOW to PuppetWidget. I do see you adding NS_NATIVE_CHILD_OF_SHAREABLE_WINDOW though.

::: dom/plugins/ipc/PPluginInstance.ipdl:218
(Diff revision 1)
> +  // If this is not sync it causes a hang.

I'm kinda surprised by this. Looking at the code, we invoke this in a SetWindow call so something like:

content process -> send sync SetWindow -> plugin process -> AnswerNPP_SetWindow ->
CreateWinlessPopupSurrogate -> send sync SetNetscapeWindowAsParent -> content process ->
nsPluginInstanceOwner::SetNetscapeWindowAsParent -> PuppetWidget's SetNativeData ->
send sync SetNativeChildOfShareableWindow -> browser process ->
nsWindow::SetNativeData() -> modifying the parent of a child window

^^ the whole system here is caught up in sync sends. Very dangerous, can we async'ify this in some way?

A few questions here, lets discuss a bit. I'm very nervous about the way this messaging wraps around through all of the processes. I'm also surprised that making some of this async locks things up! I'd be happy to apply this and do some testing on my own as well if you like.
Attachment #8636004 - Flags: review?(jmathies)
Thanks for the review Jim.
Some answers and more questions below, but it looks like I've also broken ActionScript 2 keyboard interaction in protected mode with my patch for bug 1165903 (new bug 1185529).
So, I might have to back that out if I can't fix quickly ... and this will have to wait.

(In reply to Jim Mathies [:jimm] from comment #4)

> > +    bool RecvSetNativeChildOfShareableWindow(const uintptr_t& childWindow) override;
> 
> no virtual use here? seems odd.

This is the new style guideline, but you're right it looks out of place amongst all the others.
 
> ::: widget/nsIWidget.h:115
> (Diff revision 1)
> > +#define NS_NATIVE_CHILD_OF_SHAREABLE_WINDOW 105
> 
> This is in use by NS_NATIVE_PLUGIN_ID below.

Oh yes, hadn't spotted that, thanks.
It feels like this should move to number 13 or something, as it's not platform specific?

> ::: dom/plugins/base/nsPluginInstanceOwner.cpp:742
> (Diff revision 1)
> > +  nsIWidget* offsetWidget = GetContainingWidgetIfOffset();
> 
> so is this the non-e10s case? I don't see you adding NS_NATIVE_CHILD_WINDOW
> to PuppetWidget. I do see you adding NS_NATIVE_CHILD_OF_SHAREABLE_WINDOW
> though.

My understanding was that PuppetWidget is only used in e10s in the content process.
And because of the |!XRE_IsContentProcess()| in GetContainingWidgetIfOffset it will always return null there.

> ::: dom/plugins/ipc/PPluginInstance.ipdl:218
> (Diff revision 1)
> > +  // If this is not sync it causes a hang.
> 
> I'm kinda surprised by this. Looking at the code, we invoke this in a
> SetWindow call so something like:
> 

So was I ... when I had this as async, if I remember correctly it was locking up in non-e10s mode because flash started calling for the Netscape window itself.
Although, I couldn't really work out why.
If I fix bug 1185529, I'll re-look at this.
It's looking very unlikely that I'll get this into 40 now anyway.

> content process -> send sync SetWindow -> plugin process ->
> AnswerNPP_SetWindow ->
> CreateWinlessPopupSurrogate -> send sync SetNetscapeWindowAsParent ->
> content process ->
> nsPluginInstanceOwner::SetNetscapeWindowAsParent -> PuppetWidget's
> SetNativeData ->
> send sync SetNativeChildOfShareableWindow -> browser process ->
> nsWindow::SetNativeData() -> modifying the parent of a child window
> 
> ^^ the whole system here is caught up in sync sends. Very dangerous, can we
> async'ify this in some way?

The call I replaced NPN_GetValue_NPNVnetscapeWindow, effectively has the same call pattern I think, so I didn't think I'd be making anything worse.
Or is an intr message not a synchronous one?
> > ::: widget/nsIWidget.h:115
> > (Diff revision 1)
> > > +#define NS_NATIVE_CHILD_OF_SHAREABLE_WINDOW 105
> > 
> > This is in use by NS_NATIVE_PLUGIN_ID below.
> 
> Oh yes, hadn't spotted that, thanks.
> It feels like this should move to number 13 or something, as it's not
> platform specific?

Yeah that sounds right. Just find a new value that isn't in use.

> 
> > ::: dom/plugins/base/nsPluginInstanceOwner.cpp:742
> > (Diff revision 1)
> > > +  nsIWidget* offsetWidget = GetContainingWidgetIfOffset();
> > 
> > so is this the non-e10s case? I don't see you adding NS_NATIVE_CHILD_WINDOW
> > to PuppetWidget. I do see you adding NS_NATIVE_CHILD_OF_SHAREABLE_WINDOW
> > though.
> 
> My understanding was that PuppetWidget is only used in e10s in the content
> process.
> And because of the |!XRE_IsContentProcess()| in GetContainingWidgetIfOffset
> it will always return null there.

Ah ok, sorry didn't see that process check under the fold.

> > ::: dom/plugins/ipc/PPluginInstance.ipdl:218
> > (Diff revision 1)
> > > +  // If this is not sync it causes a hang.
> > 
> > I'm kinda surprised by this. Looking at the code, we invoke this in a
> > SetWindow call so something like:
> > 
> 
> So was I ... when I had this as async, if I remember correctly it was
> locking up in non-e10s mode because flash started calling for the Netscape
> window itself.
> Although, I couldn't really work out why.

Flash asks for the parent HWND a lot which is why we cache it in the instance. (Actually I think there's a bug here, I don't think that cached value gets reset when you drag a tab out of a window.) Flash uses this for all sorts of things including interpreting mouse input.

> If I fix bug 1185529, I'll re-look at this.
> It's looking very unlikely that I'll get this into 40 now anyway.

no problem.

> > content process -> send sync SetWindow -> plugin process ->
> > AnswerNPP_SetWindow ->
> > CreateWinlessPopupSurrogate -> send sync SetNetscapeWindowAsParent ->
> > content process ->
> > nsPluginInstanceOwner::SetNetscapeWindowAsParent -> PuppetWidget's
> > SetNativeData ->
> > send sync SetNativeChildOfShareableWindow -> browser process ->
> > nsWindow::SetNativeData() -> modifying the parent of a child window
> > 
> > ^^ the whole system here is caught up in sync sends. Very dangerous, can we
> > async'ify this in some way?
> 
> The call I replaced NPN_GetValue_NPNVnetscapeWindow, effectively has the
> same call pattern I think, so I didn't think I'd be making anything worse.
> Or is an intr message not a synchronous one?

intr is sync, it's also interruptible for incalls. We might avoid this in the current code since we grab the netscape window at a different time, but I could be wrong. Either way we want to try to avoid this kind of call trace if we can. I'd like to see if we can create this surrogate context menu window and set its parent async before someone actually right-clicks the plugin.
Attachment #8636004 - Attachment description: MozReview Request: Bug 1182411: Change winless popup surrogate to have its parent set in the chrome process. r=jimm → MozReview Request: Bug 1182411 Part 1: Make plugin quirks available to the Parent as well as the Child. r?jimm
Attachment #8636004 - Flags: review?(jmathies)
Comment on attachment 8636004 [details]
MozReview Request: Bug 1182411 Part 1: Make plugin quirks available to the Parent as well as the Child. r?jimm

Bug 1182411 Part 1: Make plugin quirks available to the Parent as well as the Child. r?jimm
Bug 1182411 Part 2: Change winless popup surrogate to have its parent set in the chrome process. r?jimm

The creation of the surrogate native window in the child NPAPI process was
failing when then sandbox was at low integrity, because the parent is from the
chrome process, so at medium integrity.
Instead of making an IPC call to get the parent, we now create the window upfront
and send it in an IPC message to be parented in the chrome process.
This is done with asynchronous messaging.
Attachment #8643876 - Flags: review?(jmathies)
Comment on attachment 8643876 [details]
MozReview Request: Bug 1182411 Part 2: Change winless popup surrogate to have its parent set in the chrome process. r?jimm

Arrrgh!

With some more testing today I'm getting the hang again on some windowless content when using async.

Here's what appears to be happening:

After the async content->plugin CreateChildPopupSurrogate ... async plugin->content SetNetscapeWindowAsParent ... async content->parent SetNativeChildOfShareableWindow

I think the SetParent in the main process causes a sync Windows message, back to the plugin process.
Which in turn at some point triggers InvalidateRectDelayed (not sure exactly what triggers this).
Which further down ends up calling setwindow on the actual plugin.
Which triggers the plugin to request the NPNVnetscapeWindow.
Which causes a sync plugin->content NPN_GetValue_NPNVnetscapeWindow.
Which causes a sync content->parent GetWidgetNativeData.

... and we have deadlock.

When I make this all sync I'm guessing the Windows messages are getting deferred, so things run smoothly.
But I need to roll back to test that.
It may be that the fix I had previously (making the sync call from the plugin process) is the only way to get the Windows messages deferred, but maybe I can come up with something else.

In the old world our code calls NPN_GetValue_NPNVnetscapeWindow itself from the plugin process and I think you mentioned that this gets cached.

Did I mention that I hate NPAPI.
Attachment #8643876 - Flags: review?(jmathies)
Comment on attachment 8643876 [details]
MozReview Request: Bug 1182411 Part 2: Change winless popup surrogate to have its parent set in the chrome process. r?jimm

OK, I have a fix that means we can keep most of this async and avoid the hang.

I grab the netscape window (which will cause a sync call form content->parent, with e10s) and pass it down with the async call to create the surrogate, so this will only happen once per plugin.

This means it is already cached and we don't get the subsequent deadlock.
In the old world we were already doing a sync call from plugin->content content->parent to get the netscape window in CreateWinlessPopupSurrogate, so I think this is still an improvement.


One thing that struck me while I was writing this ...
When we are caching things in the plugin child, perhaps we should do a similar thing and send them down (from the parent or content) as soon as they are created.
This way we would avoid any potential sync calls for this stuff from the plugin process, even the first one.
Because we sometimes have little control over when these are requested even this might save some hangs.

If the cached things change in the parent or content, we could send down the new version (which as you mentioned before is an existing bug I think).
We might even get away with making some of these calls async.
Attachment #8643876 - Flags: review?(jmathies)
Comment on attachment 8643876 [details]
MozReview Request: Bug 1182411 Part 2: Change winless popup surrogate to have its parent set in the chrome process. r?jimm

Bug 1182411 Part 2: Change winless popup surrogate to have its parent set in the chrome process. r?jimm

The creation of the surrogate native window in the child NPAPI process was
failing when then sandbox was at low integrity, because the parent is from the
chrome process, so at medium integrity.
Instead of making an IPC call to get the parent, we now create the window upfront
and send it in an IPC message to be parented in the chrome process.
This is done with asynchronous messaging.
https://reviewboard.mozilla.org/r/13607/#review12339

> no virtual use here? seems odd.

Put the virtual back in and also in some other places where I'd strayed from the surrounding style.

> This is in use by NS_NATIVE_PLUGIN_ID below.

Moved NS_NATIVE_PLUGIN_ID to 13 as it's not platform specific, so I think it belongs there.

> I'm kinda surprised by this. Looking at the code, we invoke this in a SetWindow call so something like:
> 
> content process -> send sync SetWindow -> plugin process -> AnswerNPP_SetWindow ->
> CreateWinlessPopupSurrogate -> send sync SetNetscapeWindowAsParent -> content process ->
> nsPluginInstanceOwner::SetNetscapeWindowAsParent -> PuppetWidget's SetNativeData ->
> send sync SetNativeChildOfShareableWindow -> browser process ->
> nsWindow::SetNativeData() -> modifying the parent of a child window
> 
> ^^ the whole system here is caught up in sync sends. Very dangerous, can we async'ify this in some way?

I moved this to a new message to create the surrogate before we even call SetWindow.
This matches what bug 1185529 does for the real child plugin window and this is one-off set-up.

Additionally my idea was that if I had to keep it sync then:
sync content -> child (returning child window)
sync content -> parent (to reparent child)

before the call to SetWindow would be better.

However, it seems that making it async works.
I also thought that even with it async, initiating this before we call SetWindow gives it a fighting chance to get done before it's needed.

> so is this the non-e10s case? I don't see you adding NS_NATIVE_CHILD_WINDOW to PuppetWidget. I do see you adding NS_NATIVE_CHILD_OF_SHAREABLE_WINDOW though.

Discussed in bugzilla comment 6.

Hopefully these patches address the issues.
I've also done some tidying up from bug 1185529 and before, including getting rid of that nasty cast I had to get the nsPluginNativeWindow, now that I know the code a bit better.
Also, made use of the fact that we've got the quirks available in the parent now.
Just realised that the way I published some of the stuff from MozReview makes it a bit confusing.

Comment 15 is my reply to your original review.
Comment 11 is the problem I found with the hang after making things async.
Comment 13 is my explanation over the fix for the hang and an idea about how we might cache these sorts of things.
Comment on attachment 8643876 [details]
MozReview Request: Bug 1182411 Part 2: Change winless popup surrogate to have its parent set in the chrome process. r?jimm

https://reviewboard.mozilla.org/r/15147/#review13903

::: dom/ipc/TabParent.cpp:2544
(Diff revision 2)
> +    widget->SetNativeData(NS_NATIVE_CHILD_OF_SHAREABLE_WINDOW, aChildWindow);

lets add a comment here about this call generating sync native messages to the plugin process.

::: dom/plugins/ipc/PluginInstanceChild.cpp:1158
(Diff revision 2)
> -    if (!CreatePluginWindow()) {
> +    *aChildPluginWindow = mPluginWindowHWND;

nit - assert on a bad pointer here.

::: dom/plugins/ipc/PluginInstanceParent.cpp:2086
(Diff revision 2)
> +    owner->SetWidgetWindowAsParent(mChildPluginHWND);

comment about risk of sync sent native messages here too.
Attachment #8643876 - Flags: review?(jmathies) → review+
Attachment #8636004 - Flags: review?(jmathies) → review+
url:        https://hg.mozilla.org/integration/mozilla-inbound/rev/79726e2a1f1d891260ac1ffdfec8317ea30d4fe7
changeset:  79726e2a1f1d891260ac1ffdfec8317ea30d4fe7
user:       Bob Owen <bobowencode@gmail.com>
date:       Wed Aug 12 16:00:25 2015 +0100
description:
Bug 1182411 Part 1: Make plugin quirks available to the Parent as well as the Child. r=jimm

url:        https://hg.mozilla.org/integration/mozilla-inbound/rev/b96e95b7b4b946b5522374444ec94d6b163e0db3
changeset:  b96e95b7b4b946b5522374444ec94d6b163e0db3
user:       Bob Owen <bobowencode@gmail.com>
date:       Wed Aug 12 16:00:26 2015 +0100
description:
Bug 1182411 Part 2: Change winless popup surrogate to have its parent set in the chrome process. r=jimm

The creation of the surrogate native window in the child NPAPI process was
failing when then sandbox was at low integrity, because the parent is from the
chrome process, so at medium integrity.
Instead of making an IPC call to get the parent, we now create the window upfront
and send it in an IPC message to be parented in the chrome process.
This is done with asynchronous messaging.
https://hg.mozilla.org/mozilla-central/rev/79726e2a1f1d
https://hg.mozilla.org/mozilla-central/rev/b96e95b7b4b9
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
Comment on attachment 8636004 [details]
MozReview Request: Bug 1182411 Part 1: Make plugin quirks available to the Parent as well as the Child. r?jimm

Approval Request Comment
[Feature/regressing bug #]:
Bug 1123759 was the original for allowing a low integrity NPAPI sandbox policy to be set.
We need this fix on Fx41 to be able to set that sandbox policy by default (bug 1185532).

[User impact if declined]:
Users using the 64-bit version of Firefox will have no sandbox protecting the Flash Player plugin. This is because Flash Player Protected mode is not available for 64-bit.

[Describe test coverage new/current, TreeHerder]:
There is some test coverage in-tree for NPAPI, but because of the wide variety of plugins, regressions can slip through.

The patches apply cleanly to Aurora but not Beta, I'll upload new patches for Beta.
On Beta I have tested:
* a few flash games and video on BBC, Amazon and TwitchTV
* Amazon and Netflix using Silverlight.
* Java as used on IBM documentation site for iSeries
* a Unity game

[Risks and why]: 
Medium: These are not trivial changes, but the changes to fix the menu are limited to Flash and SilverLight. The other changes are refactoring to allow this that shouldn't change the way plugins are actually initialised.

[String/UUID change made/needed]:
None
Attachment #8636004 - Flags: approval-mozilla-beta?
Attachment #8636004 - Flags: approval-mozilla-aurora?
Comment on attachment 8643876 [details]
MozReview Request: Bug 1182411 Part 2: Change winless popup surrogate to have its parent set in the chrome process. r?jimm

See comment 20.
Attachment #8643876 - Flags: approval-mozilla-beta?
Attachment #8643876 - Flags: approval-mozilla-aurora?
Some context changes and a change to the way members are accessed on plugin tags that is only on Fx42.
Again some context changes and also a refactoring in TabParent that is only available on Fx42, so had to revert to original code.
Comment on attachment 8636004 [details]
MozReview Request: Bug 1182411 Part 1: Make plugin quirks available to the Parent as well as the Child. r?jimm

Patch has been in Nightly for a few days, seems safe to uplift to Beta and Aurora.
Attachment #8636004 - Flags: approval-mozilla-beta?
Attachment #8636004 - Flags: approval-mozilla-beta+
Attachment #8636004 - Flags: approval-mozilla-aurora?
Attachment #8636004 - Flags: approval-mozilla-aurora+
Comment on attachment 8643876 [details]
MozReview Request: Bug 1182411 Part 2: Change winless popup surrogate to have its parent set in the chrome process. r?jimm

Aurora+, Beta+
Attachment #8643876 - Flags: approval-mozilla-beta?
Attachment #8643876 - Flags: approval-mozilla-beta+
Attachment #8643876 - Flags: approval-mozilla-aurora?
Attachment #8643876 - Flags: approval-mozilla-aurora+
(In reply to Ritu Kothari (:ritu) from comment #25)
> Comment on attachment 8643876 [details]
> MozReview Request: Bug 1182411 Part 2: Change winless popup surrogate to
> have its parent set in the chrome process. r?jimm
> 
> Aurora+, Beta+

Why are we uplifting sandbox patches to beta?
(In reply to Jim Mathies [:jimm] from comment #26)
> (In reply to Ritu Kothari (:ritu) from comment #25)
> > Comment on attachment 8643876 [details]
> > MozReview Request: Bug 1182411 Part 2: Change winless popup surrogate to
> > have its parent set in the chrome process. r?jimm
> > 
> > Aurora+, Beta+
> 
> Why are we uplifting sandbox patches to beta?

Because we want to turn on the Windows NPAPI sandbox for Flash on 64-bit by default if we can.

64-bit doesn't have Flash's own protected mode sandbox.
We are going to test Plug-in compatibility today on sevral OSes including Windows 7 x64 (https://goo.gl/fTTk7q). Are there any other particular scenarios to verify for this fix?
Flags: needinfo?(bobowen.code)
(In reply to Florin Mezei, QA (:FlorinMezei) from comment #30)
> We are going to test Plug-in compatibility today on sevral OSes including
> Windows 7 x64 (https://goo.gl/fTTk7q). Are there any other particular
> scenarios to verify for this fix?

This fix is only for when our low integrity sandbox is turned on, which it isn't by default.

However, it has changed the way the set-up works for the Flash and SilverLight right-click menus on windowless plugins, so any testing around that would be good.

QuickTime set-up also changed slightly, because in some circumstances we no longer make a call to the NPAPI process that did nothing.
Flags: needinfo?(bobowen.code)
Running the test from https://bugzilla.mozilla.org/show_bug.cgi?id=1197803#c7 is showing that the popup menu is sometimes not showing. I attached a debugger and found that mWinlessPopupSurrogateHWND is not always being set.
Flags: needinfo?(bobowen.code)
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #32)
> Running the test from
> https://bugzilla.mozilla.org/show_bug.cgi?id=1197803#c7 is showing that the
> popup menu is sometimes not showing. I attached a debugger and found that
> mWinlessPopupSurrogateHWND is not always being set.

This is because for async init we bail here:
https://dxr.mozilla.org/mozilla-central/rev/7f987c38bd3e5ac9a834981e85378bdb02338e9d/dom/plugins/ipc/PluginModuleParent.cpp#2547

So InitQuirksModes below that doesn't get hit, which we use to decide if we need to call the NPAPI child to create the winless surrogate.
I'll file another bug and work something out.
Flags: needinfo?(bobowen.code)
Depends on: 1202024
Depends on: 1207766
Depends on: 1209235
You need to log in before you can comment on or make changes to this bug.