Closed Bug 615417 Opened 14 years ago Closed 14 years ago

Races in async plugin painting code

Categories

(Core Graveyard :: Plug-ins, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: roc, Assigned: roc)

References

Details

Attachments

(2 files, 1 obsolete file)

I found a couple of races in our async plugin painting code that were causing us to fail plugin-painting reftests in my Fedora 13 VM. Patches coming.
Actually the problems all boil down to one issue in our test plugin. During painting it calls NPN_UserAgent, which is an RPC. During that call we can receive messages from the host, including SetWindow calls. It is possible that during a call to PluginInstanceChild::ShowPluginFrame, we process a reentrant SetWindow call that sets the clipRect to empty. This means the PluginInstanceChild::PaintRectToSurface call ends up not painting anything --- it's all clipped out --- but we still send the resulting surface via SendShow. This means we paint garbage.

This patch fixes the problem by not calling NPN_UserAgent while painting.
Attachment #493861 - Flags: review?(benjamin)
Plugins do actually call RPC methods while painting (silverlight can call dozens of them). Are we wallpapering over a real problem, or is this just some testing artifact?
I don't know. It's possible for a plugin to safely handle a reentrant SetWindow call while painting. Our test plugin currently doesn't. It's not very hard to do though (another way would be to make a copy of the instanceData's NPWindow when painting begins and don't refer to the original again during this paint cycle).

In general I have no idea whether plugins that do NPN_ calls during painting are set up to handle callbacks from the host during those RPCs.
(In reply to comment #3)
> In general I have no idea whether plugins that do NPN_ calls during painting
> are set up to handle callbacks from the host during those RPCs.

Of course, it's not just NPN_ calls during painting that could cause problems, either. Dispatching any NPP call to the plugin while it's calling an NPN API could cause unexpected reentrancy. I have no idea what our policy is for that in general.
I had lot of problems caused with this call during paint... but I though this call is required and covering some real use-case, and tried to make it works..
Which call? NPN_SetWindow is part of the painting sequence, but I don't think we should deliver it re-entrantly while we're inside itself or the painting NPP_HandleEvent.

The plugin wins RPC races, so in general, even if we received a NPP_ call from the browser we wouldn't deliver it during the paint cycle.

I'd be interested to see the stack where we deliver NPP_SetWindow within the paint, and we should probably fix that case.
Comment on attachment 493861 [details] [diff] [review]
Don't call NPN_UserAgent during painting in test plugin

The NPN_UserAgent call is important to one of the mochitests which tests for race behavior, so I believe we should leave it in and fix the behavior in the host.
Attachment #493861 - Flags: review?(benjamin) → review-
Here's the stack that's setting clipRect to 0x0 while we're painting:

Breakpoint 1, pluginDoSetWindow (instanceData=0x7ffff11ea000, newWindow=0x7ffff11362c8)
    at ../../../../../mozilla-central/modules/plugin/test/testplugin/nptest_gtk2.cpp:269
269	  instanceData->window = *newWindow;
#0  pluginDoSetWindow (instanceData=0x7ffff11ea000, newWindow=0x7ffff11362c8)
    at ../../../../../mozilla-central/modules/plugin/test/testplugin/nptest_gtk2.cpp:269
#1  0x00007fffe99fd9f1 in NPP_SetWindow (instance=0x7ffff11362b8, window=0x7ffff11362c8)
    at ../../../../../mozilla-central/modules/plugin/test/testplugin/nptest.cpp:952
#2  0x00007ffff60fb4c9 in mozilla::plugins::PluginInstanceChild::UpdateWindowAttributes (this=
    0x7ffff1136260, aForceSetWindow=true)
    at ../../../mozilla-central/dom/plugins/PluginInstanceChild.cpp:2469
#3  0x00007ffff60fac7e in mozilla::plugins::PluginInstanceChild::RecvAsyncSetWindow (this=0x7ffff1136260, 
    aSurfaceType=@0x7fffffffbc8c, aWindow=...)
    at ../../../mozilla-central/dom/plugins/PluginInstanceChild.cpp:2195
#4  0x00007ffff62244f3 in mozilla::plugins::PPluginInstanceChild::OnMessageReceived (this=0x7ffff1136260, 
    __msg=...) at PPluginInstanceChild.cpp:1276
#5  0x00007ffff621a5b5 in mozilla::plugins::PPluginModuleChild::OnMessageReceived (this=0x7ffff111c830, 
    __msg=...) at PPluginModuleChild.cpp:458
#6  0x00007ffff611a195 in mozilla::ipc::AsyncChannel::OnDispatchMessage (this=0x7ffff111c840, msg=...)
    at ../../../mozilla-central/ipc/glue/AsyncChannel.cpp:262
#7  0x00007ffff61218b1 in mozilla::ipc::RPCChannel::Call (this=0x7ffff111c840, msg=0x7ffff11f40b0, reply=
    0x7fffffffc020) at ../../../mozilla-central/ipc/glue/RPCChannel.cpp:244
#8  0x00007ffff6219859 in mozilla::plugins::PPluginModuleChild::CallNPN_UserAgent (this=0x7ffff111c830, 
    userAgent=0x7ffff111cb08) at PPluginModuleChild.cpp:245
#9  0x00007ffff6102cdc in mozilla::plugins::PluginModuleChild::GetUserAgent (this=0x7ffff111c830)
    at ../../../mozilla-central/dom/plugins/PluginModuleChild.cpp:589
#10 0x00007ffff6103e35 in mozilla::plugins::child::_useragent (aNPP=0x7ffff11362b8)
    at ../../../mozilla-central/dom/plugins/PluginModuleChild.cpp:1162
#11 0x00007fffe99feb07 in NPN_UserAgent (instance=0x7ffff11362b8)
    at ../../../../../mozilla-central/modules/plugin/test/testplugin/nptest.cpp:1390
#12 0x00007fffe9a034af in pluginDrawWindow (instanceData=0x7ffff11ea000, gdkWindow=
    0x7ffff112a150 [GdkPixmap], invalidRect=...)
    at ../../../../../mozilla-central/modules/plugin/test/testplugin/nptest_gtk2.cpp:185
#13 0x00007fffe9a03eff in pluginHandleEvent (instanceData=0x7ffff11ea000, event=0x7fffffffc320)
    at ../../../../../mozilla-central/modules/plugin/test/testplugin/nptest_gtk2.cpp:396
#14 0x00007fffe99fe51c in NPP_HandleEvent (instance=0x7ffff11362b8, event=0x7fffffffc320)
    at ../../../../../mozilla-central/modules/plugin/test/testplugin/nptest.cpp:1223
#15 0x00007ffff60fb63b in mozilla::plugins::PluginInstanceChild::PaintRectToPlatformSurface (this=
    0x7ffff1136260, aRect=..., aSurface=0x7ffff1110440)
    at ../../../mozilla-central/dom/plugins/PluginInstanceChild.cpp:2541
#16 0x00007ffff60fb877 in mozilla::plugins::PluginInstanceChild::PaintRectToSurface (this=0x7ffff1136260, 
    aRect=..., aSurface=0x7ffff1110440, aColor=...)
    at ../../../mozilla-central/dom/plugins/PluginInstanceChild.cpp:2611
#17 0x00007ffff60fc06b in mozilla::plugins::PluginInstanceChild::ShowPluginFrame (this=0x7ffff1136260)
    at ../../../mozilla-central/dom/plugins/PluginInstanceChild.cpp:2701
#18 0x00007ffff60fc6fa in mozilla::plugins::PluginInstanceChild::InvalidateRectDelayed (this=0x7ffff1136260)
    at ../../../mozilla-central/dom/plugins/PluginInstanceChild.cpp:2803
#19 0x00007ffff60ff06c in DispatchToMethod<mozilla::plugins::PluginInstanceChild, void (mozilla::plugins::PluginInstanceChild::*)()> (obj=0x7ffff1136260, method=
    (void (mozilla::plugins::PluginInstanceChild::*)(mozilla::plugins::PluginInstanceChild *)) 0x7ffff60fc68a <mozilla::plugins::PluginInstanceChild::InvalidateRectDelayed()>, arg=...)
    at ../../../mozilla-central/ipc/chromium/src/base/tuple.h:383
#20 0x00007ffff60fefbc in RunnableMethod<mozilla::plugins::PluginInstanceChild, void (mozilla::plugins::PluginInstanceChild::*)(), Tuple0>::Run (this=0x7ffff1110400)
    at ../../../mozilla-central/ipc/chromium/src/base/task.h:307
#21 0x00007ffff63ba418 in MessageLoop::RunTask (this=0x7fffffffdaa0, task=0x7ffff1110400)
    at ../../../mozilla-central/ipc/chromium/src/base/message_loop.cc:343

So per comment #6, I guess there's a race between the NPN_UserAgent message from the plugin and the AsyncSetWindow message from the host. Is it a bug in the plugin IPC code that the AsyncSetWindow message is being delivered while the plugin waits for a response to NPN_UserAgent?
Karl says that since AsyncSetWindow is 'async', we just deliver that message during any rpc from the plugin to the host, it's not subject to race control.

Assuming that's true, what's the best way to ensure that AsyncSetWindow is only delivered when we're not in a nested plugin call?
Yes, Karl is correct.

There are several threads in this discussion
 - browser and plugin RPCs race all the time
 - by default, the plugin wins races, except for messages related to painting
 - so, in the old days of synchronous painting, we would re-enter plugin RPCs with SetWindow all the time.  They seem to be coping.
 - the new ingredient here is that with async painting, we dispatch a paint task to the plugin, entirely in the plugin subprocess, and the browser doesn't know about it.  This is causing the new re-entry roc sees, painting-thing re-entering another painting-thing
 - I don't think we should expect plugins to deal with this because it wouldn't have happened with synchronous painting (v. our test plugin)

I don't have a deep understanding of the new async-painting protocol, so I don't know whether this behavior is buggy.  I suspect it's not.  Can someone deeper into this code confirm?

If this isn't buggy, then the best way to deal with it is probably to defer delivery of AsyncSetWindow() to a new event loop iteration if it's received while the plugin is in ShowPluginFrame().  Could just set a flag before invoking ShowPluginFrame().
(In reply to comment #10)
> Yes, Karl is correct.
> 
> There are several threads in this discussion
>  - browser and plugin RPCs race all the time
>  - by default, the plugin wins races, except for messages related to painting
>  - so, in the old days of synchronous painting, we would re-enter plugin RPCs
> with SetWindow all the time.  They seem to be coping.

Yeah, except we wouldn't do this during painting, since the browser was paused while painting.

>  - the new ingredient here is that with async painting, we dispatch a paint
> task to the plugin, entirely in the plugin subprocess, and the browser doesn't
> know about it.  This is causing the new re-entry roc sees, painting-thing
> re-entering another painting-thing
> I don't have a deep understanding of the new async-painting protocol, so I
> don't know whether this behavior is buggy.  I suspect it's not.  Can someone
> deeper into this code confirm?

Thanks for the explanation.

I don't think there's anything wrong with the async-painting protocol here.

> > If this isn't buggy, then the best way to deal with it is probably to defer
> delivery of AsyncSetWindow() to a new event loop iteration if it's received
> while the plugin is in ShowPluginFrame().  Could just set a flag before
> invoking ShowPluginFrame().

OK.

AsyncSetWindow and WindowPosChanged are both 'async' and can therefore be delivered during a plugin paint. An rpc SetWindow call would win the race, but hopefully we won't get one since we're supposed to be using AsyncSetWIndow instead...
Oh, WindowPosChanged is only delivered for non-layers rendering so that's not a problem either.
Is it possible for us to choose synchronous rendering for some instances of the same plugin but async for others?  This could be an issue if plugins are modifying module-wide state during paints.
Oh nm, that's always possible with windowed plugins (may they rot in hell).  Guess we'll have to see how that plays out in the field.
This seems to work.

I cargo-culted the task code. Maybe some of the lines aren't necessary.
Attachment #493861 - Attachment is obsolete: true
Attachment #494948 - Flags: review?(benjamin)
Is it possible to write a test for this?  If we had a way to reliably force an AsyncSetWindow in the browser, the test-plugin side would be relatively straightforward.
I'm pretty sure that this bug is causing 580160-1.html to fail intermittently on Tinderbox (see bug 611164). Do you want a more reliable test than that?
If we can force dispatch of AsyncSetWindow, then the test I had in mind would be something like: plugin waits for delivery of paint request, notifies browser it's in the paint task, spins a nested event loop, then asks browser to force delivery of AsyncSetWindow and checks that the paint function isn't re-entered until after the browser confirms that the AsyncSetWindow message was received by the plugin subprocess (and therefore wasn't dispatched).  Depending on what mechanism we used to force AsyncSetWindow, we could potentially catch a larger class of async-paint-reentry bugs with such a test.

We obviously don't expect plugins to spin a nested event loop during painting (or care if bad things happen if they do), but in our testplugin it's an easy reentrancy check.
We could just have the plugin sleep(longtime) when it receives NPP_SetWindow, and script can keep changing the plugin size in the meantime.
Comment on attachment 494948 [details] [diff] [review]
Postpone AsyncSetWindow processing

Although NewRunnableMethod may be correct, it worries me that we might be passing these by reference instead of by value. Can we call NewRunnableMethod<PluginInstanceChild, gfxSurfaceType, NPRemoteWindow> explicitly so that we know the compiler isn't storing references to stack values in the runnable? r=me with that change
Attachment #494948 - Flags: review?(benjamin) → review+
Good idea.
Whiteboard: [needs landing]
(In reply to comment #19)
> We could just have the plugin sleep(longtime) when it receives NPP_SetWindow,
> and script can keep changing the plugin size in the meantime.

I'm not sure how this helps.

I've tried to write a reliable test but it didn't fail reliably without the patch. I probably could fix it, but I'm not sure how much effort to put into it given an existing test already fails intermittently.
Comment on attachment 494948 [details] [diff] [review]
Postpone AsyncSetWindow processing

Fixes intermittent Tinderbox test failure.
Attachment #494948 - Flags: approval2.0?
Attachment #494948 - Flags: approval2.0? → approval2.0+
Attached patch v2Splinter Review
Incorporates Benjamin's comment.
Yes, thanks.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Whiteboard: [needs landing]
Blocks: 803473
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: