Closed Bug 1111892 Opened 9 years ago Closed 9 years ago

Content process hangs when right-clicking windowed Flash object on linux

Categories

(Core Graveyard :: Plug-ins, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(e10sm8+, firefox45 fixed)

RESOLVED FIXED
mozilla45
Tracking Status
e10s m8+ ---
firefox45 --- fixed

People

(Reporter: jld, Assigned: gw280)

References

Details

Attachments

(2 files, 1 obsolete file)

STR: Display some Flash content (e.g., from https://www.adobe.com/products/flashplayer.html) in an e10s tab and try to right-click on it.

Expected: Flash player context menu.

Actual: content process hangs and generally stops doing things.

Non-STR: This works as expected in a non-e10s tab.

This might be a duplicate of bug 1111541 but I'm not sure, so I'm filing it separately.
Jed, you filed this for Linux, is that correct? If so what flavor?

WFM on windows

windowed flash test case: http://helpx.adobe.com/flash-player.html
Flags: needinfo?
Flags: needinfo? → needinfo?(jld)
Yes, this is on Linux.  Debian, but I doubt that matters much.

The content process's main thread is blocked here:

#7  0x00007f5363e20ca2 in mozilla::plugins::PPluginInstanceParent::CallNPP_HandleEvent

And here's what's happening in the plugin process's main thread:

#0  0x00007f1c305b718d in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f1c2df2aee4 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007f1c2df2b272 in g_main_loop_run () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007f1c26b400c1 in ?? () from /usr/lib/mozilla/plugins/libflashplayer.so
#4  0x00007f1c26b40abf in ?? () from /usr/lib/mozilla/plugins/libflashplayer.so
#5  0x00007f1c26b40c48 in ?? () from /usr/lib/mozilla/plugins/libflashplayer.so
#6  0x00007f1c2697c465 in ?? () from /usr/lib/mozilla/plugins/libflashplayer.so
#7  0x00007f1c26ad8a96 in ?? () from /usr/lib/mozilla/plugins/libflashplayer.so
#8  0x00007f1c26ae126b in ?? () from /usr/lib/mozilla/plugins/libflashplayer.so
#9  0x00007f1c26ade879 in ?? () from /usr/lib/mozilla/plugins/libflashplayer.so
#10 0x00007f1c331a7192 in mozilla::plugins::PluginInstanceChild::AnswerNPP_HandleEvent (
    this=0x7f1c28616200, event=..., handled=0x7fff9bafc7b0)
    at /home/jld/src/gecko-dev/dom/plugins/ipc/PluginInstanceChild.cpp:764
#11 0x00007f1c32316c19 in mozilla::plugins::PPluginInstanceChild::OnCallReceived (
    this=0x7f1c28616200, __msg=..., __reply=@0x7fff9bafc910: 0x0)
    at /home/jld/obj/gecko-dev/obj-x86_64-unknown-linux-gnu/ipc/ipdl/PPluginInstanceChild.cpp:2096
#12 0x00007f1c32138a77 in mozilla::ipc::MessageChannel::DispatchInterruptMessage (
    this=this@entry=0x7f1c28645060, aMsg=..., stackDepth=stackDepth@entry=0)
    at /home/jld/src/gecko-dev/ipc/glue/MessageChannel.cpp:1198

And for whatever this is worth:

(gdb) p/x $rsi
$2 = 0x3
(gdb) p/x ((struct pollfd*)($rdi))[0]
$3 = {fd = 0x4, events = 0x1, revents = 0x0}
(gdb) p/x ((struct pollfd*)($rdi))[1]
$4 = {fd = 0xc, events = 0x1, revents = 0x0}
(gdb) p/x ((struct pollfd*)($rdi))[2]
$5 = {fd = 0x5, events = 0x1, revents = 0x0}

lrwx------ 1 jld jld 64 Dec 16 14:29 /proc/6380/fd/4 -> anon_inode:[eventfd]
lrwx------ 1 jld jld 64 Dec 16 14:29 /proc/6380/fd/12 -> socket:[136828]
lr-x------ 1 jld jld 64 Dec 16 14:29 /proc/6380/fd/5 -> pipe:[141711]

The other end of the pipe (same process):

l-wx------ 1 jld jld 64 Dec 16 14:29 /proc/6380/fd/6 -> pipe:[141711]

I assume this is the other end of the socket (4531 is the parent), although I thought the inode numbers of the two ends of a socketpair were adjacent integers:

lrwx------ 1 jld jld 64 Dec 16 14:38 /proc/4531/fd/79 -> socket:[136828]
Flags: needinfo?(jld)
Thanks, if you can, would you mind posting the chrome and content process stacks? That would help in diagnosing what's locking up. Looks like the plugin is sitting in a modal loop. I'm guessing chrome is in a sync call, and content is..? no idea, but I'm curious.
Flags: needinfo?(jld)
The content process is in CallNPP_HandleEvent (see top of comment #2).  Here's more of the stack:

#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f40a966c580 in PR_WaitCondVar (cvar=0x7f408f307940, timeout=4294967295)
    at /home/jld/src/gecko-dev/nsprpub/pr/src/pthreads/ptsynch.c:385
#2  0x00007f40a4aac1c2 in mozilla::CondVar::Wait (this=0x7f4092560df0, aInterval=4294967295)
    at /home/jld/src/gecko-dev/xpcom/glue/BlockingResourceBase.cpp:501
#3  0x00007f40a4d331f7 in Wait (aInterval=4294967295, this=<optimized out>)
    at ../../dist/include/mozilla/Monitor.h:40
#4  mozilla::ipc::MessageChannel::WaitForSyncNotify (this=0x7f408f42a860)
    at /home/jld/src/gecko-dev/ipc/glue/MessageChannel.cpp:1332
#5  0x00007f40a4d3be10 in WaitForInterruptNotify (this=0x7f408f42a860)
    at /home/jld/src/gecko-dev/ipc/glue/MessageChannel.cpp:1342
#6  mozilla::ipc::MessageChannel::Call (this=0x7f408f42a860, aMsg=<optimized out>, 
    aReply=aReply@entry=0x7ffff3d54570) at /home/jld/src/gecko-dev/ipc/glue/MessageChannel.cpp:845
#7  0x00007f40a4f27700 in mozilla::plugins::PPluginInstanceParent::CallNPP_HandleEvent (
    this=this@entry=0x7f408f308690, event=..., handled=handled@entry=0x7ffff3d5460e)
    at /home/jld/obj/gecko-dev/obj-x86_64-unknown-linux-gnu/ipc/ipdl/PPluginInstanceParent.cpp:451
#8  0x00007f40a5db442b in mozilla::plugins::PluginInstanceParent::NPP_HandleEvent (
    this=0x7f408f308690, event=0x7ffff3d547c0)
    at /home/jld/src/gecko-dev/dom/plugins/ipc/PluginInstanceParent.cpp:1297
#9  0x00007f40a5d95f98 in nsNPAPIPluginInstance::HandleEvent (this=0x7f408fa16160, 
    event=0x7ffff3d547c0, result=0x7ffff3d547b0, aSafeToReenterGecko=<optimized out>)
    at /home/jld/src/gecko-dev/dom/plugins/base/nsNPAPIPluginInstance.cpp:683
#10 0x00007f40a5d980ca in nsPluginInstanceOwner::ProcessEvent (this=0x7f408fa160c0, anEvent=...)
    at /home/jld/src/gecko-dev/dom/plugins/base/nsPluginInstanceOwner.cpp:2265
#11 0x00007f40a5d9e0df in nsPluginInstanceOwner::ProcessMouseDown (this=this@entry=0x7f408fa160c0, 
    aMouseEvent=aMouseEvent@entry=0x7f408f34bca0)
    at /home/jld/src/gecko-dev/dom/plugins/base/nsPluginInstanceOwner.cpp:1565
#12 0x00007f40a5d9e1cc in nsPluginInstanceOwner::HandleEvent (this=0x7f408fa160c0, 
    aEvent=0x7f408f34bca0)
    at /home/jld/src/gecko-dev/dom/plugins/base/nsPluginInstanceOwner.cpp:1636
#13 0x00007f40a5af076b in mozilla::EventListenerManager::HandleEventSubType (
    this=this@entry=0x7f408f488b30, aListener=<optimized out>, aListener@entry=0x7f408f428ca8, 
    aDOMEvent=0x7f408f34bca0, aCurrentTarget=aCurrentTarget@entry=0x7f409a4fb270)
    at /home/jld/src/gecko-dev/dom/events/EventListenerManager.cpp:976
#14 0x00007f40a5af0eac in mozilla::EventListenerManager::HandleEventInternal (this=0x7f408f488b30, 
    aPresContext=aPresContext@entry=0x7f40958e0800, aEvent=aEvent@entry=0x7ffff3d552c8, 
    aDOMEvent=aDOMEvent@entry=0x7ffff3d54d10, aCurrentTarget=0x7f409a4fb270, 
    aEventStatus=aEventStatus@entry=0x7ffff3d54d18)
    at /home/jld/src/gecko-dev/dom/events/EventListenerManager.cpp:1122
#15 0x00007f40a5af69bf in HandleEvent (aEventStatus=0x7ffff3d54d18, aCurrentTarget=0x7f409a4fb270, 
    aDOMEvent=0x7ffff3d54d10, aEvent=0x7ffff3d552c8, aPresContext=0x7f40958e0800, 
    this=<optimized out>) at ../../dist/include/mozilla/EventListenerManager.h:330
#16 mozilla::EventTargetChainItem::HandleEvent (this=0x7f409577f008, aVisitor=..., aCd=...)
    at /home/jld/src/gecko-dev/dom/events/EventDispatcher.cpp:209
#17 0x00007f40a5af11e0 in mozilla::EventTargetChainItem::HandleEventTargetChain (aChain=..., 
    aVisitor=..., aCallback=aCallback@entry=0x7ffff3d54e50, aCd=...)
    at /home/jld/src/gecko-dev/dom/events/EventDispatcher.cpp:299
#18 0x00007f40a5af1a9f in mozilla::EventDispatcher::Dispatch (aTarget=<optimized out>, 
    aPresContext=<optimized out>, aEvent=aEvent@entry=0x7ffff3d552c8, 
    aDOMEvent=aDOMEvent@entry=0x0, aEventStatus=aEventStatus@entry=0x7ffff3d55184, 
    aCallback=aCallback@entry=0x7ffff3d54e50, aTargets=0x0)
    at /home/jld/src/gecko-dev/dom/events/EventDispatcher.cpp:634
#19 0x00007f40a62044d7 in PresShell::HandleEventInternal (this=this@entry=0x7f4095669c00, 
    aEvent=aEvent@entry=0x7ffff3d552c8, aStatus=aStatus@entry=0x7ffff3d55184)
    at /home/jld/src/gecko-dev/layout/base/nsPresShell.cpp:8252
#20 0x00007f40a62046a7 in PresShell::HandlePositionedEvent (this=this@entry=0x7f4095669c00, 
    aTargetFrame=aTargetFrame@entry=0x7f409255f2a8, aEvent=aEvent@entry=0x7ffff3d552c8, 
    aEventStatus=aEventStatus@entry=0x7ffff3d55184)
    at /home/jld/src/gecko-dev/layout/base/nsPresShell.cpp:7958
#21 0x00007f40a62062e7 in PresShell::HandleEvent (this=0x7f4095669c00, aFrame=<optimized out>, 
    aEvent=0x7ffff3d552c8, aDontRetargetEvents=<optimized out>, aEventStatus=0x7ffff3d55184)
    at /home/jld/src/gecko-dev/layout/base/nsPresShell.cpp:7758
#22 0x00007f40a5f9f538 in nsViewManager::DispatchEvent (this=<optimized out>, 
    aEvent=aEvent@entry=0x7ffff3d552c8, aView=aView@entry=0x7f40957b2890, 
    aStatus=aStatus@entry=0x7ffff3d55184) at /home/jld/src/gecko-dev/view/nsViewManager.cpp:774
#23 0x00007f40a5f9ce9e in nsView::HandleEvent (this=<optimized out>, aEvent=0x7ffff3d552c8, 
    aUseAttachedEvents=<optimized out>) at /home/jld/src/gecko-dev/view/nsView.cpp:1097
#24 0x00007f40a5face4d in mozilla::widget::PuppetWidget::DispatchEvent (this=0x7f4097a6ea10, 
    event=0x7ffff3d552c8, aStatus=@0x7ffff3d5527c: nsEventStatus_eIgnore)
    at /home/jld/src/gecko-dev/widget/PuppetWidget.cpp:332
#25 0x00007f40a5e367fd in mozilla::dom::TabChildBase::DispatchWidgetEvent (this=<optimized out>, 
    event=...) at /home/jld/src/gecko-dev/dom/ipc/TabChild.cpp:642
#26 0x00007f40a5e39d36 in mozilla::dom::TabChild::RecvRealMouseEvent (this=0x7f4097831800, 
    event=...) at /home/jld/src/gecko-dev/dom/ipc/TabChild.cpp:2291
#27 0x00007f40a4dbfecb in mozilla::dom::PBrowserChild::OnMessageReceived (this=<optimized out>, 
    __msg=...)
    at /home/jld/obj/gecko-dev/obj-x86_64-unknown-linux-gnu/ipc/ipdl/PBrowserChild.cpp:2523
#28 0x00007f40a4d32ff7 in mozilla::ipc::MessageChannel::DispatchAsyncMessage (this=0x7f409b21c890, 
    aMsg=...) at /home/jld/src/gecko-dev/ipc/glue/MessageChannel.cpp:1133
#29 0x00007f40a4d39669 in mozilla::ipc::MessageChannel::DispatchMessage (
    this=this@entry=0x7f409b21c890, aMsg=...)
    at /home/jld/src/gecko-dev/ipc/glue/MessageChannel.cpp:1073


The chrome process isn't waiting on anything that's obvious from its main thread's stack:

#0  0x00007f1ee65ef18d in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f1ee1a1781f in PollWrapper (ufds=0x7f1ec9b4f140, nfsd=7, timeout_=-1)
    at /home/jld/src/gecko-dev/widget/gtk/nsAppShell.cpp:44
#2  0x00007f1eddf67ee4 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007f1eddf67ffc in g_main_context_iteration () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#4  0x00007f1ee1a17865 in nsAppShell::ProcessNextNativeEvent (this=<optimized out>, 
    mayWait=<optimized out>) at /home/jld/src/gecko-dev/widget/gtk/nsAppShell.cpp:156
#5  0x00007f1ee19f0a2a in nsBaseAppShell::DoProcessNextNativeEvent (
    this=this@entry=0x7f1ed1aaf240, mayWait=<optimized out>, recursionDepth=recursionDepth@entry=0)
    at /home/jld/src/gecko-dev/widget/nsBaseAppShell.cpp:140
#6  0x00007f1ee19f0b5b in nsBaseAppShell::OnProcessNextEvent (this=0x7f1ed1aaf240, 
    thr=0x7f1ee6367a90, mayWait=true, recursionDepth=0)
    at /home/jld/src/gecko-dev/widget/nsBaseAppShell.cpp:298
#7  0x00007f1ee04d7570 in nsThread::ProcessNextEvent (this=0x7f1ee6367a90, 
    aMayWait=<optimized out>, aResult=0x7fffd0873b2f)
    at /home/jld/src/gecko-dev/xpcom/threads/nsThread.cpp:801
#8  0x00007f1ee04f7aad in NS_ProcessNextEvent (aThread=<optimized out>, aMayWait=<optimized out>)
    at /home/jld/src/gecko-dev/xpcom/glue/nsThreadUtils.cpp:265
#9  0x00007f1ee077849c in mozilla::ipc::MessagePump::Run (this=0x7f1ed583ba00, 
    aDelegate=0x7f1ed58246a0) at /home/jld/src/gecko-dev/ipc/glue/MessagePump.cpp:140
#10 0x00007f1ee075bf61 in MessageLoop::RunInternal (this=this@entry=0x7f1ed58246a0)
    at /home/jld/src/gecko-dev/ipc/chromium/src/base/message_loop.cc:233
#11 0x00007f1ee075bf92 in RunHandler (this=0x7f1ed58246a0)
    at /home/jld/src/gecko-dev/ipc/chromium/src/base/message_loop.cc:226
#12 MessageLoop::Run (this=0x7f1ed58246a0)
    at /home/jld/src/gecko-dev/ipc/chromium/src/base/message_loop.cc:200
Flags: needinfo?(jld)
I can reproduce this after allowing mixed content on the page. I guess our nested event loop handling isn't working somehow.
QA Contact: lhenry
I can't reproduce this in MacOS 10.10. It's working fine with Nightly and Flash 16.0.0.235 in both e10s and non-e10s tabs.
not tracking at this point until we get better str / additional reports.
(In reply to Jim Mathies [:jimm] from comment #7)
> not tracking at this point until we get better str / additional reports.

Have the original STR failed anyone on Linux yet?
Yeah, we definitely should be tracking this. It's Linux-specific, but we still need to fix it eventually. It's an e10s regression.
Assignee: nobody → jmathies
1117524 could be merged with this issue. There are crash reports on that bug in-case someone is interested
Summary: Content process hangs when right-clicking on Flash object in e10s tab → Content process hangs when right-clicking windowed Flash object on linux
See Also: 1111541
Assignee: jmathies → gwright
So I'm seeing a couple of different ways in which we end up in a hung state in the content process. The first is when I visit http://www.ted.com/talks/kailash_satyarthi_how_to_make_peace_get_angry and right click on the video twice.

After the first right click we cycle through these four messages indefinitely:

[time:1443067414873502][10402->10369][PPluginModuleChild] Sending Msg_ProcessSomeEvents([TODO])
[time:1443067414873633][10369<-10402][PPluginModuleParent] Received Msg_ProcessSomeEvents([TODO])
[time:1443067414875087][10369->10402][PPluginModuleParent] Sending reply Reply_ProcessSomeEvents([TODO])
[time:1443067414875156][10402<-10369][PPluginModuleChild] Received reply Reply_ProcessSomeEvents([TODO])

After the second right click, we get the following:

[time:1443067414943893][10369->10402][PPluginInstanceParent] Sending Msg_NPP_HandleEvent([TODO])
[time:1443067414944048][10402<-10369][PPluginInstanceChild] Received Msg_NPP_HandleEvent([TODO])
[time:1443067414944075][10402->10369][PPluginInstanceChild] Sending reply Reply_NPP_HandleEvent([TODO])
[time:1443067414944142][10369<-10402][PPluginInstanceParent] Received reply Reply_NPP_HandleEvent([TODO])
[time:1443067414944475][10369->10402][PPluginInstanceParent] Sending Msg_NPP_HandleEvent([TODO])
[time:1443067414944774][10402<-10369][PPluginInstanceChild] Received Msg_NPP_HandleEvent([TODO])


Then immediately go into cycling through these 16 messages forever, and we lose the calls to ProcessSomeEvents (which presumably is why the content process is now hung?):

[time:1443067415017914][10402->10369][PPluginInstanceChild] Sending Msg_NPN_GetValue_NPNVWindowNPObject([TODO])
[time:1443067415018061][10369<-10402][PPluginInstanceParent] Received Msg_NPN_GetValue_NPNVWindowNPObject([TODO])
[time:1443067415018132][10369->10402][PPluginInstanceParent] Sending reply Reply_NPN_GetValue_NPNVWindowNPObject([TODO])
[time:1443067415018206][10402<-10369][PPluginInstanceChild] Received reply Reply_NPN_GetValue_NPNVWindowNPObject([TODO])
[time:1443067415018224][10402->10369][PPluginInstanceChild] Sending Msg_NPN_PushPopupsEnabledState([TODO])
[time:1443067415018298][10369<-10402][PPluginInstanceParent] Received Msg_NPN_PushPopupsEnabledState([TODO])
[time:1443067415018338][10369->10402][PPluginInstanceParent] Sending reply Reply_NPN_PushPopupsEnabledState([TODO])
[time:1443067415018386][10402<-10369][PPluginInstanceChild] Received reply Reply_NPN_PushPopupsEnabledState([TODO])
[time:1443067415018403][10402->10369][PPluginScriptableObjectChild] Sending Msg_NPN_Evaluate([TODO])
[time:1443067415018446][10369<-10402][PPluginScriptableObjectParent] Received Msg_NPN_Evaluate([TODO])
[time:1443067415049802][10369->10402][PPluginScriptableObjectParent] Sending reply Reply_NPN_Evaluate([TODO])
[time:1443067415050122][10402<-10369][PPluginScriptableObjectChild] Received reply Reply_NPN_Evaluate([TODO])
[time:1443067415050202][10402->10369][PPluginInstanceChild] Sending Msg_NPN_PopPopupsEnabledState([TODO])
[time:1443067415050382][10369<-10402][PPluginInstanceParent] Received Msg_NPN_PopPopupsEnabledState([TODO])
[time:1443067415050453][10369->10402][PPluginInstanceParent] Sending reply Reply_NPN_PopPopupsEnabledState([TODO])
[time:1443067415050595][10402<-10369][PPluginInstanceChild] Received reply Reply_NPN_PopPopupsEnabledState([TODO])

The second one has the same STR, but I get the following after the first click, as with the case above:

[time:1443067847778670][10547->10511][PPluginModuleChild] Sending Msg_ProcessSomeEvents([TODO])
[time:1443067847778752][10511<-10547][PPluginModuleParent] Received Msg_ProcessSomeEvents([TODO])
[time:1443067847778786][10511->10547][PPluginModuleParent] Sending reply Reply_ProcessSomeEvents([TODO])
[time:1443067847778867][10547<-10511][PPluginModuleChild] Received reply Reply_ProcessSomeEvents([TODO])

Then after the second right click:

[time:1443067847831179][10511->10547][PPluginInstanceParent] Sending Msg_NPP_HandleEvent([TODO])
[time:1443067847831317][10547<-10511][PPluginInstanceChild] Received Msg_NPP_HandleEvent([TODO])
[time:1443067847831349][10547->10511][PPluginInstanceChild] Sending reply Reply_NPP_HandleEvent([TODO])
[time:1443067847831460][10511<-10547][PPluginInstanceParent] Received reply Reply_NPP_HandleEvent([TODO])
[time:1443067847831726][10511->10547][PPluginInstanceParent] Sending Msg_NPP_HandleEvent([TODO])
[time:1443067847831823][10547<-10511][PPluginInstanceChild] Received Msg_NPP_HandleEvent([TODO])

After which we simply hang.

In both cases, it looks like somehow we're losing the ProcessSomeEvents call from the plugin process which I have a feeling may be the root cause of this issue. As for why the context menu doesn't show all the time, I still have no idea. Flash is weird.
Looking at the IPC logs from a non-e10s (but still OOP plugins) shows the following when a right click event happens:

[time:1443068711554306][11113->10962][PPluginInstanceChild] Sending reply Reply_NPP_HandleEvent([TODO])
[time:1443068711554369][10962<-11113][PPluginInstanceParent] Received reply Reply_NPP_HandleEvent([TODO])
[time:1443068711555183][11113->10962][PPluginInstanceChild] Sending Msg_Show([TODO])
[time:1443068711556079][10962->11113][PPluginScriptableObjectParent] Sending Msg_HasProperty([TODO])
[time:1443068711556126][10962<-11113][PPluginInstanceParent] Received Msg_Show([TODO])
[time:1443068711556248][10962->11113][PPluginInstanceParent] Sending reply Reply_Show([TODO])
[time:1443068711556301][11113<-10962][PPluginInstanceChild] Received reply Reply_Show([TODO])
[time:1443068711556340][11113<-10962][PPluginScriptableObjectChild] Received Msg_HasProperty([TODO])
[time:1443068711556366][11113->10962][PPluginScriptableObjectChild] Sending reply Reply_HasProperty([TODO])
[time:1443068711556411][10962<-11113][PPluginScriptableObjectParent] Received reply Reply_HasProperty([TODO])
[time:1443068711556428][10962->11113][PPluginScriptableObjectParent] Sending Msg_HasMethod([TODO])
[time:1443068711556479][11113<-10962][PPluginScriptableObjectChild] Received Msg_HasMethod([TODO])
[time:1443068711556517][11113->10962][PPluginScriptableObjectChild] Sending reply Reply_HasMethod([TODO])
[time:1443068711556606][10962<-11113][PPluginScriptableObjectParent] Received reply Reply_HasMethod([TODO])
Flags: needinfo?(jmathies)
(In reply to George Wright (:gw280) from comment #14)
> So I'm seeing a couple of different ways in which we end up in a hung state
> in the content process. The first is when I visit
> http://www.ted.com/talks/kailash_satyarthi_how_to_make_peace_get_angry and
> right click on the video twice.
> 
> After the first right click we cycle through these four messages
> indefinitely:
> 
> [time:1443067414873502][10402->10369][PPluginModuleChild] Sending
> Msg_ProcessSomeEvents([TODO])
> [time:1443067414873633][10369<-10402][PPluginModuleParent] Received
> Msg_ProcessSomeEvents([TODO])
> [time:1443067414875087][10369->10402][PPluginModuleParent] Sending reply
> Reply_ProcessSomeEvents([TODO])
> [time:1443067414875156][10402<-10369][PPluginModuleChild] Received reply
> Reply_ProcessSomeEvents([TODO])

What process do we execute this code in (with e10s enabled and e10s disabled)?

http://mxr.mozilla.org/mozilla-central/source/dom/plugins/ipc/PluginModuleParent.cpp#2778
Flags: needinfo?(jmathies) → needinfo?(gwright)
e10s case:

PPluginModuleChild = plugin process
PPluginModuleParent = content process

non-e10s case:

PPluginModuleChild = plugin process
PPluginModuleParent = parent process
Flags: needinfo?(gwright)
Setting needinfo to make sure :jimm saw that :)
Flags: needinfo?(jmathies)
(In reply to George Wright (:gw280) (:gwright) from comment #17)
> e10s case:
> 
> PPluginModuleChild = plugin process
> PPluginModuleParent = content process
> 
> non-e10s case:
> 
> PPluginModuleChild = plugin process
> PPluginModuleParent = parent process

> PPluginModuleParent = content process

I think this the problem right here, in the e10s case we want this native event processing to execute in the chrome process. The plugin process has a connection to the chrome process under e10s. See PluginModuleChild::GetChrome(). Maybe we can send this over to PluginModuleChromeParent when e10s is active.
Flags: needinfo?(jmathies) → needinfo?(gwright)
I already tried sending ProcessSomeEvents to both the content and the chrome processes, and whilst I was verifying that AnswerProcessSomeEvents was executing in the chrome process, it didn't stop the hang. I think the problem lies elsewhere?
Flags: needinfo?(gwright)
Flags: needinfo?(jmathies)
Flags: needinfo?(jmathies)
So here's the patch you asked for Jim. Basically, we send ProcessSomeEvents to both the content and the chrome process, and it still hangs using the STR in comment 14
Attached patch ungrab.patch (obsolete) — Splinter Review
This isn't really a final patch; I want to solicit ideas for how best we can get this code to execute in the right process.

Basically, we need to ensure that gdk_pointer_ungrab() is called in the same process that receives the button event in the first place. In the non-e10s case, PluginInstanceParent is the chrome process which receives the button event, so that's fine, but in the e10s case, PluginInstanceParent is the content process so we end up hanging.

I couldn't find a direct route between PluginInstanceParent and PluginModuleChromeParent, so I ended up having to route the IPC call via PluginModuleChild's channel to PluginModuleChromeParent (gChromeInstance). I would like to know if there's a more direct route I can take?

In any case, this patch definitely fixes the issue, so at least we know where the issue lies.

Thanks to :karlt for helping to debug this!
Attachment #8682307 - Flags: review?(jmathies)
Comment on attachment 8682307 [details] [diff] [review]
ungrab.patch

Review of attachment 8682307 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/plugins/ipc/PluginInstanceParent.cpp
@@ +1234,5 @@
>  #  ifdef MOZ_WIDGET_GTK
>          // GDK attempts to (asynchronously) track whether there is an active
>          // grab so ungrab through GDK.
> +
> +        CallNPP_UngrabPointer(npevent->xbutton.time);

You have to account for non-e10s here, we can't break the old OOPP path. So your change needs to be wrapped in a XRE_IsContentProcess check. Also instead of going through the plugin process, you should be able to go straight to the ContentParent (browser) by grabbing the ContentChild singleton and using its protocol. Something like:

#include "mozilla/dom/ContentChild.h"

dom::ContentChild* cp = dom::ContentChild::GetSingleton();
cp->Call_UngrabPointer();

::: dom/plugins/ipc/PluginModuleChild.cpp
@@ +531,4 @@
>                 "not canceled before returning to main event loop!");
>  
>      pmc->CallProcessSomeEvents();
> +    gChromeInstance->CallProcessSomeEvents();

Do you still need this?

::: dom/plugins/ipc/PluginModuleParent.cpp
@@ +2795,5 @@
>  
>  bool
> +PluginModuleParent::AnswerUngrabPointer(const uint32_t& time)
> +{
> +    gdk_pointer_ungrab(time);

When you move this to the content proto, we don't #ifdef ipdl files, so this will be defined on all platforms. You'll need to deal with non-gtk platforms with some ifdefing and a NOT_REACHED assert. Search around there should be some examples.
Attachment #8682307 - Flags: review?(jmathies) → review-
Comment on attachment 8683338 [details] [diff] [review]
0001-Bug-1111892-Ensure-gdk_ungrab_pointer-is-always-call.patch

Review of attachment 8683338 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/ContentParent.cpp
@@ +1178,5 @@
>  bool
> +ContentParent::RecvUngrabPointer(const uint32_t& aTime)
> +{
> +#if !defined(MOZ_WIDGET_GTK)
> +    NS_RUNTIMEABORT("This message only makes sense on GTK platforms");

I think you have to return something here or you'll get an error. a push to try will confirm.

::: dom/ipc/PContent.ipdl
@@ +1134,4 @@
>      sync GetDeviceStorageLocation(nsString type)
>          returns (nsString path);
>  
> +    sync UngrabPointer(uint32_t time);

nit - add a comment please
Attachment #8683338 - Flags: review?(jmathies) → review+
https://hg.mozilla.org/mozilla-central/rev/3111b73e96d8
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: