Closed Bug 1145838 Opened 9 years ago Closed 8 years ago

(e10s) Failures in browser_tab_dragdrop.js when both e10s and Asynchronous Plugin Initialization are enabled

Categories

(Core Graveyard :: Plug-ins, defect)

defect
Not set
normal

Tracking

(e10s-)

RESOLVED WONTFIX
Tracking Status
e10s - ---

People

(Reporter: bugzilla, Unassigned)

References

Details

For some reason the results are different on our test infra vs running the test locally. See https://treeherder.mozilla.org/#/jobs?repo=try&revision=8d947323956f for an example of the former.

When run locally, everything hangs and I end up with these callstacks:

Chrome Process:

xul!mozilla::ipc::MessageChannel::WaitForSyncNotify+0x449
xul!mozilla::ipc::MessageChannel::Send+0x61d
xul!mozilla::jsipc::PJavaScriptParent::SendGet+0x145
xul!mozilla::jsipc::JavaScriptBase<mozilla::jsipc::PJavaScriptParent>::SendGet+0x36
xul!mozilla::jsipc::WrapperOwner::get+0x254
xul!CPOWProxyHandler::get+0x67
xul!js::Proxy::get+0x119
xul!js::proxy_GetProperty+0x1c
xul!js::GetProperty+0x38
xul!js::DirectProxyHandler::get+0x84
xul!js::CrossCompartmentWrapper::get+0xcb
xul!xpc::AddonWrapper<js::CrossCompartmentWrapper>::get+0xc5
xul!js::Proxy::get+0x119
xul!js::proxy_GetProperty+0x1c
xul!js::GetProperty+0x38
xul!GetPropertyOperation+0x2e0
xul!Interpret+0x7ab2
xul!js::RunScript+0x1f1
xul!js::Invoke+0x43f
xul!js::Invoke+0x264
xul!js::DirectProxyHandler::call+0xa9
xul!js::CrossCompartmentWrapper::call+0x161
xul!js::Proxy::call+0xc7
xul!js::proxy_Call+0xaf
xul!js::CallJSNative+0x66
xul!js::Invoke+0x275
xul!Interpret+0x8e00
xul!js::RunScript+0x1f1
xul!js::Invoke+0x43f
xul!js::CallOrConstructBoundFunction+0x3a7
xul!js::CallJSNative+0x66
xul!js::Invoke+0x32e
xul!js::Invoke+0x264
xul!js::DirectProxyHandler::call+0xa9
xul!js::CrossCompartmentWrapper::call+0x161
xul!js::Proxy::call+0xc7
xul!js::proxy_Call+0xaf
xul!js::CallJSNative+0x66
xul!js::Invoke+0x275
xul!js::Invoke+0x264
xul!JS::Call+0x84
xul!mozilla::dom::AnyCallback::Call+0x219
xul!mozilla::dom::AnyCallback::Call+0x8c
xul!mozilla::dom::WrapperPromiseCallback::Call+0x123
xul!mozilla::dom::PromiseCallbackTask::Run+0x398
xul!mozilla::dom::Promise::PerformMicroTaskCheckpoint+0xaf
xul!nsXPConnect::AfterProcessNextEvent+0x88
xul!nsThread::ProcessNextEvent+0x703
xul!NS_ProcessNextEvent+0x62
xul!mozilla::ipc::MessagePump::Run+0x23f
xul!MessageLoop::RunInternal+0x4d
xul!MessageLoop::RunHandler+0x82
xul!MessageLoop::Run+0x1d
xul!nsBaseAppShell::Run+0x50
xul!nsAppShell::Run+0x17
xul!nsAppStartup::Run+0x6a
xul!XREMain::XRE_mainRun+0x129e
xul!XREMain::XRE_main+0x2e7
xul!XRE_main+0x35

Content process:

ntdll!NtWaitForSingleObject+0xc
KERNELBASE!WaitForSingleObjectEx+0x99
KERNELBASE!WaitForSingleObject+0x12
nss3!_PR_MD_WAIT_CV+0x7f
nss3!_PR_WaitCondVar+0xe7
nss3!PR_WaitCondVar+0x7f
xul!mozilla::CondVar::Wait+0x52
xul!mozilla::Monitor::Wait+0x16
xul!mozilla::ipc::MessageChannel::WaitForSyncNotify+0x120
xul!mozilla::ipc::MessageChannel::WaitForInterruptNotify+0x78
xul!mozilla::ipc::MessageChannel::WaitForIncomingMessage+0x78
xul!mozilla::plugins::PluginAsyncSurrogate::WaitForInit+0x25a
xul!mozilla::plugins::PluginAsyncSurrogate::ScriptableHasProperty+0x11c
xul!NPObjWrapper_Resolve+0xba
xul!js::CallResolveOp+0x98
xul!js::LookupOwnPropertyInline<1>+0x1ab
xul!NativeGetPropertyInline<1>+0x76
xul!js::NativeGetProperty+0x1e
xul!js::GetProperty+0x61
xul!js::DirectProxyHandler::get+0x84
xul!js::Proxy::get+0x119
xul!js::proxy_GetProperty+0x1c
xul!js::GetProperty+0x38
xul!js::DirectProxyHandler::get+0x84
xul!js::CrossCompartmentWrapper::get+0xcb
xul!xpc::WaiveXrayWrapper::get+0x25
xul!js::Proxy::get+0x119
xul!js::proxy_GetProperty+0x1c
xul!js::GetProperty+0x38
xul!JS_ForwardGetPropertyTo+0x4f
xul!mozilla::jsipc::WrapperAnswer::RecvGet+0x307
xul!mozilla::jsipc::JavaScriptBase<mozilla::jsipc::PJavaScriptChild>::RecvGet+0x3b
xul!mozilla::jsipc::PJavaScriptChild::OnMessageReceived+0x1af3
xul!mozilla::dom::PContentChild::OnMessageReceived+0x64
xul!mozilla::ipc::MessageChannel::DispatchSyncMessage+0x1b8
xul!mozilla::ipc::MessageChannel::DispatchMessageW+0x56
xul!mozilla::ipc::MessageChannel::OnMaybeDequeueOne+0x171
xul!DispatchToMethod<mozilla::ipc::MessageChannel,bool (__thiscall mozilla::ipc::MessageChannel::*)(void)>+0x9
xul!RunnableMethod<mozilla::ipc::MessageChannel,bool (__thiscall mozilla::ipc::MessageChannel::*)(void),Tuple0>::Run+0x2a
xul!mozilla::ipc::MessageChannel::RefCountedTask::Run+0x1a
xul!mozilla::ipc::MessageChannel::DequeueTask::Run+0x19
xul!MessageLoop::RunTask+0x44
xul!MessageLoop::DeferOrRunPendingTask+0x36
xul!MessageLoop::DoWork+0xcd
xul!mozilla::ipc::DoWorkRunnable::Run+0x74
xul!nsThread::ProcessNextEvent+0x592
xul!NS_ProcessNextEvent+0x62
xul!mozilla::ipc::MessagePump::Run+0x23f
xul!mozilla::ipc::MessagePumpForChildProcess::Run+0x1cf
xul!MessageLoop::RunInternal+0x4d
xul!MessageLoop::RunHandler+0x82
xul!MessageLoop::Run+0x1d
xul!nsBaseAppShell::Run+0x50
xul!nsAppShell::Run+0x17
xul!XRE_RunAppShell+0x7a
xul!mozilla::ipc::MessagePumpForChildProcess::Run+0x85
xul!MessageLoop::RunInternal+0x4d
xul!MessageLoop::RunHandler+0x82
xul!MessageLoop::Run+0x1d
xul!XRE_InitChildProcess+0x907
plugin_container!content_process_main+0x194
plugin_container!NS_internal_main+0x10
plugin_container!wmain+0x11e
plugin_container!__tmainCRTStartup+0xfe
KERNEL32!BaseThreadInitThunk+0x24
ntdll!__RtlUserThreadStart+0x2f
ntdll!_RtlUserThreadStart+0x1b


Plugin process:

ntdll!NtWaitForMultipleObjects+0xc
KERNELBASE!WaitForMultipleObjectsEx+0xcc
USER32!MsgWaitForMultipleObjectsEx+0x163
xul!mozilla::widget::WinUtils::WaitForMessage+0x5a
xul!base::MessagePumpForUI::WaitForWork+0x2a
xul!base::MessagePumpForUI::DoRunLoop+0x10f
xul!base::MessagePumpWin::RunWithDispatcher+0x5f
xul!base::MessagePumpWin::Run+0x15
xul!MessageLoop::RunInternal+0x4d
xul!MessageLoop::RunHandler+0x58
xul!MessageLoop::Run+0x1d
xul!XRE_InitChildProcess+0x907
plugin_container!content_process_main+0x194
plugin_container!NS_internal_main+0x10
plugin_container!wmain+0x11e
plugin_container!__tmainCRTStartup+0xfe
KERNEL32!BaseThreadInitThunk+0x24
ntdll!__RtlUserThreadStart+0x2f
ntdll!_RtlUserThreadStart+0x1b
I think what we're seeing here is this sequence:

1) plugin starts initializing asynchronously. Chrome process is asynchronously starting plugin process;
2) Property access on plugin object requires CPOW to reach into content;
3) Accessing a property on an async init plugin instance forces a synchronous wait for the plugin to complete loading;
4) We deadlock because content is waiting on chrome to finish loading the plugin, but chrome isn't doing that right now because it's waiting on the sync message for the CPOW.

Mike, does this seem like a reasonable analysis to you? Is the solution to just fix that part of the test, or I need to investigate more serious mitigations? Feel free to redirect as necessary...
Flags: needinfo?(mconley)
That sounds extremely plausible, but billm would probably have a definitive answer.
Flags: needinfo?(mconley) → needinfo?(wmccloskey)
Looks like we're going to have to change the WaitForIncomingMessage call somehow. It's not compatible with CPOWs.

It would be better to send a sync message instead. Do you think that's possible Aaron?
Flags: needinfo?(wmccloskey)
I think that the fundamental problem is that, when asyncInit is turned on, the plugin process and NP_Initialize are asynchronous in the chrome process. Under normal conditions everything works fine, but as soon as the chrome process hits a CPOW for a plugin, we need to synchronize on those async operations before the CPOW calls into content.

The easy solution is to disable the asynchronous stuff that runs in the chrome process and just leave it to run synchronously. I'd prefer not to do that because (a) it will negate some of the biggest jank mitigations that asyncInit offers; and (b) it's only when encountering CPOWs that this is a problem.

Bill, is there any way that asyncInit could somehow be notified when a CPOW is about to call into content so that I could only force the synchronization when it is absolutely necessary?
Flags: needinfo?(wmccloskey)
I was thinking of a different approach. You could have PluginAsyncSurrogate::WaitForInit in the content process send a high-priority sync message to the parent. In the parent you could call PluginAsyncSurrogate::WaitForInit() (or something similar if the surrogate doesn't exist there) and then return. I think that would solve this problem. The flow would look like this:

Chrome              Content
CPOW ----------->
                    Run some JS
     <------------- PluginAsyncSurrogate::WaitForInit()
Wait for plugin
Reply  ----------->
     <------------- CPOW reply

So the WaitForInit message would nest inside the CPOW message. That works much more like the traditional CPOW message flow.
Flags: needinfo?(wmccloskey)
Depends on: 1160298
I think that I have a working patch for the CPOW hang, however I'm still seeing the browser_tab_dragdrop.js timeout.

I've been able to reproduce it locally. The plugin is initializing correctly but its corresponding tab just sits there with the throbber spinning away.
Depends on: 1213454
Depends on: 1211642
async init is on hold, so WONTFIX for now.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.