async plugin init spike on Win64 of hang in mozilla::ipc::MessageChannel::MaybeHandleError

RESOLVED FIXED in Firefox 39

Status

()

Core
Plug-ins
--
critical
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: Robert Kaiser, Assigned: aklotz)

Tracking

({crash})

Trunk
mozilla40
Unspecified
Windows NT
crash
Points:
---

Firefox Tracking Flags

(firefox38 wontfix, firefox39+ fixed, firefox40+ fixed)

Details

(crash signature)

Attachments

(1 attachment)

(Reporter)

Description

2 years ago
[Tracking Requested - why for this release]:

This bug was filed from the Socorro interface and is 
report bp-b238393b-0cac-4164-97c1-45ea32150422.
=============================================================

This is the third-largest hang on Dev Edition 39 with async plugin init - the signature is almost non-existent with Nightly 40 or Beta 38, even though both of those versions have more Win64 build users, and this is a signature we see on Win64 builds only.


Top frames:
0 	ntdll.dll 	NtWaitForMultipleObjects 	
1 	kernelbase.dll 	RtlAnsiStringToUnicodeString 	
2 	xul.dll 	mozilla::ipc::MessageChannel::MaybeHandleError(mozilla::ipc::HasResultCodes::Result, IPC::Message const&, char const*) 	ipc/glue/MessageChannel.cpp
3 	xul.dll 	mozilla::ipc::MessageChannel::WaitForInterruptNotify() 	ipc/glue/WindowsMessageLoop.cpp
4 	xul.dll 	mozilla::ipc::MessageChannel::Call(IPC::Message*, IPC::Message*) 	ipc/glue/MessageChannel.cpp
5 	xul.dll 	mozilla::plugins::PPluginInstanceChild::CallNPN_GetValue_NPNVdocumentOrigin(nsCString*, short*) 	obj-firefox/ipc/ipdl/PPluginInstanceChild.cpp
6 	xul.dll 	mozilla::plugins::PluginInstanceChild::NPN_GetValue(NPNVariable, void*) 	dom/plugins/ipc/PluginInstanceChild.cpp
7 	xul.dll 	mozilla::plugins::child::_getvalue 	dom/plugins/ipc/PluginModuleChild.cpp
8 	npswf64_17_0_0_169.dll 	F2086112263____________________________________________________________ 	F1737606425____________________________________________________________:275
9 	xul.dll 	mozilla::plugins::child::_getvalue 	dom/plugins/ipc/PluginModuleChild.cpp
10 	npswf64_17_0_0_169.dll 	NPP_New 	F17927733___________________________________________________________:674
[...]
Tracking this crash for 39+ as we'll need the fix on 40 as well once it moves to Aurora. Given the low volume on 38, I have marked 38 as wontfix.

Aaron - Another one for you.
status-firefox38: --- → wontfix
status-firefox40: --- → affected
tracking-firefox39: ? → +
tracking-firefox40: --- → +
Flags: needinfo?(aklotz)
(Reporter)

Updated

2 years ago
Summary: asyn plugin init spike on Win64 of hang in mozilla::ipc::MessageChannel::MaybeHandleError → async plugin init spike on Win64 of hang in mozilla::ipc::MessageChannel::MaybeHandleError
Blocks: 1116806
Flags: needinfo?(aklotz)
Looks like this crash is still happening on 39 only with 559 crashes in the last week, all on Windows 7. At this point 40 does not seem to be affected. 

Aaron, might this be from a fix on another bug that we could uplift to 39?
Flags: needinfo?(aklotz)
(Reporter)

Comment 3

2 years ago
There at least seem to be crashes with this signature that have happened in build IDs after bug 1156800 landed, yes.
I think there are some further improvements that I can make here. Patch forthcoming.
Flags: needinfo?(aklotz)
Created attachment 8602504 [details] [diff] [review]
Patch

Based on the stack in KaiRo's report, it's clear that there is cascading going on (notice the multiple invocations of PluginModuleChild::RecvAsyncNPP_New on the stack. That can cause all sorts of interesting reentry scenarios. Executing DoNPP_New in its own task will eliminate that cascade.
Assignee: nobody → aklotz
Status: NEW → ASSIGNED
Attachment #8602504 - Flags: review?(jmathies)

Comment 6

2 years ago
Comment on attachment 8602504 [details] [diff] [review]
Patch

Delaying the call to DoNPP_New() more seems a bit scary to me but I don't know the code that well. You are clearly comfortable with it. Can you assure me there's no risk here?
Attachment #8602504 - Flags: review?(jmathies) → review+
(In reply to Jim Mathies [:jimm] from comment #6)
> Comment on attachment 8602504 [details] [diff] [review]
> Patch
> 
> Delaying the call to DoNPP_New() more seems a bit scary to me but I don't
> know the code that well. You are clearly comfortable with it. Can you assure
> me there's no risk here?

I can confirm that.

We only post the task in RecvAsyncNPP_New which is the async init variant. It is defined as an async function in the protocol. The non-asyncInit variant is still fully synchronous at the IPC level and runs immediately, just like it always has.

Since the browser in asyncInit mode is already sending an async message to trigger NPP_New, it is fully prepared to deal with the consequences of delays in the actual execution of NPP_New inside the child process.
https://hg.mozilla.org/integration/mozilla-inbound/rev/262f4f70c56d
Comment on attachment 8602504 [details] [diff] [review]
Patch

Approval Request Comment
[Feature/regressing bug #]: async plugin init
[User impact if declined]: hangs
[Describe test coverage new/current, TreeHerder]: Tested locally and on try with asyncInit enabled.
[Risks and why]: Low, simple patch that eliminates excessively complex call stacks in plugin-container.
[String/UUID change made/needed]: None
Attachment #8602504 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/262f4f70c56d
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox40: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Comment on attachment 8602504 [details] [diff] [review]
Patch

Approved for uplift to aurora.
Attachment #8602504 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8602504 [details] [diff] [review]
Patch

[Triage Comment]

This should be uplifted to beta (39) - it was already on aurora (40). My mistake
Attachment #8602504 - Flags: approval-mozilla-aurora+ → approval-mozilla-beta+
https://hg.mozilla.org/releases/mozilla-beta/rev/6a1a9d282be2
status-firefox39: affected → fixed
You need to log in before you can comment on or make changes to this bug.