Closed Bug 1157237 Opened 4 years ago Closed 4 years ago

async plugin init spike on Win64 of hang in mozilla::ipc::MessageChannel::MaybeHandleError

Categories

(Core :: Plug-ins, defect, critical)

Unspecified
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox38 --- wontfix
firefox39 + fixed
firefox40 + fixed

People

(Reporter: kairo, Assigned: aklotz)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

[Tracking Requested - why for this release]:

This bug was filed from the Socorro interface and is 
report bp-b238393b-0cac-4164-97c1-45ea32150422.
=============================================================

This is the third-largest hang on Dev Edition 39 with async plugin init - the signature is almost non-existent with Nightly 40 or Beta 38, even though both of those versions have more Win64 build users, and this is a signature we see on Win64 builds only.


Top frames:
0 	ntdll.dll 	NtWaitForMultipleObjects 	
1 	kernelbase.dll 	RtlAnsiStringToUnicodeString 	
2 	xul.dll 	mozilla::ipc::MessageChannel::MaybeHandleError(mozilla::ipc::HasResultCodes::Result, IPC::Message const&, char const*) 	ipc/glue/MessageChannel.cpp
3 	xul.dll 	mozilla::ipc::MessageChannel::WaitForInterruptNotify() 	ipc/glue/WindowsMessageLoop.cpp
4 	xul.dll 	mozilla::ipc::MessageChannel::Call(IPC::Message*, IPC::Message*) 	ipc/glue/MessageChannel.cpp
5 	xul.dll 	mozilla::plugins::PPluginInstanceChild::CallNPN_GetValue_NPNVdocumentOrigin(nsCString*, short*) 	obj-firefox/ipc/ipdl/PPluginInstanceChild.cpp
6 	xul.dll 	mozilla::plugins::PluginInstanceChild::NPN_GetValue(NPNVariable, void*) 	dom/plugins/ipc/PluginInstanceChild.cpp
7 	xul.dll 	mozilla::plugins::child::_getvalue 	dom/plugins/ipc/PluginModuleChild.cpp
8 	npswf64_17_0_0_169.dll 	F2086112263____________________________________________________________ 	F1737606425____________________________________________________________:275
9 	xul.dll 	mozilla::plugins::child::_getvalue 	dom/plugins/ipc/PluginModuleChild.cpp
10 	npswf64_17_0_0_169.dll 	NPP_New 	F17927733___________________________________________________________:674
[...]
Tracking this crash for 39+ as we'll need the fix on 40 as well once it moves to Aurora. Given the low volume on 38, I have marked 38 as wontfix.

Aaron - Another one for you.
Flags: needinfo?(aklotz)
Summary: asyn plugin init spike on Win64 of hang in mozilla::ipc::MessageChannel::MaybeHandleError → async plugin init spike on Win64 of hang in mozilla::ipc::MessageChannel::MaybeHandleError
Flags: needinfo?(aklotz)
Looks like this crash is still happening on 39 only with 559 crashes in the last week, all on Windows 7. At this point 40 does not seem to be affected. 

Aaron, might this be from a fix on another bug that we could uplift to 39?
Flags: needinfo?(aklotz)
There at least seem to be crashes with this signature that have happened in build IDs after bug 1156800 landed, yes.
I think there are some further improvements that I can make here. Patch forthcoming.
Flags: needinfo?(aklotz)
Attached patch PatchSplinter Review
Based on the stack in KaiRo's report, it's clear that there is cascading going on (notice the multiple invocations of PluginModuleChild::RecvAsyncNPP_New on the stack. That can cause all sorts of interesting reentry scenarios. Executing DoNPP_New in its own task will eliminate that cascade.
Assignee: nobody → aklotz
Status: NEW → ASSIGNED
Attachment #8602504 - Flags: review?(jmathies)
Comment on attachment 8602504 [details] [diff] [review]
Patch

Delaying the call to DoNPP_New() more seems a bit scary to me but I don't know the code that well. You are clearly comfortable with it. Can you assure me there's no risk here?
Attachment #8602504 - Flags: review?(jmathies) → review+
(In reply to Jim Mathies [:jimm] from comment #6)
> Comment on attachment 8602504 [details] [diff] [review]
> Patch
> 
> Delaying the call to DoNPP_New() more seems a bit scary to me but I don't
> know the code that well. You are clearly comfortable with it. Can you assure
> me there's no risk here?

I can confirm that.

We only post the task in RecvAsyncNPP_New which is the async init variant. It is defined as an async function in the protocol. The non-asyncInit variant is still fully synchronous at the IPC level and runs immediately, just like it always has.

Since the browser in asyncInit mode is already sending an async message to trigger NPP_New, it is fully prepared to deal with the consequences of delays in the actual execution of NPP_New inside the child process.
Comment on attachment 8602504 [details] [diff] [review]
Patch

Approval Request Comment
[Feature/regressing bug #]: async plugin init
[User impact if declined]: hangs
[Describe test coverage new/current, TreeHerder]: Tested locally and on try with asyncInit enabled.
[Risks and why]: Low, simple patch that eliminates excessively complex call stacks in plugin-container.
[String/UUID change made/needed]: None
Attachment #8602504 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/262f4f70c56d
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Comment on attachment 8602504 [details] [diff] [review]
Patch

Approved for uplift to aurora.
Attachment #8602504 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8602504 [details] [diff] [review]
Patch

[Triage Comment]

This should be uplifted to beta (39) - it was already on aurora (40). My mistake
Attachment #8602504 - Flags: approval-mozilla-aurora+ → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.