Last Comment Bug 747055 - crash in RtlEnterCriticalSection | PR_Lock | mozilla::ipc::RPCChannel::WaitForNotify
: crash in RtlEnterCriticalSection | PR_Lock | mozilla::ipc::RPCChannel::WaitFo...
Status: RESOLVED FIXED
: crash, regression, reproducible, topcrash
Product: Core
Classification: Components
Component: Plug-ins (show other bugs)
: 13 Branch
: x86 Windows 7
: -- critical (vote)
: mozilla16
Assigned To: Georg Fritzsche [:gfritzsche]
:
Mentors:
Depends on:
Blocks: 726734
  Show dependency treegraph
 
Reported: 2012-04-19 10:01 PDT by Robert Kaiser
Modified: 2012-08-10 07:37 PDT (History)
13 users (show)
ryanvm: in‑testsuite?
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
wontfix
+
fixed
verified


Attachments
Callstack dumps of plugin-container with Adobe Reader plugin displaying messagebox (9.25 KB, text/plain)
2012-05-11 03:50 PDT, Georg Fritzsche [:gfritzsche]
no flags Details
Callstack for PluginInstanceParent destruction (1.64 KB, text/plain)
2012-05-29 21:32 PDT, Georg Fritzsche [:gfritzsche]
no flags Details
fix v1.0 (1.68 KB, patch)
2012-05-30 02:47 PDT, Georg Fritzsche [:gfritzsche]
no flags Details | Diff | Splinter Review
Fix #2 - delay NotifyMaybeChannelError() if still in IPC call (1.45 KB, patch)
2012-06-01 08:16 PDT, Georg Fritzsche [:gfritzsche]
cjones.bugs: feedback-
Details | Diff | Splinter Review
WIP test-case (9.21 KB, text/plain)
2012-06-07 06:56 PDT, Georg Fritzsche [:gfritzsche]
no flags Details
Callstack for premature CleanupFromTimeout() (3.16 KB, text/plain)
2012-06-08 07:34 PDT, Georg Fritzsche [:gfritzsche]
no flags Details
Possible fix (653 bytes, patch)
2012-06-08 21:09 PDT, Chris Jones [:cjones] inactive; ni?/f?/r? if you need me
benjamin: review+
akeybl: approval‑mozilla‑aurora+
akeybl: approval‑mozilla‑beta+
Details | Diff | Splinter Review

Description Robert Kaiser 2012-04-19 10:01:21 PDT
This bug was filed from the Socorro interface and is 
report bp-3028b5dc-0643-42a1-b083-d884e2120419 .
============================================================= 

Top 10 Frames:

0 	ntdll.dll 	RtlEnterCriticalSection 	
1 	nspr4.dll 	PR_Lock 	nsprpub/pr/src/threads/combined/prulock.c:233
2 	xul.dll 	mozilla::ipc::RPCChannel::WaitForNotify 	ipc/glue/WindowsMessageLoop.cpp:910
3 	xul.dll 	mozilla::ipc::RPCChannel::Call 	ipc/glue/RPCChannel.cpp:202
4 	xul.dll 	mozilla::plugins::PluginModuleParent::NPP_New 	dom/plugins/ipc/PluginModuleParent.cpp:904
5 	xul.dll 	nsNPAPIPluginInstance::InitializePlugin 	dom/plugins/base/nsNPAPIPluginInstance.cpp:432
6 	xul.dll 	nsNPAPIPluginInstance::Initialize 	dom/plugins/base/nsNPAPIPluginInstance.cpp:159
7 	xul.dll 	nsPluginHost::TrySetUpPluginInstance 	dom/plugins/base/nsPluginHost.cpp:1341
8 	xul.dll 	nsPluginHost::SetUpPluginInstance 	dom/plugins/base/nsPluginHost.cpp:1221
9 	xul.dll 	nsPluginHost::InstantiateFullPagePlugin 	dom/plugins/base/nsPluginHost.cpp:1149
10 	xul.dll 	nsObjectLoadingContent::InstantiatePluginInstance 	content/base/src/nsObjectLoadingContent.cpp:635


More signatures are at https://crash-stats.mozilla.com/report/list?signature=RtlEnterCriticalSection%20|%20PR_Lock%20|%20mozilla%3A%3Aipc%3A%3ARPCChannel%3A%3AWaitForNotify%28%29
Comment 1 Scoobidiver (away) 2012-04-19 11:42:50 PDT
It's #36 top browser crasher in 13.0a2 and #209 in 14.0a1.
The spike started in 13.0a2/20120327042012. The regression range for the spike is:
http://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?fromchange=e14e81432a54&tochange=5c4189019bc1
Comment 2 Alex Keybl [:akeybl] 2012-04-30 15:18:27 PDT
Top crasher, so tracking for release. Benjamin - can you take a look since IPC appears to be in your wheelhouse?
Comment 3 Benjamin Smedberg [:bsmedberg] 2012-05-03 08:39:03 PDT
Nothing to do with IPC, this is a plugin bug where we instantiate a plugin which, while it is being instantiated, pops up some kind of modal dialog. This causes us to spin the event loop which ends up re-instantiating a plugin (I think it's the same one). A regression from bug 726734, apparently, although that surprises me. I would have expected bug 90268.
Comment 4 Georg Fritzsche [:gfritzsche] 2012-05-11 03:48:49 PDT
This is apparently an issue with full-page instantiations of the Adobe Reader plugin.
I haven't been able to repro so far - tried bringing up modal dialogs from different 
phases in a plugins initialization, from primary and secondary thread, windowed and non-windowed. 

Possibly related: I have got my old Reader plugin (9.0) in a state where it shows an empty messagebox
and afterwards at least drawing appears to be broken. 
Apparently the messagebox handling is custom though, with what i assume to be
a modal message loop on a secondary thread (see thread 6 in callstack_msgbox_pdf.txt).
Comment 5 Georg Fritzsche [:gfritzsche] 2012-05-11 03:50:11 PDT
Created attachment 623096 [details]
Callstack dumps of plugin-container with Adobe Reader plugin displaying messagebox
Comment 6 Georg Fritzsche [:gfritzsche] 2012-05-16 06:32:33 PDT
We actually have different cases here, with the most common (28 out of 41 samples) being full-page instantiations and apparently non-recursive, e.g.:
https://crash-stats.mozilla.com/report/index/29f8f73f-2b69-4644-a2c1-435c12120516

Crash-addresses are usually in the range 0x20 - 0x30.

Other cases are:

Recursive plugin-instantiation, e.g.: 
https://crash-stats.mozilla.com/report/index/cc7dfa29-4ef9-4d98-9d4c-b1eab2120516

Crash on non-recursive, embedded plugin instantiation, e.g.:
https://crash-stats.mozilla.com/report/index/ee1895e4-6704-40d8-8d11-7a0522120516

Crash on plugin event-handling, e.g.:
https://crash-stats.mozilla.com/report/index/3b860c4c-6780-409d-b56a-1496d2120516
Comment 7 Alex Keybl [:akeybl] 2012-05-16 17:06:41 PDT
Scoobidiver/Georg - Do we know what version of Adobe reader to target? We could add the qawanted keyword to try to find STR if so. Any other tips (like looking at a full page PDF) for trying to reproduct would be helpful.
Comment 8 Georg Fritzsche [:gfritzsche] 2012-05-16 18:11:08 PDT
The version is unknown so far. The majority are full-page instantiations and the comments point to this happening with Adobe Reader.
Comment 9 Alex Keybl [:akeybl] 2012-05-18 16:31:57 PDT
(In reply to Georg Fritzsche [:gfritzsche] from comment #8)
> The version is unknown so far. The majority are full-page instantiations and
> the comments point to this happening with Adobe Reader.

Adding qawanted to test opening PDFs in the browser with the latest version of Adobe Reader on Win7, since we don't yet know affected versions.
Comment 10 Robert Kaiser 2012-05-21 05:44:49 PDT
(In reply to Alex Keybl [:akeybl] from comment #7)
> Scoobidiver/Georg - Do we know what version of Adobe reader to target?

Unfortunately, those crashes are on the browser process and we don't load the plugin into that one so we don't see its version there.
Comment 11 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-05-22 13:29:26 PDT
1. Firefox 13.0b4 on Windows 7 64-bit.
2. Did a search on Google for "pdf" 
3. Opened the first 20 pdf documents in new tabs
4. Randomly switched display modes, scrolling, otherwise playing around with documents in quick succession
5. Restarted Firefox and triggered session restore to load all at once

Adobe Reader 10.1.3: would hang once in a while but never for more than 30 seconds, and never crashed.
Adobe Reader 9.5.0: did not experience any performance degradation, nor crashes.

Is there something more specific I can do to test? Please advise.
Comment 12 Georg Fritzsche [:gfritzsche] 2012-05-22 14:13:05 PDT
> Is there something more specific I can do to test? Please advise.

Nothing more specific standing out from the crash-stats so far, sorry.
Comment 13 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-05-22 15:02:54 PDT
Removing QAWANTED as per comment 12. Please re-add if there is something more QA can do to help.
Comment 14 Georg Fritzsche [:gfritzsche] 2012-05-25 10:39:07 PDT
Crash-dumps show that for the most common crash-path the lock (from AsyncChannel::mMonitor) is null, so apparently we have a tear-down occuring from the event-loop spinning. 

Looking through the call-path however the tear-down protection seems proper (PluginDestructionGuard present and apparently checked against in the relevant places, reference to nsPluginInstanceOwner being held).
Comment 15 Alex Keybl [:akeybl] 2012-05-25 11:34:26 PDT
We're not close enough to resolution here to land a fix in FF13. We'll continue the investigation for FF14.
Comment 16 Georg Fritzsche [:gfritzsche] 2012-05-25 11:47:25 PDT
Found STR for the less common crash-path involving plugin event handling with FF13.0b4 & Flash 11.2.202.235:
1. go to http://practicefusion.com/
2. choose free signup
3. view license agreement
4. click print
5. click print
6. choose print to file
7. click ok
8. choose filepath
9. click ok

Results: 
(1) FF hangs completely for some time, crash-reporter comes up
    -> https://crash-stats.mozilla.com/report/index/bp-282dab94-2c2f-48c3-93cd-5385f2120525
(2) FF hangs completely for some time, crash-reporter submits a hang-report for Flash:
    -> https://crash-stats.mozilla.com/report/index/bp-f9cea563-dce4-475c-84ab-11b582120525
Comment 17 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-05-25 15:29:12 PDT
(In reply to Georg Fritzsche [:gfritzsche] from comment #16)
> Found STR for the less common crash-path involving plugin event handling
> with FF13.0b4 & Flash 11.2.202.235

This is not reproducible for me using Windows 7 64-bit, Flash 11.2.202.235 64-bit, and Firefox 13.0b4 or 13.0b5. There was a hang for a couple of seconds in Beta 4, no hang at all in Beta 5. Granted, I tested this on a new profile and the only print-to-file printer I have installed is the Microsoft XPS writer. 

Is there any more details you can give Georg? Is your testcase 100% reproducible for you?
Comment 18 Georg Fritzsche [:gfritzsche] 2012-05-25 16:45:26 PDT
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #17)
> Is there any more details you can give Georg? Is your testcase 100%
> reproducible for you?

I just checked and found that i don't get the hang/crash with the XPS document writer either. I left the default choice on my HP Deskjet D2400 and enabled the checkbox "print to file". 
With that the hang is 100% reproducible for me with 13.0bX, Window 7 x64 SP1 on 2 different machines - either the plugin-hang-detector kills Flash or the browser crashes.
A user comment in the reports mentions having this a "Brother Fax 2.1 sender":
https://crash-stats.mozilla.com/report/index/54f259cf-46dc-4f55-9361-d7e142120519
Comment 19 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-05-28 12:48:13 PDT
Repeated the steps in comment 16 with the latest Nightly, a new profile, and an HP Deskjet D2400 installed. I could not reproduce this crash. Doing the same with Firefox 13.0b5 resulted in a hang for about 30 seconds, but no crash. While some set of external users can trigger this crash, it is not internally reproducible. Any other suggestions for a reproducible case are greatly appreciated.
Comment 20 Robert Kaiser 2012-05-29 05:43:14 PDT
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #19)
> Doing the
> same with Firefox 13.0b5 resulted in a hang for about 30 seconds, but no
> crash.

Well, if your machine is slower for some reason, then this might just be it. IIRC, this is the plugin process hanging, and if it hangs longer than 45 seconds, we kill it and send a hang report with the crash reporter. That could be the (2) case of comment #16.
Comment 21 Georg Fritzsche [:gfritzsche] 2012-05-29 21:32:38 PDT
Created attachment 628209 [details]
Callstack for PluginInstanceParent destruction

Managed to reproduce with the aforementioned steps on a custom build, though less reliably. 
Destruction of the PluginInstanceParent is happening from a processed message (see callstack), afterwards it is crashing on accessing the destroyed AsyncChannel::mMonitor here:
http://mxr.mozilla.org/mozilla-central/source/ipc/glue/WindowsMessageLoop.cpp#877
Comment 22 Georg Fritzsche [:gfritzsche] 2012-05-29 21:38:30 PDT
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #20)
> IIRC, this is the plugin process hanging, and if it hangs longer than 45
> seconds, we kill it and send a hang report with the crash reporter. That
> could be the (2) case of comment #16.

Yes, apparently the plugin is killed due to hanging in both cases - either killing it like intended in case (2) or hitting the crash bug in (1).
Comment 23 Georg Fritzsche [:gfritzsche] 2012-05-30 02:47:19 PDT
Created attachment 628282 [details] [diff] [review]
fix v1.0

AsyncChannel::Close() & ::NotifyMaybeChannelError() may get called from a task that's processed in the nested event loop in RPCChannel::WaitForNotify() while being in a sync plugin IPC call. This leads to AsyncChannel::Clear() clearing out AsyncChannel::mMonitor and ::mListener.

The patch should assure that the RPCChannel methods in the crash-paths can be properly exited after such a task occured. It fixes the issue for me locally on trunk.

Possible other issue that is not handled by this patch: PluginInstanceParent gets deleted by the aforementioned task, so other issues could occur when the call-paths unwind to it's methods.
Comment 24 Georg Fritzsche [:gfritzsche] 2012-05-30 02:51:15 PDT
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #19)
> Repeated the steps in comment 16 with the latest Nightly, a new profile, and
> an HP Deskjet D2400 installed. I could not reproduce this crash. Doing the
> same with Firefox 13.0b5 resulted in a hang for about 30 seconds, but no
> crash. While some set of external users can trigger this crash, it is not
> internally reproducible. Any other suggestions for a reproducible case are
> greatly appreciated.

How about the following:
 * Reduce dom.ipc.plugins.timeoutSecs until the plugin regularly gets killed for hanging
 * keep clicking into the text area of the plugin (increased the crash-reliability for me)
Comment 25 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-05-31 12:26:07 PDT
I was able to reproduce a plug-in hang/crash by reducing the timeout to 5s and clicking Print, printing the document, and repeatedly clicking and dragging in the text area. Though I had to repeat the process about 10 times before it crashed (I'm guessing it's an issue that builds up over time).

In Firefox 13.0b5 it manifested as a plugin crash/hang
https://crash-stats.mozilla.com/report/index/bp-955eeff6-b685-4dd5-9232-598782120531
https://crash-stats.mozilla.com/report/index/bp-36fe12ce-3d1a-49d1-b7e4-dd0f32120531

In Firefox 13.0b6 it manifested as a browser crash
https://crash-stats.mozilla.com/report/index/bp-358d2286-cf1f-4f77-b8dc-b5a3e2120531

In Firefox 15.0a1 2012-05-31 it manifested as a plugin crash/hang
https://crash-stats.mozilla.com/report/pending/7ac98e9f-9978-4ef8-8091-679ee2120531
https://crash-stats.mozilla.com/report/index/f4698b2d-7318-4d6e-9e76-414f52120531
Comment 26 Georg Fritzsche [:gfritzsche] 2012-06-01 08:16:26 PDT
Created attachment 629195 [details] [diff] [review]
Fix #2 - delay NotifyMaybeChannelError() if still in IPC call

Patch which delays the channel error notification after a channel timeout if we are still in an IPC call.
While AsyncChannel::OnNotifyMaybeChannelError() was already protected against this, the following call-path was not:
 * PluginModuleParent::CleanupFromTimeout()
 * PPluginModuleParent::Close()
 * AsyncChannel::Close() 

The task which invokes this is created here:
http://mxr.mozilla.org/mozilla-central/source/dom/plugins/ipc/PluginModuleParent.cpp#205

I did not have success on a mochitest for this so far.
Comment 27 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-06-01 19:52:18 PDT
Comment on attachment 629195 [details] [diff] [review]
Fix #2 - delay NotifyMaybeChannelError() if still in IPC call

From what I can tell this is a logic error in PluginModuleParent.  It can't Close() the channel with C++ code on the stack.  "Spinning" the event loop like this isn't acceptable for *Channel code.

I would suggest writing a test first.  Find the sequence of events/npapi calls that reproduce this bug, then add some gross code to our testplugin to reproduce whatever the plugin is doing here.  There's code in testplugin already that does this, so you can use that as an example.  Then call that code from a mochitest.  MOZ_IPC_MESSAGE_LOG can be helpful in seeing what the plugin is doing.
Comment 28 Georg Fritzsche [:gfritzsche] 2012-06-06 12:25:18 PDT
The sequence leading up to this for me is starting with a plugin mouse event, from which a modal dialog is being brought up (leading to spinning the internal event loop), which subsequently hangs.
In the internal event loop, further mouse events are processed (nested), which trigger the hang-after-timeout-cleanup and thus the CleanupFromTimeout() task:

PluginModuleParent::ShouldContinueFromReplyTimeout() 
SyncChannel::ShouldContinueFromTimeout() 
RPCChannel::Call() 
PPluginInstanceParent::CallNPP_HandleEvent() 
PluginInstanceParent::NPP_HandleEvent() 
PluginModuleParent::NPP_HandleEvent() 
nsNPAPIPluginInstance::HandleEvent() 
nsPluginInstanceOwner::ProcessEvent() 
[...]
RPCChannel::SpinInternalEventLoop() 
RPCChannel::WaitForNotify() 
RPCChannel::Call() 
PPluginInstanceParent::CallNPP_HandleEvent() 
PluginInstanceParent::NPP_HandleEvent() 
PluginModuleParent::NPP_HandleEvent() 
nsNPAPIPluginInstance::HandleEvent() 
nsPluginInstanceOwner::ProcessEvent()

... which may be processed from the internal event loop (timing-sensitive?):

AsyncChannel::Close() 
PluginModuleParent::CleanUpFromTimeout()
MessageLoop::RunTask() 
[...]
RPCChannel::SpinInternalEventLoop() 
RPCChannel::WaitForNotify() 
RPCChannel::Call() 
PPluginInstanceParent::CallNPP_HandleEvent()

... which then leads to a crash when exiting the HandleEvent call-path.
This still isn't reproducible for me in a mochitest, either due to the timing sensitivity or the event loop not being flooded with mouse events the same way when synthesized from JS.
Comment 29 Jim Mathies [:jimm] 2012-06-06 12:50:52 PDT
Jet, anyone we can cc in from the reader group? Not sure what this dialog is for, it never has a chance to display any information.

https://twitter.com/GeorgFritzsche/status/190522573963014145/photo/1
Comment 30 Georg Fritzsche [:gfritzsche] 2012-06-07 06:56:16 PDT
Created attachment 630958 [details]
WIP test-case

WIP test-case which has the same basic behaviour i'm seeing in the reproducible case, but so far fails to reproduce the crash due to stepping out of the hanging RPC call before the PluginModuleParent::CleanUpFromTimeout() task is run.
Comment 31 Georg Fritzsche [:gfritzsche] 2012-06-08 07:34:10 PDT
Created attachment 631387 [details]
Callstack for premature CleanupFromTimeout()

As per yesterdays discussion i tested posting the CleanupFromTimeout() task via PostNonNestableTask() instead in ShouldContinueFromReplyTimeout():
http://mxr.mozilla.org/mozilla-central/source/dom/plugins/ipc/PluginModuleParent.cpp#205

That however doesn't fix the issue for me locally, with tracing showing that CleanupFromTimeout() is still being processed from SpinInternalEventLoop() (see callstack).

This is because the methods involved here:
  MessageLoop::RunTask() 
  MessageLoop::DeferOrRunPendingTask() 
  MessageLoop::DoWork() 
... are not actually increasing the run depth of the message loop (MessageLoop::state::run_depth).
Comment 32 Georg Fritzsche [:gfritzsche] 2012-06-08 07:46:29 PDT
Do we have any alternatives for posting non-nestable tasks?
Or should a call-path from ipc::DoWorkRunnable::Run() possibly increase the run depth?
Comment 33 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-06-08 21:09:04 PDT
Created attachment 631609 [details] [diff] [review]
Possible fix

Georg, does this patch fix the crash?
Comment 34 Georg Fritzsche [:gfritzsche] 2012-06-11 11:45:00 PDT
(In reply to Chris Jones [:cjones] [:warhammer] from comment #33)
> Created attachment 631609 [details] [diff] [review]
> Possible fix
> 
> Georg, does this patch fix the crash?

It fixes the crash, but instead doesn't do any cleanup in the previously crashing case.
Reposting the task from CleanupFromTimeout() seems to fix that, i can test that more tomorrow.
Comment 35 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-06-11 12:12:32 PDT
The error notification should still be delivered after all the plugin code is off the C++ stack, and that should kick off cleanup.
Comment 36 Georg Fritzsche [:gfritzsche] 2012-06-13 09:38:13 PDT
(In reply to Chris Jones [:cjones] [:warhammer] from comment #35)
> The error notification should still be delivered after all the plugin code
> is off the C++ stack, and that should kick off cleanup.

Tracing shows that AsyncChannel::PostErrorNotifyTask(), and hence AsyncChannel::OnNotifyMaybeChannelError(), are not being called in this case.
Comment 37 Georg Fritzsche [:gfritzsche] 2012-06-15 09:46:47 PDT
I can't reproduce the missing error notification anymore - can we push the patch to try so QA can verify independently to exclude possible quirks on my end?
Comment 38 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-06-15 15:21:02 PDT
Try results coming in at 

https://tbpl.mozilla.org/?tree=Try&rev=665e54d54c40
Comment 39 Georg Fritzsche [:gfritzsche] 2012-06-15 17:24:27 PDT
Adding qawanted for validation of the try build:
 * http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/cjones@mozilla.com-665e54d54c40/try-win32/
 * Test-URL: https://static.practicefusion.com/apps/ehr/?signup=true
 * STR as per comment 25 / above comments
 * Expected: Either plugin is killed for hanging or functions normally
 * Not expected & should be fixed: browser crash

The "print to file" feature is broken for me in Flash 11.3, but works in 11.2.
Older Flash versions here: http://helpx.adobe.com/flash-player/kb/archived-flash-player-versions.html
Uninstalling newer Flash versions: http://helpx.adobe.com/flash-player/kb/uninstall-flash-player-windows.html
Comment 40 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-06-19 15:12:38 PDT
Print-to-File works fine for me in Flash 11.3. With the Try build I get a plugin crash, repeating my steps in comment 25. Firefox is no longer taken down by this bug it would seem.
Comment 41 Scoobidiver (away) 2012-06-20 06:30:19 PDT
It's #35 top browser crasher in 13.0.1, #39 in 14.0b7 and #67 in 15.0a2.
Comment 42 Georg Fritzsche [:gfritzsche] 2012-06-20 10:09:22 PDT
Comment on attachment 631609 [details] [diff] [review]
Possible fix

Requesting review for Chris fix.
Comment 43 Benjamin Smedberg [:bsmedberg] 2012-06-21 08:43:26 PDT
Comment on attachment 631609 [details] [diff] [review]
Possible fix

It is clear that this patch will make things better than they are currently. I am a little worried about the fact that we're just not calling Close() any more. Does anyone remember why we needed to call Close() in the first place? Since the call to Close() may be racing with the actual killing of the child process, is it possible that we'd end up with a different `why` at http://hg.mozilla.org/mozilla-central/annotate/3df1de4c9592/dom/plugins/ipc/PluginModuleParent.cpp#l226 and we'll end up with double minidumps (one from the hang and another from ActorDestroy)?

Should we be reposting the message as we do lower at http://hg.mozilla.org/mozilla-central/annotate/3df1de4c9592/dom/plugins/ipc/PluginModuleParent.cpp#l272 ?
Comment 44 Ryan VanderMeulen [:RyanVM] 2012-06-21 17:48:45 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/1cd36362b11b

Should this have a test?
Comment 45 Ed Morley [:emorley] 2012-06-22 03:43:52 PDT
https://hg.mozilla.org/mozilla-central/rev/1cd36362b11b
Comment 46 Georg Fritzsche [:gfritzsche] 2012-06-22 05:46:50 PDT
(In reply to Ryan VanderMeulen from comment #44)
> Should this have a test?
So far i failed to reproduce that in a test (possibly due to different timing or event queuing). Even the STR above are not fully consistent.
Comment 47 Georg Fritzsche [:gfritzsche] 2012-06-22 05:49:39 PDT
Reopening as we want to track this on Nightly for a week to see wether it fixes all or at least a subset of the crashes with this signature.
Comment 48 Benjamin Smedberg [:bsmedberg] 2012-06-22 06:47:00 PDT
Standard practice is to keep this closed while verifying and then mark VERIFIED later. This is pretty low volume on trunk, so it's going to be hard to verify based on trunk crash-stats. I think we should begin uplift to Aurora ASAP.
Comment 49 Georg Fritzsche [:gfritzsche] 2012-06-22 07:14:55 PDT
Comment on attachment 631609 [details] [diff] [review]
Possible fix

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Timing issue leading to premature teardown of plugin handling code.
User impact if declined: High browser-crash volume for a subset of FF users.
Testing completed (on m-c, etc.): QA verified a try-build for the patch.
Risk to taking this patch (and alternatives if risky): Possible risk of moving crashes to another location instead of fixing them.
String or UUID changes made by this patch: None.
Comment 50 Alex Keybl [:akeybl] 2012-06-24 12:42:16 PDT
Comment on attachment 631609 [details] [diff] [review]
Possible fix

[Triage Comment]
Approving for Aurora 15, with the possibility of uplifting if it does resolve the crash.
Comment 52 Scoobidiver (away) 2012-07-01 00:05:26 PDT
With combined signatures, it's #19 top browser crasher in 13.0.1 and #30 in 14.0b8.
Comment 53 Georg Fritzsche [:gfritzsche] 2012-07-04 05:17:41 PDT
In the 7-day view for FF15, there are no crash-signatures containing mozilla::ipc::RPCChannel::WaitForNotify() in the top-crashers - only in the 14-day view.
Comment 54 Georg Fritzsche [:gfritzsche] 2012-07-05 08:30:12 PDT
Comment on attachment 631609 [details] [diff] [review]
Possible fix

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Timing issue leading to premature teardown of plugin handling code.
User impact if declined: High browser-crash volume for a subset of FF users (top-crasher).
Testing completed (on m-c, etc.): QA verified a try-build for the patch. Watched crash-signature on Aurora and Nightly; no new crashes with this signature in builds after the checkin.
Risk to taking this patch (and alternatives if risky): Possible risk of moving crashes to another location instead of fixing them.
String or UUID changes made by this patch: None.
Comment 55 Alex Keybl [:akeybl] 2012-07-05 16:58:16 PDT
Comment on attachment 631609 [details] [diff] [review]
Possible fix

[Triage Comment]
Top crash fix with only risk of moving the signature around. Approved for Beta 14.

Note You need to log in before you can comment on or make changes to this bug.