[OOPP] Trigger child shutdown on parent time outs

RESOLVED FIXED in Firefox 9

Status

()

Core
Plug-ins
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: jimm, Assigned: jimm)

Tracking

Trunk
mozilla9
x86_64
Windows 7
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox9- fixed, firefox10-, firefox11-)

Details

(Whiteboard: [qa-])

Attachments

(2 attachments, 1 obsolete attachment)

Comment hidden (empty)
(Assignee)

Comment 1

6 years ago
Created attachment 551907 [details] [diff] [review]
experimental patch
(Assignee)

Comment 2

6 years ago
Been experimenting with this today. This actually works pretty well at getting out from under some of the parent side hangs I have test cases for. (bug 675645 & bug 611826 are good examples).

I started with just using NS_ABORT(), but that triggered a messy set of assertions in the parent that didn't look good. So I came up with this, which shuts down the module. The parent handles this fine and the child doesn't actually have to crash. Down side to this is we don't get crash stacks for either process. Also looked for a way to get a bit of data over to the parent async but so far no luck. I'll dig into that a bit more tomorrow.

I also experimented with destroying and re-creating plugin child windows which are often the root of the problem. Unfortunately flash did not like that at all, crashing in most cases. (I believe plugin api consumers are supposed to support swapping out to new windows, but apparently flash doesn't support it.)
(Assignee)

Comment 3

6 years ago
One other note - the 45 second time out in the child case is way too long. If we're going to do this, we should set that lower, at least for the child. If the browser is hung, I don't think any user is going to sit around staring at a frozen window for almost a minute. Plus, we should be proactive about freeing up the browser anyway. I've been testing with a five second time out which works quite well. That might be a little short.. maybe the number should be somewhere in between that and the parent hang time.
We really should just NS_RUNTIMEABORT. What assertions do you see in the parent?
(Assignee)

Comment 5

6 years ago
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #4)
> We really should just NS_RUNTIMEABORT. What assertions do you see in the
> parent?

http://pastebin.mozilla.org/1295747

This also brings up the "plugin container problem" Windows dialog in debug mode on Windows. I didn't like that at all. I'd rather it get shutdown by the parent cleanly.. but I'm open to whatever. I was hoping a could get some stack data over to the parent somehow, was planning on looking into that today. It looks like we have a independent crash reporter pipe open for communication I might be able to use.
###!!! ABORT: terminating child process: file f:/Mozilla/firefox/mc/dom/plugins/ipc/PluginModuleChild.cpp, line 532 is expected, that's what NS_RUNTIMEABORT prints.

###!!! ASSERTION: Mismatched RPC stack frames: 'this == mChannel->mTopFrame', file f:/Mozilla/firefox/mc/ipc/glue/WindowsMessageLoop.cpp, line 612
This sounds like it ought to be investigated.

###!!! ASSERTION: IPDL error:: 'Error', file f:/Mozilla/firefox/MC-DBG/ipc/ipdl/PPluginScriptableObjectChild.cpp, line 1226
This doesn't make much sense: once you issue a NS_RUNTIMEABORT, the child ought to be dead, but it's still running and issuing assertions?

###!!! [Parent][RPCChannel] Error: Channel error: cannot send/recv
expected, this is normal

Also the windows crash dialog is normal if you don't have crash reporting enabled: if it is enabled you should be getting a normal minidump.
It might not work properly if you're killing the parent process. plugin-container doesn't write its own minidumps, it asks the chrome process to do so, and if you've killed that it will probably just give up and you'll get the Windows dialog.
We're not killing the parent process; the plugin process is just committing suicide.
(Assignee)

Comment 9

6 years ago
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #6)
> ###!!! ASSERTION: Mismatched RPC stack frames: 'this ==
> mChannel->mTopFrame', file
> f:/Mozilla/firefox/mc/ipc/glue/WindowsMessageLoop.cpp, line 612
> This sounds like it ought to be investigated.
> 
> ###!!! ASSERTION: IPDL error:: 'Error', file
> f:/Mozilla/firefox/MC-DBG/ipc/ipdl/PPluginScriptableObjectChild.cpp, line
> 1226
> This doesn't make much sense: once you issue a NS_RUNTIMEABORT, the child
> ought to be dead, but it's still running and issuing assertions?

In debug mode, the child process gets hung up by the windows error collection dialog, so I think stuff underneath continues to run. I can debug a bit and see what's going on. This spews while the windows error reporting dialog is visible and is "checking for solutions".
(Assignee)

Comment 10

6 years ago
Note, even with the use of runtime abort, we still don't get minidumps:

WARNING: [PluginModuleParent::ActorDestroy] abnormal shutdown without minidump!

This is in ActorDestroy of PluginModuleParent. Not sure yet if there's a way to go ahead and trigger some sort of report. Whether we shut down the module or just crash, it seems some work will need to be done in getting crash data stored away.
(Assignee)

Comment 11

6 years ago
Created attachment 552165 [details] [diff] [review]
patch v.1

Ok, with the crash reporter enabled this is working out pretty well.
Assignee: nobody → jmathies
Attachment #551907 - Attachment is obsolete: true
Attachment #552165 - Flags: review?(benjamin)
(Assignee)

Comment 12

6 years ago
Hmm, I have an unused timeoutMs var in the prefs observer, I'll remove that.
(Assignee)

Comment 13

6 years ago
release builds:

http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jmathies@mozilla.com-b017b3bf473d
Comment on attachment 552165 [details] [diff] [review]
patch v.1

Please mark tracking-firefoxN on this when it lands, since we'll probably want to track this very carefully. And file a followup on getting a parent-side stack when we do this child-side abort, since we really want a hang-pair in this case.
Attachment #552165 - Flags: review?(benjamin) → review+
(Assignee)

Updated

6 years ago
Blocks: 679238
(Assignee)

Comment 15

6 years ago
Landed on inbound:
http://hg.mozilla.org/integration/mozilla-inbound/rev/bef7bcdcd41e

Note to drivers: we probably want to disable this feature in releases. This will depend on the frequency stacks generated by these changes show up in crash stats.

To disable - the following release pref should be set to 0:

+// How long a plugin process will wait for a response from the parent
+// to a synchronous request before terminating itself. After this
+// point the child assumes the parent is hung.
+pref("dom.ipc.plugins.parentTimeoutSecs", 15);
status-firefox8: --- → fixed
tracking-firefox8: --- → ?
(Assignee)

Comment 16

6 years ago
This isn't going to make the last 8 inbound merge, so when tracking 9 flags arrive in bugzilla they need to be set on this bug.
status-firefox8: fixed → ---
tracking-firefox8: ? → ---
http://hg.mozilla.org/mozilla-central/rev/bef7bcdcd41e
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla9
(Assignee)

Updated

6 years ago
Blocks: 680130
(Assignee)

Updated

6 years ago
tracking-firefox9: --- → ?

Comment 18

6 years ago
Is this an approval request? This is already in 9, I presume. If it is an approval request, please unset the tracking flag and set the approval request in the patch with an explanation of the risk / reward.
(Assignee)

Comment 19

6 years ago
(In reply to Asa Dotzler [:asa] from comment #18)
> Is this an approval request? This is already in 9, I presume. If it is an
> approval request, please unset the tracking flag and set the approval
> request in the patch with an explanation of the risk / reward.

We want this disabled in releases for now. I'll post a patch to disable this in 9 now that we've branched.
(Assignee)

Comment 20

6 years ago
Created attachment 563783 [details] [diff] [review]
disable for aurora and up patch
(Assignee)

Updated

6 years ago
Attachment #563783 - Flags: approval-mozilla-aurora?

Updated

6 years ago
Attachment #563783 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(Assignee)

Comment 21

6 years ago
(In reply to Jim Mathies [:jimm] from comment #20)
> Created attachment 563783 [details] [diff] [review] [diff] [details] [review]
> disable for aurora and up patch

https://hg.mozilla.org/releases/mozilla-aurora/rev/c4862aaec55b
(Assignee)

Updated

6 years ago
status-firefox9: --- → fixed
Is there something QA can do to verify this fix?
Whiteboard: [qa?]

Updated

6 years ago
tracking-firefox9: ? → -
[Triage Comment]
Please provide STR here by tomorrow at 12:00PM PT 12/14. We will be holding our FF9 sign-offs later that afternoon, and need to be able to verify. Thanks!
(Assignee)

Comment 24

6 years ago
(In reply to Alex Keybl [:akeybl] from comment #23)
> [Triage Comment]
> Please provide STR here by tomorrow at 12:00PM PT 12/14. We will be holding
> our FF9 sign-offs later that afternoon, and need to be able to verify.
> Thanks!

This code is only enabled on mc, so you don't have to worry about verifying anything. To verify it's disabled, you can check the pref in 'disable for aurora and up patch' is properly set.
Thanks Jim. So to be completely clear...

dom.ipc.plugins.parentTimeoutSecs == 0 in Firefox 9 means this is FIXED?
(Assignee)

Comment 26

6 years ago
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #25)
> Thanks Jim. So to be completely clear...
> 
> dom.ipc.plugins.parentTimeoutSecs == 0 in Firefox 9 means this is FIXED?

Disabled really. There's no need to confirm the feature is "fixed", the reports this generates have been showing up in crash stats since the first patch landed. I don't see the stacks from this in aurora builds so it's properly disabled. So yes, basically, confirmed fixed on mc and confirmed disabled on aurora. :)

The crash stack in question is:

mozalloc_abort(char const* const) | NS_DebugBreak_P | mozilla::plugins::PluginModuleChild::ShouldContinueFromReplyTimeout()

We may enabled in the future for all releases, but we'll file a new bug on that.
Okay, thanks Jim. I'm going to mark this one qa-. We can verify the new bug when it gets enabled.
Whiteboard: [qa?] → [qa-]
Looks like this isn't fixed on FF10..

dom.ipc.plugins.parentTimeoutSecs=15

.. and is thus suspected to be causing bug 683967.

Jim - can you prepare the patch for FF10/11?
Blocks: 683967
tracking-firefox10: --- → +
tracking-firefox11: --- → +
Looks like bsmedberg will actually create the disable patch in 683967. Untracking here.
tracking-firefox10: + → -
tracking-firefox11: + → -
You need to log in before you can comment on or make changes to this bug.