Closed
Bug 1132874
Opened 10 years ago
Closed 10 years ago
Content process aborts on PPluginWidget::Msg_ParentShutdown
Categories
(Core Graveyard :: Plug-ins, defect)
Tracking
(e10sm5+, firefox38 wontfix, firefox39 wontfix, firefox40 fixed)
VERIFIED
FIXED
mozilla40
People
(Reporter: jimm, Assigned: jimm)
References
Details
(Keywords: crash, topcrash)
Crash Data
Attachments
(1 file, 10 obsolete files)
24.88 KB,
patch
|
Details | Diff | Splinter Review |
Waiting to see where this settles in the top crash list.
http://www.mathies.com/mozilla/bug1116884.txt
requesting: mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::dom::ContentChild::ProcessingError(mozilla::ipc::HasResultCodes::Result, char const*)
product: Firefox
rev: 38.0a1
process: content
build date range: >=20150211000000 && <=20150213900000
processing 100 reports.
Child protocol abort error messages:
-------------------------------------
1 (msgtype=0x48000D,name=???) Route error: message sent to unknown actor ID
97 (msgtype=0x840009,name=???) Route error: message sent to unknown actor ID
0x84 = PPluginWidget
0x0009 = Msg_ParentShutdown
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → jmathies
Assignee | ||
Updated•10 years ago
|
Crash Signature: [@ mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::dom::ContentChild::ProcessingError(mozilla::ipc::HasResultCodes::Result, char const*)]
Assignee | ||
Comment 1•10 years ago
|
||
#3 top crasher overall, #2 top content crasher, plugin tear down related -> this should block m5.
Assignee | ||
Updated•10 years ago
|
Assignee | ||
Comment 2•10 years ago
|
||
Assignee | ||
Comment 3•10 years ago
|
||
Assignee | ||
Comment 4•10 years ago
|
||
Assignee | ||
Comment 5•10 years ago
|
||
Similar to the patch in bug 1127378 that was backed out while hopefully avoiding any additional regressions.
Attachment #8565143 -
Attachment is obsolete: true
Attachment #8565557 -
Flags: review?(aklotz)
Comment 6•10 years ago
|
||
Comment on attachment 8565557 [details] [diff] [review]
patch (checked in)
Review of attachment 8565557 [details] [diff] [review]:
-----------------------------------------------------------------
::: dom/plugins/ipc/PluginWidgetChild.h
@@ +5,5 @@
> #ifndef mozilla_plugins_PluginWidgetChild_h
> #define mozilla_plugins_PluginWidgetChild_h
>
> #include "mozilla/plugins/PPluginWidgetChild.h"
> +#include "mozilla/plugins/PluginWidgetParent.h"
Is this needed here?
Attachment #8565557 -
Flags: review?(aklotz) → review+
Assignee | ||
Comment 7•10 years ago
|
||
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #6)
> Comment on attachment 8565557 [details] [diff] [review]
> patch
>
> Review of attachment 8565557 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: dom/plugins/ipc/PluginWidgetChild.h
> @@ +5,5 @@
> > #ifndef mozilla_plugins_PluginWidgetChild_h
> > #define mozilla_plugins_PluginWidgetChild_h
> >
> > #include "mozilla/plugins/PPluginWidgetChild.h"
> > +#include "mozilla/plugins/PluginWidgetParent.h"
>
> Is this needed here?
Nope, thanks. Moved to the cpp.
Assignee | ||
Comment 8•10 years ago
|
||
Comment 9•10 years ago
|
||
Status: NEW → RESOLVED
Closed: 10 years ago
status-firefox38:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Assignee | ||
Comment 10•10 years ago
|
||
Unfortunately this work made things worse -
https://crash-stats.mozilla.com/report/list?product=Firefox&range_value=7&range_unit=days&date=2015-02-25&signature=mozalloc_abort%28char+const*+const%29+|+NS_DebugBreak+|+mozilla%3A%3Adom%3A%3AContentChild%3A%3AProcessingError%28mozilla%3A%3Aipc%3A%3AHasResultCodes%3A%3AResult%2C+char+const*%29&version=Firefox%3A39.0a1#tab-graph
Assignee | ||
Updated•10 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 11•10 years ago
|
||
Bill, asking for your review, also wondering if you might have some alternative ideas here. The problem seems to be that PPluginWidget can get torn down from either side of the connection including starting a tear down from both sides simultaneously. There are some comments explaining the two ways PPluginWidget tears down here [1]. We're still running into a situation where the browser ends up with a ParentShutdown message on the child side after the child actor is gone. I'd like to avoid the resulting abort since it's a passive error that can be ignored.
[1] http://mxr.mozilla.org/mozilla-central/source/dom/plugins/ipc/PluginWidgetParent.cpp#44
Attachment #8569904 -
Flags: review?(wmccloskey)
Well, I don't think we should do it this way :-). Dealing with actor deletion, especially when both sides can delete, is a pretty tricky part of IPDL. I'll try explain how I understand it, in theory.
1. Both the parent and the child need to keep a state variable, mDestroyed or something, that says whether they're allowed to send messages or not.
2. When one side (say A) wants to destroy, it sends a Destroy message and sets its own mDestroyed flag.
3. When side B gets the Destroy message, it sends the __delete__ message back to side A to complete the actor's destruction.
4. Every time either side is going to send a message, it checks mDestroyed first and doesn't send if it's set.
5. If either side gets an ActorDestroy message, it sets mDestroyed but doesn't send the Destroy message.
The really tricky part is what happens if we tear down the parent protocol. In your case that happens when the TabParent sends a PBrowser::Destroy message to TabChild. It looks like you already have code to deal with that. All that code needs to do, in TabParent, is set the mDestroyed flag on all the widgets. That way they won't be allowed to send anymore. Once the child receives the PBrowser::Destroy message, it will send a PBrowser::__delete__ message to the parent, which will cause the widget actor to be deleted as well.
It looks like the code we have already sorta implements something like this scheme. I'm hoping you just need to bring it a little closer to what I described. Let me know if you run into any problems or if this doesn't match your understanding.
Attachment #8569904 -
Flags: review?(wmccloskey)
Assignee | ||
Updated•10 years ago
|
Attachment #8569904 -
Attachment is obsolete: true
Assignee | ||
Comment 13•10 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #12)
> Well, I don't think we should do it this way :-). Dealing with actor
> deletion, especially when both sides can delete, is a pretty tricky part of
> IPDL. I'll try explain how I understand it, in theory.
>
> 1. Both the parent and the child need to keep a state variable, mDestroyed
> or something, that says whether they're allowed to send messages or not.
> 2. When one side (say A) wants to destroy, it sends a Destroy message and
> sets its own mDestroyed flag.
> 3. When side B gets the Destroy message, it sends the __delete__ message
> back to side A to complete the actor's destruction.
> 4. Every time either side is going to send a message, it checks mDestroyed
> first and doesn't send if it's set.
> 5. If either side gets an ActorDestroy message, it sets mDestroyed but
> doesn't send the Destroy message.
>
> The really tricky part is what happens if we tear down the parent protocol.
> In your case that happens when the TabParent sends a PBrowser::Destroy
> message to TabChild. It looks like you already have code to deal with that.
> All that code needs to do, in TabParent, is set the mDestroyed flag on all
> the widgets. That way they won't be allowed to send anymore. Once the child
> receives the PBrowser::Destroy message, it will send a PBrowser::__delete__
> message to the parent, which will cause the widget actor to be deleted as
> well.
>
> It looks like the code we have already sorta implements something like this
> scheme. I'm hoping you just need to bring it a little closer to what I
> described. Let me know if you run into any problems or if this doesn't match
> your understanding.
Thanks for the details response. I think I have most of this in place, although mDestroyed is mWidget in this case. Hopefully through bug 1136375 I can reproduce this on try so I can test a final fix.
Assignee | ||
Comment 14•10 years ago
|
||
Assignee | ||
Updated•10 years ago
|
Attachment #8565557 -
Attachment description: patch → patch (checked in)
Assignee | ||
Comment 15•10 years ago
|
||
Currently trying to reproduce this using pgo builds on try. Not having much luck though.
mc tip:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b56a61c964b6
w/patch:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1e6c41b44a9d
Assignee | ||
Comment 16•10 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #15)
> Currently trying to reproduce this using pgo builds on try. Not having much
> luck though.
>
> mc tip:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=b56a61c964b6
got one here, which confirms we can reproduce on try.
https://treeherder.mozilla.org/logviewer.html#?job_id=5417017&repo=try
Assignee | ||
Comment 17•10 years ago
|
||
This is looking pretty stable on try, so lets see if it helps.
patch w/retriggers:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9f80d8078c82
patch with logging and retriggers:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1e6c41b44a9d
mc tip:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b56a61c964b6
Attachment #8572664 -
Attachment is obsolete: true
Attachment #8574277 -
Flags: review?(aklotz)
Updated•10 years ago
|
Attachment #8574277 -
Flags: review?(aklotz) → review+
Assignee | ||
Comment 18•10 years ago
|
||
Comment 19•10 years ago
|
||
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/c8fbb68f9fe1 - one of this or bug 1138181 was crashing like https://treeherder.mozilla.org/logviewer.html#?job_id=7335810&repo=mozilla-inbound
Assignee | ||
Comment 20•10 years ago
|
||
This patch tweaked something on Linux and introduced a shutdown crash. Throwing stuff at try to try and figure out what happened.
Assignee | ||
Comment 21•10 years ago
|
||
A change in the order of calls in PluginWidgetParent::KillWidget() seems to have solved this.
Attachment #8574277 -
Attachment is obsolete: true
Assignee | ||
Comment 22•10 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
status-firefox39:
--- → fixed
Resolution: --- → FIXED
Assignee | ||
Comment 24•10 years ago
|
||
Unfortunately I'm going to have to reopen this again since we still do not have a good fix here.
Child protocol error abort messages:
------------------------------------
293 '(msgtype=0x8A0009,name=???) Route error: message sent to unknown actor ID'
http://www.mathies.com/mozilla/bug1116884.txt
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 25•10 years ago
|
||
This signature is by far the #1 on Nightly right now with 15% of the total submitted crashes.
Is bug 1130734 a dupe of this one?
Assignee | ||
Comment 26•10 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #25)
> This signature is by far the #1 on Nightly right now with 15% of the total
> submitted crashes.
>
> Is bug 1130734 a dupe of this one?
bug 1130734 is a meta bug that currently tracks the main signature for these aborts. These can be triggered on any sub-protocol, it just happens that currently the only abort we have in large numbers related to a plugin shutdown msg.
Assignee | ||
Updated•10 years ago
|
Attachment #8575229 -
Attachment description: patch → patch (checked in)
Assignee | ||
Comment 27•10 years ago
|
||
Attachment #8565557 -
Attachment is obsolete: true
Attachment #8575229 -
Attachment is obsolete: true
Assignee | ||
Comment 28•10 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #27)
> Created attachment 8588099 [details] [diff] [review]
> wip
Unfortunately this just swaps the shutdown message crash for the async window update crash. But at least things are bit simpler.
Assignee | ||
Comment 29•10 years ago
|
||
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetChild::ProxyShutdown()
PluginWidgetChild::ActorDestroy(1)
PluginWidgetChild::KillWidget()
PluginWidgetChild::~PluginWidgetChild()
PluginWidgetParent::ActorDestroy(1)
PluginWidgetParent::KillWidget() widget=0BD35800
###!!! [Child][DispatchAsyncMessage] Error: (msgtype=0x980008,name=???) Route error: message sent to unknown actor ID
PluginWidgetParent::~PluginWidgetParent()
Assignee | ||
Comment 30•10 years ago
|
||
What I might be able to do here is make that __delete__ message sent to the parent sync. ipc should process any async update messages it receives while waiting on the response to __delete__. Once the child gets that response back, the child shouldn't receive any more message from the parent.
Assignee | ||
Comment 31•10 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #30)
> What I might be able to do here is make that __delete__ message sent to the
> parent sync. ipc should process any async update messages it receives while
> waiting on the response to __delete__. Once the child gets that response
> back, the child shouldn't receive any more message from the parent.
ipdl pre-processing barfs on this. :/
Assignee | ||
Comment 32•10 years ago
|
||
Ah, I think I have a fix - lets move that last remaining async UpdateWindow api on the child side to a different protocol. The abort on late delivery problem no longer exists.
Assignee | ||
Comment 33•10 years ago
|
||
This seems to work pretty well.
Attachment #8588099 -
Attachment is obsolete: true
Assignee | ||
Comment 34•10 years ago
|
||
per billm's suggestion, I've added parent protocol state checks around all the send calls.
Assignee | ||
Updated•10 years ago
|
Attachment #8588205 -
Attachment is obsolete: true
Assignee | ||
Comment 35•10 years ago
|
||
Comment on attachment 8588221 [details] [diff] [review]
patch v.1
Review of attachment 8588221 [details] [diff] [review]:
-----------------------------------------------------------------
try push looks good:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f6ec1f40e3cf
Attachment #8588221 -
Flags: review?(aklotz)
Assignee | ||
Comment 36•10 years ago
|
||
Following another suggestion from billm, skip the async Send__delete__() call on the parent side, which works - both sides continue to tear down correctly. This effectively removes all async apis on the child side. *awesome*
Attachment #8588221 -
Attachment is obsolete: true
Attachment #8588221 -
Flags: review?(aklotz)
Attachment #8588337 -
Flags: review?(aklotz)
Comment 37•10 years ago
|
||
Comment on attachment 8588337 [details] [diff] [review]
patch v.2
Review of attachment 8588337 [details] [diff] [review]:
-----------------------------------------------------------------
Mostly nits re: alphabetical ordering for #includes (I like to ensure we're not making the situation worse than it is, at least). There is one nullptr check that needs to be fixed though.
::: dom/ipc/ContentChild.cpp
@@ +49,5 @@
> #include "mozilla/layers/PCompositorChild.h"
> #include "mozilla/layers/SharedBufferManagerChild.h"
> #include "mozilla/net/NeckoChild.h"
> #include "mozilla/plugins/PluginModuleParent.h"
> +#include "mozilla/plugins/PluginInstanceParent.h"
Nit: move up a line to preserve alphabetical ordering
::: dom/plugins/ipc/PluginWidgetChild.cpp
@@ +3,5 @@
> * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>
> #include "mozilla/plugins/PluginWidgetChild.h"
> #include "mozilla/plugins/PluginWidgetParent.h"
> +#include "mozilla/dom/TabChild.h"
Nit: add blank line under PluginWidgetChild.h include and then move this include above PluginWidgetParent.
@@ +8,2 @@
> #include "PluginWidgetProxy.h"
> +#include "mozilla/unused.h"
Nit: move down a line
::: dom/plugins/ipc/PluginWidgetParent.cpp
@@ +6,5 @@
> #include "mozilla/dom/TabParent.h"
> #include "mozilla/dom/ContentParent.h"
> #include "nsComponentManagerUtils.h"
> #include "nsWidgetsCID.h"
> +#include "mozilla/unused.h"
Nit: move down a line
::: widget/PluginWidgetProxy.cpp
@@ +118,5 @@
>
> void*
> PluginWidgetProxy::GetNativeData(uint32_t aDataType)
> {
> + auto tab = static_cast<mozilla::dom::TabChild*>(mActor->Manager());
You're using mActor but it is not checked for nullptr until the following line. Please fix!
::: widget/windows/nsWindowGfx.cpp
@@ +18,5 @@
> **************************************************************/
>
> #include "mozilla/plugins/PluginInstanceParent.h"
> using mozilla::plugins::PluginInstanceParent;
> +#include "mozilla/dom/ContentParent.h"
Nit: please move this 2 lines up to make this alphabetical. At least that way we won't worsen the include ordering situation in this file. :-P
Attachment #8588337 -
Flags: review?(aklotz) → review+
Assignee | ||
Comment 38•10 years ago
|
||
Ok, updated per comments, lets hope for good results.
Attachment #8588337 -
Attachment is obsolete: true
Assignee | ||
Updated•10 years ago
|
Keywords: checkin-needed
Comment 39•10 years ago
|
||
Do we need to backport anything to 38/39 or should we call them wontfix? This bug has become a PITA for tracking purposes because of how things landed :\
Comment 40•10 years ago
|
||
Keywords: checkin-needed
Assignee | ||
Comment 41•10 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #39)
> Do we need to backport anything to 38/39 or should we call them wontfix?
> This bug has become a PITA for tracking purposes because of how things
> landed :\
This shouldn't show up in any build other than nightly. AFAICT, that's what we're seeing. There are crashes on crashstats for 39 and 38, but they all appear to be nightly channel builds. I don't think you need to track this at all.
nightly channel 38 crashes:
https://crash-stats.mozilla.com/search/?product=Firefox&signature=%3Dmozalloc_abort%28char+const*+const%29+|+NS_DebugBreak+|+mozilla%3A%3Adom%3A%3AContentChild%3A%3AProcessingError%28mozilla%3A%3Aipc%3A%3AHasResultCodes%3A%3AResult%2C+char+const*%29&version=38.0a1&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports
non-nightly channel 38a1 crashes: 0
https://crash-stats.mozilla.com/search/?product=Firefox&signature=%3Dmozalloc_abort%28char+const*+const%29+|+NS_DebugBreak+|+mozilla%3A%3Adom%3A%3AContentChild%3A%3AProcessingError%28mozilla%3A%3Aipc%3A%3AHasResultCodes%3A%3AResult%2C+char+const*%29&version=38.0a1&release_channel=!nightly&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports
Flags: needinfo?(jmathies) → needinfo?(ryanvm)
Updated•10 years ago
|
Comment 42•10 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Assignee | ||
Comment 44•10 years ago
|
||
Yesterday's nightly build:
Bug 1130734 Child protocol error abort messages:
----------------------------------------------------
1 '(msgtype=0x0,name=???) Route error: message sent to unknown actor ID'
1 '(msgtype=0x52000D,name=???) Route error: message sent to unknown actor ID'
woot!
Status: RESOLVED → VERIFIED
Updated•3 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•