Content process aborts on PPluginWidget::Msg_ParentShutdown

VERIFIED FIXED in Firefox 40

Status

()

VERIFIED FIXED
4 years ago
4 years ago

People

(Reporter: jimm, Assigned: jimm)

Tracking

(Blocks: 1 bug, {crash, topcrash})

Trunk
mozilla40
x86_64
Windows 7
crash, topcrash
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(e10sm5+, firefox38 wontfix, firefox39 wontfix, firefox40 fixed)

Details

(crash signature)

Attachments

(1 attachment, 10 obsolete attachments)

(Assignee)

Description

4 years ago
Waiting to see where this settles in the top crash list.

http://www.mathies.com/mozilla/bug1116884.txt

requesting: mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::dom::ContentChild::ProcessingError(mozilla::ipc::HasResultCodes::Result, char const*)
 product: Firefox
 rev: 38.0a1
 process: content
 build date range: >=20150211000000 && <=20150213900000
processing 100 reports.

Child protocol abort error messages:
-------------------------------------
1 (msgtype=0x48000D,name=???) Route error: message sent to unknown actor ID
97 (msgtype=0x840009,name=???) Route error: message sent to unknown actor ID

0x84 = PPluginWidget
0x0009 = Msg_ParentShutdown
(Assignee)

Updated

4 years ago
Assignee: nobody → jmathies
(Assignee)

Updated

4 years ago
Crash Signature: [@ mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::dom::ContentChild::ProcessingError(mozilla::ipc::HasResultCodes::Result, char const*)]
(Assignee)

Comment 1

4 years ago
#3 top crasher overall, #2 top content crasher, plugin tear down related -> this should block m5.
tracking-e10s: ? → m5+
(Assignee)

Updated

4 years ago
Keywords: crash, topcrash
(Assignee)

Comment 5

4 years ago
Created attachment 8565557 [details] [diff] [review]
patch (checked in)

Similar to the patch in bug 1127378 that was backed out while hopefully avoiding any additional regressions.
Attachment #8565143 - Attachment is obsolete: true
Attachment #8565557 - Flags: review?(aklotz)
Comment on attachment 8565557 [details] [diff] [review]
patch (checked in)

Review of attachment 8565557 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/plugins/ipc/PluginWidgetChild.h
@@ +5,5 @@
>  #ifndef mozilla_plugins_PluginWidgetChild_h
>  #define mozilla_plugins_PluginWidgetChild_h
>  
>  #include "mozilla/plugins/PPluginWidgetChild.h"
> +#include "mozilla/plugins/PluginWidgetParent.h"

Is this needed here?
Attachment #8565557 - Flags: review?(aklotz) → review+
(Assignee)

Comment 7

4 years ago
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #6)
> Comment on attachment 8565557 [details] [diff] [review]
> patch
> 
> Review of attachment 8565557 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/plugins/ipc/PluginWidgetChild.h
> @@ +5,5 @@
> >  #ifndef mozilla_plugins_PluginWidgetChild_h
> >  #define mozilla_plugins_PluginWidgetChild_h
> >  
> >  #include "mozilla/plugins/PPluginWidgetChild.h"
> > +#include "mozilla/plugins/PluginWidgetParent.h"
> 
> Is this needed here?

Nope, thanks. Moved to the cpp.
https://hg.mozilla.org/mozilla-central/rev/e19b2d82fee0
Status: NEW → RESOLVED
Last Resolved: 4 years ago
status-firefox38: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
(Assignee)

Updated

4 years ago
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 11

4 years ago
Created attachment 8569904 [details] [diff] [review]
avoid aborts patch

Bill, asking for your review, also wondering if you might have some alternative ideas here. The problem seems to be that PPluginWidget can get torn down from either side of the connection including starting a tear down from both sides simultaneously. There are some comments explaining the two ways PPluginWidget tears down here [1]. We're still running into a situation where the browser ends up with a ParentShutdown message on the child side after the child actor is gone. I'd like to avoid the resulting abort since it's a passive error that can be ignored.

[1] http://mxr.mozilla.org/mozilla-central/source/dom/plugins/ipc/PluginWidgetParent.cpp#44
Attachment #8569904 - Flags: review?(wmccloskey)
Well, I don't think we should do it this way :-). Dealing with actor deletion, especially when both sides can delete, is a pretty tricky part of IPDL. I'll try explain how I understand it, in theory.

1. Both the parent and the child need to keep a state variable, mDestroyed or something, that says whether they're allowed to send messages or not.
2. When one side (say A) wants to destroy, it sends a Destroy message and sets its own mDestroyed flag.
3. When side B gets the Destroy message, it sends the __delete__ message back to side A to complete the actor's destruction.
4. Every time either side is going to send a message, it checks mDestroyed first and doesn't send if it's set.
5. If either side gets an ActorDestroy message, it sets mDestroyed but doesn't send the Destroy message.

The really tricky part is what happens if we tear down the parent protocol. In your case that happens when the TabParent sends a PBrowser::Destroy message to TabChild. It looks like you already have code to deal with that. All that code needs to do, in TabParent, is set the mDestroyed flag on all the widgets. That way they won't be allowed to send anymore. Once the child receives the PBrowser::Destroy message, it will send a PBrowser::__delete__ message to the parent, which will cause the widget actor to be deleted as well.

It looks like the code we have already sorta implements something like this scheme. I'm hoping you just need to bring it a little closer to what I described. Let me know if you run into any problems or if this doesn't match your understanding.
(Assignee)

Updated

4 years ago
Blocks: 1136375
(Assignee)

Updated

4 years ago
Attachment #8569904 - Attachment is obsolete: true
(Assignee)

Comment 13

4 years ago
(In reply to Bill McCloskey (:billm) from comment #12)
> Well, I don't think we should do it this way :-). Dealing with actor
> deletion, especially when both sides can delete, is a pretty tricky part of
> IPDL. I'll try explain how I understand it, in theory.
> 
> 1. Both the parent and the child need to keep a state variable, mDestroyed
> or something, that says whether they're allowed to send messages or not.
> 2. When one side (say A) wants to destroy, it sends a Destroy message and
> sets its own mDestroyed flag.
> 3. When side B gets the Destroy message, it sends the __delete__ message
> back to side A to complete the actor's destruction.
> 4. Every time either side is going to send a message, it checks mDestroyed
> first and doesn't send if it's set.
> 5. If either side gets an ActorDestroy message, it sets mDestroyed but
> doesn't send the Destroy message.
> 
> The really tricky part is what happens if we tear down the parent protocol.
> In your case that happens when the TabParent sends a PBrowser::Destroy
> message to TabChild. It looks like you already have code to deal with that.
> All that code needs to do, in TabParent, is set the mDestroyed flag on all
> the widgets. That way they won't be allowed to send anymore. Once the child
> receives the PBrowser::Destroy message, it will send a PBrowser::__delete__
> message to the parent, which will cause the widget actor to be deleted as
> well.
> 
> It looks like the code we have already sorta implements something like this
> scheme. I'm hoping you just need to bring it a little closer to what I
> described. Let me know if you run into any problems or if this doesn't match
> your understanding.

Thanks for the details response. I think I have most of this in place, although mDestroyed is mWidget in this case. Hopefully through bug 1136375 I can reproduce this on try so I can test a final fix.
(Assignee)

Comment 14

4 years ago
Created attachment 8572664 [details] [diff] [review]
harden parent side patch
(Assignee)

Updated

4 years ago
Attachment #8565557 - Attachment description: patch → patch (checked in)
(Assignee)

Comment 15

4 years ago
Currently trying to reproduce this using pgo builds on try. Not having much luck though.

mc tip:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b56a61c964b6

w/patch:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1e6c41b44a9d
(Assignee)

Comment 16

4 years ago
(In reply to Jim Mathies [:jimm] from comment #15)
> Currently trying to reproduce this using pgo builds on try. Not having much
> luck though.
> 
> mc tip:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=b56a61c964b6

got one here, which confirms we can reproduce on try. 

https://treeherder.mozilla.org/logviewer.html#?job_id=5417017&repo=try
(Assignee)

Comment 17

4 years ago
Created attachment 8574277 [details] [diff] [review]
patch

This is looking pretty stable on try, so lets see if it helps.

patch w/retriggers:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9f80d8078c82
patch with logging and retriggers:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1e6c41b44a9d
mc tip:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b56a61c964b6
Attachment #8572664 - Attachment is obsolete: true
Attachment #8574277 - Flags: review?(aklotz)
Attachment #8574277 - Flags: review?(aklotz) → review+
(Assignee)

Comment 20

4 years ago
This patch tweaked something on Linux and introduced a shutdown crash. Throwing stuff at try to try and figure out what happened.
(Assignee)

Comment 21

4 years ago
Created attachment 8575229 [details] [diff] [review]
patch (checked in)

A change in the order of calls in PluginWidgetParent::KillWidget() seems to have solved this.
Attachment #8574277 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/0dbdf8b62e2f
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago4 years ago
status-firefox39: --- → fixed
Resolution: --- → FIXED
(Assignee)

Comment 24

4 years ago
Unfortunately I'm going to have to reopen this again since we still do not have a good fix here.

Child protocol error abort messages:
------------------------------------
293	'(msgtype=0x8A0009,name=???) Route error: message sent to unknown actor ID'


http://www.mathies.com/mozilla/bug1116884.txt
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Comment 25

4 years ago
This signature is by far the #1 on Nightly right now with 15% of the total submitted crashes.

Is bug 1130734 a dupe of this one?
(Assignee)

Comment 26

4 years ago
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #25)
> This signature is by far the #1 on Nightly right now with 15% of the total
> submitted crashes.
> 
> Is bug 1130734 a dupe of this one?

bug 1130734 is a meta bug that currently tracks the main signature for these aborts. These can be triggered on any sub-protocol, it just happens that currently the only abort we have in large numbers related to a plugin shutdown msg.
(Assignee)

Updated

4 years ago
Attachment #8575229 - Attachment description: patch → patch (checked in)
(Assignee)

Comment 27

4 years ago
Created attachment 8588099 [details] [diff] [review]
wip
Attachment #8565557 - Attachment is obsolete: true
Attachment #8575229 - Attachment is obsolete: true
(Assignee)

Comment 28

4 years ago
(In reply to Jim Mathies [:jimm] from comment #27)
> Created attachment 8588099 [details] [diff] [review]
> wip

Unfortunately this just swaps the shutdown message crash for the async window update crash. But at least things are bit simpler.
(Assignee)

Comment 29

4 years ago
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetChild::ProxyShutdown()
PluginWidgetChild::ActorDestroy(1)
PluginWidgetChild::KillWidget()
PluginWidgetChild::~PluginWidgetChild()
PluginWidgetParent::ActorDestroy(1)
PluginWidgetParent::KillWidget() widget=0BD35800
###!!! [Child][DispatchAsyncMessage] Error: (msgtype=0x980008,name=???) Route error: message sent to unknown actor ID
PluginWidgetParent::~PluginWidgetParent()
(Assignee)

Comment 30

4 years ago
What I might be able to do here is make that __delete__ message sent to the parent sync. ipc should process any async update messages it receives while waiting on the response to __delete__. Once the child gets that response back, the child shouldn't receive any more message from the parent.
(Assignee)

Comment 31

4 years ago
(In reply to Jim Mathies [:jimm] from comment #30)
> What I might be able to do here is make that __delete__ message sent to the
> parent sync. ipc should process any async update messages it receives while
> waiting on the response to __delete__. Once the child gets that response
> back, the child shouldn't receive any more message from the parent.

ipdl pre-processing barfs on this. :/
(Assignee)

Comment 32

4 years ago
Ah, I think I have a fix - lets move that last remaining async UpdateWindow api on the child side to a different protocol. The abort on late delivery problem no longer exists.
(Assignee)

Comment 33

4 years ago
Created attachment 8588205 [details] [diff] [review]
wip

This seems to work pretty well.
Attachment #8588099 - Attachment is obsolete: true
(Assignee)

Comment 34

4 years ago
Created attachment 8588221 [details] [diff] [review]
patch v.1

per billm's suggestion, I've added parent protocol state checks around all the send calls.
(Assignee)

Updated

4 years ago
Attachment #8588205 - Attachment is obsolete: true
(Assignee)

Comment 35

4 years ago
Comment on attachment 8588221 [details] [diff] [review]
patch v.1

Review of attachment 8588221 [details] [diff] [review]:
-----------------------------------------------------------------

try push looks good:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f6ec1f40e3cf
Attachment #8588221 - Flags: review?(aklotz)
(Assignee)

Comment 36

4 years ago
Created attachment 8588337 [details] [diff] [review]
patch v.2

Following another suggestion from billm, skip the async Send__delete__() call on the parent side, which works - both sides continue to tear down correctly.  This effectively removes all async apis on the child side. *awesome*
Attachment #8588221 - Attachment is obsolete: true
Attachment #8588221 - Flags: review?(aklotz)
Attachment #8588337 - Flags: review?(aklotz)
Comment on attachment 8588337 [details] [diff] [review]
patch v.2

Review of attachment 8588337 [details] [diff] [review]:
-----------------------------------------------------------------

Mostly nits re: alphabetical ordering for #includes (I like to ensure we're not making the situation worse than it is, at least). There is one nullptr check that needs to be fixed though.

::: dom/ipc/ContentChild.cpp
@@ +49,5 @@
>  #include "mozilla/layers/PCompositorChild.h"
>  #include "mozilla/layers/SharedBufferManagerChild.h"
>  #include "mozilla/net/NeckoChild.h"
>  #include "mozilla/plugins/PluginModuleParent.h"
> +#include "mozilla/plugins/PluginInstanceParent.h"

Nit: move up a line to preserve alphabetical ordering

::: dom/plugins/ipc/PluginWidgetChild.cpp
@@ +3,5 @@
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
>  #include "mozilla/plugins/PluginWidgetChild.h"
>  #include "mozilla/plugins/PluginWidgetParent.h"
> +#include "mozilla/dom/TabChild.h"

Nit: add blank line under PluginWidgetChild.h include and then move this include above PluginWidgetParent.

@@ +8,2 @@
>  #include "PluginWidgetProxy.h"
> +#include "mozilla/unused.h"

Nit: move down a line

::: dom/plugins/ipc/PluginWidgetParent.cpp
@@ +6,5 @@
>  #include "mozilla/dom/TabParent.h"
>  #include "mozilla/dom/ContentParent.h"
>  #include "nsComponentManagerUtils.h"
>  #include "nsWidgetsCID.h"
> +#include "mozilla/unused.h"

Nit: move down a line

::: widget/PluginWidgetProxy.cpp
@@ +118,5 @@
>  
>  void*
>  PluginWidgetProxy::GetNativeData(uint32_t aDataType)
>  {
> +  auto tab = static_cast<mozilla::dom::TabChild*>(mActor->Manager());

You're using mActor but it is not checked for nullptr until the following line. Please fix!

::: widget/windows/nsWindowGfx.cpp
@@ +18,5 @@
>   **************************************************************/
>  
>  #include "mozilla/plugins/PluginInstanceParent.h"
>  using mozilla::plugins::PluginInstanceParent;
> +#include "mozilla/dom/ContentParent.h"

Nit: please move this 2 lines up to make this alphabetical. At least that way we won't worsen the include ordering situation in this file. :-P
Attachment #8588337 - Flags: review?(aklotz) → review+
(Assignee)

Comment 38

4 years ago
Created attachment 8589089 [details] [diff] [review]
patch (r=aklotz)

Ok, updated per comments, lets hope for good results.
Attachment #8588337 - Attachment is obsolete: true
(Assignee)

Updated

4 years ago
Keywords: checkin-needed
Do we need to backport anything to 38/39 or should we call them wontfix? This bug has become a PITA for tracking purposes because of how things landed :\
status-firefox38: fixed → affected
status-firefox39: fixed → affected
status-firefox40: --- → affected
Flags: needinfo?(jmathies)
Target Milestone: mozilla38 → ---
(Assignee)

Comment 41

4 years ago
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #39)
> Do we need to backport anything to 38/39 or should we call them wontfix?
> This bug has become a PITA for tracking purposes because of how things
> landed :\

This shouldn't show up in any build other than nightly. AFAICT, that's what we're seeing. There are crashes on crashstats for 39 and 38, but they all appear to be nightly channel builds. I don't think you need to track this at all.


nightly channel 38 crashes:

https://crash-stats.mozilla.com/search/?product=Firefox&signature=%3Dmozalloc_abort%28char+const*+const%29+|+NS_DebugBreak+|+mozilla%3A%3Adom%3A%3AContentChild%3A%3AProcessingError%28mozilla%3A%3Aipc%3A%3AHasResultCodes%3A%3AResult%2C+char+const*%29&version=38.0a1&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports

non-nightly channel 38a1 crashes: 0

https://crash-stats.mozilla.com/search/?product=Firefox&signature=%3Dmozalloc_abort%28char+const*+const%29+|+NS_DebugBreak+|+mozilla%3A%3Adom%3A%3AContentChild%3A%3AProcessingError%28mozilla%3A%3Aipc%3A%3AHasResultCodes%3A%3AResult%2C+char+const*%29&version=38.0a1&release_channel=!nightly&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports
Flags: needinfo?(jmathies) → needinfo?(ryanvm)
status-firefox38: affected → wontfix
status-firefox39: affected → wontfix
Flags: needinfo?(ryanvm)
(Assignee)

Updated

4 years ago
Blocks: 1130838
https://hg.mozilla.org/mozilla-central/rev/24bb85e9dced
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago4 years ago
status-firefox40: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
(Assignee)

Updated

4 years ago
No longer blocks: 1130838
Duplicate of this bug: 1130838
(Assignee)

Comment 44

4 years ago
Yesterday's nightly build:

Bug 1130734 Child protocol error abort messages:
----------------------------------------------------
1       '(msgtype=0x0,name=???) Route error: message sent to unknown actor ID'
1       '(msgtype=0x52000D,name=???) Route error: message sent to unknown actor ID'

woot!
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.