Closed Bug 1132874 Opened 9 years ago Closed 9 years ago

Content process aborts on PPluginWidget::Msg_ParentShutdown

Categories

(Core Graveyard :: Plug-ins, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

(e10sm5+, firefox38 wontfix, firefox39 wontfix, firefox40 fixed)

VERIFIED FIXED
mozilla40
Tracking Status
e10s m5+ ---
firefox38 --- wontfix
firefox39 --- wontfix
firefox40 --- fixed

People

(Reporter: jimm, Assigned: jimm)

References

Details

(Keywords: crash, topcrash)

Crash Data

Attachments

(1 file, 10 obsolete files)

Waiting to see where this settles in the top crash list.

http://www.mathies.com/mozilla/bug1116884.txt

requesting: mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::dom::ContentChild::ProcessingError(mozilla::ipc::HasResultCodes::Result, char const*)
 product: Firefox
 rev: 38.0a1
 process: content
 build date range: >=20150211000000 && <=20150213900000
processing 100 reports.

Child protocol abort error messages:
-------------------------------------
1 (msgtype=0x48000D,name=???) Route error: message sent to unknown actor ID
97 (msgtype=0x840009,name=???) Route error: message sent to unknown actor ID

0x84 = PPluginWidget
0x0009 = Msg_ParentShutdown
Assignee: nobody → jmathies
Crash Signature: [@ mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::dom::ContentChild::ProcessingError(mozilla::ipc::HasResultCodes::Result, char const*)]
#3 top crasher overall, #2 top content crasher, plugin tear down related -> this should block m5.
Keywords: crash, topcrash
Attached patch patch (obsolete) — Splinter Review
Attached patch patch (checked in) (obsolete) — Splinter Review
Similar to the patch in bug 1127378 that was backed out while hopefully avoiding any additional regressions.
Attachment #8565143 - Attachment is obsolete: true
Attachment #8565557 - Flags: review?(aklotz)
Comment on attachment 8565557 [details] [diff] [review]
patch (checked in)

Review of attachment 8565557 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/plugins/ipc/PluginWidgetChild.h
@@ +5,5 @@
>  #ifndef mozilla_plugins_PluginWidgetChild_h
>  #define mozilla_plugins_PluginWidgetChild_h
>  
>  #include "mozilla/plugins/PPluginWidgetChild.h"
> +#include "mozilla/plugins/PluginWidgetParent.h"

Is this needed here?
Attachment #8565557 - Flags: review?(aklotz) → review+
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #6)
> Comment on attachment 8565557 [details] [diff] [review]
> patch
> 
> Review of attachment 8565557 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/plugins/ipc/PluginWidgetChild.h
> @@ +5,5 @@
> >  #ifndef mozilla_plugins_PluginWidgetChild_h
> >  #define mozilla_plugins_PluginWidgetChild_h
> >  
> >  #include "mozilla/plugins/PPluginWidgetChild.h"
> > +#include "mozilla/plugins/PluginWidgetParent.h"
> 
> Is this needed here?

Nope, thanks. Moved to the cpp.
https://hg.mozilla.org/mozilla-central/rev/e19b2d82fee0
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch avoid aborts patch (obsolete) — Splinter Review
Bill, asking for your review, also wondering if you might have some alternative ideas here. The problem seems to be that PPluginWidget can get torn down from either side of the connection including starting a tear down from both sides simultaneously. There are some comments explaining the two ways PPluginWidget tears down here [1]. We're still running into a situation where the browser ends up with a ParentShutdown message on the child side after the child actor is gone. I'd like to avoid the resulting abort since it's a passive error that can be ignored.

[1] http://mxr.mozilla.org/mozilla-central/source/dom/plugins/ipc/PluginWidgetParent.cpp#44
Attachment #8569904 - Flags: review?(wmccloskey)
Well, I don't think we should do it this way :-). Dealing with actor deletion, especially when both sides can delete, is a pretty tricky part of IPDL. I'll try explain how I understand it, in theory.

1. Both the parent and the child need to keep a state variable, mDestroyed or something, that says whether they're allowed to send messages or not.
2. When one side (say A) wants to destroy, it sends a Destroy message and sets its own mDestroyed flag.
3. When side B gets the Destroy message, it sends the __delete__ message back to side A to complete the actor's destruction.
4. Every time either side is going to send a message, it checks mDestroyed first and doesn't send if it's set.
5. If either side gets an ActorDestroy message, it sets mDestroyed but doesn't send the Destroy message.

The really tricky part is what happens if we tear down the parent protocol. In your case that happens when the TabParent sends a PBrowser::Destroy message to TabChild. It looks like you already have code to deal with that. All that code needs to do, in TabParent, is set the mDestroyed flag on all the widgets. That way they won't be allowed to send anymore. Once the child receives the PBrowser::Destroy message, it will send a PBrowser::__delete__ message to the parent, which will cause the widget actor to be deleted as well.

It looks like the code we have already sorta implements something like this scheme. I'm hoping you just need to bring it a little closer to what I described. Let me know if you run into any problems or if this doesn't match your understanding.
Blocks: 1136375
Attachment #8569904 - Attachment is obsolete: true
(In reply to Bill McCloskey (:billm) from comment #12)
> Well, I don't think we should do it this way :-). Dealing with actor
> deletion, especially when both sides can delete, is a pretty tricky part of
> IPDL. I'll try explain how I understand it, in theory.
> 
> 1. Both the parent and the child need to keep a state variable, mDestroyed
> or something, that says whether they're allowed to send messages or not.
> 2. When one side (say A) wants to destroy, it sends a Destroy message and
> sets its own mDestroyed flag.
> 3. When side B gets the Destroy message, it sends the __delete__ message
> back to side A to complete the actor's destruction.
> 4. Every time either side is going to send a message, it checks mDestroyed
> first and doesn't send if it's set.
> 5. If either side gets an ActorDestroy message, it sets mDestroyed but
> doesn't send the Destroy message.
> 
> The really tricky part is what happens if we tear down the parent protocol.
> In your case that happens when the TabParent sends a PBrowser::Destroy
> message to TabChild. It looks like you already have code to deal with that.
> All that code needs to do, in TabParent, is set the mDestroyed flag on all
> the widgets. That way they won't be allowed to send anymore. Once the child
> receives the PBrowser::Destroy message, it will send a PBrowser::__delete__
> message to the parent, which will cause the widget actor to be deleted as
> well.
> 
> It looks like the code we have already sorta implements something like this
> scheme. I'm hoping you just need to bring it a little closer to what I
> described. Let me know if you run into any problems or if this doesn't match
> your understanding.

Thanks for the details response. I think I have most of this in place, although mDestroyed is mWidget in this case. Hopefully through bug 1136375 I can reproduce this on try so I can test a final fix.
Attached patch harden parent side patch (obsolete) — Splinter Review
Attachment #8565557 - Attachment description: patch → patch (checked in)
Currently trying to reproduce this using pgo builds on try. Not having much luck though.

mc tip:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b56a61c964b6

w/patch:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1e6c41b44a9d
(In reply to Jim Mathies [:jimm] from comment #15)
> Currently trying to reproduce this using pgo builds on try. Not having much
> luck though.
> 
> mc tip:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=b56a61c964b6

got one here, which confirms we can reproduce on try. 

https://treeherder.mozilla.org/logviewer.html#?job_id=5417017&repo=try
Attached patch patch (obsolete) — Splinter Review
This is looking pretty stable on try, so lets see if it helps.

patch w/retriggers:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9f80d8078c82
patch with logging and retriggers:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1e6c41b44a9d
mc tip:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b56a61c964b6
Attachment #8572664 - Attachment is obsolete: true
Attachment #8574277 - Flags: review?(aklotz)
Attachment #8574277 - Flags: review?(aklotz) → review+
This patch tweaked something on Linux and introduced a shutdown crash. Throwing stuff at try to try and figure out what happened.
Attached patch patch (checked in) (obsolete) — Splinter Review
A change in the order of calls in PluginWidgetParent::KillWidget() seems to have solved this.
Attachment #8574277 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/0dbdf8b62e2f
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Unfortunately I'm going to have to reopen this again since we still do not have a good fix here.

Child protocol error abort messages:
------------------------------------
293	'(msgtype=0x8A0009,name=???) Route error: message sent to unknown actor ID'


http://www.mathies.com/mozilla/bug1116884.txt
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This signature is by far the #1 on Nightly right now with 15% of the total submitted crashes.

Is bug 1130734 a dupe of this one?
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #25)
> This signature is by far the #1 on Nightly right now with 15% of the total
> submitted crashes.
> 
> Is bug 1130734 a dupe of this one?

bug 1130734 is a meta bug that currently tracks the main signature for these aborts. These can be triggered on any sub-protocol, it just happens that currently the only abort we have in large numbers related to a plugin shutdown msg.
Attachment #8575229 - Attachment description: patch → patch (checked in)
Attached patch wip (obsolete) — Splinter Review
Attachment #8565557 - Attachment is obsolete: true
Attachment #8575229 - Attachment is obsolete: true
(In reply to Jim Mathies [:jimm] from comment #27)
> Created attachment 8588099 [details] [diff] [review]
> wip

Unfortunately this just swaps the shutdown message crash for the async window update crash. But at least things are bit simpler.
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetParent::SendAsyncUpdate()
PluginWidgetChild::ProxyShutdown()
PluginWidgetChild::ActorDestroy(1)
PluginWidgetChild::KillWidget()
PluginWidgetChild::~PluginWidgetChild()
PluginWidgetParent::ActorDestroy(1)
PluginWidgetParent::KillWidget() widget=0BD35800
###!!! [Child][DispatchAsyncMessage] Error: (msgtype=0x980008,name=???) Route error: message sent to unknown actor ID
PluginWidgetParent::~PluginWidgetParent()
What I might be able to do here is make that __delete__ message sent to the parent sync. ipc should process any async update messages it receives while waiting on the response to __delete__. Once the child gets that response back, the child shouldn't receive any more message from the parent.
(In reply to Jim Mathies [:jimm] from comment #30)
> What I might be able to do here is make that __delete__ message sent to the
> parent sync. ipc should process any async update messages it receives while
> waiting on the response to __delete__. Once the child gets that response
> back, the child shouldn't receive any more message from the parent.

ipdl pre-processing barfs on this. :/
Ah, I think I have a fix - lets move that last remaining async UpdateWindow api on the child side to a different protocol. The abort on late delivery problem no longer exists.
Attached patch wip (obsolete) — Splinter Review
This seems to work pretty well.
Attachment #8588099 - Attachment is obsolete: true
Attached patch patch v.1 (obsolete) — Splinter Review
per billm's suggestion, I've added parent protocol state checks around all the send calls.
Attachment #8588205 - Attachment is obsolete: true
Comment on attachment 8588221 [details] [diff] [review]
patch v.1

Review of attachment 8588221 [details] [diff] [review]:
-----------------------------------------------------------------

try push looks good:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f6ec1f40e3cf
Attachment #8588221 - Flags: review?(aklotz)
Attached patch patch v.2 (obsolete) — Splinter Review
Following another suggestion from billm, skip the async Send__delete__() call on the parent side, which works - both sides continue to tear down correctly.  This effectively removes all async apis on the child side. *awesome*
Attachment #8588221 - Attachment is obsolete: true
Attachment #8588221 - Flags: review?(aklotz)
Attachment #8588337 - Flags: review?(aklotz)
Comment on attachment 8588337 [details] [diff] [review]
patch v.2

Review of attachment 8588337 [details] [diff] [review]:
-----------------------------------------------------------------

Mostly nits re: alphabetical ordering for #includes (I like to ensure we're not making the situation worse than it is, at least). There is one nullptr check that needs to be fixed though.

::: dom/ipc/ContentChild.cpp
@@ +49,5 @@
>  #include "mozilla/layers/PCompositorChild.h"
>  #include "mozilla/layers/SharedBufferManagerChild.h"
>  #include "mozilla/net/NeckoChild.h"
>  #include "mozilla/plugins/PluginModuleParent.h"
> +#include "mozilla/plugins/PluginInstanceParent.h"

Nit: move up a line to preserve alphabetical ordering

::: dom/plugins/ipc/PluginWidgetChild.cpp
@@ +3,5 @@
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
>  #include "mozilla/plugins/PluginWidgetChild.h"
>  #include "mozilla/plugins/PluginWidgetParent.h"
> +#include "mozilla/dom/TabChild.h"

Nit: add blank line under PluginWidgetChild.h include and then move this include above PluginWidgetParent.

@@ +8,2 @@
>  #include "PluginWidgetProxy.h"
> +#include "mozilla/unused.h"

Nit: move down a line

::: dom/plugins/ipc/PluginWidgetParent.cpp
@@ +6,5 @@
>  #include "mozilla/dom/TabParent.h"
>  #include "mozilla/dom/ContentParent.h"
>  #include "nsComponentManagerUtils.h"
>  #include "nsWidgetsCID.h"
> +#include "mozilla/unused.h"

Nit: move down a line

::: widget/PluginWidgetProxy.cpp
@@ +118,5 @@
>  
>  void*
>  PluginWidgetProxy::GetNativeData(uint32_t aDataType)
>  {
> +  auto tab = static_cast<mozilla::dom::TabChild*>(mActor->Manager());

You're using mActor but it is not checked for nullptr until the following line. Please fix!

::: widget/windows/nsWindowGfx.cpp
@@ +18,5 @@
>   **************************************************************/
>  
>  #include "mozilla/plugins/PluginInstanceParent.h"
>  using mozilla::plugins::PluginInstanceParent;
> +#include "mozilla/dom/ContentParent.h"

Nit: please move this 2 lines up to make this alphabetical. At least that way we won't worsen the include ordering situation in this file. :-P
Attachment #8588337 - Flags: review?(aklotz) → review+
Attached patch patch (r=aklotz)Splinter Review
Ok, updated per comments, lets hope for good results.
Attachment #8588337 - Attachment is obsolete: true
Keywords: checkin-needed
Do we need to backport anything to 38/39 or should we call them wontfix? This bug has become a PITA for tracking purposes because of how things landed :\
Flags: needinfo?(jmathies)
Target Milestone: mozilla38 → ---
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #39)
> Do we need to backport anything to 38/39 or should we call them wontfix?
> This bug has become a PITA for tracking purposes because of how things
> landed :\

This shouldn't show up in any build other than nightly. AFAICT, that's what we're seeing. There are crashes on crashstats for 39 and 38, but they all appear to be nightly channel builds. I don't think you need to track this at all.


nightly channel 38 crashes:

https://crash-stats.mozilla.com/search/?product=Firefox&signature=%3Dmozalloc_abort%28char+const*+const%29+|+NS_DebugBreak+|+mozilla%3A%3Adom%3A%3AContentChild%3A%3AProcessingError%28mozilla%3A%3Aipc%3A%3AHasResultCodes%3A%3AResult%2C+char+const*%29&version=38.0a1&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports

non-nightly channel 38a1 crashes: 0

https://crash-stats.mozilla.com/search/?product=Firefox&signature=%3Dmozalloc_abort%28char+const*+const%29+|+NS_DebugBreak+|+mozilla%3A%3Adom%3A%3AContentChild%3A%3AProcessingError%28mozilla%3A%3Aipc%3A%3AHasResultCodes%3A%3AResult%2C+char+const*%29&version=38.0a1&release_channel=!nightly&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports
Flags: needinfo?(jmathies) → needinfo?(ryanvm)
Blocks: 1130838
https://hg.mozilla.org/mozilla-central/rev/24bb85e9dced
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
No longer blocks: 1130838
Yesterday's nightly build:

Bug 1130734 Child protocol error abort messages:
----------------------------------------------------
1       '(msgtype=0x0,name=???) Route error: message sent to unknown actor ID'
1       '(msgtype=0x52000D,name=???) Route error: message sent to unknown actor ID'

woot!
Status: RESOLVED → VERIFIED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: