[EME] Async shutdown hang in mozilla::gmp::GeckoMediaPluginService::Observe(...) - Add crash report annotations to track GMPParent shutdown

RESOLVED FIXED in Firefox 40

Status

()

defect
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: gerald, Assigned: gerald)

Tracking

(Blocks 1 bug)

unspecified
mozilla41
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox40 fixed, firefox41 fixed, firefox42 fixed)

Details

(crash signature)

Attachments

(1 attachment)

Another follow-up to bug 1173634:

Shutdown-hang crash reports show that we were waiting for plugin to asynchronously shutdown, but the expected timeout in GMPParent didn't happen!

A closer look at the code shows that when GMPService handles profile-change-teardown, for each plugin it calls GMPParent::CloseActive, which does a SendCloseActive; Later each GMPParent should receive 1+ RecvPGMPContentChildDestroyed and eventually should call CloseIfUnused, and only then would start a timer for the async shutdown...
So if the blockage happened during the GMPParent::CloseActive portion, this would explain why the GMPParent async-shutdown timer didn't fire in reports: It was never started!

To test this theory, we should add crash report annotations showing the current parent-side state of each plugin being shutdown.
Later we can take other measures (like adding another timer in GMPParent; and/or a timer in GMPService, see bug 1175765.)
Added crash report annotations to track GMPParent shutdown steps, so in case of shutdown hangs waiting for GMP async shutdown, we can see where some plugins are stuck.
Attachment #8625383 - Flags: review?(cpearce)
Attachment #8625383 - Flags: review?(cpearce) → review+
https://hg.mozilla.org/mozilla-central/rev/ec338746b491
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Please consider uplifting to 40.
Flags: needinfo?(cpearce)
(In reply to Gerald Squelart [:gerald] from comment #5)
> Please consider uplifting to 40.

Wrong number, this patch is already in 41 (now Aurora).
Please uplift to 42 (Beta).

https://treeherder.mozilla.org/#/jobs?repo=try&revision=164bf7c61327
I thought I made a mistake, but I was wrong.

So: Patch is in 41 (Aurora), should be uplifted to 40 (Beta).
[Tracking Requested - why for this release]: We should get this patch uplifted ASAP, so we can debug the shutdown hangs ASAP.
Comment on attachment 8625383 [details] [diff] [review]
1175783-gmpparent-shutdown-annotations.patch

Approval Request Comment
[Feature/regressing bug #]: EME

[User impact if declined]: We're seeing crash reports which are shutdown hangs on Beta40 waiting for either EME plugins or OpenH264 to shutdown. This patch adds further annotations, so that we can debug and fix these shutdown hangs.

Denying uplift on this patch will make it take longer for us to fix the shutdown hangs.

[Describe test coverage new/current, TreeHerder]: We have lots of EME mochitests

[Risks and why]: Pretty low; this just adds crash reporter annotations.

[String/UUID change made/needed]: None.
Attachment #8625383 - Flags: approval-mozilla-beta?
Comment on attachment 8625383 [details] [diff] [review]
1175783-gmpparent-shutdown-annotations.patch

Taking it to help debugging.
Attachment #8625383 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
In general I recommend using fixed (not printf) annotation keys. When every process writes a different key due to `this` pointers, you lose the ability to search on that key or otherwise look at the data in aggregate.
(In reply to David Major [:dmajor] from comment #12)
> In general I recommend using fixed (not printf) annotation keys. When every
> process writes a different key due to `this` pointers, you lose the ability
> to search on that key or otherwise look at the data in aggregate.

(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #13)
> Backed out at dmajor's request.
> https://hg.mozilla.org/releases/mozilla-beta/rev/c332d0157b9f

Sorry for the backout. I'm worried that Socorro won't be happy with the explosion in keys (in particular I'm thinking of things like the "non-searchable keys" list on the admin page). I see that the change has been around for a while but I'm worried about greater exposure from the beta channel.

Could I convince you to write these in an alternative form, perhaps something like:
AnnotateCrashReport("AsyncPluginShutdown-name", ...);
AnnotateCrashReport("AsyncPluginShutdown-this", ...);
AnnotateCrashReport("AsyncPluginShutdown-message", ...);
Flags: needinfo?(gsquelart)
Blocks: 1183433
(In reply to David Major [:dmajor] from comment #14)
> (In reply to David Major [:dmajor] from comment #12)
> > In general I recommend using fixed (not printf) annotation keys. When every
> > process writes a different key due to `this` pointers, you lose the ability
> > to search on that key or otherwise look at the data in aggregate.
> 
> (In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #13)
> > Backed out at dmajor's request.
> > https://hg.mozilla.org/releases/mozilla-beta/rev/c332d0157b9f
> 
> Sorry for the backout. I'm worried that Socorro won't be happy with the
> explosion in keys (in particular I'm thinking of things like the
> "non-searchable keys" list on the admin page). I see that the change has
> been around for a while but I'm worried about greater exposure from the beta
> channel.

Thanks for finding this potential issue, makes sense.

> Could I convince you to write these in an alternative form, perhaps
> something like:
> AnnotateCrashReport("AsyncPluginShutdown-name", ...);
> AnnotateCrashReport("AsyncPluginShutdown-this", ...);
> AnnotateCrashReport("AsyncPluginShutdown-message", ...);

The problem is that multiple plugins could each be used multiple times, and I was hoping to track the state of each use when a shutdown-hang happens.

Spawned bug 1183433 to work on an alternative.
Flags: needinfo?(gsquelart)
Flags: needinfo?(cpearce)
See Also: → 1183971
See Also: → 1183972
Won't uplift to beta/40 for now, waiting for bug 1183433 to get to aurora/41 first.
Please uplift the following patches together to beta/40: bug 1175783, bug 1175765, bug 1183433, bug 1185392.

BETA try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b70f2bca3b28

(Note: This patch here requires a slight rebase, should be easy to do manually or you may use the patch from the try run)
Flags: needinfo?(cpearce)
Comment on attachment 8625383 [details] [diff] [review]
1175783-gmpparent-shutdown-annotations.patch

Approval Request Comment

[Feature/regressing bug #]: EME

[User impact if declined]: We're seeing crash reports which are shutdown hangs on Beta40 waiting for either EME plugins or OpenH264 to shutdown. This patch adds further annotations, so that we can debug and fix these shutdown hangs.
Denying uplift on this patch (and its follow-up bug 1183433 and bug 1185392) will make it take longer for us to fix the shutdown hangs.

[Describe test coverage new/current, TreeHerder]: We have lots of EME mochitests.

[Risks and why]: Pretty low; this just adds crash reporter annotations.

[String/UUID change made/needed]: None.
Attachment #8625383 - Flags: approval-mozilla-beta+ → approval-mozilla-beta?
Comment on attachment 8625383 [details] [diff] [review]
1175783-gmpparent-shutdown-annotations.patch

I asked rillian and jesup to review the series of patches for EME as well because we're late in the 40 cycle. Their suggestion is to take the changes as they put us in a better state that we were in previously and because waiting an additional ~2.5 weeks for 41 to hit Beta will be detrimental. I'm approving for Beta. We need to watch the beta8 build to ensure to the best of our ability that we have not regressed anything related to WebRTC or EME. Beta+
Attachment #8625383 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: needinfo?(cpearce)
You need to log in before you can comment on or make changes to this bug.