Closed Bug 1266286 Opened 8 years ago Closed 8 years ago

Ensure GMP crash reports are being reported for EME GMPs with e10s enabled

Categories

(Core :: Audio/Video: GMP, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla49
Tracking Status
e10s m9+ ---
firefox47 --- fixed
firefox48 --- fixed
firefox49 --- verified

People

(Reporter: cpearce, Assigned: u480271)

References

Details

Attachments

(1 file, 3 obsolete files)

We discovered via FHR telemetry in Bug 1265815 that we were getting millions of GMP crashes, yet we have a much smaller number of crashes reported in crash-stats:

https://crash-stats.mozilla.com/search/?gmp_plugin=!__null__&date=%3E%3D2016-01-01&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

You can cause a GMP to crash by creating or toggling the pref media.gmp.plugin.crash=true.

Submitting crash reports is opt-in, but we need to make sure users are being prompted to submit them. I am prompted when gmp-clearkey is crashed in beta, but not when widevine is crashed in my Nightly build.

We need to ensure that we're getting crash reports for the unencrypted decoding case. You can force unencrypted decoding by disabling Windows Media Foundation by setting the pref media.wmf.enabled = false.

I think the problem may be that we're not setting a CrashedEventTarget on the unencrypted decode path. Basically, we set-up the link to the window to dispatch the crahsed event to in the EME case here:
http://mxr.mozilla.org/mozilla-central/source/dom/media/eme/MediaKeys.cpp#405

but I can't see where we're doing this in the unencrypted media playback decode path.
I also note I'm getting the crash report UI if e10s is disabled, but not if it is enabled.
Summary: Ensure GMP crash reports are being reported for unencrypted GMP decoding → Ensure GMP crash reports are being reported for EME with e10s enabled or for unencrypted GMP decoding
See also bug 1146955 where GMP crash reports were made to work with e10s on. Must have been regressed somewhere along the way.
See Also: → 1146955
And see also bug 1135541 for the non WebRTC GMP crash reporting.
See Also: → 1135541
Hey Mike, you worked on bug 1146955, any insight you can offer here?
Flags: needinfo?(mconley)
Hey cpearce, do you have some steps-to-reproduce for me? I used https://mozilla.github.io/webrtc-landing/pc_test.html with H.264 video to get a GMP running, and toggling media.gmp.plugin.crash caused the crash reporter UI to be displayed with e10s enabled.
Flags: needinfo?(mconley) → needinfo?(cpearce)
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #5)
> Hey cpearce, do you have some steps-to-reproduce for me? I used
> https://mozilla.github.io/webrtc-landing/pc_test.html with H.264 video to
> get a GMP running, and toggling media.gmp.plugin.crash caused the crash
> reporter UI to be displayed with e10s enabled.

Note that OpenH264/WebRTC and EME use different mechanisms to figure out which windows to fire the PluginCrashedEvent at. So you can't repro this failure with WebRTC/OpenH264.

You can repro using these STR:

1. Ensure e10s is enabled.
2. Open http://people.mozilla.org/~cpearce/mse-clearkey/
3. Wait for playback to start.
4. Toggle media.gmp.plugin.crash.
5. Observe playback fails, but no crash report UI pops up.

The EME/GMP crash reporting code works by calling GMPService::AddPluginCrashedEventTarget() to add a crash handler in MediaKeySystemAccess::NotifyObservers():
http://mxr.mozilla.org/mozilla-central/source/dom/media/eme/MediaKeys.cpp#405

This sets up the window to which the PluginCrashedEvent is fired at.

The handler is called from the GMPParent's crash handler here:
http://mxr.mozilla.org/mozilla-central/source/dom/media/gmp/GMPParent.cpp#652

I'm guessing that the window we're firing the PluginCrashedEvent is not the correct one in the e10s case.

For the unencrypted decoding via GMP case, we're not yet calling GMPService::AddPluginCrashedEventTarget(). Once we do that, we'll have the same problem in the e10s case.
Flags: needinfo?(cpearce)
(In reply to Chris Pearce (:cpearce) from comment #6)
> (In reply to Mike Conley (:mconley) - Needinfo me! from comment #5)
> > Hey cpearce, do you have some steps-to-reproduce for me? I used
> > https://mozilla.github.io/webrtc-landing/pc_test.html with H.264 video to
> > get a GMP running, and toggling media.gmp.plugin.crash caused the crash
> > reporter UI to be displayed with e10s enabled.
> 
> Note that OpenH264/WebRTC and EME use different mechanisms to figure out
> which windows to fire the PluginCrashedEvent at. So you can't repro this
> failure with WebRTC/OpenH264.
> 
> You can repro using these STR:
> 
> 1. Ensure e10s is enabled.
> 2. Open http://people.mozilla.org/~cpearce/mse-clearkey/
> 3. Wait for playback to start.
> 4. Toggle media.gmp.plugin.crash.
> 5. Observe playback fails, but no crash report UI pops up.

Great, thanks - I can successfully reproduce with these steps.

> I'm guessing that the window we're firing the PluginCrashedEvent is not the
> correct one in the e10s case.
> 

That's a good guess. Setting a breakpoint in onPluginCrashed in PluginContent.jsm, I do not see the PluginCrashedEvent being handled in the e10s case.

Looks like this is where it's supposed to be fired: http://mxr.mozilla.org/mozilla-central/source/dom/media/gmp/GMPService.cpp#210

Assuming we're hitting that code, I think your conclusion is right - we must be firing it on the wrong window.
This is an important bug for Widevine. Amazon is testing Widevine with Firefox 47+ users, but we don't have much insight without Widevine CDM crash reports.
Maire, this appears to be something specific to the widevine decoder / gmp plugin and doesn't appear to be e10s generic. Do you have anyone who can look at this?
Flags: needinfo?(mreavy)
This affects GMPs that the playback team own. We'll handle it.
Flags: needinfo?(mreavy)
Or more specifically, this only affects how the playback teams GMPs' call site is doing crash reporting. The WebRTC GMP's call site's crash reporting code is working with e10s.
Crash reporting for GMPs being used from the EME call site are not generating
crash reports because they depend on the MediaKeys object calling
GMPService::AddPluginCrashHandler() to associate a window to which the
PluginCrashedEvent is fired. This doesn't work with e10s enabled because the
GMPParent which causes the plugin crash handlers to run is in the chrome
process, but the MediaKeys which adds the handler is in the child process. So
the crash handler is on the GMPServiceChild, but we only run the crash handlers
that were added to the GMPServiceParent in the chrome/parent process.

The solution is to broadcast a message from the chrome process to all the
content processes when a GMP has crashed that causes the GMPServiceChild to
also run its crash handlers.


MozReview-Commit-ID: 8Lek16G9ZGb
Attachment #8745851 - Flags: review?(mconley)
Assignee: dglastonbury → cpearce
Status: NEW → ASSIGNED
Let's split out the task of connecting unencrypted video decoding via GMPs from playback code up to crash reporting to a separate bug. Dan will take that, and address any review comments on this patch, and get it landed.
Summary: Ensure GMP crash reports are being reported for EME with e10s enabled or for unencrypted GMP decoding → Ensure GMP crash reports are being reported for EME GMPs with e10s enabled
Comment on attachment 8745851 [details] [diff] [review]
Ensure crash reports work for GMP used by EME code

Review of attachment 8745851 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/base/content/content.js
@@ +1447,5 @@
>  
>  addEventListener("MozApplicationManifest", OfflineApps, false);
>  addMessageListener("OfflineApps:StartFetching", OfflineApps);
> +
> +if (Cc["@mozilla.org/childprocessmessagemanager;1"]) {

This probably makes more sense in toolkit/content/browser-child.js, which is only ever loaded in remote browsers. You won't need the CPMM check then.

Also, what about the gmp-plugin-crash handler that's been defined in PeerConnection.js at https://hg.mozilla.org/mozilla-central/file/10f66b316457/dom/media/PeerConnection.js#l57 ? Is this superfluous now, or do we still need it?

::: dom/media/gmp/mozIGeckoMediaPluginService.idl
@@ +61,5 @@
>     */
>    readonly attribute nsIThread thread;
>  
> +  void RunPluginCrashCallbacks(in unsigned long pluginId, in ACString pluginName);
> +  

Whitespace. Would be good to document this method too.
Attachment #8745851 - Flags: review?(mconley) → review-
Crash reporting for GMPs being used from the EME call site are not generating
crash reports because they depend on the MediaKeys object calling
GMPService::AddPluginCrashHandler() to associate a window to which the
PluginCrashedEvent is fired. This doesn't work with e10s enabled because the
GMPParent which causes the plugin crash handlers to run is in the chrome
process, but the MediaKeys which adds the handler is in the child process. So
the crash handler is on the GMPServiceChild, but we only run the crash handlers
that were added to the GMPServiceParent in the chrome/parent process.

The solution is to broadcast a message from the chrome process to all the
content processes when a GMP has crashed that causes the GMPServiceChild to
also run its crash handlers.

MozReview-Commit-ID: 8Lek16G9ZGb
Attachment #8748502 - Flags: review?(mconley)
Attachment #8745851 - Attachment is obsolete: true
Comment on attachment 8748502 [details] [diff] [review]
Ensure crash reports work for GMP used by EME code. r?mconley

Review of attachment 8748502 [details] [diff] [review]:
-----------------------------------------------------------------

Hey - I made a mistake with my earlier review, sorry about that.

::: toolkit/content/browser-child.js
@@ +644,5 @@
> +// Handle invoking crash reporter for crashing gecko media plugins.
> +let mm = Cc["@mozilla.org/childprocessmessagemanager;1"]
> +           .getService(Ci.nsIMessageListenerManager);
> +
> +mm.addMessageListener("gmp-plugin-crash", msg => {

I have to backpedal here - I've just realized that browser-child.js is actually _not_ the place where we want to put this. This framescript is loaded for every tab, which means that every tab that is open is going to be alerted about the gmp-plugin-crash, and then every tab will then call RunPluginCrashCallbacks, which is probably not what we want.

What we want is a singleton in the content process to hear this message and respond to it, as far as I can tell.

The right place for this, I believe, is toolkit/content/process-content.js, which is loaded just once per content process.

Sorry for the churn here.
Attachment #8748502 - Flags: review?(mconley) → review-
Crash reporting for GMPs being used from the EME call site are not generating
crash reports because they depend on the MediaKeys object calling
GMPService::AddPluginCrashHandler() to associate a window to which the
PluginCrashedEvent is fired. This doesn't work with e10s enabled because the
GMPParent which causes the plugin crash handlers to run is in the chrome
process, but the MediaKeys which adds the handler is in the child process. So
the crash handler is on the GMPServiceChild, but we only run the crash handlers
that were added to the GMPServiceParent in the chrome/parent process.

The solution is to broadcast a message from the chrome process to all the
content processes when a GMP has crashed that causes the GMPServiceChild to
also run its crash handlers.

MozReview-Commit-ID: 8Lek16G9ZGb
Attachment #8748980 - Flags: review?(mconley)
Attachment #8748502 - Attachment is obsolete: true
Mike, if this is r+ can you please mark with "need checkin". thanks.
Flags: needinfo?(mconley)
Crash reporting for GMPs being used from the EME call site are not generating
crash reports because they depend on the MediaKeys object calling
GMPService::AddPluginCrashHandler() to associate a window to which the
PluginCrashedEvent is fired. This doesn't work with e10s enabled because the
GMPParent which causes the plugin crash handlers to run is in the chrome
process, but the MediaKeys which adds the handler is in the child process. So
the crash handler is on the GMPServiceChild, but we only run the crash handlers
that were added to the GMPServiceParent in the chrome/parent process.

The solution is to broadcast a message from the chrome process to all the
content processes when a GMP has crashed that causes the GMPServiceChild to
also run its crash handlers.

MozReview-Commit-ID: 8Lek16G9ZGb
Assignee: cpearce → mconley
Assignee: mconley → dglastonbury
Flags: needinfo?(mconley)
Attachment #8748980 - Attachment is obsolete: true
Attachment #8748980 - Flags: review?(mconley)
Comment on attachment 8749305 [details] [diff] [review]
Ensure crash reports work for GMP used by EME code

I had one last suggestion for kamidphish (to use Services.cpmm and drop the message name check which is redundant), but I took the liberty of just doing that myself and skipping another review cycle.

I'll land this.
Attachment #8749305 - Flags: review+
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #21)
> Comment on attachment 8749305 [details] [diff] [review]
> [...] drop the message name check which is redundant)

I spoke to :cpearce about that and the redundant check may have been copied from another example.

Thanks for landing.
https://hg.mozilla.org/mozilla-central/rev/051ea6ad172d
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
cpearce, should we uplift this crash reporter fix to Aurora 48 and Beta 47? e10s is enabled by default in Aurora and for a small number (~20%) of Beta 47 users.
Flags: needinfo?(cpearce)
Comment on attachment 8749305 [details] [diff] [review]
Ensure crash reports work for GMP used by EME code

Approval Request Comment
[Feature/regressing bug #]: EME, including Widevine EME and Adobe EME/Netflix
[User impact if declined]: Without this patch, we won't get GMP crash reports when e10s is enabled. So since the e10s team is running e10s experiments in beta, and we're trying to ship Widevine EME in 47, we're flying blind as to whether GMPs are crashing in the field or not.
[Describe test coverage new/current, TreeHerder]: We don't have coverage for EME GMP crash reporting, especially with e10s enabled. We're going to fix that!
[Risks and why]: Low; just passing a message between processes
[String/UUID change made/needed]: Non.
Attachment #8749305 - Flags: approval-mozilla-beta?
Attachment #8749305 - Flags: approval-mozilla-aurora?
Comment on attachment 8749305 [details] [diff] [review]
Ensure crash reports work for GMP used by EME code

GMP crashes in e10s mode are not getting reported, we need accurate stability data, Aurora48+, Beta47+
Attachment #8749305 - Flags: approval-mozilla-beta?
Attachment #8749305 - Flags: approval-mozilla-beta+
Attachment #8749305 - Flags: approval-mozilla-aurora?
Attachment #8749305 - Flags: approval-mozilla-aurora+
Hi Chris, in cases like this (no automated test coverage), before requesting uplift to Aurora/Beta, it would be good if you could manually test and verify that GMP crash submission works in e10s mode in Nightly. Manual testing is a good substitute in such cases. Also this maybe the only week the e10s experiment continues and therefore getting the timing right on these uplifts (without needing additional uplifts) is key for Beta47 cycle! Thanks.
Verified fixed in Nightly.
Status: RESOLVED → VERIFIED
(In reply to Chris Pearce (:cpearce) from comment #30)
> Verified fixed in Nightly.

Awesome! Thanks.
(In reply to Chris Pearce (:cpearce) from comment #33)
> https://hg.mozilla.org/releases/mozilla-beta/rev/7ce8790265c6

This changed dom/media/gmp/mozIGeckoMediaPluginService.idl which changed a UUID. I gave myself binary approval to change this and land it.
Flags: needinfo?(cpearce)
Hi Jorge, fyi for the UUID change mentioned in comment 34 by Chris. He said there shouldn't be any add-ons that use this UUID (that we know of). Please let us know if that is not the case.
Flags: needinfo?(jorge)
Yes, should be fine.
Flags: needinfo?(jorge)
You need to log in before you can comment on or make changes to this bug.