Closed Bug 1209385 Opened 4 years ago Closed 4 years ago

[EME] Crash unresponsive GMPs

Categories

(Core :: Audio/Video: MediaStreamGraph, defect, P1)

defect

Tracking

()

RESOLVED WONTFIX
mozilla44
Tracking Status
firefox42 --- fixed
firefox43 --- fixed
firefox44 --- wontfix

People

(Reporter: cpearce, Assigned: cpearce)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 2 obsolete files)

Sometimes Adobe's GMP doesn't call the GMPVideoDecoderCallback::ResetComplete() callback when we call it's GMPVideoDecoder. I believe this is caused by a deadlock in their H.264 decoder. Firefox's media stack seems to lock up, and the only way Netflix can be gotten to work is by restarting Firefox.

One solution is to crash the plugin when we detect this. Then at least Netflix can be gotten to work again via a tab refresh.
Adobe have a problem where they sometimes don't call GMPVideoDecoderCallback::ResetComplete() after we call GMPVideoDecoder. This patch crashes when they don't, in the hope that the crash report it generates will help them debug the failure. It also means that users will be able to get Netflix to work again just by refreshing the page.
Attachment #8667110 - Flags: review?(jwwang)
Comment on attachment 8667110 [details] [diff] [review]
Patch v1: Crash GMPs that don't respond to GMPVideoDecoder::Reset()

Review of attachment 8667110 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/gmp/PGMPService.ipdl
@@ +23,5 @@
>                      bool inPrivateBrowsing)
>      returns (nsCString id);
>  
>    async UpdateGMPTrialCreateState(nsString keySystem, uint32_t status);
> +  

space
Attachment #8667110 - Flags: review?(jwwang) → review+
Just in case killing hung plugins makes debugging hard for Adobe...
Attachment #8667199 - Flags: review?(jwwang)
Attachment #8667199 - Flags: review?(jwwang) → review+
https://hg.mozilla.org/mozilla-central/rev/417dab86a413
https://hg.mozilla.org/mozilla-central/rev/6078e8b4878b
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
Aurora Try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d96adc84f158

Beta Try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3e657c22d317

Rebased patches can be found in these pushes.
Flags: needinfo?(cpearce)
Comment on attachment 8667110 [details] [diff] [review]
Patch v1: Crash GMPs that don't respond to GMPVideoDecoder::Reset()

Approval Request Comment
[Feature/regressing bug #]: EME
[User impact if declined]: Misbehaving EME plugins can prevent Firefox from being able to play Netflix/EME. This patch detects when plugins are not responding, and terminates them. So without this patch, users who experience this won't be able to get Firefox to resume playing Netflix unless they restart Firefox.
[Describe test coverage new/current, TreeHerder]: Lots of EME mochitests
[Risks and why]: Low; we only terminate the EME plugin child process, and only when it's not working already.
[String/UUID change made/needed]: None.
Attachment #8667110 - Flags: approval-mozilla-beta?
Attachment #8667110 - Flags: approval-mozilla-aurora?
Comment on attachment 8667199 [details] [diff] [review]
Patch 2: Pref to disable killing hungg plugins

Approval Request Comment
[Feature/regressing bug #]: EME
[User impact if declined]: This patch adds a pref to disable the kill-hung-plugins behaviour. This is to make it easier for plugin authors to debug their broken plugins.
[Describe test coverage new/current, TreeHerder]: Lots of EME mochitests
[Risks and why]: Low; we only terminate the EME plugin child process, and only when it's not working already.
[String/UUID change made/needed]: None.
Attachment #8667199 - Flags: approval-mozilla-beta?
Attachment #8667199 - Flags: approval-mozilla-aurora?
Comment on attachment 8667110 [details] [diff] [review]
Patch v1: Crash GMPs that don't respond to GMPVideoDecoder::Reset()

Improve EME behavior, taking it. Should be in 42.0b3.
Attachment #8667110 - Flags: approval-mozilla-beta?
Attachment #8667110 - Flags: approval-mozilla-beta+
Attachment #8667110 - Flags: approval-mozilla-aurora?
Attachment #8667110 - Flags: approval-mozilla-aurora+
Attachment #8667199 - Flags: approval-mozilla-beta?
Attachment #8667199 - Flags: approval-mozilla-beta+
Attachment #8667199 - Flags: approval-mozilla-aurora?
Attachment #8667199 - Flags: approval-mozilla-aurora+
Flags: needinfo?(cpearce)
We should back this out, including from Aurora. I was able to reproduce the particular CDM hang that was causing issue, and the functionality added in this bug was not able to crash the GMP when it was hung.

This is because the CDM had hung its IPC message channel thread. If we're going to do this, we should do like how Firefox does it; have a "watchdog" thread which runs in the child process itself and periodically checks to see whether we're responsive, and crash if not.
Attached patch Patch: Aurora backout (obsolete) — Splinter Review
Approval Request Comment
[Feature/regressing bug #]: Requesting to backout this bug from Aurora.
[User impact if declined]: This bug added the ability to crash EME plugins when they're not responsive. However I was able to reproduce an unresponsive plugin, and my patch here was unable to crash it. We may however unnecessarily crash a plugin when the CPU us under heavy load. In this case the GMP may have been able to respond, and we don't need to crash.
[Describe test coverage new/current, TreeHerder]: We have lots of EME tests.
[Risks and why]: Low; this is a backout.
[String/UUID change made/needed]: None.
Attachment #8673462 - Flags: review+
Attachment #8673462 - Flags: approval-mozilla-aurora?
Attached patch Patch: Beta backout (obsolete) — Splinter Review
Approval Request Comment
[Feature/regressing bug #]: Requesting to backout this bug from Beta.
[User impact if declined]: This bug added the ability to crash EME plugins when
they're not responsive. However I was able to reproduce an unresponsive plugin,
and my patch here was unable to crash it. We may however unnecessarily crash a
plugin when the CPU us under heavy load. In this case the GMP may have been
able to respond, and we don't need to crash.
[Describe test coverage new/current, TreeHerder]: We have lots of EME tests.
[Risks and why]: Low; this is a backout.
[String/UUID change made/needed]: None.
Attachment #8673466 - Flags: review+
Attachment #8673466 - Flags: approval-mozilla-beta?
Chris, would you mind creating a new bug for the backout? 
Not to add more work for you but it is easier for both release management & sheriff.
Otherwise, it can mess up with the queries.
Thanks and sorry for the extra work.
Backed out of m-c before I read comment 16. Filed bug Bug 1214505 for aurora/beta backout.
Attachment #8673462 - Attachment is obsolete: true
Attachment #8673462 - Flags: approval-mozilla-aurora?
Attachment #8673466 - Attachment is obsolete: true
Attachment #8673466 - Flags: approval-mozilla-beta?
Check the flags once backout merges
Flags: needinfo?(cpearce)
(In reply to Pulsebot from comment #20)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/48da6c2f495a

This was incorrectly labeled as bug 1209385, it should have been bug 1202683.
Backed out in 44. Will be backed on in 42 and 43 in bug 1214505.
Flags: needinfo?(cpearce)
Resolution: FIXED → WONTFIX
You need to log in before you can comment on or make changes to this bug.