Resume OOP video decoding after a GPU process crash

RESOLVED FIXED in Firefox 52

Status

()

defect
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: mattwoodrow, Assigned: mattwoodrow)

Tracking

Trunk
mozilla52
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox52 fixed)

Details

Attachments

(1 attachment)

Assignee

Description

3 years ago
The current code has the MediaDataDecoder implementation make a call to Error (on the MediaDataDecoderCallback) if the GPU process crashes.

This stops the video from playing (as if the video was corrupt), which is better than crashing, but not ideal.

If you refresh the page, then the video can be played again.

Ideally we'd make it so the video just resumed along with the gpu process (possibly from the next keyframe).

One bit of complexity is that we might not be able to resume on the GPU process. If there have been too many crashes (exact definition still TBA) then we'll fall back to software compositing in the UI process, so we'll have to do in-process decoding.

The way things currently work is we have RemoteVideoDecoder (an implementation of MediaDataDecoder) that dispatches all input to an IDPL thread where it's handled by VideoDecoderChild (an IPDL actor).

Some options:

If we get a crash (reported to the VideoDecoderChild on the IPDL thread), rather than sending Error() back to the MediaDataDecoderCallback, we could silently attempt to recreate the actor and then notify the RemoteVideoDecoder to update the actor pointer and start sending input to the new one.

We'd probably want to discard all incoming coded frames until the next keyframe, and just return NeedInput() so that data keeps getting fed in. We also need to deal with races where incoming frames get sent to the old VideoDecoderChild before we manage to notify the RemoteVideoDecoder that we've switched to a new one (since they operate on different threads).

This doesn't work if we can't use the GPU process any more, but I guess RemoteVideoDecoder could switch to a different mode where it just operates as a wrapper around an in-process decoder.



Another alternative is to add a new error code that signifies the actual problem (decoder crashed, we should try again) and return that to the MediaDataDecoderCallback.

That way the caller could be responsible for destroying the decoder entirely and creating a new one (which would handle the case where the GPU process is no longer available).

I assume MediaFormatReader could fairly easily attempt to create the decoder again, and probably skip-to-next-keyframe without too much work.

I'm not sure if this latter option has much overlap with Dan's work to resume decoders or not.

Does anyone have any preferences or better ideas?
Flags: needinfo?(jyavenard)
Flags: needinfo?(dglastonbury)
Flags: needinfo?(cpearce)
All you've described above is already implemented. 

If Error() is called with a non fatal error and we have a keyframe available in the future, then the decoder will be flushed, reset and it will attempt to resume decoding. 
Currently it can handle two consécutive errors until it gives up and make the error fatal. 

Currently only a NS_MEDIA_DECODE_ERR is considered non fatal. So when the gpu process crash you can either call Error() with a decode error; or create a new error code that will be considered non fatal (you need then to amend HasFatalError in mediaFormatReader.h)

This later approach is probably best so we can detect GPU crash errors and fallback to software decoding automatically.
Flags: needinfo?(jyavenard)
Assignee

Comment 2

3 years ago
This adds a new error type that doesn't have a limit (since we already have code for limiting the number of GPU process crashes and duplicating the logic seems unnecessary).
Assignee: nobody → matt.woodrow
Flags: needinfo?(dglastonbury)
Flags: needinfo?(cpearce)
Attachment #8807929 - Flags: review?(jyavenard)
Comment on attachment 8807929 [details] [diff] [review]
Add new non-fatal error type for this

Review of attachment 8807929 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/MediaFormatReader.h
@@ +336,5 @@
> +        return mNumOfConsecutiveError > mMaxConsecutiveError;
> +      } else if (mError.ref() == NS_ERROR_DOM_MEDIA_NEED_NEW_DECODER) {
> +        // If the caller asked for a new decoder we shouldn't treat
> +        // it as fatal.
> +        return false;

so do we allow the GPU decoder to crash indefinitely?
how is the fallback to software decoding handled?
Attachment #8807929 - Flags: review?(jyavenard) → review+
We could shutdown all the time. It would allow to simplify the various MediaDataDecoder handling of errors / flush
Assignee

Comment 5

3 years ago
(In reply to Jean-Yves Avenard [:jya] from comment #3)
 
> so do we allow the GPU decoder to crash indefinitely?
> how is the fallback to software decoding handled?

The GPU process only restarts "layers.gpu-process.dev.max_restarts" times, after that we fall back to using a software compositor in the UI process.

Unfortunately MediaFormatReader caches the old LayerManager and so won't actually notice the change so it'll keep trying to use the GPU decoder (and it'll keep failing). I'm going to write a patch to fix that tomorrow.

Comment 6

3 years ago
Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9ee070ee6dad
Add new non-fatal media error so that we recreate decoders when the GPU process crashes. r=jya

Comment 8

3 years ago
Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d76e0d2be865
Add new non-fatal media error so that we recreate decoders when the GPU process crashes. r=jya

Comment 9

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/d76e0d2be865
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52

Comment 10

3 years ago
Matt Woodrow (:mattwoodrow)
Jean-Yves Avenard [:jya]

Either this particular bug 1315144 or related bug 1315510 (or both) has caused 
a serious regression in breaking the HTML5 version of BBC iPlayer TV (when e10s is ON). 
I lack the skills to file this in a proper fashion, but everything needed to file a 
new bug has been posted over at the Nightly Testers Mailing List: 

https://mail.mozilla.org/pipermail/nightly-testers/2017-January/004176.html

Please fix, preferably before the 52.0a2 cycle is over...
Thanks!

Comment 11

3 years ago
Anthony Hughes (:ashughes) has filed bug 1329386 on my behalf 
(thanks again Anthony :-) ). No need to dupe...
You need to log in before you can comment on or make changes to this bug.