Closed Bug 1626364 Opened 5 years ago Closed 5 years ago

Intermittent h264 decode hang on Amazon Chime

Categories

(Core :: WebRTC: Audio/Video, defect, P1)

Product:

Component:

Type:

defect

Priority:

P1

Severity:

normal

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla78

Tracking Flags:

Tracking

Status

firefox78

---

fixed

People

(Reporter: bwc, Assigned: bwc)

References

Details

Attachments

(2 files)

Bug 1626364: Report errors in VideoToolkit h264 decode PlatformCallback. r?jya 5 years ago Byron Campen [:bwc] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1626364: Reinit decoder on error in WebrtcMediaDataDecoder. 5 years ago Byron Campen [:bwc] 47 bytes, text/x-phabricator-request		Details \| Review

Byron Campen [:bwc]

Assignee

Description

•

5 years ago

We're waiting forever here:

https://searchfox.org/mozilla-central/rev/fa2df28a49883612bd7af4dacd80cdfedcccd2f6/media/webrtc/signaling/src/media-conduit/WebrtcMediaDataDecoderCodec.cpp#115

This callstack has acquired this lock:

https://searchfox.org/mozilla-central/rev/fa2df28a49883612bd7af4dacd80cdfedcccd2f6/media/webrtc/trunk/webrtc/modules/video_coding/video_receiver.cc#261

Several other threads are waiting on this lock, including one that has also locked here:

https://searchfox.org/mozilla-central/rev/fa2df28a49883612bd7af4dacd80cdfedcccd2f6/media/webrtc/trunk/webrtc/video/call_stats.cc#168

Several threads are stuck waiting on this lock, including STS and main.

When examining the plugin-container for openh264, there is no sign of deadlock, although no decode work is being carried out. It seems likely that a decode operation has silently failed (or simply been lost somehow), resulting in a hang in the content process. There does not appear to be anything particularly special about the Amazon Chime use-case here, so this is likely a deadlock that will happen in other applications that use webrtc h264.

Byron Campen [:bwc]

Assignee

Comment 1

•

5 years ago

•

I should note that there are a few error-cases in openh264's OpenH264VideoDecoder::Decode_m that are not propagated back to the GMP parent. Hitting any one of these could result in the reported deadlock:

https://github.com/cisco/openh264/blob/6fe15a6b82d492bebe388c55b7ee5131208e7334/module/gmp-openh264.cpp#L888
https://github.com/cisco/openh264/blob/6fe15a6b82d492bebe388c55b7ee5131208e7334/module/gmp-openh264.cpp#L895
https://github.com/cisco/openh264/blob/6fe15a6b82d492bebe388c55b7ee5131208e7334/module/gmp-openh264.cpp#L908

All of these should probably be doing error handling something like this:

https://github.com/cisco/openh264/blob/6fe15a6b82d492bebe388c55b7ee5131208e7334/module/gmp-openh264.cpp#L861

Byron Campen [:bwc]

Assignee

Comment 2

•

5 years ago

From a discussion on Slack, it seems that we are not using GMP for decode any longer. I'm guessing that on OS X, we're using this:

https://searchfox.org/mozilla-central/source/dom/media/platforms/apple/AppleVTDecoder.cpp#82

Something that jumps out at me is this:

https://searchfox.org/mozilla-central/rev/fa2df28a49883612bd7af4dacd80cdfedcccd2f6/dom/media/platforms/apple/AppleVTDecoder.cpp#187

Does the synchronous dropping of a frame end up resulting in a callback to PlatformCallback here?

https://searchfox.org/mozilla-central/rev/fa2df28a49883612bd7af4dacd80cdfedcccd2f6/dom/media/platforms/apple/AppleVTDecoder.cpp#268

I would be a little surprised if this were the case.

Byron Campen [:bwc]

Assignee

Comment 3

•

5 years ago

jya, should we be resolving mPromise with an empty frame when VTDecompressionSessionDecodeFrame fails due to synchronously dropping the frame, similar to how we do here:

https://searchfox.org/mozilla-central/rev/fa2df28a49883612bd7af4dacd80cdfedcccd2f6/dom/media/platforms/apple/AppleVTDecoder.cpp#313

Flags: needinfo?(jyavenard)

Byron Campen [:bwc]

Assignee

Comment 4

•

5 years ago

FWIW, I tried making this modification, and it did not prevent the deadlock. Must be some other reason in this case.

Byron Campen [:bwc]

Assignee

Comment 5

•

5 years ago

Ok, I'm definitely seeing cases where a successful call to VTDecompressionSessionDecodeFrame is not resulting in a call to PlatformCallback. Not sure yet why this might be happening.

Byron Campen [:bwc]

Assignee

Comment 6

•

5 years ago

This seems to happen immediately after we get a PlatformCallback with status == -8969. We currently treat this (and any other status besides noErr) as an error on a given frame, and keep trying to use the decoder, but maybe an error status means we need to stop using the decoder entirely?

Byron Campen [:bwc]

Assignee

Comment 7

•

5 years ago

Yeah, when we hang like this, neither VTDecompressionSessionDecodeFrame's return code nor the infoFlags outparam are different from the case where it is working normally. There does not appear to be any way to detect this failure from that callsite.

FWIW, I see signs that others have run into this problem.

VLC: https://github.com/videolan/vlc/blob/e89d40a501e60d64441807585bd8d9388ae23dbb/modules/codec/videotoolbox.c#L1765

Byron Campen [:bwc]

Assignee

Comment 8

•

5 years ago

Is it possible we have missed a case in H264ChangeMonitor::CheckForChange() that VideoToolbox cannot cope with? Maybe we should allow Decode promises to be rejected with NS_ERROR_DOM_MEDIA_NEED_NEW_DECODER, and have MediaChangeMonitor reset the decoder when that occurs?

Byron Campen [:bwc]

Assignee

Comment 9

•

5 years ago

So I see a few things we can do here:

Figure out why these frames are killing the decoder, and detect the problem in H264ChangeMonitor::CheckForChange. Unfortunately, it seems doubtful that we can make H264ChangeMonitor::CheckForChange truly complete given that VideoToolbox is a black box.
Detect this kind of failure in AppleVTDecoder and either:
2a. Have MediaChangeMonitor re-init the decoder (I have some prototype code that does this) or
2b. Mark the decoder as dead, and reject all subsequent calls to decode
Figure out whether these frames are truly invalid (or at least avoidable on the encode side), and if so fix the bug in openh264.

Thoughts?

Flags: needinfo?(bvandyk)

Byron Campen [:bwc]

Assignee

Comment 10

•

5 years ago

Attached file Bug 1626364: Report errors in VideoToolkit h264 decode PlatformCallback. r?jya — Details

Bryce Seager van Dyk [:bryce] (he/him) - Not reading bugmail

Comment 11

•

5 years ago

(In reply to Byron Campen [:bwc] from comment #9)

So I see a few things we can do here:

Figure out why these frames are killing the decoder, and detect the problem in H264ChangeMonitor::CheckForChange. Unfortunately, it seems doubtful that we can make H264ChangeMonitor::CheckForChange truly complete given that VideoToolbox is a black box.

Sounds ideal, but as you say, may not be (easily) doable.

Detect this kind of failure in AppleVTDecoder and either:
2a. Have MediaChangeMonitor re-init the decoder (I have some prototype code that does this) or
2b. Mark the decoder as dead, and reject all subsequent calls to decode

Do I understand correctly that 2b is fatal?

Figure out whether these frames are truly invalid (or at least avoidable on the encode side), and if so fix the bug in openh264.

This sounds good, but do you think it's feasible? From my perspective this seams tricky for the same reasons as 1 in that it's hard to know what exactly is upsetting the platform decoder.

2a sounds reasonable to me. Does the stall happen often enough that you have a good idea on if the prototype is fixing it?

Flags: needinfo?(bvandyk)

Byron Campen [:bwc]

Assignee

Comment 12

•

5 years ago

(In reply to Bryce Seager van Dyk (:bryce) from comment #11)

(In reply to Byron Campen [:bwc] from comment #9)

So I see a few things we can do here:

Figure out why these frames are killing the decoder, and detect the problem in H264ChangeMonitor::CheckForChange. Unfortunately, it seems doubtful that we can make H264ChangeMonitor::CheckForChange truly complete given that VideoToolbox is a black box.

Sounds ideal, but as you say, may not be (easily) doable.

Detect this kind of failure in AppleVTDecoder and either:
2a. Have MediaChangeMonitor re-init the decoder (I have some prototype code that does this) or
2b. Mark the decoder as dead, and reject all subsequent calls to decode

Do I understand correctly that 2b is fatal?

Yes; it would mean no more frames will be decoded. 2a tries to recover. It may be a good idea to put some limits on 2a (eg; only reinit the decoder once per some unit of time).

Figure out whether these frames are truly invalid (or at least avoidable on the encode side), and if so fix the bug in openh264.

This sounds good, but do you think it's feasible? From my perspective this seams tricky for the same reasons as 1 in that it's hard to know what exactly is upsetting the platform decoder.

2a sounds reasonable to me. Does the stall happen often enough that you have a good idea on if the prototype is fixing it?

It does; I've seen the error with some debug logging happen, and the prototype seems happy.

Byron Campen [:bwc]

Assignee

Comment 13

•

5 years ago

What kind of testing do I need to run for this (both on try, and manual if necessary)?

Flags: needinfo?(bvandyk)

Bryce Seager van Dyk [:bryce] (he/him) - Not reading bugmail

Comment 14

•

5 years ago

I typically run

crashtests
gtests
mochitest-media
reftest
web-platform-tests

on try to see if I've busted anything. In this case I wouldn't expect any interaction with the EME case, so don't think we need any local testing there.

Flags: needinfo?(bvandyk)

Jean-Yves Avenard [:jya]

Comment 15

•

5 years ago

(In reply to Byron Campen [:bwc] from comment #7)

Yeah, when we hang like this, neither VTDecompressionSessionDecodeFrame's return code nor the infoFlags outparam are different from the case where it is working normally. There does not appear to be any way to detect this failure from that callsite.

FWIW, I see signs that others have run into this problem.

VLC: https://github.com/videolan/vlc/blob/e89d40a501e60d64441807585bd8d9388ae23dbb/modules/codec/videotoolbox.c#L1765

this is just to provide details on the error.
We have a catch all case.

Seems to me that all that's needed is to no longer ignore the error when a frame is dropped, or handle that error type.

everything else will false into place through the normal error handling

Flags: needinfo?(jyavenard)

Jean-Yves Avenard [:jya]

Comment 16

•

5 years ago

(In reply to Byron Campen [:bwc] from comment #6)

This seems to happen immediately after we get a PlatformCallback with status == -8969. We currently treat this (and any other status besides noErr) as an error on a given frame, and keep trying to use the decoder, but maybe an error status means we need to stop using the decoder entirely?

Want to handle the ability to recreate decoders several times after an error like the MediaFormatReader does.

this shouldn't be done in the MediaChangeMonitor but in the WebrtcMediaDataDecoderCodec.cpp code.

It would be surprising if you couldn't continue to use the existing VT decoder however, a reset (a call to Flush()) is likely all is needed to do so.

Byron Campen [:bwc]

Assignee

Comment 17

•

5 years ago

(In reply to Jean-Yves Avenard [:jya] from comment #16)

(In reply to Byron Campen [:bwc] from comment #6)

This seems to happen immediately after we get a PlatformCallback with status == -8969. We currently treat this (and any other status besides noErr) as an error on a given frame, and keep trying to use the decoder, but maybe an error status means we need to stop using the decoder entirely?

Want to handle the ability to recreate decoders several times after an error like the MediaFormatReader does.

this shouldn't be done in the MediaChangeMonitor but in the WebrtcMediaDataDecoderCodec.cpp code.

It would be surprising if you couldn't continue to use the existing VT decoder however, a reset (a call to Flush()) is likely all is needed to do so.

After this error happens PlatformCallback is never called again. It is possible VTDecompressionSessionWaitForAsynchronousFrames will cause those callbacks to resume, so I can try it.

Byron Campen [:bwc]

Assignee

Comment 18

•

5 years ago

(In reply to Byron Campen [:bwc] from comment #17)

(In reply to Jean-Yves Avenard [:jya] from comment #16)

(In reply to Byron Campen [:bwc] from comment #6)

This seems to happen immediately after we get a PlatformCallback with status == -8969. We currently treat this (and any other status besides noErr) as an error on a given frame, and keep trying to use the decoder, but maybe an error status means we need to stop using the decoder entirely?

Want to handle the ability to recreate decoders several times after an error like the MediaFormatReader does.

this shouldn't be done in the MediaChangeMonitor but in the WebrtcMediaDataDecoderCodec.cpp code.

It would be surprising if you couldn't continue to use the existing VT decoder however, a reset (a call to Flush()) is likely all is needed to do so.

After this error happens PlatformCallback is never called again. It is possible VTDecompressionSessionWaitForAsynchronousFrames will cause those callbacks to resume, so I can try it.

I checked and a flush doesn't repair the VideoToolbox decoder. Is there anything else we could try to return the VideoToolbox decoder back to functionality?

You prefer not to handle this error in MediaChangeMonitor, and suggest WebrtcMediaDataDecoder instead. Does anything else use AppleVTDecoder besides webrtc? This really does look like a bug in VideoToolbox, and it would be a good idea to have a workaround in place for everything that uses it. I suppose we could workaround inside AppleVTDecoder itself, would that be acceptable?

There is another possibility here that might be worth checking. WebrtcMediaDataDecoder is a synchronous decoder; it passes in a frame at a time, and waits for the decoder to spit out either a decoded frame or an error before feeding the next one:

https://searchfox.org/mozilla-central/rev/fa2df28a49883612bd7af4dacd80cdfedcccd2f6/media/webrtc/signaling/src/media-conduit/WebrtcMediaDataDecoderCodec.cpp#115

It may be that this usage pattern does not play well with VideoToolbox running in async mode. Perhaps VideoToolbox needs multiple encoded frames before it can recover from whatever error condition it has encountered? Does this sound likely to you?

Flags: needinfo?(jyavenard)

Byron Campen [:bwc]

Assignee

Comment 19

•

5 years ago

(In reply to Byron Campen [:bwc] from comment #18)

It may be that this usage pattern does not play well with VideoToolbox running in async mode. Perhaps VideoToolbox needs multiple encoded frames before it can recover from whatever error condition it has encountered? Does this sound likely to you?

I tried this, and after 3 more frames, the call to VTDecompressionSessionDecodeFrame itself blocked forever.

He's dead Jim!

Phabricator Automation

Updated

•

5 years ago

Attachment #9137859 - Attachment description: Bug 1626364: (WIP) Detect fatal errors in VideoToolkit h264 decode, and re-init the decoder. → Bug 1626364: (WIP) Detect fatal errors in VideoToolkit h264 decode, and report the need for a new decoder.

Byron Campen [:bwc]

Assignee

Comment 20

•

5 years ago

Attached file Bug 1626364: Reinit decoder on error in WebrtcMediaDataDecoder. — Details

Depends on D69443

Byron Campen [:bwc]

Assignee

Comment 21

•

5 years ago

Ok, so this is a version of the previous patch, where the NS_ERROR_DOM_MEDIA_NEED_NEW_DECODER error is handled by the WebrtcMediaDataDecoder instead of the MediaChangeMonitor. I have some follow-up questions though:

If we are asserting that the end consumer of this interface handle NS_ERROR_DOM_MEDIA_NEED_NEW_DECODER, does that mean MediaChangeMonitor should no longer handle NS_ERROR_DOM_MEDIA_NEED_NEW_DECODER (it would continue returning this error code when necessary, but the work would be left up to the end consumer)?
If the answer to 1 is "yes", do we need to implement this handling in any other places before removing the decoder reinit logic from MediaChangeMonitor?
If the answer to 1 is "yes", do we need to modify this decoder restart handling in WebrtcMediaDataDecoder, MediaFormatReader, and perhaps others to remember the frame that caused the error, and attempt to decode it again once the decoder is reinitted?
If we're going to need to implement this logic in lots of places, and especially if the answer to 3 is "yes", should we have a single class that handles this as a passthrough, similar to how MediaChangeMonitor handles (most) detection of frames that require a decoder reinit?

Flags: needinfo?(bvandyk)

Byron Campen [:bwc]

Assignee

Comment 22

•

5 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=afa76055ff6fa5638e3fdf0468e982f883e4ddf4

Jean-Yves Avenard [:jya]

Comment 23

•

5 years ago

(In reply to Byron Campen [:bwc] from comment #18)

(In reply to Byron Campen [:bwc] from comment #17)

(In reply to Jean-Yves Avenard [:jya] from comment #16)

(In reply to Byron Campen [:bwc] from comment #6)

This seems to happen immediately after we get a PlatformCallback with status == -8969. We currently treat this (and any other status besides noErr) as an error on a given frame, and keep trying to use the decoder, but maybe an error status means we need to stop using the decoder entirely?

Want to handle the ability to recreate decoders several times after an error like the MediaFormatReader does.

this shouldn't be done in the MediaChangeMonitor but in the WebrtcMediaDataDecoderCodec.cpp code.

It would be surprising if you couldn't continue to use the existing VT decoder however, a reset (a call to Flush()) is likely all is needed to do so.

After this error happens PlatformCallback is never called again. It is possible VTDecompressionSessionWaitForAsynchronousFrames will cause those callbacks to resume, so I can try it.

I checked and a flush doesn't repair the VideoToolbox decoder. Is there anything else we could try to return the VideoToolbox decoder back to functionality?

You prefer not to handle this error in MediaChangeMonitor, and suggest WebrtcMediaDataDecoder instead. Does anything else use AppleVTDecoder besides webrtc? This really does look like a bug in VideoToolbox, and it would be a good idea to have a workaround in place for everything that uses it. I suppose we could workaround inside AppleVTDecoder itself, would that be acceptable?

well, normal media playback use this decoder.

Yes, doing the workaround inside the AppleVTDecoder was also my suggestion in the latest review comments. That way the AppleVTDecoder behaves like all the other decoders.

The change in the webrtc client however can be made to be restarted if it receives the message for a new decoder, at least it will handle if the Windows GPU decoder crashed

There is another possibility here that might be worth checking. WebrtcMediaDataDecoder is a synchronous decoder; it passes in a frame at a time, and waits for the decoder to spit out either a decoded frame or an error before feeding the next one:

I only wrote it that way, because that's how the GMP decoder did it.
It could be made fully asynchronous. I had a local change done here to test, and I saw no ill effect.

https://searchfox.org/mozilla-central/rev/fa2df28a49883612bd7af4dacd80cdfedcccd2f6/media/webrtc/signaling/src/media-conduit/WebrtcMediaDataDecoderCodec.cpp#115

It may be that this usage pattern does not play well with VideoToolbox running in async mode. Perhaps VideoToolbox needs multiple encoded frames before it can recover from whatever error condition it has encountered? Does this sound likely to you?

It would need to get another keyframe before it can do whatever it needs to do, and for that you need to return an error. However, it is my experience that even when you send the message back to webrtc that you need a new keyframe, the code continues to send p-pframe for a while. This didn't play well with some decoders.

Flags: needinfo?(jyavenard)

Bryce Seager van Dyk [:bryce] (he/him) - Not reading bugmail

Comment 24

•

5 years ago

:jya, do you want to field the questions in comment 21?

Flags: needinfo?(bvandyk) → needinfo?(jyavenard)

Byron Campen [:bwc]

Assignee

Comment 25

•

5 years ago

I've just tried teaching AppleVTDecoder to re-init its session when it gets a fatal error, but this results in extremely frequent reinits. It seems that part (all?) of the problem is related to some of the const configuration set at the construction of AppleVTDecoder:

https://searchfox.org/mozilla-central/rev/4d2a9d5dc8f0e65807ee66e2b04c64596c643b7a/dom/media/platforms/apple/AppleVTDecoder.h#85-87

Should we try to update these using MediaRawData::mTrackInfo->GetAsVideoInfo()? Or should we just allow the reinit to be carried out somewhere else (eg; WebrtcMediaDataDecoder, MediaFormatReader)?

Byron Campen [:bwc]

Assignee

Comment 26

•

5 years ago

(In reply to Byron Campen [:bwc] from comment #25)

Should we try to update these using MediaRawData::mTrackInfo->GetAsVideoInfo()?

Just tried doing this; it does not help.

Jean-Yves Avenard [:jya]

Comment 27

•

5 years ago

(In reply to Byron Campen [:bwc] from comment #25)

I've just tried teaching AppleVTDecoder to re-init its session when it gets a fatal error, but this results in extremely frequent reinits. It seems that part (all?) of the problem is related to some of the const configuration set at the construction of AppleVTDecoder:

https://searchfox.org/mozilla-central/rev/4d2a9d5dc8f0e65807ee66e2b04c64596c643b7a/dom/media/platforms/apple/AppleVTDecoder.h#85-87

Should we try to update these using MediaRawData::mTrackInfo->GetAsVideoInfo()? Or should we just allow the reinit to be carried out somewhere else (eg; WebrtcMediaDataDecoder, MediaFormatReader)?

Restarting the decoder within AppleVTDecoder, or returning need_new_decoder and handling that anywhere else would end up in the exact same outcome.
The differences are only conceptual and where the logic should be handled.

Why would amending TrackInfo->GetAsVideoInfo() work any better?

if the binary stream change, the MediaChangeMonitor will pick it up and recreate it as needed.

Flags: needinfo?(jyavenard)

Jean-Yves Avenard [:jya]

Comment 28

•

5 years ago

(In reply to Byron Campen [:bwc] from comment #21)

Ok, so this is a version of the previous patch, where the NS_ERROR_DOM_MEDIA_NEED_NEW_DECODER error is handled by the WebrtcMediaDataDecoder instead of the MediaChangeMonitor. I have some follow-up questions though:

If we are asserting that the end consumer of this interface handle NS_ERROR_DOM_MEDIA_NEED_NEW_DECODER, does that mean MediaChangeMonitor should no longer handle NS_ERROR_DOM_MEDIA_NEED_NEW_DECODER (it would continue returning this error code when necessary, but the work would be left up to the end consumer)?

MediaChangeMonitor job is to monitor the bytestream to see if the content change: that is the resolution change, or the H264 SPS/PPS change.
When this happens, it extracts the extradata, and rebuild the VideoInfo as needed. That's all it does.
It doesn't care about any errors returned by the decoder, it will be passed to the consumer of the decoder (MediaFormatReader or WebrtcMediaDataDecoderCodec)

If the answer to 1 is "yes", do we need to implement this handling in any other places before removing the decoder reinit logic from MediaChangeMonitor?

The re-init logic is needed when the binary stream has changed and a change of content has been detected. It has nothing to do with handling decoder errors.
Any errors will be passed back to the consumer.
MediaChangeMonitor is a wrapper around a MediaDataDecoder.

There are two types of H264 decoder, those that takes AnnexB (Android, Windows WMF, FFmpeg) and those that take AVCC (AppleVTDecoder). The annexB one can handle change of content by themselves. So if the wrapped decoder is one that takes annexb, the MediaChangeMonitor does nothing but rebuild the VideoInfo. It doesn't manage the decoder lifecyle as the decoder itself can do so.

For the AVCC decoder (AppleVT) it waits until a SPS/PPS NAL are found, and only then create the decoder. After that it's pass-through.

If the answer to 1 is "yes", do we need to modify this decoder restart handling in WebrtcMediaDataDecoder, MediaFormatReader, and perhaps others to remember the frame that caused the error, and attempt to decode it again once the decoder is reinitted?

A frame can only be re-decoded if it's a keyframe. When an error occur, the next step is to Flush() the MediaDataDecoder, this will make the MediaChangeMonitor wait for a new keyframe and all frames will be dropped until then.

If we're going to need to implement this logic in lots of places, and especially if the answer to 3 is "yes", should we have a single class that handles this as a passthrough, similar to how MediaChangeMonitor handles (most) detection of frames that require a decoder reinit?

There's not a lot of places, and from what I can see, this only occurs in the Apple VT decoder , for h264 content, so really, pretty unused all things considered.

The issue with MediaChangeMonitor is that it behaves differently depending on the type of the decoder wrapped (annexb vs avcc).

To me, if there are any errors, the WebrtcMediaDataDecoderCodec shouldn't be handled them, they should be returned to libwebrtc and let it do whatever it needs to do if an error occur. This will signal the remote to send a new SPS/PPS and send a new keyframe.

Attempting to retry something that already fails is doomed to fail again. H264 is very deterministic, particularly the Apple VT decoder. If it failed before, it will fail again. It doesn't randomly work and sometimes not.

May be worth capturing the content and lodge a bug with the Apple media team rather than attempting to get around it.

Byron Campen [:bwc]

Assignee

Comment 29

•

5 years ago

(In reply to Jean-Yves Avenard [:jya] from comment #27)

(In reply to Byron Campen [:bwc] from comment #25)

I've just tried teaching AppleVTDecoder to re-init its session when it gets a fatal error, but this results in extremely frequent reinits. It seems that part (all?) of the problem is related to some of the const configuration set at the construction of AppleVTDecoder:

https://searchfox.org/mozilla-central/rev/4d2a9d5dc8f0e65807ee66e2b04c64596c643b7a/dom/media/platforms/apple/AppleVTDecoder.h#85-87

Should we try to update these using MediaRawData::mTrackInfo->GetAsVideoInfo()? Or should we just allow the reinit to be carried out somewhere else (eg; WebrtcMediaDataDecoder, MediaFormatReader)?

Restarting the decoder within AppleVTDecoder, or returning need_new_decoder and handling that anywhere else would end up in the exact same outcome.
The differences are only conceptual and where the logic should be handled.

This is not consistent with what I've observed. Resetting the decoder inside AppleVTDecoder results in continuous resetting. Resetting in WebrtcMediaDataDecoder results in a single reset, after which it works fine.

Why would amending TrackInfo->GetAsVideoInfo() work any better?

if the binary stream change, the MediaChangeMonitor will pick it up and recreate it as needed.

I suspect that the root cause of all this is a change that MediaChangeMonitor misses, which AppleVTDecoder then trips over. Resetting the entire chain fixes things, perhaps because that forces this VideoInfo stuff to be recomputed from scratch. Resetting inside AppleVTDecoder does not.

Byron Campen [:bwc]

Assignee

Comment 30

•

5 years ago

•

(In reply to Jean-Yves Avenard [:jya] from comment #28)

(In reply to Byron Campen [:bwc] from comment #21)

If we're going to need to implement this logic in lots of places, and especially if the answer to 3 is "yes", should we have a single class that handles this as a passthrough, similar to how MediaChangeMonitor handles (most) detection of frames that require a decoder reinit?

There's not a lot of places, and from what I can see, this only occurs in the Apple VT decoder , for h264 content, so really, pretty unused all things considered.

The issue with MediaChangeMonitor is that it behaves differently depending on the type of the decoder wrapped (annexb vs avcc).

To me, if there are any errors, the WebrtcMediaDataDecoderCodec shouldn't be handled them, they should be returned to libwebrtc and let it do whatever it needs to do if an error occur. This will signal the remote to send a new SPS/PPS and send a new keyframe.

webrtc.org might have an error code that will cause it to re-create the decoder. I can look into it. But what should we do about the non-webrtc cases?

Attempting to retry something that already fails is doomed to fail again. H264 is very deterministic, particularly the Apple VT decoder. If it failed before, it will fail again. It doesn't randomly work and sometimes not.

It is not random, but how thorough we are in resetting things determines whether we end up in the "keeps failing" case or not. If our reset destroys and re-creates the AppleVTDecoder, we're fine from then on. If we just reset the VT session, then we keep failing/resetting. It looks like there's something in the bitstream that has changed in a way that is incompatible with the const state in AppleVTDecoder, and for whatever reason MediaChangeMonitor has not noticed, or has not decided to reset the decoder.

May be worth capturing the content and lodge a bug with the Apple media team rather than attempting to get around it.

In addition to what we're doing here, yes. But we still need a workaround, because Apple is probably going to be slow rolling out a fix.

Do we already have a way to capture this content?

Byron Campen [:bwc]

Assignee

Updated

•

5 years ago

Flags: needinfo?(jyavenard)

Byron Campen [:bwc]

Assignee

Comment 31

•

5 years ago

Dan, is there a way to capture a video stream inside webrtc.org?

Flags: needinfo?(dminor)

Byron Campen [:bwc]

Assignee

Comment 32

•

5 years ago

I should note that we may need to file bugs on openh264's encoder; it may be doing something invalid (or weird and unnecessary). Dumping and analyzing the stream is going to be important here, I think, because we could trip up lots of implementations besides Firefox.

Dan Minor [:dminor]

Comment 33

•

5 years ago

(In reply to Byron Campen [:bwc] from comment #31)

Dan, is there a way to capture a video stream inside webrtc.org?

There is nothing built in as far as I can tell. Assuming you want to capture the encoded stream, you could probably dump what's coming from Openh264 in WebrtcGmpVideoEncoder::Encoded, but I'm not sure if that would result in a valid stream.

It looks like Openh264 can be built with an ENABLE_FRAME_DUMP define, but that is normally only enabled for unit testing. If you build openh264 locally, you just need to copy it to your profile directory to use it. That might do what you want as well.

Flags: needinfo?(dminor)

Byron Campen [:bwc]

Assignee

Comment 34

•

5 years ago

That might work, although I'd prefer to see this on the receiving end; it might be that packet loss or reordering is part of this problem.

Byron Campen [:bwc]

Assignee

Comment 35

•

5 years ago

There does not seem to be any simple way to cause webrtc.org to re-create a decoder for us. The simplest solution by far is to do this in WebrtcMediaDataDecoder.

Dan Minor [:dminor]

Comment 36

•

5 years ago

(In reply to Byron Campen [:bwc] from comment #34)

That might work, although I'd prefer to see this on the receiving end; it might be that packet loss or reordering is part of this problem.

On the receive side, as far as I know everything h264 goes through H264SpsPpsTracker::CopyAndFixBitstream, so that would be a good place to try.

Byron Campen [:bwc]

Assignee

Comment 37

•

5 years ago

@bryce, is there already a way to do stream capture by setting a pref or something? If not, would something like this be useful? Say, keep a ringbuffer of encoded frames, and dump that ringbuffer to a file if a decode error is observed?

Flags: needinfo?(bvandyk)

Bryce Seager van Dyk [:bryce] (he/him) - Not reading bugmail

Comment 38

•

5 years ago

(In reply to Byron Campen [:bwc] from comment #37)

@bryce, is there already a way to do stream capture by setting a pref or something? If not, would something like this be useful? Say, keep a ringbuffer of encoded frames, and dump that ringbuffer to a file if a decode error is observed?

I don't believe we have anything like this, though I know folks have experimented with local changes to do this (not I). Sounds good to me, and if we do pre-decode I think it should be fine (post would need to avoid doing so for EME). Having the series of frames leading to a decode error seems nifty for various bugs.

Flags: needinfo?(bvandyk)

Phabricator Automation

Updated

•

5 years ago

Attachment #9137859 - Attachment description: Bug 1626364: (WIP) Detect fatal errors in VideoToolkit h264 decode, and report the need for a new decoder. → Bug 1626364: Detect fatal errors in VideoToolkit h264 decode, and ask for reinit. r?jya

Phabricator Automation

Updated

•

5 years ago

Attachment #9138181 - Attachment description: Bug 1626364: (WIP) Handle decoder reinit in WebrtcMediaDataDecoder. → Bug 1626364: Handle decoder reinit in WebrtcMediaDataDecoder.

Byron Campen [:bwc]

Assignee

Comment 39

•

5 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=fdaca3ad57d14f9fbfad59709d5f0872706bea8e

Jean-Yves Avenard [:jya]

Comment 40

•

5 years ago

(In reply to Byron Campen [:bwc] from comment #29)

(In reply to Jean-Yves Avenard [:jya] from comment #27)

(In reply to Byron Campen [:bwc] from comment #25)

I've just tried teaching AppleVTDecoder to re-init its session when it gets a fatal error, but this results in extremely frequent reinits. It seems that part (all?) of the problem is related to some of the const configuration set at the construction of AppleVTDecoder:

https://searchfox.org/mozilla-central/rev/4d2a9d5dc8f0e65807ee66e2b04c64596c643b7a/dom/media/platforms/apple/AppleVTDecoder.h#85-87

Should we try to update these using MediaRawData::mTrackInfo->GetAsVideoInfo()? Or should we just allow the reinit to be carried out somewhere else (eg; WebrtcMediaDataDecoder, MediaFormatReader)?

Restarting the decoder within AppleVTDecoder, or returning need_new_decoder and handling that anywhere else would end up in the exact same outcome.
The differences are only conceptual and where the logic should be handled.

This is not consistent with what I've observed. Resetting the decoder inside AppleVTDecoder results in continuous resetting. Resetting in WebrtcMediaDataDecoder results in a single reset, after which it works fine.

That can't be, you must not be resetting the same thing or in the same fashion.

Like are you reporting the error? if doing so this will cause libwebrtc to request a new keyframe.

Why would amending TrackInfo->GetAsVideoInfo() work any better?

if the binary stream change, the MediaChangeMonitor will pick it up and recreate it as needed.

I suspect that the root cause of all this is a change that MediaChangeMonitor misses, which AppleVTDecoder then trips over. Resetting the entire chain fixes things, perhaps because that forces this VideoInfo stuff to be recomputed from scratch. Resetting inside AppleVTDecoder does not.

MediaChangeMonitor looks for SPS/PPS and decode it. That's all it does. The scenario you describe isn't plausible.

Flags: needinfo?(jyavenard)

Jean-Yves Avenard [:jya]

Comment 41

•

5 years ago

(In reply to Byron Campen [:bwc] from comment #37)

@bryce, is there already a way to do stream capture by setting a pref or something? If not, would something like this be useful? Say, keep a ringbuffer of encoded frames, and dump that ringbuffer to a file if a decode error is observed?

I have patches that extract the data coming over Media Source Extension, but that data is already muxed in an mp4 or a webm, and it's much easier to reconstruct.

This is raw h264 you're getting. It will need to be remuxed in something usable.

I guess from those raw frames you could rebuild something making sense with ffmpeg but it's not going to be easy.

My guess is that the easiest would be to reuse the MediaRecorder code, except you bypass the encoder as what you have is already the compressed content.

Is it easy to access the stream causing those failures?

Jean-Yves Avenard [:jya]

Comment 42

•

5 years ago

BTW, AppleVT doesn't care about the VideoInfo, all it uses from it is the extradata bit

If you played that stream under linux with ffmpeg and enabled verbose logs, maybe something would show such as errors. This would provide a clue if the content is incorrect or the MediaChangeMonitor returned something invalid.

Byron Campen [:bwc]

Assignee

Comment 43

•

5 years ago

(In reply to Jean-Yves Avenard [:jya] from comment #40)

(In reply to Byron Campen [:bwc] from comment #29)

(In reply to Jean-Yves Avenard [:jya] from comment #27)

(In reply to Byron Campen [:bwc] from comment #25)

I've just tried teaching AppleVTDecoder to re-init its session when it gets a fatal error, but this results in extremely frequent reinits. It seems that part (all?) of the problem is related to some of the const configuration set at the construction of AppleVTDecoder:

https://searchfox.org/mozilla-central/rev/4d2a9d5dc8f0e65807ee66e2b04c64596c643b7a/dom/media/platforms/apple/AppleVTDecoder.h#85-87

Should we try to update these using MediaRawData::mTrackInfo->GetAsVideoInfo()? Or should we just allow the reinit to be carried out somewhere else (eg; WebrtcMediaDataDecoder, MediaFormatReader)?

Restarting the decoder within AppleVTDecoder, or returning need_new_decoder and handling that anywhere else would end up in the exact same outcome.
The differences are only conceptual and where the logic should be handled.

This is not consistent with what I've observed. Resetting the decoder inside AppleVTDecoder results in continuous resetting. Resetting in WebrtcMediaDataDecoder results in a single reset, after which it works fine.

That can't be, you must not be resetting the same thing or in the same fashion.

Like are you reporting the error? if doing so this will cause libwebrtc to request a new keyframe.

Yes, in both cases WEBRTC_VIDEO_CODEC_ERROR is returned to webrtc.org. Yes, different stuff is being reset in a different way, because when the reset is done in WebrtcMediaDataDecoder, the const data (including AppleVTDecoder::mExtraData) is also reset, along with the MediaChangeObserver, and anything else that's in the chain. When the reset is handled inside AppleVTDecoder, none of this stuff happens.

If you think that shouldn't be enough to make a difference, I guess there might be another bug lurking here. I can attach the patch that I used to test this, if you like.

(In reply to Jean-Yves Avenard [:jya] from comment #41)

(In reply to Byron Campen [:bwc] from comment #37)

@bryce, is there already a way to do stream capture by setting a pref or something? If not, would something like this be useful? Say, keep a ringbuffer of encoded frames, and dump that ringbuffer to a file if a decode error is observed?

I have patches that extract the data coming over Media Source Extension, but that data is already muxed in an mp4 or a webm, and it's much easier to reconstruct.

This is raw h264 you're getting. It will need to be remuxed in something usable.

I guess from those raw frames you could rebuild something making sense with ffmpeg but it's not going to be easy.

My guess is that the easiest would be to reuse the MediaRecorder code, except you bypass the encoder as what you have is already the compressed content.

Is it easy to access the stream causing those failures?

So, if we're going to watch for decode errors in the webrtc case, it is easiest to do that somewhere between WebrtcMediaDataDecoder and AppleVTDecoder, because webrtc.org swallows those errors, and I don't want to make modifications to webrtc.org since we're already in a world of hurt from doing that. If that's what we end up doing, we're stuck with raw h264. I've looked around, and it seems easy to do this with ffmpeg at first glance, although I doubt any of these cases are working with output from openh264, which may make things harder. However, I'm not so interested in playing this stuff back (although that might be useful for making test cases), what I really want to do is load it up in some sort of analysis tool. Being totally new to this, I don't know what the available tools are, or how easily they deal with raw h264.

Byron Campen [:bwc]

Assignee

Comment 44

•

5 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4a0d0804ae7b27082eeac84189acc98475585051

Byron Campen [:bwc]

Assignee

Comment 45

•

5 years ago

•

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a23821f543d714d47c0a0a7adcadd24fcdcb6ebb Edit: Seeing some intermittents on this push. Here's the same try push on the base revision: https://treeherder.mozilla.org/#/jobs?repo=try&revision=26d0ca37029a8ff68ee7be9f212b127e43b96839

Byron Campen [:bwc]

Assignee

Comment 46

•

5 years ago

Yeah, base revision looks the same.

Phabricator Automation

Updated

•

5 years ago

Attachment #9137859 - Attachment description: Bug 1626364: Detect fatal errors in VideoToolkit h264 decode, and ask for reinit. r?jya → Bug 1626364: Report errors in VideoToolkit h264 decode PlatformCallback. r?jya

Phabricator Automation

Updated

•

5 years ago

Attachment #9138181 - Attachment description: Bug 1626364: Handle decoder reinit in WebrtcMediaDataDecoder. → Bug 1626364: Reinit decoder on error in WebrtcMediaDataDecoder.

Comment 47

•

5 years ago

Pushed by bcampen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/df0f49e9c06e Report errors in VideoToolkit h264 decode PlatformCallback. r=jya https://hg.mozilla.org/integration/autoland/rev/671ea69f58e4 Reinit decoder on error in WebrtcMediaDataDecoder. r=jya

Raul Gurzau (:RaulG)

Comment 48

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/df0f49e9c06e
https://hg.mozilla.org/mozilla-central/rev/671ea69f58e4

Status: NEW → RESOLVED

Closed: 5 years ago

status-firefox78: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla78

Andrew McCreight [:mccr8]

Updated

•

5 years ago

Regressions: 1636615

You need to log in before you can comment on or make changes to this bug.