Closed Bug 1626364 Opened 3 months ago Closed 2 months ago

Intermittent h264 decode hang on Amazon Chime

Categories

(Core :: WebRTC: Audio/Video, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla78
Tracking Status
firefox78 --- fixed

People

(Reporter: bwc, Assigned: bwc)

References

Details

Attachments

(2 files)

We're waiting forever here:

https://searchfox.org/mozilla-central/rev/fa2df28a49883612bd7af4dacd80cdfedcccd2f6/media/webrtc/signaling/src/media-conduit/WebrtcMediaDataDecoderCodec.cpp#115

This callstack has acquired this lock:

https://searchfox.org/mozilla-central/rev/fa2df28a49883612bd7af4dacd80cdfedcccd2f6/media/webrtc/trunk/webrtc/modules/video_coding/video_receiver.cc#261

Several other threads are waiting on this lock, including one that has also locked here:

https://searchfox.org/mozilla-central/rev/fa2df28a49883612bd7af4dacd80cdfedcccd2f6/media/webrtc/trunk/webrtc/video/call_stats.cc#168

Several threads are stuck waiting on this lock, including STS and main.

When examining the plugin-container for openh264, there is no sign of deadlock, although no decode work is being carried out. It seems likely that a decode operation has silently failed (or simply been lost somehow), resulting in a hang in the content process. There does not appear to be anything particularly special about the Amazon Chime use-case here, so this is likely a deadlock that will happen in other applications that use webrtc h264.

From a discussion on Slack, it seems that we are not using GMP for decode any longer. I'm guessing that on OS X, we're using this:

https://searchfox.org/mozilla-central/source/dom/media/platforms/apple/AppleVTDecoder.cpp#82

Something that jumps out at me is this:

https://searchfox.org/mozilla-central/rev/fa2df28a49883612bd7af4dacd80cdfedcccd2f6/dom/media/platforms/apple/AppleVTDecoder.cpp#187

Does the synchronous dropping of a frame end up resulting in a callback to PlatformCallback here?

https://searchfox.org/mozilla-central/rev/fa2df28a49883612bd7af4dacd80cdfedcccd2f6/dom/media/platforms/apple/AppleVTDecoder.cpp#268

I would be a little surprised if this were the case.

jya, should we be resolving mPromise with an empty frame when VTDecompressionSessionDecodeFrame fails due to synchronously dropping the frame, similar to how we do here:

https://searchfox.org/mozilla-central/rev/fa2df28a49883612bd7af4dacd80cdfedcccd2f6/dom/media/platforms/apple/AppleVTDecoder.cpp#313

Flags: needinfo?(jyavenard)

FWIW, I tried making this modification, and it did not prevent the deadlock. Must be some other reason in this case.

Ok, I'm definitely seeing cases where a successful call to VTDecompressionSessionDecodeFrame is not resulting in a call to PlatformCallback. Not sure yet why this might be happening.

This seems to happen immediately after we get a PlatformCallback with status == -8969. We currently treat this (and any other status besides noErr) as an error on a given frame, and keep trying to use the decoder, but maybe an error status means we need to stop using the decoder entirely?

Yeah, when we hang like this, neither VTDecompressionSessionDecodeFrame's return code nor the infoFlags outparam are different from the case where it is working normally. There does not appear to be any way to detect this failure from that callsite.

FWIW, I see signs that others have run into this problem.

VLC: https://github.com/videolan/vlc/blob/e89d40a501e60d64441807585bd8d9388ae23dbb/modules/codec/videotoolbox.c#L1765

Is it possible we have missed a case in H264ChangeMonitor::CheckForChange() that VideoToolbox cannot cope with? Maybe we should allow Decode promises to be rejected with NS_ERROR_DOM_MEDIA_NEED_NEW_DECODER, and have MediaChangeMonitor reset the decoder when that occurs?

So I see a few things we can do here:

  1. Figure out why these frames are killing the decoder, and detect the problem in H264ChangeMonitor::CheckForChange. Unfortunately, it seems doubtful that we can make H264ChangeMonitor::CheckForChange truly complete given that VideoToolbox is a black box.

  2. Detect this kind of failure in AppleVTDecoder and either:
    2a. Have MediaChangeMonitor re-init the decoder (I have some prototype code that does this) or
    2b. Mark the decoder as dead, and reject all subsequent calls to decode

  3. Figure out whether these frames are truly invalid (or at least avoidable on the encode side), and if so fix the bug in openh264.

Thoughts?

Flags: needinfo?(bvandyk)

(In reply to Byron Campen [:bwc] from comment #9)

So I see a few things we can do here:

  1. Figure out why these frames are killing the decoder, and detect the problem in H264ChangeMonitor::CheckForChange. Unfortunately, it seems doubtful that we can make H264ChangeMonitor::CheckForChange truly complete given that VideoToolbox is a black box.

Sounds ideal, but as you say, may not be (easily) doable.

  1. Detect this kind of failure in AppleVTDecoder and either:
    2a. Have MediaChangeMonitor re-init the decoder (I have some prototype code that does this) or
    2b. Mark the decoder as dead, and reject all subsequent calls to decode

Do I understand correctly that 2b is fatal?

  1. Figure out whether these frames are truly invalid (or at least avoidable on the encode side), and if so fix the bug in openh264.

This sounds good, but do you think it's feasible? From my perspective this seams tricky for the same reasons as 1 in that it's hard to know what exactly is upsetting the platform decoder.

2a sounds reasonable to me. Does the stall happen often enough that you have a good idea on if the prototype is fixing it?

Flags: needinfo?(bvandyk)

(In reply to Bryce Seager van Dyk (:bryce) from comment #11)

(In reply to Byron Campen [:bwc] from comment #9)

So I see a few things we can do here:

  1. Figure out why these frames are killing the decoder, and detect the problem in H264ChangeMonitor::CheckForChange. Unfortunately, it seems doubtful that we can make H264ChangeMonitor::CheckForChange truly complete given that VideoToolbox is a black box.

Sounds ideal, but as you say, may not be (easily) doable.

  1. Detect this kind of failure in AppleVTDecoder and either:
    2a. Have MediaChangeMonitor re-init the decoder (I have some prototype code that does this) or
    2b. Mark the decoder as dead, and reject all subsequent calls to decode

Do I understand correctly that 2b is fatal?

Yes; it would mean no more frames will be decoded. 2a tries to recover. It may be a good idea to put some limits on 2a (eg; only reinit the decoder once per some unit of time).

  1. Figure out whether these frames are truly invalid (or at least avoidable on the encode side), and if so fix the bug in openh264.

This sounds good, but do you think it's feasible? From my perspective this seams tricky for the same reasons as 1 in that it's hard to know what exactly is upsetting the platform decoder.

2a sounds reasonable to me. Does the stall happen often enough that you have a good idea on if the prototype is fixing it?

It does; I've seen the error with some debug logging happen, and the prototype seems happy.

What kind of testing do I need to run for this (both on try, and manual if necessary)?

Flags: needinfo?(bvandyk)

I typically run

  • crashtests
  • gtests
  • mochitest-media
  • reftest
  • web-platform-tests

on try to see if I've busted anything. In this case I wouldn't expect any interaction with the EME case, so don't think we need any local testing there.

Flags: needinfo?(bvandyk)

(In reply to Byron Campen [:bwc] from comment #7)

Yeah, when we hang like this, neither VTDecompressionSessionDecodeFrame's return code nor the infoFlags outparam are different from the case where it is working normally. There does not appear to be any way to detect this failure from that callsite.

FWIW, I see signs that others have run into this problem.

VLC: https://github.com/videolan/vlc/blob/e89d40a501e60d64441807585bd8d9388ae23dbb/modules/codec/videotoolbox.c#L1765

this is just to provide details on the error.
We have a catch all case.

Seems to me that all that's needed is to no longer ignore the error when a frame is dropped, or handle that error type.

everything else will false into place through the normal error handling

Flags: needinfo?(jyavenard)

(In reply to Byron Campen [:bwc] from comment #6)

This seems to happen immediately after we get a PlatformCallback with status == -8969. We currently treat this (and any other status besides noErr) as an error on a given frame, and keep trying to use the decoder, but maybe an error status means we need to stop using the decoder entirely?

Want to handle the ability to recreate decoders several times after an error like the MediaFormatReader does.

this shouldn't be done in the MediaChangeMonitor but in the WebrtcMediaDataDecoderCodec.cpp code.

It would be surprising if you couldn't continue to use the existing VT decoder however, a reset (a call to Flush()) is likely all is needed to do so.

(In reply to Jean-Yves Avenard [:jya] from comment #16)

(In reply to Byron Campen [:bwc] from comment #6)

This seems to happen immediately after we get a PlatformCallback with status == -8969. We currently treat this (and any other status besides noErr) as an error on a given frame, and keep trying to use the decoder, but maybe an error status means we need to stop using the decoder entirely?

Want to handle the ability to recreate decoders several times after an error like the MediaFormatReader does.

this shouldn't be done in the MediaChangeMonitor but in the WebrtcMediaDataDecoderCodec.cpp code.

It would be surprising if you couldn't continue to use the existing VT decoder however, a reset (a call to Flush()) is likely all is needed to do so.

After this error happens PlatformCallback is never called again. It is possible VTDecompressionSessionWaitForAsynchronousFrames will cause those callbacks to resume, so I can try it.

(In reply to Byron Campen [:bwc] from comment #17)

(In reply to Jean-Yves Avenard [:jya] from comment #16)

(In reply to Byron Campen [:bwc] from comment #6)

This seems to happen immediately after we get a PlatformCallback with status == -8969. We currently treat this (and any other status besides noErr) as an error on a given frame, and keep trying to use the decoder, but maybe an error status means we need to stop using the decoder entirely?

Want to handle the ability to recreate decoders several times after an error like the MediaFormatReader does.

this shouldn't be done in the MediaChangeMonitor but in the WebrtcMediaDataDecoderCodec.cpp code.

It would be surprising if you couldn't continue to use the existing VT decoder however, a reset (a call to Flush()) is likely all is needed to do so.

After this error happens PlatformCallback is never called again. It is possible VTDecompressionSessionWaitForAsynchronousFrames will cause those callbacks to resume, so I can try it.

I checked and a flush doesn't repair the VideoToolbox decoder. Is there anything else we could try to return the VideoToolbox decoder back to functionality?

You prefer not to handle this error in MediaChangeMonitor, and suggest WebrtcMediaDataDecoder instead. Does anything else use AppleVTDecoder besides webrtc? This really does look like a bug in VideoToolbox, and it would be a good idea to have a workaround in place for everything that uses it. I suppose we could workaround inside AppleVTDecoder itself, would that be acceptable?

There is another possibility here that might be worth checking. WebrtcMediaDataDecoder is a synchronous decoder; it passes in a frame at a time, and waits for the decoder to spit out either a decoded frame or an error before feeding the next one:

https://searchfox.org/mozilla-central/rev/fa2df28a49883612bd7af4dacd80cdfedcccd2f6/media/webrtc/signaling/src/media-conduit/WebrtcMediaDataDecoderCodec.cpp#115

It may be that this usage pattern does not play well with VideoToolbox running in async mode. Perhaps VideoToolbox needs multiple encoded frames before it can recover from whatever error condition it has encountered? Does this sound likely to you?

Flags: needinfo?(jyavenard)

(In reply to Byron Campen [:bwc] from comment #18)

It may be that this usage pattern does not play well with VideoToolbox running in async mode. Perhaps VideoToolbox needs multiple encoded frames before it can recover from whatever error condition it has encountered? Does this sound likely to you?

I tried this, and after 3 more frames, the call to VTDecompressionSessionDecodeFrame itself blocked forever.

He's dead Jim!

Attachment #9137859 - Attachment description: Bug 1626364: (WIP) Detect fatal errors in VideoToolkit h264 decode, and re-init the decoder. → Bug 1626364: (WIP) Detect fatal errors in VideoToolkit h264 decode, and report the need for a new decoder.

Ok, so this is a version of the previous patch, where the NS_ERROR_DOM_MEDIA_NEED_NEW_DECODER error is handled by the WebrtcMediaDataDecoder instead of the MediaChangeMonitor. I have some follow-up questions though:

  1. If we are asserting that the end consumer of this interface handle NS_ERROR_DOM_MEDIA_NEED_NEW_DECODER, does that mean MediaChangeMonitor should no longer handle NS_ERROR_DOM_MEDIA_NEED_NEW_DECODER (it would continue returning this error code when necessary, but the work would be left up to the end consumer)?

  2. If the answer to 1 is "yes", do we need to implement this handling in any other places before removing the decoder reinit logic from MediaChangeMonitor?

  3. If the answer to 1 is "yes", do we need to modify this decoder restart handling in WebrtcMediaDataDecoder, MediaFormatReader, and perhaps others to remember the frame that caused the error, and attempt to decode it again once the decoder is reinitted?

  4. If we're going to need to implement this logic in lots of places, and especially if the answer to 3 is "yes", should we have a single class that handles this as a passthrough, similar to how MediaChangeMonitor handles (most) detection of frames that require a decoder reinit?

Flags: needinfo?(bvandyk)

(In reply to Byron Campen [:bwc] from comment #18)

(In reply to Byron Campen [:bwc] from comment #17)

(In reply to Jean-Yves Avenard [:jya] from comment #16)

(In reply to Byron Campen [:bwc] from comment #6)

This seems to happen immediately after we get a PlatformCallback with status == -8969. We currently treat this (and any other status besides noErr) as an error on a given frame, and keep trying to use the decoder, but maybe an error status means we need to stop using the decoder entirely?

Want to handle the ability to recreate decoders several times after an error like the MediaFormatReader does.

this shouldn't be done in the MediaChangeMonitor but in the WebrtcMediaDataDecoderCodec.cpp code.

It would be surprising if you couldn't continue to use the existing VT decoder however, a reset (a call to Flush()) is likely all is needed to do so.

After this error happens PlatformCallback is never called again. It is possible VTDecompressionSessionWaitForAsynchronousFrames will cause those callbacks to resume, so I can try it.

I checked and a flush doesn't repair the VideoToolbox decoder. Is there anything else we could try to return the VideoToolbox decoder back to functionality?

You prefer not to handle this error in MediaChangeMonitor, and suggest WebrtcMediaDataDecoder instead. Does anything else use AppleVTDecoder besides webrtc? This really does look like a bug in VideoToolbox, and it would be a good idea to have a workaround in place for everything that uses it. I suppose we could workaround inside AppleVTDecoder itself, would that be acceptable?

well, normal media playback use this decoder.

Yes, doing the workaround inside the AppleVTDecoder was also my suggestion in the latest review comments. That way the AppleVTDecoder behaves like all the other decoders.

The change in the webrtc client however can be made to be restarted if it receives the message for a new decoder, at least it will handle if the Windows GPU decoder crashed

There is another possibility here that might be worth checking. WebrtcMediaDataDecoder is a synchronous decoder; it passes in a frame at a time, and waits for the decoder to spit out either a decoded frame or an error before feeding the next one:

I only wrote it that way, because that's how the GMP decoder did it.
It could be made fully asynchronous. I had a local change done here to test, and I saw no ill effect.

https://searchfox.org/mozilla-central/rev/fa2df28a49883612bd7af4dacd80cdfedcccd2f6/media/webrtc/signaling/src/media-conduit/WebrtcMediaDataDecoderCodec.cpp#115

It may be that this usage pattern does not play well with VideoToolbox running in async mode. Perhaps VideoToolbox needs multiple encoded frames before it can recover from whatever error condition it has encountered? Does this sound likely to you?

It would need to get another keyframe before it can do whatever it needs to do, and for that you need to return an error. However, it is my experience that even when you send the message back to webrtc that you need a new keyframe, the code continues to send p-pframe for a while. This didn't play well with some decoders.

Flags: needinfo?(jyavenard)

:jya, do you want to field the questions in comment 21?

Flags: needinfo?(bvandyk) → needinfo?(jyavenard)

I've just tried teaching AppleVTDecoder to re-init its session when it gets a fatal error, but this results in extremely frequent reinits. It seems that part (all?) of the problem is related to some of the const configuration set at the construction of AppleVTDecoder:

https://searchfox.org/mozilla-central/rev/4d2a9d5dc8f0e65807ee66e2b04c64596c643b7a/dom/media/platforms/apple/AppleVTDecoder.h#85-87

Should we try to update these using MediaRawData::mTrackInfo->GetAsVideoInfo()? Or should we just allow the reinit to be carried out somewhere else (eg; WebrtcMediaDataDecoder, MediaFormatReader)?

(In reply to Byron Campen [:bwc] from comment #25)

Should we try to update these using MediaRawData::mTrackInfo->GetAsVideoInfo()?

Just tried doing this; it does not help.

(In reply to Byron Campen [:bwc] from comment #25)

I've just tried teaching AppleVTDecoder to re-init its session when it gets a fatal error, but this results in extremely frequent reinits. It seems that part (all?) of the problem is related to some of the const configuration set at the construction of AppleVTDecoder:

https://searchfox.org/mozilla-central/rev/4d2a9d5dc8f0e65807ee66e2b04c64596c643b7a/dom/media/platforms/apple/AppleVTDecoder.h#85-87

Should we try to update these using MediaRawData::mTrackInfo->GetAsVideoInfo()? Or should we just allow the reinit to be carried out somewhere else (eg; WebrtcMediaDataDecoder, MediaFormatReader)?

Restarting the decoder within AppleVTDecoder, or returning need_new_decoder and handling that anywhere else would end up in the exact same outcome.
The differences are only conceptual and where the logic should be handled.

Why would amending TrackInfo->GetAsVideoInfo() work any better?

if the binary stream change, the MediaChangeMonitor will pick it up and recreate it as needed.

Flags: needinfo?(jyavenard)

(In reply to Byron Campen [:bwc] from comment #21)

Ok, so this is a version of the previous patch, where the NS_ERROR_DOM_MEDIA_NEED_NEW_DECODER error is handled by the WebrtcMediaDataDecoder instead of the MediaChangeMonitor. I have some follow-up questions though:

  1. If we are asserting that the end consumer of this interface handle NS_ERROR_DOM_MEDIA_NEED_NEW_DECODER, does that mean MediaChangeMonitor should no longer handle NS_ERROR_DOM_MEDIA_NEED_NEW_DECODER (it would continue returning this error code when necessary, but the work would be left up to the end consumer)?

MediaChangeMonitor job is to monitor the bytestream to see if the content change: that is the resolution change, or the H264 SPS/PPS change.
When this happens, it extracts the extradata, and rebuild the VideoInfo as needed. That's all it does.
It doesn't care about any errors returned by the decoder, it will be passed to the consumer of the decoder (MediaFormatReader or WebrtcMediaDataDecoderCodec)

  1. If the answer to 1 is "yes", do we need to implement this handling in any other places before removing the decoder reinit logic from MediaChangeMonitor?

The re-init logic is needed when the binary stream has changed and a change of content has been detected. It has nothing to do with handling decoder errors.
Any errors will be passed back to the consumer.
MediaChangeMonitor is a wrapper around a MediaDataDecoder.

There are two types of H264 decoder, those that takes AnnexB (Android, Windows WMF, FFmpeg) and those that take AVCC (AppleVTDecoder). The annexB one can handle change of content by themselves. So if the wrapped decoder is one that takes annexb, the MediaChangeMonitor does nothing but rebuild the VideoInfo. It doesn't manage the decoder lifecyle as the decoder itself can do so.

For the AVCC decoder (AppleVT) it waits until a SPS/PPS NAL are found, and only then create the decoder. After that it's pass-through.

  1. If the answer to 1 is "yes", do we need to modify this decoder restart handling in WebrtcMediaDataDecoder, MediaFormatReader, and perhaps others to remember the frame that caused the error, and attempt to decode it again once the decoder is reinitted?

A frame can only be re-decoded if it's a keyframe. When an error occur, the next step is to Flush() the MediaDataDecoder, this will make the MediaChangeMonitor wait for a new keyframe and all frames will be dropped until then.

  1. If we're going to need to implement this logic in lots of places, and especially if the answer to 3 is "yes", should we have a single class that handles this as a passthrough, similar to how MediaChangeMonitor handles (most) detection of frames that require a decoder reinit?

There's not a lot of places, and from what I can see, this only occurs in the Apple VT decoder , for h264 content, so really, pretty unused all things considered.

The issue with MediaChangeMonitor is that it behaves differently depending on the type of the decoder wrapped (annexb vs avcc).

To me, if there are any errors, the WebrtcMediaDataDecoderCodec shouldn't be handled them, they should be returned to libwebrtc and let it do whatever it needs to do if an error occur. This will signal the remote to send a new SPS/PPS and send a new keyframe.

Attempting to retry something that already fails is doomed to fail again. H264 is very deterministic, particularly the Apple VT decoder. If it failed before, it will fail again. It doesn't randomly work and sometimes not.

May be worth capturing the content and lodge a bug with the Apple media team rather than attempting to get around it.

(In reply to Jean-Yves Avenard [:jya] from comment #27)

(In reply to Byron Campen [:bwc] from comment #25)

I've just tried teaching AppleVTDecoder to re-init its session when it gets a fatal error, but this results in extremely frequent reinits. It seems that part (all?) of the problem is related to some of the const configuration set at the construction of AppleVTDecoder:

https://searchfox.org/mozilla-central/rev/4d2a9d5dc8f0e65807ee66e2b04c64596c643b7a/dom/media/platforms/apple/AppleVTDecoder.h#85-87

Should we try to update these using MediaRawData::mTrackInfo->GetAsVideoInfo()? Or should we just allow the reinit to be carried out somewhere else (eg; WebrtcMediaDataDecoder, MediaFormatReader)?

Restarting the decoder within AppleVTDecoder, or returning need_new_decoder and handling that anywhere else would end up in the exact same outcome.
The differences are only conceptual and where the logic should be handled.

This is not consistent with what I've observed. Resetting the decoder inside AppleVTDecoder results in continuous resetting. Resetting in WebrtcMediaDataDecoder results in a single reset, after which it works fine.

Why would amending TrackInfo->GetAsVideoInfo() work any better?

if the binary stream change, the MediaChangeMonitor will pick it up and recreate it as needed.

I suspect that the root cause of all this is a change that MediaChangeMonitor misses, which AppleVTDecoder then trips over. Resetting the entire chain fixes things, perhaps because that forces this VideoInfo stuff to be recomputed from scratch. Resetting inside AppleVTDecoder does not.

(In reply to Jean-Yves Avenard [:jya] from comment #28)

(In reply to Byron Campen [:bwc] from comment #21)

  1. If we're going to need to implement this logic in lots of places, and especially if the answer to 3 is "yes", should we have a single class that handles this as a passthrough, similar to how MediaChangeMonitor handles (most) detection of frames that require a decoder reinit?

There's not a lot of places, and from what I can see, this only occurs in the Apple VT decoder , for h264 content, so really, pretty unused all things considered.

The issue with MediaChangeMonitor is that it behaves differently depending on the type of the decoder wrapped (annexb vs avcc).

To me, if there are any errors, the WebrtcMediaDataDecoderCodec shouldn't be handled them, they should be returned to libwebrtc and let it do whatever it needs to do if an error occur. This will signal the remote to send a new SPS/PPS and send a new keyframe.

webrtc.org might have an error code that will cause it to re-create the decoder. I can look into it. But what should we do about the non-webrtc cases?

Attempting to retry something that already fails is doomed to fail again. H264 is very deterministic, particularly the Apple VT decoder. If it failed before, it will fail again. It doesn't randomly work and sometimes not.

It is not random, but how thorough we are in resetting things determines whether we end up in the "keeps failing" case or not. If our reset destroys and re-creates the AppleVTDecoder, we're fine from then on. If we just reset the VT session, then we keep failing/resetting. It looks like there's something in the bitstream that has changed in a way that is incompatible with the const state in AppleVTDecoder, and for whatever reason MediaChangeMonitor has not noticed, or has not decided to reset the decoder.

May be worth capturing the content and lodge a bug with the Apple media team rather than attempting to get around it.

In addition to what we're doing here, yes. But we still need a workaround, because Apple is probably going to be slow rolling out a fix.

Do we already have a way to capture this content?

Flags: needinfo?(jyavenard)

Dan, is there a way to capture a video stream inside webrtc.org?

Flags: needinfo?(dminor)

I should note that we may need to file bugs on openh264's encoder; it may be doing something invalid (or weird and unnecessary). Dumping and analyzing the stream is going to be important here, I think, because we could trip up lots of implementations besides Firefox.

(In reply to Byron Campen [:bwc] from comment #31)

Dan, is there a way to capture a video stream inside webrtc.org?

There is nothing built in as far as I can tell. Assuming you want to capture the encoded stream, you could probably dump what's coming from Openh264 in WebrtcGmpVideoEncoder::Encoded, but I'm not sure if that would result in a valid stream.

It looks like Openh264 can be built with an ENABLE_FRAME_DUMP define, but that is normally only enabled for unit testing. If you build openh264 locally, you just need to copy it to your profile directory to use it. That might do what you want as well.

Flags: needinfo?(dminor)

That might work, although I'd prefer to see this on the receiving end; it might be that packet loss or reordering is part of this problem.

There does not seem to be any simple way to cause webrtc.org to re-create a decoder for us. The simplest solution by far is to do this in WebrtcMediaDataDecoder.

(In reply to Byron Campen [:bwc] from comment #34)

That might work, although I'd prefer to see this on the receiving end; it might be that packet loss or reordering is part of this problem.

On the receive side, as far as I know everything h264 goes through H264SpsPpsTracker::CopyAndFixBitstream, so that would be a good place to try.

@bryce, is there already a way to do stream capture by setting a pref or something? If not, would something like this be useful? Say, keep a ringbuffer of encoded frames, and dump that ringbuffer to a file if a decode error is observed?

Flags: needinfo?(bvandyk)

(In reply to Byron Campen [:bwc] from comment #37)

@bryce, is there already a way to do stream capture by setting a pref or something? If not, would something like this be useful? Say, keep a ringbuffer of encoded frames, and dump that ringbuffer to a file if a decode error is observed?

I don't believe we have anything like this, though I know folks have experimented with local changes to do this (not I). Sounds good to me, and if we do pre-decode I think it should be fine (post would need to avoid doing so for EME). Having the series of frames leading to a decode error seems nifty for various bugs.

Flags: needinfo?(bvandyk)
Attachment #9137859 - Attachment description: Bug 1626364: (WIP) Detect fatal errors in VideoToolkit h264 decode, and report the need for a new decoder. → Bug 1626364: Detect fatal errors in VideoToolkit h264 decode, and ask for reinit. r?jya
Attachment #9138181 - Attachment description: Bug 1626364: (WIP) Handle decoder reinit in WebrtcMediaDataDecoder. → Bug 1626364: Handle decoder reinit in WebrtcMediaDataDecoder.

(In reply to Byron Campen [:bwc] from comment #29)

(In reply to Jean-Yves Avenard [:jya] from comment #27)

(In reply to Byron Campen [:bwc] from comment #25)

I've just tried teaching AppleVTDecoder to re-init its session when it gets a fatal error, but this results in extremely frequent reinits. It seems that part (all?) of the problem is related to some of the const configuration set at the construction of AppleVTDecoder:

https://searchfox.org/mozilla-central/rev/4d2a9d5dc8f0e65807ee66e2b04c64596c643b7a/dom/media/platforms/apple/AppleVTDecoder.h#85-87

Should we try to update these using MediaRawData::mTrackInfo->GetAsVideoInfo()? Or should we just allow the reinit to be carried out somewhere else (eg; WebrtcMediaDataDecoder, MediaFormatReader)?

Restarting the decoder within AppleVTDecoder, or returning need_new_decoder and handling that anywhere else would end up in the exact same outcome.
The differences are only conceptual and where the logic should be handled.

This is not consistent with what I've observed. Resetting the decoder inside AppleVTDecoder results in continuous resetting. Resetting in WebrtcMediaDataDecoder results in a single reset, after which it works fine.

That can't be, you must not be resetting the same thing or in the same fashion.

Like are you reporting the error? if doing so this will cause libwebrtc to request a new keyframe.

Why would amending TrackInfo->GetAsVideoInfo() work any better?

if the binary stream change, the MediaChangeMonitor will pick it up and recreate it as needed.

I suspect that the root cause of all this is a change that MediaChangeMonitor misses, which AppleVTDecoder then trips over. Resetting the entire chain fixes things, perhaps because that forces this VideoInfo stuff to be recomputed from scratch. Resetting inside AppleVTDecoder does not.

MediaChangeMonitor looks for SPS/PPS and decode it. That's all it does. The scenario you describe isn't plausible.

Flags: needinfo?(jyavenard)

(In reply to Byron Campen [:bwc] from comment #37)

@bryce, is there already a way to do stream capture by setting a pref or something? If not, would something like this be useful? Say, keep a ringbuffer of encoded frames, and dump that ringbuffer to a file if a decode error is observed?

I have patches that extract the data coming over Media Source Extension, but that data is already muxed in an mp4 or a webm, and it's much easier to reconstruct.

This is raw h264 you're getting. It will need to be remuxed in something usable.

I guess from those raw frames you could rebuild something making sense with ffmpeg but it's not going to be easy.

My guess is that the easiest would be to reuse the MediaRecorder code, except you bypass the encoder as what you have is already the compressed content.

Is it easy to access the stream causing those failures?

BTW, AppleVT doesn't care about the VideoInfo, all it uses from it is the extradata bit

If you played that stream under linux with ffmpeg and enabled verbose logs, maybe something would show such as errors. This would provide a clue if the content is incorrect or the MediaChangeMonitor returned something invalid.

(In reply to Jean-Yves Avenard [:jya] from comment #40)

(In reply to Byron Campen [:bwc] from comment #29)

(In reply to Jean-Yves Avenard [:jya] from comment #27)

(In reply to Byron Campen [:bwc] from comment #25)

I've just tried teaching AppleVTDecoder to re-init its session when it gets a fatal error, but this results in extremely frequent reinits. It seems that part (all?) of the problem is related to some of the const configuration set at the construction of AppleVTDecoder:

https://searchfox.org/mozilla-central/rev/4d2a9d5dc8f0e65807ee66e2b04c64596c643b7a/dom/media/platforms/apple/AppleVTDecoder.h#85-87

Should we try to update these using MediaRawData::mTrackInfo->GetAsVideoInfo()? Or should we just allow the reinit to be carried out somewhere else (eg; WebrtcMediaDataDecoder, MediaFormatReader)?

Restarting the decoder within AppleVTDecoder, or returning need_new_decoder and handling that anywhere else would end up in the exact same outcome.
The differences are only conceptual and where the logic should be handled.

This is not consistent with what I've observed. Resetting the decoder inside AppleVTDecoder results in continuous resetting. Resetting in WebrtcMediaDataDecoder results in a single reset, after which it works fine.

That can't be, you must not be resetting the same thing or in the same fashion.

Like are you reporting the error? if doing so this will cause libwebrtc to request a new keyframe.

Yes, in both cases WEBRTC_VIDEO_CODEC_ERROR is returned to webrtc.org. Yes, different stuff is being reset in a different way, because when the reset is done in WebrtcMediaDataDecoder, the const data (including AppleVTDecoder::mExtraData) is also reset, along with the MediaChangeObserver, and anything else that's in the chain. When the reset is handled inside AppleVTDecoder, none of this stuff happens.

If you think that shouldn't be enough to make a difference, I guess there might be another bug lurking here. I can attach the patch that I used to test this, if you like.

(In reply to Jean-Yves Avenard [:jya] from comment #41)

(In reply to Byron Campen [:bwc] from comment #37)

@bryce, is there already a way to do stream capture by setting a pref or something? If not, would something like this be useful? Say, keep a ringbuffer of encoded frames, and dump that ringbuffer to a file if a decode error is observed?

I have patches that extract the data coming over Media Source Extension, but that data is already muxed in an mp4 or a webm, and it's much easier to reconstruct.

This is raw h264 you're getting. It will need to be remuxed in something usable.

I guess from those raw frames you could rebuild something making sense with ffmpeg but it's not going to be easy.

My guess is that the easiest would be to reuse the MediaRecorder code, except you bypass the encoder as what you have is already the compressed content.

Is it easy to access the stream causing those failures?

So, if we're going to watch for decode errors in the webrtc case, it is easiest to do that somewhere between WebrtcMediaDataDecoder and AppleVTDecoder, because webrtc.org swallows those errors, and I don't want to make modifications to webrtc.org since we're already in a world of hurt from doing that. If that's what we end up doing, we're stuck with raw h264. I've looked around, and it seems easy to do this with ffmpeg at first glance, although I doubt any of these cases are working with output from openh264, which may make things harder. However, I'm not so interested in playing this stuff back (although that might be useful for making test cases), what I really want to do is load it up in some sort of analysis tool. Being totally new to this, I don't know what the available tools are, or how easily they deal with raw h264.

Yeah, base revision looks the same.

Attachment #9137859 - Attachment description: Bug 1626364: Detect fatal errors in VideoToolkit h264 decode, and ask for reinit. r?jya → Bug 1626364: Report errors in VideoToolkit h264 decode PlatformCallback. r?jya
Attachment #9138181 - Attachment description: Bug 1626364: Handle decoder reinit in WebrtcMediaDataDecoder. → Bug 1626364: Reinit decoder on error in WebrtcMediaDataDecoder.
Pushed by bcampen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/df0f49e9c06e
Report errors in VideoToolkit h264 decode PlatformCallback. r=jya
https://hg.mozilla.org/integration/autoland/rev/671ea69f58e4
Reinit decoder on error in WebrtcMediaDataDecoder. r=jya
Status: NEW → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla78
Regressions: 1636615
You need to log in before you can comment on or make changes to this bug.