Closed Bug 1068151 Opened 5 years ago Closed 4 years ago

Video playback halts with a malformed MPEG-4 file

Categories

(Core :: Audio/Video: Playback, defect, P2)

32 Branch
x86
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox49 --- fixed

People

(Reporter: chris, Assigned: ayang)

References

(Blocks 2 open bugs)

Details

(Keywords: compat)

Attachments

(1 file, 14 obsolete files)

54.14 KB, patch
ayang
: review+
Details | Diff | Splinter Review
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0
Build ID: 20140911151253

Steps to reproduce:

http://chris.improbable.org/experiments/browser/video/waters_of_destiny.html

has a simple <video> which points to an MPEG-4 file which is malformed with a corrupt frame just after 44 seconds into the stream.



Actual results:

In Firefox Nightly on OS X, video playback hangs at 44s but audio continues
In Firefox 32 on Windows 7, video playback halts at 44s

In neither case is anything logged to the console or an error event dispatched


Expected results:

Safari, Internet Explorer and Chrome on Windows 7 continue playing with a brief flash of corrupted data, although no errors are reported.

Chrome on OS X hangs.

Given the prevalence of broken video in the wild and the degree to which media players silently compensate, I think Firefox will be unfairly blamed if a video does not play. In all cases, it would also make sense to report the error so a user who looks at the console could learn why playback stopped (given the obtrusiveness, an in-browser alert might even be appropriate) and triggering a JavaScript error would make it easier for site owners to learn of problems.
Flags: needinfo?(cpearce)
Retesting with FF 35, the symptoms are similar (video hands, audio continues) except that the video element now shows the same spinner as for a slow network connection. The good news is that scrubbing the slider past the problem point will also restart the video playback.
Flags: needinfo?(cpearce)
Component: Audio/Video → Audio/Video: Playback
Tested 44: Windows plays through, Mac OSX stops at 44 seconds.
Status: UNCONFIRMED → NEW
Ever confirmed: true
It decodes error on a non-key frame.

2016-03-16 02:06:45.472508 UTC - [MediaPDecoder #1]: D/PlatformDecoderModule AppleVTDecoder: Error -12909 VTDecompressionSessionDecodeFrame

Instead of returning a fatal error, a non-key frame error should be recoverable IMHO.
Flags: needinfo?(cpearce)
(In reply to Alfredo Yang (:alfredo) from comment #3)
> It decodes error on a non-key frame.
> 
> 2016-03-16 02:06:45.472508 UTC - [MediaPDecoder #1]: D/PlatformDecoderModule
> AppleVTDecoder: Error -12909 VTDecompressionSessionDecodeFrame
> 
> Instead of returning a fatal error, a non-key frame error should be
> recoverable IMHO.

I agree. Want to write us a patch? :)
Flags: needinfo?(cpearce)
Assignee: nobody → ayang
(In reply to Alfredo Yang (:alfredo) from comment #3)
> It decodes error on a non-key frame.
> 
> 2016-03-16 02:06:45.472508 UTC - [MediaPDecoder #1]: D/PlatformDecoderModule
> AppleVTDecoder: Error -12909 VTDecompressionSessionDecodeFrame
> 
> Instead of returning a fatal error, a non-key frame error should be
> recoverable IMHO.

so you then ignore all frames until the next keyframe? the frame that fail to decode may be a B or P frame. Additionally, you just don't know if the decoder can recover.
additionally, the issue isn't just with the Apple VT decoder, but any decoder (we have reports of the AAC decoder sometimes failing to decode a frame).

So the solution to be taken isn't to be unique to Apple VT IMHO, but a generic one (so typically to be done in MediaFormatReader).
(In reply to Jean-Yves Avenard [:jya] from comment #5)
> (In reply to Alfredo Yang (:alfredo) from comment #3)
> > It decodes error on a non-key frame.
> > 
> > 2016-03-16 02:06:45.472508 UTC - [MediaPDecoder #1]: D/PlatformDecoderModule
> > AppleVTDecoder: Error -12909 VTDecompressionSessionDecodeFrame
> > 
> > Instead of returning a fatal error, a non-key frame error should be
> > recoverable IMHO.
> 
> so you then ignore all frames until the next keyframe? the frame that fail
> to decode may be a B or P frame. Additionally, you just don't know if the
> decoder can recover.

I think we don't need to skip to next key frame. We should keep feeding data to decoder as much as possible. The worst case will be the decoder fails to output any frame within corrupted GOP but it should be recover at next I frame.
(In reply to Jean-Yves Avenard [:jya] from comment #6)
> additionally, the issue isn't just with the Apple VT decoder, but any
> decoder (we have reports of the AAC decoder sometimes failing to decode a
> frame).
> 
> So the solution to be taken isn't to be unique to Apple VT IMHO, but a
> generic one (so typically to be done in MediaFormatReader).

Agree.
(In reply to Alfredo Yang (:alfredo) from comment #7)

> I think we don't need to skip to next key frame. We should keep feeding data
> to decoder as much as possible. The worst case will be the decoder fails to
> output any frame within corrupted GOP but it should be recover at next I
> frame.

we need to be careful however that we just simply attempt to demux in a loop until we get a valid frame.
We need to limit the amount of errors we allow.
A reasonable first pass might simply be to raise some sort of JavaScript error so people would see things in the console or, better yet, window.onerror handlers could capture them.
(In reply to Jean-Yves Avenard [:jya] from comment #9)
> (In reply to Alfredo Yang (:alfredo) from comment #7)
> 
> > I think we don't need to skip to next key frame. We should keep feeding data
> > to decoder as much as possible. The worst case will be the decoder fails to
> > output any frame within corrupted GOP but it should be recover at next I
> > frame.
> 
> we need to be careful however that we just simply attempt to demux in a loop
> until we get a valid frame.
> We need to limit the amount of errors we allow.

My current plan is:
1. Stop decoding immediately when key frame is corrupted.
2. Ignore corrupted data few times for current GOP. When corrupted data count exceeds the limitation, stop decoding.
3. Reset the corrupted data count at next GOP.

Any suggestion?
Attached patch ignore_corrupted_data (obsolete) — Splinter Review
Draft.

There are other errors in this video.
14:17, 3 errors in this GOP. Firefox stops playing at this case.
22:16, audio corrupted. This patch also handles it and plays silence at that time.
See Also: → 1258871
See Also: 1258871
GMP crashes at eme-adobe.dll while playing the test video on Windows.
It should be same as bug 1258871.
ffmpeg can't recover from malformat video.
Attached patch ignore_corrupted_data (obsolete) — Splinter Review
Attachment #8733274 - Attachment is obsolete: true
Attachment #8735742 - Flags: review?(jyavenard)
Blocks: 1260667
Comment on attachment 8735742 [details] [diff] [review]
ignore_corrupted_data

Review of attachment 8735742 [details] [diff] [review]:
-----------------------------------------------------------------

sorry for the delay in doing the review, it had skipped my mind.

::: dom/media/MediaFormatReader.cpp
@@ +1319,5 @@
> +    task = NS_NewRunnableFunction([aTrack, self] () {
> +      MOZ_ASSERT(self->OnTaskQueue());
> +
> +      // Check if it exceeds corrupted data limitation in this GOP.
> +      if (++self->mNumOfCorruptedDataCount > self->mMaxCorruptedDataCount) {

I would prefer this to simply increase the error count if needed per decoder type, and have the process of handling error count in Update instead.

::: dom/media/MediaFormatReader.h
@@ +437,5 @@
> +  // mMaxCorruptedDataCount defines the max allowed error count per GOP.
> +  // mNumOfCorruptedDataCount is the current error count, it is reset at each
> +  // GOP.
> +  uint32_t mMaxCorruptedDataCount;
> +  uint32_t mNumOfCorruptedDataCount;

this should be set per decoder track. not global so there's no overlap between the audio and the video decoder.

Skipping audio decoding is I believe more important than skipping over video error as they are more common, especially with radio streams.

I have my doubt that any video decoders will be able to recover after an error, especially ffmpeg or apple VT.
Those are heavily multi-threaded, if one frame fail, that means you'll get as many frames failing as there are threads if you're lucky.

to be honest, I'm not sure attempting to recover from a decoding error simply by feeding more frames is going to work. With most h264 or vp9, all frames in a GOP depends on the previous one. If one fail, it's the whole GOP that is unrecoverable. 
You need to skip to the next keyframe is I believe the only way recovering is going to work.

Doing so for audio means that you attempt to decode the next frame immediately.

So when encountering an error, drain the decoder, and kick the skip to next keyframe logic. all the infrastructure is already in place to do so easily.

::: dom/media/platforms/PlatformDecoderModule.h
@@ +111,5 @@
>  // MediaFormatReader.
>  // Implementation is threadsafe, and can be called on any thread.
>  class MediaDataDecoderCallback {
>  public:
> +  enum DecodeErrorReason {

Please call this something else.
Just Error would be enough

and make it enum class
so you must do: Error::GENERIC_ERROR to avoid all ambiguities.

@@ +114,5 @@
>  public:
> +  enum DecodeErrorReason {
> +    GENERIC_ERROR,
> +    DECODE_ERROR
> +  };

I'm a bit unclear on what is a GENERIC_ERROR and a DECODE_ERROR.

How they are used appears rather random at times.

It appears to me that you treat DECODE_ERROR as a recoverable error, while GENERIC_ERROR is fatal.

So can't you use FATAL_ERROR instead? seems more appropriate to me.

::: dom/media/platforms/agnostic/OpusDecoder.cpp
@@ +125,5 @@
>  void
>  OpusDataDecoder::Decode(MediaRawData* aSample)
>  {
>    if (DoDecode(aSample) == -1) {
> +    mCallback->Error(MediaDataDecoderCallback::DecodeErrorReason::GENERIC_ERROR);

Why would that be a generic_error ?

for all audio decoder, you're far more likely going to be able to recover from an error than you can with video.

returning DECODING_ERROR for all audio decoder makes more sense to me.

::: dom/media/platforms/agnostic/VPXDecoder.cpp
@@ +177,5 @@
>  void
>  VPXDecoder::DecodeFrame(MediaRawData* aSample)
>  {
>    if (DoDecodeFrame(aSample) == -1) {
> +    mCallback->Error(MediaDataDecoderCallback::DecodeErrorReason::GENERIC_ERROR);

same here, why is this a GENERIC_ERROR ?

::: dom/media/platforms/apple/AppleVTDecoder.cpp
@@ +250,5 @@
>                                           &infoFlags);
>    if (rv != noErr && !(infoFlags & kVTDecodeInfo_FrameDropped)) {
>      LOG("AppleVTDecoder: Error %d VTDecompressionSessionDecodeFrame", rv);
>      NS_WARNING("Couldn't pass frame to decoder");
> +    mCallback->Error(MediaDataDecoderCallback::DECODE_ERROR);

why would VT be recoverable, but not VDA? I don't follow the logic.
Attachment #8735742 - Flags: review?(jyavenard) → review-
(In reply to Jean-Yves Avenard [:jya] from comment #16)
> Comment on attachment 8735742 [details] [diff] [review]
> ignore_corrupted_data
> 
> Review of attachment 8735742 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> sorry for the delay in doing the review, it had skipped my mind.
> 
> ::: dom/media/MediaFormatReader.cpp
> @@ +1319,5 @@
> > +    task = NS_NewRunnableFunction([aTrack, self] () {
> > +      MOZ_ASSERT(self->OnTaskQueue());
> > +
> > +      // Check if it exceeds corrupted data limitation in this GOP.
> > +      if (++self->mNumOfCorruptedDataCount > self->mMaxCorruptedDataCount) {
> 
> I would prefer this to simply increase the error count if needed per decoder
> type, and have the process of handling error count in Update instead.
> 
> ::: dom/media/MediaFormatReader.h
> @@ +437,5 @@
> > +  // mMaxCorruptedDataCount defines the max allowed error count per GOP.
> > +  // mNumOfCorruptedDataCount is the current error count, it is reset at each
> > +  // GOP.
> > +  uint32_t mMaxCorruptedDataCount;
> > +  uint32_t mNumOfCorruptedDataCount;
> 
> this should be set per decoder track. not global so there's no overlap
> between the audio and the video decoder.

I remove the mMaxCorruptedDataCount limitation in my patch.
The reason is the error could be variant according to its format, it is hard to decide a generic value for all formats.
Another decoder should try to decode as much as possible if demuxer thinks the data is valid. IMHO, it should be demuxer to stop playing, not decoder.
Attached patch ignore_corrupted_data (obsolete) — Splinter Review
Attachment #8735742 - Attachment is obsolete: true
Attachment #8740812 - Flags: review?(jyavenard)
Comment on attachment 8740812 [details] [diff] [review]
ignore_corrupted_data

Review of attachment 8740812 [details] [diff] [review]:
-----------------------------------------------------------------

much simpler.

There are still few things that aren't right:
1- Lack of consistency in the severity of an error across platforms. On OMX, Windows and Android they are fatal. On other only OOM error are fatal. It needs to be consistent.
2- There's no threshold in how many errors you allow.

With 2, you could potentially end up demuxing and attempting to decode every single corrupted frame of a media to the end. I don't think that's right. We need to limit the amount of error we allow. Maybe limit the number of consecutive errors allowed? say 3 for audio and just 2 for video (so if we fail the first video frame, and then the next keyframe one we stop)

::: dom/media/MediaFormatReader.cpp
@@ +1129,5 @@
> +  if (decoder.mDecodeError) {
> +    LOG("%s decoded error count %d", TrackTypeToStr(aTrack));
> +    Flush(aTrack);
> +    if (aTrack == TrackType::kVideoTrack) {
> +      SkipVideoDemuxToNextKeyFrame(decoder.mLastSampleTime.ref());

decoder.mLastSampleTime will only ever be set if we have returned a sample to the MDSM.
So if the error occurs on the first frame this would cause an assert as mLastSampleTime.isNothing() is true.

So maybe mLastSampleTime.refOr(TimeUnit())

but having said that... this raises the question of what would happen if the error occurs on the first frame. This would cause us to jump on the next keyframe. so the first video frame returned wouldn't be at 0 while the MDSM and the MediaDecoderReaderWrapper is still calculating what the start time will be.

Couldn't this lead to broken A/V sync there? JW could answer that question. I think not, but just to be safe

@@ +1322,5 @@
> +      decoder.mDecodeError = true;
> +    });
> +  } else {
> +    task = NS_NewRunnableMethodWithArg<TrackType>(this,
> +                                                  &MediaFormatReader::NotifyError,

why don't you simply past aError as a new argument to NotifyError ? coud make it optional and by default be FATAL_ERROR.
And if aError is DECODE_ERROR just schedule an update.

would simplify the code and keep consistent with the handling of the existing error.

::: dom/media/platforms/PlatformDecoderModule.h
@@ +13,5 @@
>  #include "mozilla/layers/LayersTypes.h"
>  #include "nsTArray.h"
>  #include "mozilla/RefPtr.h"
>  #include <queue>
> +#include <functional>

what's this header for?

@@ +112,5 @@
>  // MediaFormatReader.
>  // Implementation is threadsafe, and can be called on any thread.
>  class MediaDataDecoderCallback {
>  public:
> +  enum class DecoderError {

that makes for unreasonable long enum name:
like MediaDataDecoderCallback::DecoderError::DECODER_ERROR

a simple enum would be easier.

MediaDataDecoderCallback::FATAL_ERROR is good enough

::: dom/media/platforms/android/AndroidDecoderModule.cpp
@@ +393,5 @@
>      if (State() == kDrainDecoder) { \
>        INVOKE_CALLBACK(DrainComplete); \
>        State(kDecoding); \
>      } \
> +    INVOKE_CALLBACK(Error, MediaDataDecoderCallback::DecoderError::FATAL_ERROR); \

any particular reason you want an error on android to be fatal?

In all, it makes different platform behave differently. I think it's not what we really want to do with gecko. We should aim at having the most consistent experience across all platform. that firefox should be the same, no matter if you're on windows, mac, android, linux

::: dom/media/platforms/apple/AppleATDecoder.cpp
@@ +186,5 @@
>  
>    if (rv == NS_OK) {
>      for (size_t i = 0; i < mQueuedSamples.Length(); i++) {
>        if (NS_FAILED(DecodeSample(mQueuedSamples[i]))) {
> +        mCallback->Error(MediaDataDecoderCallback::DecoderError::DECODE_ERROR);

i don't see the point of not returning early here.

you will attempt to continue decoding which will likely return multiple error, only to clear the queue

(having said that the typical use is a single frame in the queue)

so I think it's best to just return the error code, and do just like before

::: dom/media/platforms/apple/AppleVDADecoder.cpp
@@ +321,5 @@
>    }
>    MOZ_ASSERT(mQueuedSamples);
>    mQueuedSamples--;
>  
> +  if (aStatus != noErr) {

aImage will be null if you got an error, you can remove all this

@@ +330,3 @@
>    if (!aImage) {
>      // Image was dropped by decoder.
>      return NS_OK;

mind you, it's very frequent for the mac decoder on some hardware to return no frame and a status != NoErr

currently, all those errors were ignored.

your changes could potentially make them permanent.
If you do intend to mark those as error then you could do here: mCallback->Error(MediaDataDecoderCallback::DecoderError::DECODE_ERROR);

but I think it's going to cause us to regress

@@ +353,5 @@
>      // Lock the returned image data.
>      CVReturn rv = CVPixelBufferLockBaseAddress(aImage, kCVPixelBufferLock_ReadOnly);
>      if (rv != kCVReturnSuccess) {
>        NS_ERROR("error locking pixel data");
> +      mCallback->Error(MediaDataDecoderCallback::DecoderError::FATAL_ERROR);

I think this could be a recoverable error.

::: dom/media/platforms/apple/AppleVDADecoder.h
@@ +88,5 @@
>  
>    // Access from the taskqueue and the decoder's thread.
>    // OutputFrame is thread-safe.
>    nsresult OutputFrame(CVPixelBufferRef aImage,
> +                       AppleFrameRef aFrameRef,

unnecessary, see comment in AppleVDADecoder

::: dom/media/platforms/apple/AppleVTDecoder.cpp
@@ +174,5 @@
>    } else {
>      MOZ_ASSERT(CFGetTypeID(image) == CVPixelBufferGetTypeID(),
>        "VideoToolbox returned an unexpected image type");
>    }
> +  decoder->OutputFrame(image, *frameRef, status);

remove

::: dom/media/platforms/omx/OmxDataDecoder.cpp
@@ +449,5 @@
>  void
>  OmxDataDecoder::NotifyError(OMX_ERRORTYPE aError, const char* aLine)
>  {
>    LOG("NotifyError %d at %s", aError, aLine);
> +  mCallback->Error(MediaDataDecoderCallback::DecoderError::FATAL_ERROR);

same as with android, now everything on OMX is a fatal error, while other platform have more leeway.

any reasons for this choice?

::: dom/media/platforms/wmf/WMFMediaDataDecoder.cpp
@@ +159,5 @@
>        mCallback->InputExhausted();
>      }
>    } else if (FAILED(hr)) {
>      NS_WARNING("WMFMediaDataDecoder failed to output data");
> +    mCallback->Error(MediaDataDecoderCallback::DecoderError::FATAL_ERROR);

that would make all decoding on windows fatal for bother audio or video, shouldn't this be DECODE_ERROR instead?
Attachment #8740812 - Flags: review?(jyavenard)
(In reply to Jean-Yves Avenard [:jya] from comment #19)
> ::: dom/media/platforms/android/AndroidDecoderModule.cpp
> @@ +393,5 @@
> >      if (State() == kDrainDecoder) { \
> >        INVOKE_CALLBACK(DrainComplete); \
> >        State(kDecoding); \
> >      } \
> > +    INVOKE_CALLBACK(Error, MediaDataDecoderCallback::DecoderError::FATAL_ERROR); \
> 
> any particular reason you want an error on android to be fatal?

It is for generic error in Android decoder. For decode error, it is on https://dxr.mozilla.org/mozilla-central/rev/67ac40fb8f680ea5e03805552187ba1b5e8392a1/dom/media/platforms/android/AndroidDecoderModule.cpp#632.
Attached patch ignore_corrupted_data (obsolete) — Splinter Review
Attachment #8740812 - Attachment is obsolete: true
Attachment #8743103 - Flags: review?(jyavenard)
Comment on attachment 8743103 [details] [diff] [review]
ignore_corrupted_data

Review of attachment 8743103 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with comments addressed.

::: dom/media/MediaFormatReader.cpp
@@ +736,5 @@
>    // run of input than we input, decoders fire an "input exhausted" callback,
>    // which overrides our "few more samples" threshold.
>    return
>      !aDecoder.mDraining &&
> +    !aDecoder.HasFatalError() &&

NeedInput won't be called if you have a fatal error. it's an unnecessary check

@@ +1138,5 @@
> +    LOG("%s decoded error count %d", TrackTypeToStr(aTrack),
> +                                     decoder.mNumOfConsecutiveError);
> +    Flush(aTrack);
> +    if (aTrack == TrackType::kVideoTrack) {
> +      SkipVideoDemuxToNextKeyFrame(decoder.mLastSampleTime.refOr(TimeUnit()));

Previously, you could only call SkipVideoDemuxToNextKeyFrame if you had a pending video request promise.
However, following your changes with decode ahead when you get there that may not always be true and you would hit the assertion: 
MOZ_ASSERT(mVideo.HasPromise());

You need to update the range of asserts in SkipVideoDemuxToNextKeyFrame as you've changed the conditions during which SkipVideoDemuxToNextKeyFrame can be called

@@ +1290,5 @@
>         aSample->mKeyframe, aSample->mDuration);
>  
>    if (!aSample) {
>      NS_WARNING("MediaFormatReader::Output() passed a null sample");
> +    Error(aTrack, MediaDataDecoderError::FATAL_ERROR);

the default is FATAL_ERROR, do you need to specify it again there when you haven't specify it elsewhere under similar circumstances?

@@ +1380,5 @@
>      mDecoder->NotifyDecodedFrames(aSkipped, 0, aSkipped);
>    }
>    mVideo.mNumSamplesSkippedTotal += aSkipped;
>    mVideo.mNumSamplesSkippedTotalSinceTelemetry += aSkipped;
> +  MOZ_ASSERT(!mVideo.HasFatalError()); // We have flushed the decoder, no frame could

It should be mVideo.mError.isNothing()

you have called Flush() prior, which would have reset mVideo.mError. The comment is still valid.
Attachment #8743103 - Flags: review?(jyavenard) → review+
Attached patch ignore_corrupted_data (obsolete) — Splinter Review
Attachment #8743103 - Attachment is obsolete: true
Attachment #8744821 - Flags: review+
Attached patch test_corrupted_video_playback (obsolete) — Splinter Review
Attachment #8745157 - Flags: review?(jyavenard)
Attached patch remove_invalid_test (obsolete) — Splinter Review
Attachment #8745159 - Flags: review?(jyavenard)
Comment on attachment 8745159 [details] [diff] [review]
remove_invalid_test

Review of attachment 8745159 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not certain why we should accept those files when they are clearly invalid and against the spec.

This is not about an invalid frame here, but an invalid file.

I'll pass the decision to :kinetik as he knows far more about webm than I do.

But my gut feel is that the reader should still reject those and mark those errors as fatal. The test isn't invalid.
Attachment #8745159 - Flags: review?(jyavenard) → review?(kinetik)
Comment on attachment 8745157 [details] [diff] [review]
test_corrupted_video_playback

Review of attachment 8745157 [details] [diff] [review]:
-----------------------------------------------------------------

r- on the provisio that you now have a test that gives different behaviour according to the platform, when I believe we should be driving to a behaviour and user experience independent of the platform used.

::: dom/media/test/mochitest.ini
@@ +622,5 @@
>  [test_closing_connections.html]
>  [test_constants.html]
>  [test_controls.html]
> +[test_corrupted_file_play.html]
> +skip-if = os == 'mac'

why would you not run this test on mac ?

::: dom/media/test/test_corrupted_file_play.html
@@ +9,5 @@
> +  <script type="text/javascript" src="/tests/SimpleTest/SimpleTest.js"></script>
> +  <link rel="stylesheet" type="text/css" href="/tests/SimpleTest/test.css" />
> +</head>
> +<body>
> +<a target="_blank" href="https://bugzilla.mozilla.org/show_bug.cgi?id=975978">Mozilla Bug 1068151</a>

wrong link, it's pointing to bug 975978

@@ +17,5 @@
> +
> +SimpleTest.waitForExplicitFinish();
> +
> +var resource =
> +  { name:"water_corrupted_video.mp4", type:"video/mp4", corrupttime:16};

can you please add a comment on why 16?

@@ +26,5 @@
> +  v.currentTime = resource.corrupttime;
> +}
> +
> +v.addEventListener("error", function(ev) {
> +  ok(false, "decoder should be able to play this corrutped file.")

corrupted file

@@ +31,5 @@
> +  SimpleTest.finish();
> +}, false);
> +
> +v.addEventListener("seeked", function(ev) {
> +  ok(true, "decoder played corrutped file successfully.")

corrupted
Attachment #8745157 - Flags: review?(jyavenard)
Comment on attachment 8745159 [details] [diff] [review]
remove_invalid_test

These are files with invalid constructions that we have explicitly decided to reject, mainly to ensure muxers producing these files are spec compliant.  I don't understand why we would want to remove these tests or accept these files.
Attachment #8745159 - Flags: review?(kinetik) → review-
(In reply to Matthew Gregan [:kinetik] from comment #32)
> Comment on attachment 8745159 [details] [diff] [review]
> remove_invalid_test
> 
> These are files with invalid constructions that we have explicitly decided
> to reject, mainly to ensure muxers producing these files are spec compliant.
> I don't understand why we would want to remove these tests or accept these
> files.

This patch makes MediaFormatReader to ignore decoder error to a degree so the browser keeps playing as much as possible.
And I found opus decoder returns error at [1] and [2] when playing these files.
I can modify patch, so it returns fatal error at these two conditions and other errors in OpusDataDecoder::DoDecode are tolerable.
Does that make sense?

[1] https://dxr.mozilla.org/mozilla-central/rev/77cead2cd20300623eea2416bc9bce4d5021df09/dom/media/platforms/agnostic/OpusDecoder.cpp#232
[2] https://dxr.mozilla.org/mozilla-central/rev/77cead2cd20300623eea2416bc9bce4d5021df09/dom/media/platforms/agnostic/OpusDecoder.cpp#246
Flags: needinfo?(kinetik)
(In reply to Alfredo Yang (:alfredo) from comment #33)
> This patch makes MediaFormatReader to ignore decoder error to a degree so
> the browser keeps playing as much as possible.
> And I found opus decoder returns error at [1] and [2] when playing these
> files.
> I can modify patch, so it returns fatal error at these two conditions and
> other errors in OpusDataDecoder::DoDecode are tolerable.
> Does that make sense?

If that's sufficient to reject those invalid files such that test_invalid_reject_play passes, sure.
Flags: needinfo?(kinetik)
Attached patch ignore_corrupted_data (obsolete) — Splinter Review
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ccc979527c3e&selectedJob=20417146
Attachment #8744821 - Attachment is obsolete: true
Attachment #8745157 - Attachment is obsolete: true
Attachment #8745159 - Attachment is obsolete: true
Attachment #8749547 - Flags: review+
See Also: → 1270747
Blocks: 1270748
Keywords: checkin-needed
Keywords: checkin-needed
Attached patch ignore_corrupted_data (obsolete) — Splinter Review
Rebase.
Attachment #8749547 - Attachment is obsolete: true
Attachment #8749564 - Flags: review+
Attached patch ignore_corrupted_data (obsolete) — Splinter Review
Rebase.
Attachment #8749564 - Attachment is obsolete: true
Attachment #8749565 - Flags: review+
Keywords: checkin-needed
sorry had to back this out for assertion failures like https://treeherder.mozilla.org/logviewer.html#?job_id=27373708&repo=mozilla-inbound
Flags: needinfo?(ayang)
The assertion occurs because I removed the check for mVideo.mError in SkipVideoNextKeyFram (or something like that) which would have immediately rejected the MediaDataPromiae. Having mError set when Request*Data is called would typically have been an abuse of the MediaDecoderReader API as all errors were fatal prior this bug. 

Changing this assert so that it only checks that the error isn't a fatal one should be sufficient. Or remove the assert altogether as the error will be handled during the next run of Update.
Attached patch ignore_corrupted_data (obsolete) — Splinter Review
Assert only if fatal error.
Attachment #8749565 - Attachment is obsolete: true
Flags: needinfo?(ayang)
Attachment #8750167 - Flags: review+
Keywords: checkin-needed
As per IRC, theres an issue I had overlooked:

you also need to change in MediaFormatReader::SkipVideoDemuxToNextKeyFrame ; that you only exit early if the error is fatal
https://hg.mozilla.org/integration/mozilla-inbound/file/tip/dom/media/MediaFormatReader.cpp#l1413
otherwise, the behaviour will be racy. it could be that at the time the SkipVideoDemuxeToNExtKeyframe kicks in, you have a recoverable error pending and it will exit, when really it shouldn't
I also think there there's another problem at hand
Say an error just occurred, and mVideo.mError is set to a recoverable error. if SkipVideoDemuxToNextKeyFrame happens to be called right at that time; it will flush the decoder. But that leave mError set so when you get to handle mError, you will flush the decoder and skip to the next key frame one more time so you end up skipping two keyframes.
As it is, if SkipVideoDemuxToNextKeyFrame is triggered, your non-fatal error are really becoming fatal as the media element will receive an error
Keywords: checkin-needed
You can probably not do anything at all when mError is set ; you do not want to reject the promise if the error is recoverable.
Attached patch ignore_corrupted_data (obsolete) — Splinter Review
Attachment #8750167 - Attachment is obsolete: true
Attachment #8750234 - Flags: feedback?(jyavenard)
Comment on attachment 8750234 [details] [diff] [review]
ignore_corrupted_data

Review of attachment 8750234 [details] [diff] [review]:
-----------------------------------------------------------------

Please fix commit message "malformat video" doesn't mean much.

What about: continue decoding when encountering recoverable error.

In addition to my earlier review.

OOM and arithmetic overflow errors should be made fatal. Same for audio errors where the channel configuration was found to be invalid. We have no chance of recovering from that, and some of the later code expect that those errors were terminal.

So please review all the call to Error to ensure that an OOM and overflow have been made fatal.

::: dom/media/MediaFormatReader.cpp
@@ +1185,5 @@
> +      return;
> +    }
> +    LOG("%s decoded error count %d", TrackTypeToStr(aTrack),
> +                                     decoder.mNumOfConsecutiveError);
> +    decoder.Flush();

this should be if (aTrack == TrackType::kVideoTrack) { ... } else { decoder.Flush() } as SkipVideoDemuxerToNextKeyFrame will already flush the decoder.

@@ +1446,5 @@
>      mDecoder->NotifyDecodedFrames(aSkipped, 0, aSkipped);
>    }
>    mVideo.mNumSamplesSkippedTotal += aSkipped;
>    mVideo.mNumSamplesSkippedTotalSinceTelemetry += aSkipped;
> +  MOZ_ASSERT(!mVideo.HasFatalError()); // We have flushed the decoder, no frame could

it is fine to leave the assert, because the decoder has been flush and we should never reach here if mError.isSome() is true.

::: dom/media/MediaFormatReader.h
@@ +309,5 @@
> +    uint32_t mNumOfConsecutiveError;
> +    uint32_t mMaxConsecutiveError;
> +
> +    Maybe<MediaDataDecoderError> mError;
> +    bool HasFatalError()

this function should be const

@@ +311,5 @@
> +
> +    Maybe<MediaDataDecoderError> mError;
> +    bool HasFatalError()
> +    {
> +      if (mError.isSome() && mError.ref() == MediaDataDecoderError::FATAL_ERROR) {

return mError.isSome() && mError.ref() == MediaDataDecoderError::FATAL_ERROR;

@@ +397,5 @@
>        mNumSamplesInput = 0;
>        mNumSamplesOutput = 0;
>        mSizeOfQueue = 0;
>        mNextStreamSourceID.reset();
> +      if (mError.isSome() && mError.ref() == MediaDataDecoderError::DECODE_ERROR) {

if (!HasFatalError())

::: dom/media/platforms/agnostic/OpusDecoder.cpp
@@ +145,5 @@
> +  DecodeError err = DoDecode(aSample);
> +  if (err == DecodeError::FATAL_ERROR) {
> +    mCallback->Error(MediaDataDecoderError::FATAL_ERROR);
> +  } else {
> +    if (err == DecodeError::DECODE_ERROR) { 

trailing space

@@ +194,5 @@
>    // A valid Opus packet must be between 2.5 and 120 ms long (48kHz).
>    int32_t frames = frames_number*samples;
>    if (frames < 120 || frames > 5760) {
>      OPUS_DEBUG("Invalid packet frames: %ld", frames);
> +    return DECODE_ERROR;

this should be a fatal error; it's incorrectly muxed

@@ +233,5 @@
>    }
>  
>    if (aDiscardPadding < 0) {
>      // Negative discard padding is invalid.
> +    // Send fatal error to make sure reader rejects this file.

I think those extra comments are redundant. The type of error is rather explicit

::: dom/media/platforms/agnostic/gmp/GMPAudioDecoder.cpp
@@ +30,5 @@
>    MOZ_ASSERT(IsOnGMPThread());
>  
>    if (aRate == 0 || aChannels == 0) {
>      NS_WARNING("Invalid rate or num channels returned on GMP audio samples");
> +    mCallback->Error(MediaDataDecoderError::DECODE_ERROR);

those are fatal errors. we can't recover from that.

@@ +38,5 @@
>    size_t numFrames = aPCM.Length() / aChannels;
>    MOZ_ASSERT((aPCM.Length() % aChannels) == 0);
>    AlignedAudioBuffer audioData(aPCM.Length());
>    if (!audioData) {
> +    mCallback->Error(MediaDataDecoderError::DECODE_ERROR);

this is an OOM, not sure if there's any point retrying from that.

@@ +51,5 @@
>      mAudioFrameSum = 0;
>      auto timestamp = UsecsToFrames(aTimeStamp, aRate);
>      if (!timestamp.isValid()) {
>        NS_WARNING("Invalid timestamp");
> +      mCallback->Error(MediaDataDecoderError::DECODE_ERROR);

that too should be fatal, we have an overflow error, we can't recover from this ; and all it means is that the next packet may succeed because the overflow has already occurred.

@@ +69,5 @@
>  
>    auto duration = FramesToUsecs(numFrames, aRate);
>    if (!duration.isValid()) {
>      NS_WARNING("Invalid duration on audio samples");
> +    mCallback->Error(MediaDataDecoderError::DECODE_ERROR);

this should be fatal too. as the error is an overflow.

@@ +204,5 @@
>    MOZ_ASSERT(IsOnGMPThread());
>  
>    RefPtr<MediaRawData> sample(aSample);
>    if (!mGMP) {
> +    mCallback->Error(MediaDataDecoderError::FATAL_ERROR);

this is an OOM. can't recover

::: dom/media/platforms/ffmpeg/FFmpegAudioDecoder.cpp
@@ +129,5 @@
>      if (decoded) {
>        uint32_t numChannels = mCodecContext->channels;
>        AudioConfig::ChannelLayout layout(numChannels);
>        if (!layout.IsValid()) {
> +        mCallback->Error(MediaDataDecoderError::DECODE_ERROR);

this is a fatal error. we have no chance of recovering from that.
Attachment #8750234 - Flags: feedback?(jyavenard)
Attached patch ignore_corrupted_data (obsolete) — Splinter Review
Attachment #8750234 - Attachment is obsolete: true
Attachment #8750732 - Flags: feedback?(jyavenard)
Attached patch ignore_corrupted_data (obsolete) — Splinter Review
Rebase
Attachment #8750732 - Attachment is obsolete: true
Attachment #8750732 - Flags: feedback?(jyavenard)
Attachment #8751082 - Flags: feedback?(jyavenard)
Comment on attachment 8751082 [details] [diff] [review]
ignore_corrupted_data

Review of attachment 8751082 [details] [diff] [review]:
-----------------------------------------------------------------

see comments. r=me with the comments applied.

We really need mochitest to test the behaviour... for both audio and video.

Please provide some before committing, or open a new follow up bug... It's too much changes without properly testing that.

::: dom/media/MediaFormatReader.cpp
@@ +512,5 @@
>    if (!HasVideo()) {
>      LOG("called with no video track");
>      return MediaDataPromise::CreateAndReject(DECODE_ERROR, __func__);
>    }
>  

MozReview is much nicer as I could add the comment in line where it needs to be.

See comments in SkipVideoDemuxToNextK.

You'll want to not even call SkipVideoDemuxToNextK

something like:
  if (!mVideo.mError && !mVideo.HasInternalSeekPending() &&
      ShouldSkip(aSkipToNextKeyframe, timeThreshold)) {
    RefPtr<MediaDataPromise> p = mVideo.EnsurePromise(__func__);
    SkipVideoDemuxToNextKeyFrame(timeThreshold);
    return p;
  }

@@ -738,5 @@
>    // run of input than we input, decoders fire an "input exhausted" callback,
>    // which overrides our "few more samples" threshold.
>    return
>      !aDecoder.mDraining &&
> -    !aDecoder.mError &&

shouldn't this stay but have !aDecoder.HasFatalError()

@@ +1188,5 @@
> +                                     decoder.mNumOfConsecutiveError);
> +    if (aTrack == TrackType::kVideoTrack) {
> +      SkipVideoDemuxToNextKeyFrame(decoder.mLastSampleTime.refOr(TimeUnit()));
> +    } else {
> +      Reset(aTrack);

Do you have regression with files containing audio errors?

Because here I don't see how it's going to work for the audio track and resume from an error.

For video it's fine.

But for audio, you're calling reset. which will reset the decoder state and flush it. This will also set mDecodingRequested and mOutputRequested to false. So from no one, there will be no more decoding happening.

Also, as you're returning right after, the MFR will stop doing any new tasks for the audio track.

After Reset(aTrack) you need to request new data, you do this with a call to NotifyDecodingRequested(aTrack);. then you can return.

@@ +1426,5 @@
>      LOGV("Internal Seek pending, cancelling it");
>    }
>    Reset(TrackInfo::kVideoTrack);
>  
>    if (mVideo.mError) {

Upon thinking about it more, this is not going to work either.

Reset will clear mNeedOutput and mDecodingRequested.
so from that point on, decoding will stop, and it won't restart.

so it's likely best to remove this test completely, and in RequestVideoData, if we're in error, we just won't even bother with skipping to the next keyframe (it will be done once Update get to run anyway)

::: dom/media/platforms/agnostic/OpusDecoder.cpp
@@ +142,5 @@
>  void
>  OpusDataDecoder::Decode(MediaRawData* aSample)
>  {
> +  DecodeError err = DoDecode(aSample);
> +  if (err == DecodeError::FATAL_ERROR) {

switch (err) please...

::: dom/media/platforms/apple/AppleVDADecoder.cpp
@@ +493,5 @@
>                                   frameInfo);
>  
>    if (rv != noErr) {
>      NS_WARNING("AppleVDADecoder: Couldn't pass frame to decoder");
> +    mCallback->Error(MediaDataDecoderError::FATAL_ERROR);

shouldn't this be DECODE_ERROR ? so it's like with VT
Attachment #8751082 - Flags: feedback?(jyavenard) → feedback+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/7e6f2545d6e3
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Depends on: 1277212
Depends on: 1307013
See Also: → 1364872
You need to log in before you can comment on or make changes to this bug.