Closed Bug 1576990 Opened 4 months ago Closed 3 months ago

Slack video frames get stuck

Categories

(Core :: Audio/Video: Playback, defect, P1)

Unspecified
Windows
defect

Tracking

()

VERIFIED FIXED
mozilla71
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- wontfix
firefox67 --- unaffected
firefox68 --- unaffected
firefox69 + wontfix
firefox70 + verified
firefox71 --- verified

People

(Reporter: cpeterson, Assigned: bryce)

References

(Regression)

Details

(Keywords: regression)

Attachments

(3 files)

[Tracking Requested - why for this release]:

Jean-Yves, this is a regression in Firefox 69 from Gfycat bug 1560440. I bisected this regression to this pushlog in 70 Nightly, though the fix (and this regression) were also uplifted to 69 Beta:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=3179c40a2a25fdba731bed824fb4352bb8209e3b&tochange=c52127007a0162c72b87a8d4613400d3741bcd5f

STR:

  1. Play this btnn.mp4 video in Slack: https://app.slack.com/client/T027LFU12/C027LFU20/files/FMRGE3E7J

Expected Result:

The video will play smoothing, showing a screen capture of Firefox for Android scrolling to left to right from "WELCOME" to "PRIVACY" to "ACCOUNT" pages.

Actual Result:

In Firefox 69 and 70 on Windows, the video starts playback at time 0:01 instead of 0:00 and then gets stuck on a frame showing the "PRIVACY" page from time 0:04 to the end time 0:08.

Flags: needinfo?(jyavenard)

Bryce, do you have cycles to take a look at this?

Flags: needinfo?(bvandyk)
Priority: -- → P1

Acking that I see this and that it's next on my queue. Holding NI.

Attached video btnn.mp4

Attaching video to the bug to facilitate more testing.

Chris, do you know the source of the video?

Flags: needinfo?(cpeterson)

Notes as I debug.

Comparing inputs and outputs with current code or with the regressing code backed out, first the current state (note we don't log duration on input because it's no longer passed -- this is due to the regressing bug):

2019-09-05 20:58:03.406000 UTC - [GPU 88320: MediaPDecoder #1]: D/PlatformDecoderModule Input Time: 0
2019-09-05 20:58:03.439000 UTC - [GPU 88320: MediaPDecoder #1]: D/PlatformDecoderModule Output Time: 0, duration: 33366
2019-09-05 20:58:03.440000 UTC - [GPU 88320: MediaPDecoder #1]: D/PlatformDecoderModule Input Time: 1388433
2019-09-05 20:58:03.448000 UTC - [GPU 88320: MediaPDecoder #1]: D/PlatformDecoderModule Output Time: 1388433, duration: 1388433
2019-09-05 20:58:03.449000 UTC - [GPU 88320: MediaPDecoder #1]: D/PlatformDecoderModule Input Time: 1401811
2019-09-05 20:58:03.455000 UTC - [GPU 88320: MediaPDecoder #1]: D/PlatformDecoderModule Output Time: 1401811, duration: 13378
2019-09-05 20:58:03.456000 UTC - [GPU 88320: MediaPDecoder #1]: D/PlatformDecoderModule Input Time: 3630844
2019-09-05 20:58:03.463000 UTC - [GPU 88320: MediaPDecoder #1]: D/PlatformDecoderModule Output Time: 3630844, duration: 2229033
2019-09-05 20:58:04.839000 UTC - [GPU 88320: MediaPDecoder #1]: D/PlatformDecoderModule Input Time: 3645955
2019-09-05 20:58:04.857000 UTC - [GPU 88320: MediaPDecoder #1]: D/PlatformDecoderModule Output Time: 3645955, duration: 15111
2019-09-05 20:58:06.227000 UTC - [GPU 88320: MediaPDecoder #1]: D/PlatformDecoderModule Input Time: 3661277
...

Backing out changes:

2019-09-05 21:08:39.838000 UTC - [GPU 89220: MediaPDecoder #1]: D/PlatformDecoderModule Input Time: 0, duration: 1388433
2019-09-05 21:08:39.879000 UTC - [GPU 89220: MediaPDecoder #1]: D/PlatformDecoderModule Output Time: 0, duration: 1388433
2019-09-05 21:08:39.880000 UTC - [GPU 89220: MediaPDecoder #1]: D/PlatformDecoderModule Input Time: 1388433, duration: 13378
2019-09-05 21:08:39.888000 UTC - [GPU 89220: MediaPDecoder #1]: D/PlatformDecoderModule Output Time: 1388433, duration: 13378
2019-09-05 21:08:39.889000 UTC - [GPU 89220: MediaPDecoder #1]: D/PlatformDecoderModule Input Time: 1401811, duration: 2229033
2019-09-05 21:08:39.894000 UTC - [GPU 89220: MediaPDecoder #1]: D/PlatformDecoderModule Output Time: 1401811, duration: 2229033
2019-09-05 21:08:39.895000 UTC - [GPU 89220: MediaPDecoder #1]: D/PlatformDecoderModule Input Time: 3630844, duration: 15111
2019-09-05 21:08:39.902000 UTC - [GPU 89220: MediaPDecoder #1]: D/PlatformDecoderModule Output Time: 3630844, duration: 15111
2019-09-05 21:08:41.279000 UTC - [GPU 89220: MediaPDecoder #1]: D/PlatformDecoderModule Input Time: 3645955, duration: 15322
2019-09-05 21:08:41.290000 UTC - [GPU 89220: MediaPDecoder #1]: D/PlatformDecoderModule Output Time: 3645955, duration: 15322
2019-09-05 21:08:41.292000 UTC - [GPU 89220: MediaPDecoder #1]: D/PlatformDecoderModule Input Time: 3661277, duration: 16556
...

where the time value being logged is the display timestamp. It looks like the display time stamps are all the same, but the durations on the outputs are different -- it looks like we're getting the same durations but offset by 1 sample.

Not sure what to make of this yet.

Chris, do you know the source of the video?

Vlad, what tool did you use to create the video demonstrating Fennec's new Onboarding Sign-in buttons? A code change in Firefox 69 causes this video to get stuck during playback.

Flags: needinfo?(cpeterson) → needinfo?(vlad.baicu)

Hello, I used Android Studio's video recording tool

Flags: needinfo?(vlad.baicu)

Thanks.

Thinking about comment 4, my hypothesis is that the decoder is attempting to infer durations from previous samples, which would cause the off by one behaviour. I.e. if sample 0 should have duration d0, sample 1 should have duration d1 and so on, then when the metadata has the information correct and we feed it in, each sample has the correct duration on output. However, if we do not feed that information in, then the decoder instead infers that from the space between each sample. In this case when it's decoding sample 1, it can look at the delta between sample 0 and sample 1, which is equivalent to d0 (duration of sample 0), and it then assumes this is a sensible duration to give sample 1. Thus each sample has an incorrect duration that appears shifted by 1.

This inference works in bug 1560440 because all those the samples in the mp4 have roughly the same duration, but seems like it will be problematic for mp4s that do not have uniform sample durations.

Ideally we can find an approach that works for both this bug and bug 1560440. I'm going to look more into what's going on with that regressing bug. It looks to me like we're getting some odd values from our metadata parsing. Specifically, we're producing results like this:

...
2019-09-06 17:53:40.714000 UTC - [GPU 82856: MediaPDecoder #1]: D/PlatformDecoderModule Input Time: 419326, duration: 0
2019-09-06 17:53:40.749000 UTC - [GPU 82856: MediaPDecoder #1]: D/PlatformDecoderModule Input Time: 384382, duration: 34944
2019-09-06 17:53:40.783000 UTC - [GPU 82856: MediaPDecoder #1]: D/PlatformDecoderModule Input Time: 349438, duration: 34944
2019-09-06 17:53:40.820000 UTC - [GPU 82856: MediaPDecoder #1]: D/PlatformDecoderModule Input Time: 419326, duration: 34944
...

^ this is logging starting from sample 10 for the media from bug 1560440. We've got a 0 duration sample at presentation time 419326, but later we get another sample at that same presentation time, but with a non-0 duration. Seems odd to me, will investigate further.

I believe comment 7 summarizes why this regression took place. Based on that information I believe we should feed duration information to the WMF decoder, in order to support files such as this one.

However, in order to support files such as those in bug 1560440, we will need an adjustment in how we handle durations. Feeding the WMF decoder a 0 duration appears to lead to unpredictable results. Bug 1560440's fix works because it avoids us feeding a 0 duration into the decoder, and instead lets the decoder infer a duration. The media in that bug has metadata that appears to me to be bogus: it leads to multiple samples having the same presentation timestamp. Per (where the video file is a copy of the video from the bug with only the video track)

ffprobe.exe 1560440Video-video-only.mp4 -show_frames

I get output such as:

...
[FRAME]
media_type=video
stream_index=0
key_frame=0
pkt_pts=7464
pkt_pts_time=0.419326
pkt_dts=7464
pkt_dts_time=0.419326
best_effort_timestamp=7464
best_effort_timestamp_time=0.419326
pkt_duration=622
pkt_duration_time=0.034944
pkt_pos=107342
pkt_size=1439
width=640
height=802
pix_fmt=yuv420p
sample_aspect_ratio=N/A
pict_type=B
coded_picture_number=13
display_picture_number=0
interlaced_frame=0
top_field_first=0
repeat_pict=0
color_range=unknown
color_space=unknown
color_primaries=unknown
color_transfer=unknown
chroma_location=left
[/FRAME]
[FRAME]
media_type=video
stream_index=0
key_frame=0
pkt_pts=7464
pkt_pts_time=0.419326
pkt_dts=7465
pkt_dts_time=0.419382
best_effort_timestamp=7465
best_effort_timestamp_time=0.419382
pkt_duration=622
pkt_duration_time=0.034944
pkt_pos=85733
pkt_size=13333
width=640
height=802
pix_fmt=yuv420p
sample_aspect_ratio=N/A
pict_type=P
coded_picture_number=10
display_picture_number=0
interlaced_frame=0
top_field_first=0
repeat_pict=0
color_range=unknown
color_space=unknown
color_primaries=unknown
color_transfer=unknown
chroma_location=left
[/FRAME]
...

Note that the presentation timestamps on those samples/frames are the same. Note also the the duration in the metadata doesn't make a lot of sense, in that those durations would cause those frames to overlap. Note also the libavcodec's best effort timestamp will move the second frame one time unit later so they don't exactly overlap.

When we parse the metadata and calculate end composition time, we end up overlapping these samples. This results in us calculating a 0 composition range, and thus a 0 duration further down the pipeline. When we feed the 3 duration into the WMF decoder we get the odd behaviour observed in bug 1560440.

:kinetik, :jya, are there cases where it makes sense to have samples exactly overlapping? My thought to address this is that we could, in the mp4 metadata parser, detect exactly overlapping samples and offset them by 1 time unit, as libavcodec appears to do with its best effort timestamp.

Flags: needinfo?(kinetik)

Sounds like this is an issue with specific video files and not a more widespread issue. Taking this off the Fx69 dot release radar.

(In reply to Bryce Seager van Dyk (:bryce) from comment #8)

:kinetik, :jya, are there cases where it makes sense to have samples exactly overlapping?

I'm not aware of one, but I don't really know the answer to this.

My thought to address this is that we could, in the mp4 metadata parser, detect exactly overlapping samples and offset them by 1 time unit, as libavcodec appears to do with its best effort timestamp.

That sounds reasonable. Do you know if the WMF decoder behaves correctly if we feed it samples with a "corrected"/offset time and a very short (1 unit) duration?

Flags: needinfo?(kinetik)

(In reply to Matthew Gregan [:kinetik] from comment #10)

(In reply to Bryce Seager van Dyk (:bryce) from comment #8)

:kinetik, :jya, are there cases where it makes sense to have samples exactly overlapping?

I'm not aware of one, but I don't really know the answer to this.

Thanks. The only thing I could think of is that some codecs may wish to have frames that are never displayed but that are referenced by other frames. In that case my 1 duration approach below would be non-ideal, but possibly a worthwhile trade off such that it would avoid bustages, while only showing frames for a very small amount of time.

My thought to address this is that we could, in the mp4 metadata parser, detect exactly overlapping samples and offset them by 1 time unit, as libavcodec appears to do with its best effort timestamp.

That sounds reasonable. Do you know if the WMF decoder behaves correctly if we feed it samples with a "corrected"/offset time and a very short (1 unit) duration?

In the case of the two bugs here, this does seem to work. I.e. if I change duration 0 samples to duration 1 in bug 1560440 then that file seems to work (even without changing timestamps, so we still have overlapping samples). I'd considered just hacking the WMF decoder with that change as a fix. My thinking is that updating the metadata parsing to offset timestamps is a more hygienic extension of that idea.

I've been testing solutions for getting both of the cases I detail above working and have arrived at doing the simple, dumb, and hopefully effective solution of having us set the duration on samples + detect 0 duration samples before feeding the decoder and just setting those samples to have 1 microsecond duration. In most cases this should work fine and we won't have 0 durations, and in the cases we have 0 duration samples it is better than we make the intentional decision to limit those samples to a very small duration, rather than letting the decoder decide and give us a possibly large duration. Patch incoming.

For posterity, I'd also considered some other things:

  • Detecting 0 duration samples/overlapping samples in rustmp4parse and applying determining a heuristic timestamp for them such that ever sample gets a duration. FFmpeg appears to do something like this in its best effort timestamp on AVFrame. I think this could work, but was rabbit holing on different edge cases (what happens if multiple 0 duration samples are stack on each other?).
  • Trying to match input and output samples from the decoder, so we can reset the duration afterwards. Chromium in their DXVA decoder do this by overloading the timestamp on input samples, and using it to store a unique ID, then matching the timestamp on output samples. This seems to work, though they have at least one report of the decoder altering the timestamp (in a remote desktop env) and breaking their pipeline.

I suppose if my attempted fix is not suitable we may need to consider something like the above.

Flags: needinfo?(jyavenard)
Flags: needinfo?(bvandyk)

This is a back out of changeset c52127007a01, i.e. reverts bug 1560440. That bug
was itself reverting a prior bug where we started setting the duration.

There are two scenarios involved, each with their own issues:

  • We don't set the duration when feeding the decoder (what we're doing prior to
    this change). In this case the decoder will estimate durations for all
    samples. This works in many cases, but leads to issues with files that have
    non-uniform durations - as seen in bug 1576990 (this bug).
  • We do set the duration when feeding the decoder (what will happen after this
    change). In this case the decoder will not estimate durations, except it still
    seems to do so when fed 0 duration samples. Since our demuxing pipeline can
    create 0 duration samples, this leads to issues with such files (as sen in bug
    1560440).

This patch fixes the former of the above, and the next patch in the chain will
address the latter.

Set a trivial duration (1us) on 0 duration samples before feeding them to the
WMF MFT decoder to prevent it from estimating the duration for such samples.

Depends on D46515

Assignee: nobody → bvandyk
Pushed by bvandyk@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3f2fc282b9c7
Set durations on frames for the WMF decoder. r=dminor
https://hg.mozilla.org/integration/autoland/rev/6028bf4f8a43
Stop the WMF decoder estimating durations on 0 duration samples. r=dminor
Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla71
Flags: qe-verify+

I have reproduced the issue in beta v70.0b9 and verified the fix in Nightly v71.0a1 form 2019-09-24 on Windows 10. It does not occur on Mac OS.

Status: RESOLVED → VERIFIED
Flags: qe-verify+

Please nominate this for Beta approval when you get a chance.

Flags: needinfo?(bvandyk)

Comment on attachment 9094014 [details]
Bug 1576990 - Set durations on frames for the WMF decoder. r?dminor,kinetik

Beta/Release Uplift Approval Request

  • User impact if declined: Mp4's with metadata that indicates variable durations for samples may fail to play correctly on Windows. We've seen this manifest as the durations being off by one, such that the nth + 1 frame gets the duration of the nth frame -- see the video on this bug in affected versions.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: Just the other patch on this bug.
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): There is moderate risk here: this fix is a hack, and while I'm confident that it does the right thing in the case of this bug and bug 1560440, I think the risk remains of unintended changes in behaviour. This is because the Windows decoder we're working with here often behaves as a black box, and does not provide an ideal API to us -- we'd like to be able to track samples through the decoder, but it doesn't let us.

Due to the diversity of mp4 content it's difficult to be sure this doesn't regress anything. On the other hand, this does fix a known regression, without introducing any known issues.

  • String changes made/needed: None.
Flags: needinfo?(bvandyk)
Attachment #9094014 - Flags: approval-mozilla-beta?
Attachment #9094015 - Flags: approval-mozilla-beta?

Comment on attachment 9094014 [details]
Bug 1576990 - Set durations on frames for the WMF decoder. r?dminor,kinetik

Moderate risk but I think worth taking for beta 10.
If we see regressions (especially if they're worse than this issue) then we may need to back out before 70 release.

Attachment #9094014 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9094015 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
QA Whiteboard: [qa-triaged]

I have also verified this fix in Beta v70.0b10.

Flags: qe-verify+

Setting "qe-verify+" tag back in case there's gonna be uplifted to ESR68.

Flags: qe-verify+

Seems edge-case and risky enough that we can skip uplifting for ESR68.

You need to log in before you can comment on or make changes to this bug.