Closed Bug 1722171 Opened 6 months ago Closed 4 months ago

YouTube videos intermittently stutter

Categories

(Core :: Audio/Video: Playback, defect, P2)

Firefox 91
Desktop
Windows 10
defect

Tracking

()

VERIFIED FIXED
95 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox-esr91 --- verified
firefox92 --- wontfix
firefox93 --- wontfix
firefox94 --- verified
firefox95 --- verified

People

(Reporter: kael, Assigned: alwu)

References

(Regression)

Details

(Keywords: regression, Whiteboard: [media-youtube])

Attachments

(6 files)

Attached file about:support

For the last week or two I've been experiencing frequent video stutters on YouTube, typically one right at the start of playback and a couple over the course of the video. The video display hitches for 200-1500ms, while the audio keeps working, and then it skips the dropped frames and keeps going.

I've verified that this happens on my machine in a fresh profile (within 3 seconds of starting the first video I tried), even at a low quality level (at the time youtube was playing 240p, but I see it on 1080p and 4k videos). Happens on both of my monitors.

I don't think this is associated with a video driver update, though I'll try installing the latest nv drivers because I think there are new ones.

Attaching about:support from the test profile

According to YouTube's 'stats for nerds' the codecs in use are vp9 and opus. It actually seems to happen consistently every time I open a video, even if it's the same one (i.e. it should be in cache). This could be a buffering issue somehow related to EME, but if so I would expect the audio to drop too, and my connection is very fast (gigabit, 288000kbps according to Youtube) so I wouldn't expect buffering issues especially since it sometimes happens during playback later on. When these hitches happen mid-playback I don't see the YouTube buffering indicator either.

I do see some CORS failures in the debug console that seem like they could be related, but I'm not sure:

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://r5---sn-n4v7knls.googlevideo.com/videoplayback?expire=1627166976&ei=oET8YO-MJ-aRsfIP-e-Y4Ak&ip=8.17.89.28&id=o-ANxo-4FFWAfc1gp0xhxs3fsQt7AJ5OVzPZi2E5kaRp26&itag=303&aitags=133%2C134%2C135%2C136%2C160%2C242%2C243%2C244%2C247%2C278%2C298%2C299%2C302%2C303&source=youtube&requiressl=yes&mh=Sh&mm=31%2C29&mn=sn-n4v7snee%2Csn-n4v7knls&ms=au%2Crdu&mv=m&mvi=5&pl=21&ctier=A&pfa=5&initcwndbps=2935000&hightc=yes&vprv=1&mime=video%2Fwebm&ns=4xFjfaRU6iI_qarMSrQGmIkG&gir=yes&clen=601534414&dur=1444.583&lmt=1627104652716481&mt=1627145082&fvip=5&keepalive=yes&fexp=24001373%2C24007246&c=WEB&txp=6432432&n=ZvfXLBpxIZE6vg&sparams=expire%2Cei%2Cip%2Cid%2Caitags%2Csource%2Crequiressl%2Cctier%2Cpfa%2Chightc%2Cvprv%2Cmime%2Cns%2Cgir%2Cclen%2Cdur%2Clmt&sig=AOq0QJ8wRQIgNl9IaYragSZdQqVpHWE_uihv-Wo7v9LkR4LsXz396NMCIQDoRUbxnXR7OPnF5A1r7VAaUe5VwhbFBl7IC5sVG454GA%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRQIgRgqxYBN0ej7IsK9Na151p_Pi7RswvuYRqWVmxvLvgDoCIQCkm7EqUj2roXY0tCn1PeMsE35ik8hC0QZ5kyXZ016Acg%3D%3D&alr=yes&cpn=jOVb_X8owB7yB8Vt&cver=2.20210721.00.00&fallback_count=1&range=1121381-1529131&rn=4&rbuf=2197. (Reason: CORS request did not succeed).

For the last week or two I've been experiencing frequent video stutters on YouTube, typically one right at the start of playback and a couple over the course of the video. The video display hitches for 200-1500ms, while the audio keeps working, and then it skips the dropped frames and keeps going.

So, this issue started only recently? That could imply a regression in Firefox. Did it correspond with an upgraded Firefox version, and if so, can you reproduce it with older ones? If not, please try https://mozilla.github.io/mozregression/ to narrow down when exactly the regression was introduced.

Alternatively, has anything else about your system changed? I don't think this should have anything to do with EME if it's occurring on regular YouTube videos, but you can rule that out by disabling EME: go to about:addons > Plugins and set "Widevine Content Decryption Module" to "Never Activate". Please let me know if that changes anything.

Flags: needinfo?(kg)

Don't they use EME to stitch together segments of all their videos for streaming (adaptive bitrate, etc)?

While it's not 100% reproducible over the long term, mozregression did turn up something of a regression window. I picked a random high-motion clip from my youtube recommendations and picked a timestamp where I knew footage was moving (so a hitch was obvious), https://www.youtube.com/watch?v=77n9fexl8K4&t=3574s, and when a hitch happened it always happened at the same spot (~59:43) a few seconds into playback, which is interesting. Suggests to me that it could have something to do with the video being stitched together at certain points since those points would be the same across sessions in most cases. I did this using the same profile (in clone mode) so YT should be selecting the same quality level, etc.

Sadly there are no log messages in mozregression's log that look related. Are there some prefs or command line flags I should set to try and get more info here?

Tested mozilla-central build: 2021-06-24 (verdict: g)
Tested mozilla-central build: 2021-06-26 (verdict: g)
Tested mozilla-central build: 2021-06-27 (verdict: b)
Tested mozilla-central build: 2021-07-02 (verdict: b)

2021-07-27T22:13:40.286000: INFO : Narrowed integration regression window from [813c6084, 8a74f3d5] (3 builds) to [813c6084, 3f9e9848] (2 builds) (~1 steps left)
2021-07-27T22:13:40.298000: DEBUG : Starting merge handling...
2021-07-27T22:13:40.298000: DEBUG : Using url: https://hg.mozilla.org/integration/autoland/json-pushes?changeset=3f9e98489ae52f5dfc8549d8595910e13ae65c52&full=1
2021-07-27T22:13:40.298000: DEBUG : redo: attempt 1/3
2021-07-27T22:13:40.298000: DEBUG : redo: retry: calling _default_get with args: ('https://hg.mozilla.org/integration/autoland/json-pushes?changeset=3f9e98489ae52f5dfc8549d8595910e13ae65c52&full=1',), kwargs: {}, attempt #1
2021-07-27T22:13:40.300000: DEBUG : urllib3.connectionpool: Resetting dropped connection: hg.mozilla.org
2021-07-27T22:13:40.782000: DEBUG : urllib3.connectionpool: https://hg.mozilla.org:443 "GET /integration/autoland/json-pushes?changeset=3f9e98489ae52f5dfc8549d8595910e13ae65c52&full=1 HTTP/1.1" 200 None
2021-07-27T22:13:40.887000: DEBUG : Found commit message:
Bug 1703692 - Use ChildOffset to build the offset path for flushes, r=nika

We can't rely on the Children list since it may have been cleared on shutdown.
Since we don't clear parent edges, walking the parent chain and using
mChildOffset works.

Differential Revision: https://phabricator.services.mozilla.com/D118384

2021-07-27T22:13:40.887000: DEBUG : Did not find a branch, checking all integration branches
2021-07-27T22:13:40.889000: INFO : The bisection is done.
2021-07-27T22:13:40.891000: INFO : Stopped

Flags: needinfo?(kg)

I ran another mozregression over that general time window just to be sure, and the results are slightly different. In case it's relevant, my repro steps are to let it launch that URL with my fixed profile (I created a blank profile earlier when testing this, so it's a default profile logged into my YouTube account) and then click the 'theater mode' icon which expands the video view and begins playing automatically. I always use theater mode (but I doubt that has any impact here). I think the fact that this isn't 100% means the time window is going to be slightly unreliable no matter how much i mozregression it, but it's nice that it always hitches at the same spot for me.

Tested mozilla-central build: 2021-06-30 (verdict: g)
Tested mozilla-central build: 2021-07-01 (verdict: b)
...
Tested autoland build: 7b43d864 (verdict: b)
Tested autoland build: 2e7e91c0 (verdict: g)

2021-07-27T22:20:41.095000: INFO : Narrowed integration regression window from [bbd7f5ee, 7b43d864] (3 builds) to [2e7e91c0, 7b43d864] (2 builds) (~1 steps left)
2021-07-27T22:20:41.111000: DEBUG : Starting merge handling...
2021-07-27T22:20:41.112000: DEBUG : Using url: https://hg.mozilla.org/integration/autoland/json-pushes?changeset=7b43d8647aa46af5edf47cb1fb70685adf1d6331&full=1
2021-07-27T22:20:41.113000: DEBUG : redo: attempt 1/3
2021-07-27T22:20:41.113000: DEBUG : redo: retry: calling _default_get with args: ('https://hg.mozilla.org/integration/autoland/json-pushes?changeset=7b43d8647aa46af5edf47cb1fb70685adf1d6331&full=1',), kwargs: {}, attempt #1
2021-07-27T22:20:41.118000: DEBUG : urllib3.connectionpool: Resetting dropped connection: hg.mozilla.org
2021-07-27T22:20:41.719000: DEBUG : urllib3.connectionpool: https://hg.mozilla.org:443 "GET /integration/autoland/json-pushes?changeset=7b43d8647aa46af5edf47cb1fb70685adf1d6331&full=1 HTTP/1.1" 200 None
2021-07-27T22:20:41.799000: DEBUG : Found commit message:
Bug 1717185 - Fix mBufferedAmount and return value of send in TCPSocket. r=valentin

Previously, mBufferedAmount is only updated when copy complete. If I keep calling send, NotifyCopyComplete is only called after the last send.

This patch changes to update mBufferedAmount in every send call, so that it returns false correctly if buffer is full.

Differential Revision: https://phabricator.services.mozilla.com/D118258

2021-07-27T22:20:41.799000: DEBUG : Did not find a branch, checking all integration branches
2021-07-27T22:20:41.800000: INFO : The bisection is done.
2021-07-27T22:20:41.802000: INFO : Stopped

Attached video Screen recording of bug

Ignore most of the previous two comments because I refined my testing approach... This is definitely intermittent. I did some desktop recording while reproducing it on a specific build (using mozregression's 'retry' option) and while it happened 3 out of 3 times, the third time happened at a later time offset in the video. I started doing 3 retries of any "good" build before continuing the bisect to try and avoid flagging a bad one as good just because it worked once, which moved the time window backward. I'll stop messing around for now, we at least know it's in this general area of the timeline I think. I imagine with some detailed logging or config flags it's possible to bisect this more accurately.

I should also note that some of the 'good' builds I'm hitting still have a problem on this video, which is that they often drop a few frames at the point where 'bad' builds hitch up. According to YouTube's Stats For Nerds, FF is dropping like 7-14 frames of the video in the first 700 frames, and they all come around where the bad builds hitch up. Other attempts no frames are dropped though. So it's possible this is a latent issue that got worse.

I'm attaching a screen recording of me reproducing the bug during mozregression. You can see the framedrop stats in the corner of the playback viewport (from youtube's debug overlay). Obviously in this case it's dropping tons of frames, but at that point in good builds it often drops some frames there (other runs it drops 0). I saw a hitch a few moments later in one test run instead of at this point.

Since hardware decoding is enabled and there are frame drops showing up here sometimes even in "good" builds I suppose this could be impacted by things like the clock of my GPU, current system load, etc. Hopefully not. I did verify that unlike some past video playback issues I've had, this occurs on either of my monitors.


Tested mozilla-central build: 2021-06-21 (verdict: r)
Tested mozilla-central build: 2021-06-21 (verdict: r)
Tested mozilla-central build: 2021-06-21 (verdict: g)
Tested mozilla-central build: 2021-06-23 (verdict: b)
Tested mozilla-central build: 2021-06-22 (verdict: r)
Tested mozilla-central build: 2021-06-22 (verdict: r)
Tested mozilla-central build: 2021-06-22 (verdict: g)
...
Tested autoland build: 997eb0fe (verdict: r)
Tested autoland build: 997eb0fe (verdict: r)
Tested autoland build: 997eb0fe (verdict: g)
Tested autoland build: 7c64fbd1 (verdict: r)
Tested autoland build: 7c64fbd1 (verdict: r)
Tested autoland build: 7c64fbd1 (verdict: g)
Tested autoland build: ed020228 (verdict: b)

(In reply to Katelyn Gadd (:kael) from comment #3)

Don't they use EME to stitch together segments of all their videos for streaming (adaptive bitrate, etc)?

I don't think so. EME is typically only used for encrypted content (Netflix, Hulu, etc.). I don't have any issues watching free YouTube videos (including ads) with widevine disabled.

Alastor, I think you have some familiarity with this area. Any thoughts?

Flags: needinfo?(alwu)

Hi, Sorry for my late reply.
Would you mind to capture the log by using MOZ_LOG=MediaDecoder:5,MediaFormatReader:5? In addition, would you mind to download the Firefox Nightly and see if this issue still exists on the latest version?
Thank you so much.

Severity: -- → S4
Flags: needinfo?(alwu) → needinfo?(kg)
Priority: -- → P3
Whiteboard: [media-youtube]

This bug was gone for a while, but it's back now. Just figured I'd call that out, I'll grab nightly and try to record a log. The fact that it seemed to go away and come back suggests it could definitely be something to do with youtube's frontend or backend

Attached file Temp.zip

Here's a log, the hitch happened near the end of the run. I waited until it recovered and used 'Exit' to close Firefox Nightly.

I'll note that while looking at youtube's stats for nerds, I noticed that the 'frames dropped' instantly jumps up by 200+ frames as soon as the pause starts, instead of climbing over time or climbing once the pause ends. This suggests that somehow YT finds out that we're about to drop a ton of frames (and finds out exactly how many) before the hitch is even over.

Flags: needinfo?(kg)

i think i has same problem too. i used firefox 94.0a1. i can't find way to reproduce this. because if i restart firefox problem will gone for a while.
this is profiler that i took when i encounter this problem.
https://share.firefox.dev/3nrolRE
https://share.firefox.dev/3C5jBoK

in profiler if you go to Renderer thread at GPU process you can see that when video freeze (but audio still play normally).
it has no CPU usage from Graphic category (no blue / green legend)
and if you click at Renderer thread and click Marker Chart. you can see that CompositorScreenshot (700ms - 2000ms) is usage too long (When video still play normally it usage around 16.4 - 16.8 ms)

Attached image gpu.png

and when video freeze GPU 3D and Video Decoder usage will drop to zero too.

By checking the profiled result in comment11, it seems like bug1692881 jumping in. So what bug1692881 did is, if we detect the incoming decoded video frame is already later than the audio clock, which means the video decoding is too slow and couldn't catch up with the audio that would cause A/V unsync, then we would seek to the next key frame in order to decode the future video frame in advance.

When the problem happens, will it happen on almost every videos? or it would only affect some videos? and what's the frequency of this problem? Is it easily happening in a regular interval, or it happens without an expectable pattern?

Thank you.

Flags: needinfo?(mix5003)

no when problem happens i try another video and it not happen to other video (i check codec it same codex vp9 (303) and opus (251) in stats for nerd) and actually in same video when i refresh problem may gone too.

in 3 days i think i found this issue at least 1 time. it not so frequently. as far as i known it so randomly. i dind't find step to reproduce.

Flags: needinfo?(mix5003)

today my internet sucks. and i found something that you might interest.

i think this issue most likely happen when video switch resolution.
i didn't take video but you can seen in Katelyn video that when video resume current resolution change (you can see in stats for nerd in "Current / Optimal Res")

but i still not sure because i try to set video to 144p and set back to auto it can not reproduce this, so it may need another condition to make it happen too.

i think i can create a reproduce step. can you try it?
1. open https://ishare.in.th/tmp/bug-1729281/new.html and wait until start button appear
2. click start button and watch the video  (please don't speed up or seek in this page. this page should load around 30s)
3. wait until video play to 0:15 - 0:20  you can see that video stop playing but you can hear audio normally

i try mozregression to find a commit but i not sure is this commit relevant
```
2021-09-23T17:58:32.365000: INFO : application_buildid: 20210623203346
2021-09-23T17:58:32.366000: INFO : application_changeset: ed0202284d7e59612f8f64f58df913a293381d7f
2021-09-23T17:58:32.366000: INFO : application_display_name: Firefox Nightly
2021-09-23T17:58:32.366000: INFO : application_id: {ec8030f7-c20a-464f-9b0e-13a3a9e97384}
2021-09-23T17:58:32.366000: INFO : application_name: Firefox
2021-09-23T17:58:32.366000: INFO : application_remotingname: firefox
2021-09-23T17:58:32.366000: INFO : application_repository: https://hg.mozilla.org/integration/autoland
2021-09-23T17:58:32.366000: INFO : application_vendor: Mozilla
2021-09-23T17:58:32.366000: INFO : application_version: 91.0a1
2021-09-23T17:58:32.366000: INFO : platform_buildid: 20210623203346
2021-09-23T17:58:32.366000: INFO : platform_changeset: ed0202284d7e59612f8f64f58df913a293381d7f
2021-09-23T17:58:32.366000: INFO : platform_repository: https://hg.mozilla.org/integration/autoland
2021-09-23T17:58:32.366000: INFO : platform_version: 91.0a1
2021-09-23T17:58:58.923000: INFO : Narrowed integration regression window from [7c64fbd1, 1f8cbba9] (3 builds) to [7c64fbd1, ed020228] (2 builds) (~1 steps left)
2021-09-23T17:58:58.928000: DEBUG : Starting merge handling...
2021-09-23T17:58:58.928000: DEBUG : Using url: https://hg.mozilla.org/integration/autoland/json-pushes?changeset=ed0202284d7e59612f8f64f58df913a293381d7f&full=1
2021-09-23T17:58:58.929000: DEBUG : redo: attempt 1/3
2021-09-23T17:58:58.929000: DEBUG : redo: retry: calling _default_get with args: ('https://hg.mozilla.org/integration/autoland/json-pushes?changeset=ed0202284d7e59612f8f64f58df913a293381d7f&full=1',), kwargs: {}, attempt #1
2021-09-23T17:58:58.933000: DEBUG : urllib3.connectionpool: Resetting dropped connection: hg.mozilla.org
2021-09-23T17:59:00.677000: DEBUG : urllib3.connectionpool: https://hg.mozilla.org:443 "GET /integration/autoland/json-pushes?changeset=ed0202284d7e59612f8f64f58df913a293381d7f&full=1 HTTP/1.1" 200 None
2021-09-23T17:59:00.708000: DEBUG : Found commit message:
Bug 1716474 - remove the assertion for checking whether `aTimeThreshold` is zero or positive. r=bryce

Reader proxy will adjust the time threshold [1] based on the media's start time that is retrieved from metadata, which is possible to be negative. Eg. [2]

Therefore, we should remove that assetion.

[1] https://searchfox.org/mozilla-central/rev/308ea44d0d60b391b031ccee695920bd543f7d2f/dom/media/ReaderProxy.cpp#75-77

[2] https://searchfox.org/mozilla-central/rev/308ea44d0d60b391b031ccee695920bd543f7d2f/dom/media/autoplay/test/mochitest/AutoplayTestUtils.js#7

Differential Revision: https://phabricator.services.mozilla.com/D118403

2021-09-23T17:59:00.709000: DEBUG : Did not find a branch, checking all integration branches
2021-09-23T17:59:00.714000: INFO : The bisection is done.
2021-09-23T17:59:00.718000: INFO : Stopped```

```

i think it most likely from bug1692881. my example original from https://www.youtube.com/watch?v=n3U1-2UhZZc.
and i think i can create page that can replicate it https://ishare.in.th/tmp/bug-1729281/new.html.

i use command for check keyframe times. the result is each keyframe has 5.12 second long.

$ ffprobe -show_frames -show_entries frame=key_frame,pkt_pts_time -print_format csv  -i n3U1-2UhZZc-303.webm | grep "^frame,1"
ffprobe version 4.2.4-1ubuntu0.1 Copyright (c) 2007-2020 the FFmpeg developers
...
Input #0, matroska,webm, from 'n3U1-2UhZZc-303.webm':
  Metadata:
    encoder         : google/video-file
  Duration: 00:07:20.24, start: 0.000000, bitrate: 1651 kb/s
    Stream #0:0(eng): Video: vp9 (Profile 0), yuv420p(tv, bt709), 1920x1080, SAR 1:1 DAR 16:9, 50 fps, 50 tbr, 1k tbn, 1k tbc (default)
frame,1,0.000000
frame,1,5.120000
frame,1,10.240000
frame,1,15.360000
frame,1,20.480000
frame,1,25.600000
frame,1,30.720000

and from the log that i can get

...
[Child 71652: MediaSupervisor #2]: V/MediaFormatReader MediaFormatReader[7f29851da000] ::RequestDemuxSamples: Requesting extra demux Video
[Child 71652: MediaSupervisor #2]: V/MediaFormatReader MediaFormatReader[7f29851da000] ::OnVideoDemuxCompleted: 1 video samples demuxed (sid:2)
[Child 71652: MediaSupervisor #2]: V/MediaFormatReader MediaFormatReader[7f29851da000] ::ScheduleUpdate: SchedulingUpdate(Video)
[Child 71652: MediaSupervisor #2]: V/MediaFormatReader MediaFormatReader[7f29851da000] ::Update: Processing update for Video
[Child 71652: MediaSupervisor #2]: V/MediaFormatReader MediaFormatReader[7f29851da000] ::Update: Update(Video) ni=1 no=1 in:102 out:99 qs=3 decoding:0 flushing:0 desc:MediaChangeMonitor decoder (pending) pending:0 waiting:0 eos:0 ds:0 sid:2 waitcdm:0
[Child 71652: MediaSupervisor #2]: V/MediaFormatReader MediaFormatReader[7f29851da000] ::HandleDemuxedSamples: Input:17400000 (dts:17400000 kf:0)
[Child 71652: MediaSupervisor #1]: V/MediaFormatReader MediaFormatReader[7f29851da000] ::NotifyNewOutput: Received new Video sample time:17340000 duration:20000
[Child 71652: MediaSupervisor #1]: D/MediaFormatReader MediaFormatReader[7f29851da000] ::NotifyNewOutput: Done processing new Video samples
[Child 71652: MediaSupervisor #1]: V/MediaFormatReader MediaFormatReader[7f29851da000] ::ScheduleUpdate: SchedulingUpdate(Video)
[Child 71652: MediaSupervisor #1]: V/MediaFormatReader MediaFormatReader[7f29851da000] ::Update: Processing update for Video
[Child 71652: MediaSupervisor #1]: D/MediaFormatReader MediaFormatReader[7f29851da000] ::ReturnOutput: Resolved data promise for Video [17340000, 17360000]
[Child 71652: MediaSupervisor #1]: D/MediaFormatReader MediaFormatReader[7f29851da000] ::ReturnOutput: change of video display size (1280x720->1920x1080)
[Child 71652: MediaSupervisor #1]: V/MediaFormatReader MediaFormatReader[7f29851da000] ::Update: Update(Video) ni=0 no=1 in:103 out:100 qs=3 decoding:0 flushing:0 desc:MediaChangeMonitor decoder (pending) pending:0 waiting:0 eos:0 ds:0 sid:2 waitcdm:0
[Child 71652: MediaSupervisor #1]: V/MediaFormatReader MediaFormatReader[7f29851da000] ::Update: No need for additional input (pending:0)
[Child 71652: MediaDecoderStateMachine #1]: V/MediaDecoder MediaDecoderStateMachine[7f29850cc000] Decoder=7f2986dca000 OnVideoDecoded [17340000,17360000]
[Child 71652: MediaDecoderStateMachine #1]: D/MediaDecoder MediaDecoderStateMachine[7f29850cc000] state=DECODING video 17360000 is too late (current=17513270)
[Child 71652: AudioIPC0]: V/MediaDecoder AudioSink=7f297aa0ac00 playing audio at time=17734500 offset=629 length=331
[Child 71652: AudioIPC0]: V/MediaDecoder AudioSink=7f297aa0ac00 playing audio at time=17754500 offset=0 length=960
[Child 71652: AudioIPC0]: V/MediaDecoder AudioSink=7f297aa0ac00 playing audio at time=17774500 offset=0 length=308
[Child 71652: MediaDecoderStateMachine #1]: V/MediaDecoder MediaDecoderStateMachine[7f29850cc000] Decoder=7f2986dca000 Queueing audio task - queued=73, decoder-queued=0
[Child 71652: MediaDecoderStateMachine #1]: V/MediaDecoder MediaDecoderStateMachine[7f29850cc000] Decoder=7f2986dca000 Queueing video task - queued=2, decoder-queued=3, stime=17513270
[Child 71652: MediaDecoderStateMachine #1]: V/MediaDecoder AudioSink=7f297aa0ac00 AudioStream has used an audio packet.
[Child 71652: MediaDecoderStateMachine #1]: V/MediaDecoder AudioSink=7f297aa0ac00 AudioStream has used an audio packet.
[Child 71652: MediaSupervisor #2]: V/MediaFormatReader MediaFormatReader[7f29851da000] ::RequestAudioData: 
[Child 71652: MediaSupervisor #2]: V/MediaFormatReader MediaFormatReader[7f29851da000] ::ScheduleUpdate: SchedulingUpdate(Audio)
[Child 71652: MediaSupervisor #2]: V/MediaFormatReader MediaFormatReader[7f29851da000] ::RequestVideoData: RequestVideoData(17513270), requestNextKeyFrame=1
[Child 71652: MediaSupervisor #2]: D/MediaFormatReader MediaFormatReader[7f29851da000] ::SkipVideoDemuxToNextKeyFrame: Skipping up to 20480000
[Child 71652: MediaDecoderStateMachine #1]: V/MediaDecoder VideoSink=7f2985320880 discarding video frame mTime=17340000 clock_time=17535770
[Child 71652: MediaSupervisor #2]: D/MediaFormatReader MediaFormatReader[7f29851da000] ::OnVideoSkipCompleted: Skipping succeeded, skipped 153 frames
[Child 71652: MediaSupervisor #2]: D/MediaFormatReader MediaFormatReader[7f29851da000] ::Reset: Reset(Video) BEGIN
[Child 71652: MediaSupervisor #2]: D/MediaFormatReader MediaFormatReader[7f29851da000] ::Reset: Reset(Video) END
[Child 71652: MediaSupervisor #2]: V/MediaFormatReader MediaFormatReader[7f29851da000] ::ScheduleUpdate: SchedulingUpdate(Video)
...

i think problem this problem may happen when i try to appendBuffer with different video dimension. that make DECODING video is too late at 17513270 and it will skip to closest keyframe. but closest keyframe is 20480000. so it freeze about 3 second before next keyframe come.

i think the problem is ShouldSkip function. because i goto about:config and set media.decoder.skip-to-next-key-frame.enabled to false. it can reduce stuttering.
https://searchfox.org/mozilla-central/rev/f953948db2ac26698487f5fc5c03046c29922f26/dom/media/MediaFormatReader.cpp#1322

Duplicate of this bug: 1731709
Duplicate of this bug: 1731437

It seems that we have to further relax the constraint of skipping to next keyframe. If we completely remove it, then it would make some 4K video almost frozen all the time during playback. We had relax that constraint once in bug1717171 before, but it seems that wasn't enough.

Assignee: nobody → alwu
Severity: S4 → S3
Priority: P3 → P2

(In reply to mix5003 from comment #16)

i think i can create a reproduce step. can you try it?

  1. open https://ishare.in.th/tmp/bug-1729281/new.html and wait until start
    button appear
  2. click start button and watch the video (please don't speed up or seek in
    this page. this page should load around 30s)
  3. wait until video play to 0:15 - 0:20 you can see that video stop playing
    but you can hear audio normally

Thank you, yes I can reproduce the issue on this page as well!

Just to confirm, the repro from #16 is identical to the behavior I see on youtube, and it repros here too.

Here is a profiled result I captured while reproducing the issue on comment16. Around 6.5s to 7.5s, check the MDSM thread, then we can see that there is a request-data (see marker) taking very very long time to complete (0.469s for one video frame!)

Before that, we also did some data trimming for MSE on MediaSupervisor#1. During that 0.469s, I did see MFR already got many video frames (see marker, MediaSupervisor thread), but it didn't return any video frame to MDSM, which is weird. For the investigation so far, the problem might be in the MFR, which didn't give video frames to MDSM faster enough so that triggering the skip-to-next-keyframe. I'm still going to dig this deeper, and will update more information later.

Duplicate of this bug: 1732820
Duplicate of this bug: 1732602
Duplicate of this bug: 1732609

In bug1692881, we introduced a mechanism to skip to next key frame if video decoding is too slow, which aims to help slow decoding issue on high resolution videos. Eg. 4k+.

However, there are some cases which would also lead to slow decoding. Eg. when internal seek (seek on one track only) happens on a place where we can't seek to directly, we need to seek to the previous keyframe and then decode all of them until we get the target frame.

That could lead to spending huge amount of time if the target frame is far from the keyframe, and causes decoded video frame later than the media time because audio won't stop playing during video seek. However, in above example, there would only be few frames later than the media time, and the video will soon catch up with the time.

As we only want to trigger this mechanism for continually slow decoding problem, and completely removing this mechanism would lead us fallback to the situation where some 4K+ videos would be frozen all the time during the playback.

In this patch, we introduce a method which still keep this mechanism but won't perform that as aggressive as before, this uses the time threshold to seek to the next keyframe when video has been late for too long.

I'm fairly late to the party, but FWIW and clarity, I think the instances of EME (encrypted media extensions) in some of the comments here may be meant to refer to MSE (media source extensions). MSE is indeed the tech used for adaptive streaming.

Pushed by alwu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/27327612fb66
use time threshold to determine if we need to skip to next key frame. r=bryce
Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → 95 Branch

Hi, Would you might help me try the latest Nightly and see if the issue gets fixed?
Thank you!

Flags: needinfo?(mix5003)
Duplicate of this bug: 1733817

i think this issue is already fixed. today i not found this issue on youtube.

Flags: needinfo?(mix5003)

Comment on attachment 9243350 [details]
Bug 1722171 - use time threshold to determine if we need to skip to next key frame.

Beta/Release Uplift Approval Request

  • User impact if declined: Video playback might be stutter a couple seconds (3 to 5 seconds) while changing the resolution.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: See comment16.
    Note. video would still be frozen a little bit (less than one second) in that example, but way shorter than before
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This patch doesn't introduce any new mechanism, but modify the rate of triggering the current mechanism of skipping to next key frame. It was too easy to tirgger that before (higher than our expectation, and those situtation are actually not fit our expectation) so it causes the stuttering.

If we trigger that mechanism less, the worst case is that users will stilll see the longe frozen video frames, if they have very very slow machine (but that case is way more rare considering current stuttering cases)

  • String changes made/needed: no
Attachment #9243350 - Flags: approval-mozilla-beta?
Flags: qe-verify+

Comment on attachment 9243350 [details]
Bug 1722171 - use time threshold to determine if we need to skip to next key frame.

Approved for 94.0b3. We may want to get this onto ESR91 eventually too, but we can afford to wait a bit first and let it bake more.

Attachment #9243350 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
QA Whiteboard: [qa-triaged]

Reproduced the issue in Release 92, using Win10.
Verified - Fixed in Beta 94.0b3 (build id: 20211007185900) and latest Nightly 95.0a1 (build id: 20211007215152). Indeed there is still a small freeze but less than one second.

Duplicate of this bug: 1735302

Please nominate this for ESR91 uplift.

Flags: needinfo?(alwu)

Comment on attachment 9243350 [details]
Bug 1722171 - use time threshold to determine if we need to skip to next key frame.

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Video playback might be stutter a couple seconds (3 to 5 seconds) while changing the resolution.
  • User impact if declined: As above.
  • Fix Landed on Version: 94
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This patch doesn't introduce any new mechanism, but modify the rate of triggering the current mechanism of skipping to next key frame. It was too easy to trigger that before (higher than our expectation, and those situtation are actually not fit our expectation) so it causes the stuttering.
  • String or UUID changes made by this patch: No
Flags: needinfo?(alwu)
Attachment #9243350 - Flags: approval-mozilla-esr91?

Comment on attachment 9243350 [details]
Bug 1722171 - use time threshold to determine if we need to skip to next key frame.

Approved for 91.4esr.

Attachment #9243350 - Flags: approval-mozilla-esr91? → approval-mozilla-esr91+

Verified - Fixed in ESR 91.4.0 (build id: 20211111144221). There is still a small freeze but less than one second.

Status: RESOLVED → VERIFIED
QA Whiteboard: [qa-triaged]
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.