Open Bug 1820370 Opened 2 years ago Updated 3 months ago

Auto scrolling on a YouTube 4K video stutters

Categories

(Core :: Graphics, defect, P3)

Firefox 112
Desktop
Windows
defect

Tracking

()

ASSIGNED

People

(Reporter: nissan4321, Assigned: sotaro)

References

(Blocks 2 open bugs, )

Details

Attachments

(1 file)

Steps to reproduce:

  1. Open this 4K video on YouTube: https://www.youtube.com/watch?v=MMb7WfxKxp0
    Leave it in a Default view.
  2. Change the video to 4K60 fps and start the video
  3. Try to auto scroll up and down while the video is playing - the scrolling stutters

Please mind the following:

  1. I have tried this with a new profile and it's the same results.
  2. If I set media.hardware-video-decoding.enabled=false there are less stuttering, but some are still there while auto scrolling so it's not entirely related to hardware decoding.

Actual results:

Auto scrolling stutters while the video is playing

Expected results:

Auto scrolling should have been smooth as it is without the video playing

  1. Please see the following profiles:
  1. I am not sure if this is somehow related to Bug 1477170 or not

The Bugbug bot thinks this bug should belong to the 'Core::Audio/Video: Playback' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Audio/Video: Playback
Product: Firefox → Core

By the way, Chrome Canary auto scrolling this page smoothly without any hiccup when disabling Hardware decoding, so there must be something beyond the hardware decoding which affects the auto scrolling of this page.

Attached file about_support.txt —

Attaching about:support page

OS: Unspecified → Windows
Hardware: Unspecified → Desktop

More info:

  1. Per https://forums.mozillazine.org/viewtopic.php?p=14951224#p14951224: Enabling hardware acceleration: media.hardware-video-decoding.enabled true and set gfx.webrender.dcomp-video-overlay-win to false - There are still stutters.
  2. I have been told that a recent similar issue was fixed regarding scrolling on YouTube, and it might be related to the fact that I am not using the recent drivers (but these are the official supported drivers by Microsoft that are downloaded via Windows Update on Windows 10 - so there are probably a lot of users like myself with these drivers).
    I am reluctant to update them because I have other bugs that made me revert to this version before - so if anyone can test if the latest drivers are OK against the NVIDIA 516.94 drivers would be great.

(In reply to Mikel from comment #5)

More info:

  1. Per https://forums.mozillazine.org/viewtopic.php?p=14951224#p14951224: Enabling hardware acceleration: media.hardware-video-decoding.enabled true and set gfx.webrender.dcomp-video-overlay-win to false - There are still stutters.
  2. I have been told that a recent similar issue was fixed regarding scrolling on YouTube, and it might be related to the fact that I am not using the recent drivers (but these are the official supported drivers by Microsoft that are downloaded via Windows Update on Windows 10 - so there are probably a lot of users like myself with these drivers).
    I am reluctant to update them because I have other bugs that made me revert to this version before - so if anyone can test if the latest drivers are OK against the NVIDIA 516.94 drivers would be great.

Correction for #2 - Looks like the latest NVIDIA drivers from Windows Update for Windows 10 are actually 526.98 (and not 516.94 like I mentioned) , but for some mysterious reason I can't understand, Windows Update doesn't find them when searching for updates - so this might be an issue on my end's computer for what it worth.

Blocks: video-perf

Hi Mikel, thank you for reporting. Can you take Firefox profiler with "Media" settings and with media.hardware-video-decoding.enabled=true ?

Flags: needinfo?(nissan4321)

Here is a profile recording of the "Media" preset with "media.hardware-video-decoding.enabled=true":
https://share.firefox.dev/3mvMnMY

I have started the recording and then afterwards started to auto scrolling the page down, then started to auto scrolling upwards and when got the video in view, I have stopped the video and then the profiler.

Flags: needinfo?(nissan4321)

The media part in the profiled result in comment 8 looks very normal, Firefox didn't drop any video frame and the video decoding was fast enough. However, when checking the Compositor thread in the GPU process, I see there is a gap of UpdateCompositedFrame, does that mean the compositor stopped compositing at that time?

Also, if the stuttering only happens during scolling, this looks more like a gfx issue to me. Move this to gfx component, and feel free to move it back if you determine this is a media issue again. Thanks.

Blocks: gfx-triage
Severity: -- → S3
Component: Audio/Video: Playback → Graphics
Flags: needinfo?(sotaro.ikeda.g)
Priority: -- → P3

(In reply to Alastor Wu [:alwu] from comment #9)

The media part in the profiled result in comment 8 looks very normal, Firefox didn't drop any video frame and the video decoding was fast enough. However, when checking the Compositor thread in the GPU process, I see there is a gap of UpdateCompositedFrame, does that mean the compositor stopped compositing at that time?

Also, if the stuttering only happens during scolling, this looks more like a gfx issue to me. Move this to gfx component, and feel free to move it back if you determine this is a media issue again. Thanks.

Thx, it does seem like the painting for some reason become super slow on YouTube videos as you go up with resolution, and specifically it manifests on 60FPS videoss.

Do you need me to provide any more details to try and debug this?

Thank you. Can you check if the problem happens with the following prefs with latest nightly?

  • pref gfx.webrender.compositor = false
  • pref media.hardware-video-decoding.enabled = true
Flags: needinfo?(sotaro.ikeda.g) → needinfo?(nissan4321)

(In reply to Sotaro Ikeda [:sotaro] from comment #11)

Thank you. Can you check if the problem happens with the following prefs with latest nightly?

  • pref gfx.webrender.compositor = false
  • pref media.hardware-video-decoding.enabled = true

Unfortunately the issue is still reproducible with these prefs on the latest nightly.

Flags: needinfo?(nissan4321)

@botond: Is auto-scrolling APZ?

Flags: needinfo?(botond)

I'll try repro - I can partially confirm already on Intel 11th gen with 4K60 and will try on RTX 3080 Ti with high refresh rate.

Flags: needinfo?(ahale)

So far I can reproduce this on the 360hz monitor reliably with NVIDIA RTX 3080 Ti or AMD RX 6900 XT, I have not tried hooking up the Intel laptop to that monitor.

Having the 360hz monitor enabled and running at 360hz causes frame time in the WR profiler to spike as high as 31ms regularly, regardless of which monitor the browser is on, and regardless of whether the 360hz monitor is the primary monitor or secondary, this mostly happens on that video in particular.

With only 60hz monitors (1920x1080@60, 2560x1600@60, 2560x1600@60, or even just the 1920x1080@60 by itself) the WR profiler shows frame times of 0.67ms (max 1.54ms).

With a 360hz monitor in the mix (1920x1080@360, 2560x1600@60, 2560x1600@60), WR profiler says average 3.7ms (max 23ms), sometimes the max spikes as high as 31ms or 64ms.

It doesn't seem to matter whether I have the compositor enabled or disabled, and does not seem to matter whether I have zero-copy video enabled.

Recorded a profile of scrolling up and down with the video visible at all times using auto-scroll on the 360hz monitor on the AMD RX 6900 XT - https://share.firefox.dev/3z4DLzO

Recorded a profile of scrolling on the NVIDIA RTX 3080 Ti for comparison, with the monitor set to 144hz to more closely match the report - this seems to have very different contention over EnterCriticalSection if I am reading correctly, vs waiting for the compositor notification channel on AMD. https://share.firefox.dev/3K5EcjZ

(In reply to Kelsey Gilbert [:jgilbert] from comment #13)

@botond: Is auto-scrolling APZ?

Yes, auto-scrolling is supported in APZ, so during auto-scrolling the scroll offset that WebRender samples from APZ every frame in update_document() should be changing.

Which is just to say that looking at WR frame times seems to be barking up the right tree.

Flags: needinfo?(botond)

@ahale, thx for testing and confirming this!

@sotaro,
Is there anything that needs be dug deeper?
How can we move forward with fixing it?

Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(ahale)

This looks very similar to https://bugzilla.mozilla.org/show_bug.cgi?id=1749647 so this may be a bug that has been around for a year.

I'll look into gpuview and etw for analyzing the lock contention and finding out what stack unlocked these locks. Will sync with Jeff on how to use these tools.

Flags: needinfo?(ahale)

I can reproduce this in a variety of scenarios, it actually makes windows other than the youtube one also stutter when scrolling (autoscroll, mouse wheel scroll, arrow keys). I'll look into the gpuview and etw tools soon.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(ahale)

I've explored this in gpuview (part of the Windows Performance Toolkit) and found that it looks like displaylists are not being built for the scrolling when the stutters occur, so WebRender isn't given anything to do, as far as I can tell there is something holding a lock for an extended period of time, since the ANGLE code is using a time limit on the wait it isn't entirely clear what thread is unlocking the lock it is waiting on as the time doesn't line up.

Some interesting properties of this bug:

  • Having a 4K60 video playing in any window causes stuttering in scrolling in all windows.
  • If two monitors have a common factor in their refresh rates, the stuttering is rare or unobvious, for example:
    • 60hz and 360hz is fine most all of the time, but can occasionally get out of phase for a while and cause stuttering of nearly two frames at 60hz where the 60hz is missing its beat.
    • 60hz and 144hz is always stuttering, these frequencies almost never match up and it can cause stutters on both monitors (it's not simply waiting on the wrong monitor or something like that)

I think this bug is related to the artifacting we see on Windows 10 with NVIDIA GPUs ( https://bugzilla.mozilla.org/show_bug.cgi?id=1638709 ), but something is preventing the artifacts on Windows 11, instead we get this out of phase anomaly causing stuttering.

(In reply to Ashley Hale [:ahale] from comment #23)

I've explored this in gpuview (part of the Windows Performance Toolkit) and found that it looks like displaylists are not being built for the scrolling when the stutters occur, so WebRender isn't given anything to do, as far as I can tell there is something holding a lock for an extended period of time, since the ANGLE code is using a time limit on the wait it isn't entirely clear what thread is unlocking the lock it is waiting on as the time doesn't line up.

Some interesting properties of this bug:

  • Having a 4K60 video playing in any window causes stuttering in scrolling in all windows.
  • If two monitors have a common factor in their refresh rates, the stuttering is rare or unobvious, for example:
    • 60hz and 360hz is fine most all of the time, but can occasionally get out of phase for a while and cause stuttering of nearly two frames at 60hz where the 60hz is missing its beat.
    • 60hz and 144hz is always stuttering, these frequencies almost never match up and it can cause stutters on both monitors (it's not simply waiting on the wrong monitor or something like that)

I think this bug is related to the artifacting we see on Windows 10 with NVIDIA GPUs ( https://bugzilla.mozilla.org/show_bug.cgi?id=1638709 ), but something is preventing the artifacts on Windows 11, instead we get this out of phase anomaly causing stuttering.

Hey Ashley,
Thanks for the drill down of your testing, but this bug is for Windows 10 (without any artifacts) with an NVIDIA GPU and a single 144hz screen.
I guess that this issue might be related somehow to the multi monitor bug you mentioned (https://bugzilla.mozilla.org/show_bug.cgi?id=1638709) as well, but the fact that it's reproducible with a single screen and on Windows 10 without artifacts means that it's not the same as that bug, right?

Does this bug manifests differently on Windows 11 than on Windows 10?
I don't have a Windows 11 machine ATM to check unfortunately.

Flags: needinfo?(ahale)

I will attempt repro on single 144hz screen on Windows 10 then when I get a chance (this isn't my top priority at the moment) and analyze it in gpuview.

I was able to get the bug to go away by turning on the pref media.wmf.zero-copy-nv12-textures-force-enabled in about:config, so I think this is possibly a thread holding a lock and waiting for a GPU fence which then blocks other threads trying to grab that lock, Sotaro would know more about that, since it goes away with his new zero copy implementation.

There seemed to be too many GPU-related tasks during the STR. Tasks during the STR could be roughly categorized like the followings.

  • [1] Hardware video decoding
  • [2] video frame copy
  • [3] Sync object in RenderCompositorANGLE::BeginFrame()
  • [4] GPU tasks during scrolling at WebRender
  • [5] Video rendering GPU tasks at WebRender
  • [6] Access GPU Query in RenderCompositorANGLE::GetLastCompletedFrameId() and in RenderCompositorANGLE::WaitForPreviousGraphicsCommandsFinishedQuery()
  • [7] mDevice->GetDeviceRemovedReason() in RenderCompositorANGLE::IsContextLost()

[2] could be removed by zero video frame copy.
It seems that [3] could be reduced.
I am not sure if [4] could be reduced.

Flags: needinfo?(sotaro.ikeda.g)

:gw, :nical, is it possible to reduce "[4] GPU tasks during scrolling at WebRender"?

Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(gwatson)

There might be some cases where we invalidate more than needed, or possibly calculate picture cache slices inefficiently, but it's hard to say without investigating and profiling the specific case here.

Flags: needinfo?(gwatson)
Severity: S3 → S2

Ashley, when you have cycles, could you please investigate how difficult it might be to pick up the zero-copy work from Sotaro?

Assignee: nobody → ahale

I'm spinning up my knowledge on this now.

Flags: needinfo?(nical.bugzilla)

An interesting bit that I discovered right now is that the page scrolls totally smoothly if I have a game like Valorant at the background minimized while scrolling the page of this bug on a 1440p60, and a very slight, almost not visible, hiccups on the 2160p60 (4K) setting- so it means that the higher clocks (or shall I say state) of the GPU seems affect this issue.

It makes sense that the GPU operating at higher clocks would perform the video copy faster and not provoke a visible stutter, that's not really a solution but it is valuable to know this, thanks.

NeedInfo - Sotaro, can you identify anything we can do to avoid waiting on locks in AMD and NVIDIA drivers on these large video copies?

My guess is it's locking a texture to update and having to wait for the GPU to stop using it first, which makes us miss the refresh window for the scrolling.

Profiles from comment 16 and comment 17:

Flags: needinfo?(ahale) → needinfo?(sotaro.ikeda.g)

I think that waiting on lock of SyncObjectD3D11Host::Synchronize() could be reduce to the following.

  • Call SyncObjectD3D11Host::Synchronize() only when there exists ID3D11Texture2D that is created by different ID3D11Device and it does not have keyed mutex.
Flags: needinfo?(sotaro.ikeda.g)

(In reply to Ashley Hale [:ahale] from comment #25)

I will attempt repro on single 144hz screen on Windows 10 then when I get a chance (this isn't my top priority at the moment) and analyze it in gpuview.

I was able to get the bug to go away by turning on the pref media.wmf.zero-copy-nv12-textures-force-enabled in about:config, so I think this is possibly a thread holding a lock and waiting for a GPU fence which then blocks other threads trying to grab that lock, Sotaro would know more about that, since it goes away with his new zero copy implementation.

Turning on media.wmf.zero-copy-nv12-textures-force-enabled does help for 1440p 60FPS, but there are still horrible stutters for the 4k 60 FPS.
Should this be better with the potential fixes for the locking issues?

Flags: needinfo?(ahale)

In my personal use of Firefox I ran into an interesting case when I changed the primary monitor from a 360hz monitor to a 60hz monitor while Firefox was running, and a 1080p video started dropping a lot of frames on the 360hz monitor, and continued doing so when I moved it to a 60hz monitor. Restarting Firefox fixed that. There may be more clues I can dig into based on that repro case I just stumbled upon.

I'm not entirely sure how to move this forward in the short term.

(In reply to Sotaro Ikeda [:sotaro] from comment #34)

I think that waiting on lock of SyncObjectD3D11Host::Synchronize() could be reduce to the following.

  • Call SyncObjectD3D11Host::Synchronize() only when there exists ID3D11Texture2D that is created by different ID3D11Device and it does not have keyed mutex.

SyncObjectD3D11Host::Synchronize() is reduced if possible by Bug 1847665.

Depends on: 1847665

This may fixed - I will attempt repro today.

(In reply to Ashley Hale [:ahale] from comment #38)

This may fixed - I will attempt repro today.

While 1440p60fps is much better (still a bit stuttery),
The 4k60fps still stutters visibly and hard.

I can see a good correlation to the initial P state (clock states) of the GPU when the auto scrolling has started as such:
**** Note: the video is playing all the time of the repro in 4k60fps. ****

  • Scenario 1:

---------------------------

Starting to auto scroll downwards from the top of the page where the 4k60fps video is playing at that time - the GPU goes into the highest P state (max clocks) and the auto scrolling is smooth until it gears down to lower P states as you keep auto scrolling the page downward, and when getting to the lowest P state, the scrolling stutters badly until you go up to the video (where the P state goes back to the initial highest P state (max clocks)), and the auto scrolling then is smooth and keeps being smooth until it again dwindles down in P states and then becomes stuttery.

  • Scenario 2:

---------------------------

Starting to auto scroll downwards (not to the video) from the middle of the page (where the video is not visible) - Initial P state observed is the lowest and the auto scrolling is smooth.
The auto scrolling keep being smooth until you scroll up back to the video (or if for some other reason the P state went up to the highest state), then auto scroll back downwards - the scrolling is smooth until the P state goes down to the lowest one (like the one that was in the initial auto scrolling), then it starts stuttering badly.
For some reason if you keep auto scrolling some time more (maybe 5-10 seconds), then the auto scrolling magically becomes in sync and it scrolls smoothly throughout this session of auto scrolling - this is consistently reproduced.

I hope this helps to pinpoint and reproduce the issue better in the code.

Profiled this on an Intel 11th gen laptop as well https://share.firefox.dev/3sp0zKI (see comment #33 for my AMD and NVIDIA profiles), and went over it with another developer, it is hitting a pending frame limit which is not a good sign, needs more debugging though.

A few things are quirky about this bug:

  • autoscrolling is not a common user action on YouTube to my knowledge (this is the scroll widget that appears in Windows apps when you middle click)
  • YouTube has miniature video previews on the right side, so a lot of these are created via JS while scrolling, which means they might all be decoding several frames immediately upon loading after creation in the DOM, so this could be a decent chunk of GPU activity.
  • we have to complete all rendering in the time between vsync events to be smooth - but we seem to be starting late in the frame in the circumstances in this bug.
    • starting late in the frame has been observed frequently in my repro attempts, which means we have as little as 1-2ms to do the whole video frame decode despite the vsync rate being 16ms, and hence the browser usually misses the vsync trigger and stuttering happens - paradoxically, this seems to keep happening on following frames, which shouldn't be the case, as it should be beginning the next video frame decode immediately after this one finishes so that we maintain a pre-buffered number of decoded frames to display. This may be partly an interaction with our video decoding API usage because we always ask for a realtime context (buffering at most 3 frames, which is intended for WebRTC situations) even when decoding content like YouTube where prebuffering 16 frames would be much more appropriate for smooth playback.
    • using zero-copy video decoding substantially reduces the time taken to decode video, so we meet the vsync timing more often, but the fact we started several milliseconds after the vsync trigger means we can still miss the next vsync, which seems like it is the most essential problem here.
  • using higher refresh rate monitors (especially 144hz, 240hz, 300hz, 360hz) causes significantly worse stuttering than 60hz monitors, which seems very surprising in practice - we take longer to draw the frames at higher refresh rates, for unclear reasons, and this leads to a lot of stuttering where frames take 23ms+ (drops refresh to 30hz) or even spiking to 43ms at times (which drops refresh to 20hz), yet at 60hz the autoscrolling is completely smooth with many videos on YouTube.

This bug seems like it should be downgraded to S3 because autoscrolling is (I believe) an infrequent action on YouTube, however the video decoding late in the frame is worthy of S2, so if we downgrade this bug to S3 we still need a separate S2 bug to cover that.

Severity: S2 → S3

ahale will file a new bug for the late-frame-start.

Indeed I will, I'm still trying to characterize it a bit more so I can write a meaningful description in the new bug, like https://bugzilla.mozilla.org/show_bug.cgi?id=1865542 may be the same bug (and doesn't even involve video).

(In reply to Ashley Hale [:ahale] from comment #44)

Indeed I will, I'm still trying to characterize it a bit more so I can write a meaningful description in the new bug, like https://bugzilla.mozilla.org/show_bug.cgi?id=1865542 may be the same bug (and doesn't even involve video).

Hey Ashley,
Do we have a new S2 bug for the late decoding to follow?

I just need to take the time to characterize this better and file one. This bug is closely related to the high refresh rate bugs (tracked in meta bug 1749376) as high refresh rates seem to cause the same missed-vsync timings with media playback.

Sotaro, I'm having trouble finding enough time to work on this, and you are familiar with it, so I'm sending it back to you for further investigation if you can find some time to look into it.

I can say that this needs a deep understanding of the lock chains involved with media decode and layout autoscrolling.

Assignee: ahale → sotaro.ikeda.g
Status: NEW → ASSIGNED
Flags: needinfo?(ahale) → needinfo?(sotaro.ikeda.g)

Ok, I take the bug!

Flags: needinfo?(sotaro.ikeda.g)
See Also: → hw-ffvpx

If Bug 1893427 is implemented, we could implement D3DFence correctly with hardware decoded video frame.

(In reply to Sotaro Ikeda [:sotaro] from comment #49)

If Bug 1893427 is implemented, we could implement D3DFence correctly with hardware decoded video frame.

Would you mind provide more details about that? why using ffmpeg can make the implementation D3DFence? Is current architecture of WMF not allowing us to do so? Thanks!

Flags: needinfo?(sotaro.ikeda.g)

When D3D11 video api is directly used for hardware video decoding, we could implement D3D11 Fence like chromium. ffmpec also seemed to use d3d11 video api.

In chromium, D3D11 Fence of video frame is controlled by the followings.

It end up to the followings

D3D11VideoDecodeImageRepresentation is created by DefaultTexture2DWrapper::GpuResources

Its write access is started by DefaultTexture2DWrapper::BeginSharedImageAccess()

DefaultTexture2DWrapper::BeginSharedImageAccess() is called before ID3D11VideoContext::DecoderBeginFrame() call in D3D11VideoDecoderWrapperImpl::WaitForFrameBegins() by D3D11PictureBuffer::AcquireOutputView().

VideoDecodeImageRepresentation::ScopedWriteAccess is released by DefaultTexture2DWrapper::ProcessTexture()

The ProcessTexture() is called by D3D11VideoDecoder::OutputResult().

Then chromium uses D3D11 video API directly, and it adds D3D11 Fence support to hardware decoded video.

ffmpeg also seemed to use low level D3D11 video API, then I thought that with ffmpec usage, D3D11 Fence handling could be added.

Flags: needinfo?(sotaro.ikeda.g)

With current WMF api, it seems not possible to get correct timing for D3D11 Fence like comment 51.

And like bug 1890622, video decoding performance might be different between current WMF api and low level D3D11 video api.

And it seems better to compare the performance difference between current native compositor(multiple direct composition layers) and one framebuffer with overlay like chromium.

Pretty sure this scroll stutter was root caused to be a Microsoft issue by one of the vendors.

No longer blocks: gfx-triage
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: