Closed Bug 1532961 Opened 5 years ago Closed 11 months ago

WebRender scrolling jankier on intel machines on Windows

Categories

(Core :: Graphics: WebRender, defect, P2)

defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: bas.schouten, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(2 files)

Attached video CNN Scrolling

On the intel machines that I've tested on, WebRender's scrolling experience appears to be jankier than the standard D3D11/D2D combination. When looking at profiles this appears to be due to more long frames when compositing. I've tested this on three different low-end to mid-end devices. I realize we're not shipping on Intel yet but it concerns me that this does not appear to be visible in any of the stats on the WebRender dashboard for Intel.

I haven't tested on any other hardware, included is a crude video comparing WebRender-On on the right to WebRender-Off on the right, I tested two websites but the issue appears to be perceivable on most sites. I've included profiles for the sites shown in the video.

Note the affect is more pronounces when the window is maximized, but for the purposes of the video side-by-side is better. Sadly my HDMI capture hard has been sent for RMA, once it's back I will make a maximized side by side video.

Profiles including WR threads where applicable:

WR:
nu.nl: https://perfht.ml/2SM1fST
cnn.com: https://perfht.ml/2SLZN31
Non-WR:
nu.nl: https://perfht.ml/2SO09G1
cnn.com: https://perfht.ml/2SITbCh

Attached video NU.nl Scrolling
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(jmuizelaar)

Can you confirm if it's dependent on the content of the page at all? i.e do you see the problem when scrolling very simple pages?

Flags: needinfo?(jmuizelaar) → needinfo?(bas)
Blocks: wr-intel

In the video, WebRender-Off is on the left side (confirmed with Bas).

This does show up on the dashboard a bit, in the graph that shows rate of COMPOSITE_TIME being >=16ms [1]. Unfortunately the error bars for beta make the graph a bit hard to read, but it's clearly higher for WebRender enabled.

The other issue is that COMPOSITE_TIME just measures the CPU time spent during a composite (not GPU time), and so 16ms may not be the right threshold.

The best outcome would be to find a way to more directly measure whether we are dropping frames due to slow composites.

D3D11 offers time queries to measure the total GPU spent, but this time isn't sequentially after the CPU time, so I don't think adding them together is useful (but maybe it's still an interesting data point).

We could also try to use IDXGISwapChain::GetFrameStatistics [2] to see if that can reliably show when we miss a frame, though admittedly I haven't fully understood it yet.

There may also be other ways to get a more useful measurement, any ideas Bas?

[1] https://metrics.mozilla.com/webrender/dashboard_intel.html#composite_time
[2] https://docs.microsoft.com/en-us/windows/desktop/api/dxgi/nf-dxgi-idxgiswapchain-getframestatistics

Flags: needinfo?(matt.woodrow)

Looking at the profiles, most of the composites are well under 16ms (usually sub 8ms), so it seems unlikely that the GPU time would be long enough to make us miss frame.

There are however, a small number of very slow composites (100ms+). Most of these look to be due to texture upload, though I also see a 343ms composite in the nu.nl profile due to compiling a shader.

These slow composites should be contributing to the COMPOSITE_TIME>16ms graph on the dashboard, though maybe we're under-representing that severity by using a boolean condition, rather than factoring in how far over 16ms we were.

It also appears that we're attempting to composite every ~33ms (30fps), not every 16. Is that expected for your configuration?

(In reply to Jeff Muizelaar [:jrmuizel] from comment #2)

Can you confirm if it's dependent on the content of the page at all? i.e do you see the problem when scrolling very simple pages?

The affect does seem to be 'worse' on somewhat more complicated pages. Although as far as I can tell it becomes visible fairly soon, but I'd prefer something more objective than that. I should be able to do HDMI captures again by tomorrow. Do you have some test pages you could like me to examine and upload videos for?

(In reply to Matt Woodrow (:mattwoodrow) from comment #3)

In the video, WebRender-Off is on the left side (confirmed with Bas).

This does show up on the dashboard a bit, in the graph that shows rate of COMPOSITE_TIME being >=16ms [1]. Unfortunately the error bars for beta make the graph a bit hard to read, but it's clearly higher for WebRender enabled.

The other issue is that COMPOSITE_TIME just measures the CPU time spent during a composite (not GPU time), and so 16ms may not be the right threshold.

The best outcome would be to find a way to more directly measure whether we are dropping frames due to slow composites.

D3D11 offers time queries to measure the total GPU spent, but this time isn't sequentially after the CPU time, so I don't think adding them together is useful (but maybe it's still an interesting data point).

We could also try to use IDXGISwapChain::GetFrameStatistics [2] to see if that can reliably show when we miss a frame, though admittedly I haven't fully understood it yet.

There may also be other ways to get a more useful measurement, any ideas Bas?

[1] https://metrics.mozilla.com/webrender/dashboard_intel.html#composite_time
[2] https://docs.microsoft.com/en-us/windows/desktop/api/dxgi/nf-dxgi-idxgiswapchain-getframestatistics

This is a really tricky problem and I don't necessarily have great ideas. On these shared memory architectures it's also hard to predict how workloads on other threads may affect the GPU work, or vice versa, another thing we sadly have no indication of in the dashboard.

(In reply to Matt Woodrow (:mattwoodrow) from comment #4)

Looking at the profiles, most of the composites are well under 16ms (usually sub 8ms), so it seems unlikely that the GPU time would be long enough to make us miss frame.

There are however, a small number of very slow composites (100ms+). Most of these look to be due to texture upload, though I also see a 343ms composite in the nu.nl profile due to compiling a shader.

These slow composites should be contributing to the COMPOSITE_TIME>16ms graph on the dashboard, though maybe we're under-representing that severity by using a boolean condition, rather than factoring in how far over 16ms we were.

It also appears that we're attempting to composite every ~33ms (30fps), not every 16. Is that expected for your configuration?

I feel one of the main problems is that there's far more composites than there's 'content frames', for example scrolling and async animations. i.e. if you make Content frame time 10ms faster but Composite Time 3ms slower you're likely still causing vastly more work to be done, particularly on pages with few content-driven paints. These content driven paints will also probably often be the fast ones.

Basically, if you have a fairly complicated page, with no content changes, and you just scroll down, the 'paints' when the viewport changes may take 100ms, quite often that will be fine though, once the scroll gets there the content will be ready. Making those take 20ms is great, but if it's at the expense of composites going over 16ms twice as often, the experience is much worse. Nothing like that is captured on the dashboard at the moment. Indeed, it's not hard to make us checkerboard on these machines if you try, and indeed, when doing similar things WR will have janks at comparable times when non-WR checkerboards. Which of the two experiences is better is always a complicated problem, but it's certainly very noticeable and in no way reflected in our data.

Flags: needinfo?(bas)

(In reply to Matt Woodrow (:mattwoodrow) from comment #4)

It also appears that we're attempting to composite every ~33ms (30fps), not every 16. Is that expected for your configuration?

Umm, I can't see why, I used clean profiles on this device at least! But perhaps there's something device specific.

Priority: -- → P2

Note over the course of the investigation I also found several sites where WebRender scrolling feels 'better' than D3D11. Although the difference isn't as pronounced.

It also appears WebRender scrolling compared more favorably when the site has finished loading and the machine is under low load. On these devices with fewer cores this may be related to the larger number of threads involved when compositing with WebRender.

Can you record video with slow frame indicator turned on? "gfx.webrender.debug.slow-frame-indicator"

Flags: needinfo?(bas)

So I did some load testing on the Quantum reference laptop yesterday.

The main test I did was under memory bandwidth load:
https://github.com/jrmuizel/jrmuizel-membench

Using ./jrmuizel-membench mem 6

WebRender performed noticeably better than non-WebRender. WebRender was usually able to get around 29fps where as non-WebRender tended to be around 10-20fps

I also saw similar numbers with cpu load testing
./jmuizel-membench cpuall 7

I'd be curious to see what you see in this scenario. (Remember to build with cargo build --release)

To clarify this was on the 2018 reference device.

On the 2017 reference device both Wr and Non-Wr hit 60fps, but I see noticeably more checkerboarding with Non-Wr when running with jrmuizel-membench mem 7

(In reply to Jeff Muizelaar [:jrmuizel] from comment #8)

Can you record video with slow frame indicator turned on? "gfx.webrender.debug.slow-frame-indicator"

I can!

(In reply to Jeff Muizelaar [:jrmuizel] from comment #9)

So I did some load testing on the Quantum reference laptop yesterday.

The main test I did was under memory bandwidth load:
https://github.com/jrmuizel/jrmuizel-membench

Using ./jrmuizel-membench mem 6

WebRender performed noticeably better than non-WebRender. WebRender was usually able to get around 29fps where as non-WebRender tended to be around 10-20fps

I also saw similar numbers with cpu load testing
./jmuizel-membench cpuall 7

I'd be curious to see what you see in this scenario. (Remember to build with cargo build --release)

I'd also be curious to know what you see while there is still stuff on the page loading?

The video above, for the record, was taken on Surface Go device (just because the touch screen makes it easier to compare 'scrolling feel'), which is a Intel Pentium Gold 4415Y, which contains a 615 GPU and a 1800 x 1200 screen. Since I can only compare the feeling of the scroll well with the touch screen, it's possible that WebRender performed worse in those scenarios due to GPU load caused by 'regular firefox', I'll have a look.

I should note that it didn't seem like the 'average' frame rate was worse with WebRender, it was just that it felt like there were more moments of jank, which is what we appear to see in the profile as well. (The profile was taken on the aforementioned Surface Go).

Flags: needinfo?(bas)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #8)

Can you record video with slow frame indicator turned on? "gfx.webrender.debug.slow-frame-indicator"

What am I supposed to see when doing this right? I'm seeing lots of jank but I never see any kind of indicator. Having said that, the 2018 reference laptop seems to have very different performance characteristics from one point of time to another so I'm having trouble getting a decent run.

I got a profile during a 'good time' janks were only occasional there: https://perfht.ml/2Y9KsNg showing slow frames, but I didn't see any visual indicators or anything. Also note that at the moment the machine is particularly bad for me, without WebRender as well, so it's hard to say if this is 'worse' than without WebRender.

(In reply to Bas Schouten (:bas.schouten) from comment #13)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #8)

Can you record video with slow frame indicator turned on? "gfx.webrender.debug.slow-frame-indicator"

What am I supposed to see when doing this right? I'm seeing lots of jank but I never see any kind of indicator. Having said that, the 2018 reference laptop seems to have very different performance characteristics from one point of time to another so I'm having trouble getting a decent run.

At the top left of the window there should be a small red bar that moves every time there's a jank. Unfortunately, it doesn't show up when the maximized.

Bas, can you confirm that you don't see this anymore because of picture caching?

Flags: needinfo?(bas)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #15)

Bas, can you confirm that you don't see this anymore because of picture caching?

I will retest this week with both picture caching and D3D11 double buffering enabled!

Flags: needinfo?(bas)

Changing the integrated graphics to high performance mode when plugged in also results in a decent speed boost.
This can be done in the Intel Graphics Control Panel.

Bas, do you see this still now that we have DirectComposition scrolling?

Flags: needinfo?(bas)

I tested cnn.com and nu.nl on a on a low end atom surface tablet with low end intel GPU running on battery (without direct composition) and scrolling has improved quite a bit since the bug was reported, although it still isn't as smooth as non-webrender (a bit more jank every now and then, but scrolling is generally good).
Render backend and renderer times are pretty reasonable on these two pages.

I'll make this bug specific to Windows since there is a chance that direct composition helps and we have an equivalent bug on linux.

See Also: → 1583881
Summary: WebRender scrolling jankier on intel machines → WebRender scrolling jankier on intel machines on Windows

@Nical: Could you do another test and see if we would be able to ship with this? Also, can you check if DC is being used now?

Flags: needinfo?(nical.bugzilla)

On the low end Windows device I own, we don't use DC. The CPU and GPU are weak enough that WebRender is noticeably slower than the layers backed as soon as a significant portion of the page invalidates. Render backend times are not very good but not too bad, but renderer times are pretty bad with a lot of time in ANGLE and drivers.

Hand waving some areas where to seek wins that would help here:

  • improve picture caching heuristics so that we invalidate less
  • coming up with ways to reduce the ANGLE & driver overhead
  • better overlap CPU and GPU work (bug 1660116)
  • long tail of frame building perf improvements to give the renderer thread a bit more budget

Realistically I don't thonk we will be able to push one of these enough for webrender to not be a noticeable regression (on the low end device I own), so I think that we'll need to be a mix of as much of the above as we can, especially picture caching invalidation and driver overhead.

Right now my understanding is that we want to avoid taking risks and shipping regressions, so my suggestion is to focus on shipping other configurations while continuously improving performance and reevaluate low end intel when we are in a better spot and run out of configurations to ship. Or we could do a focused effort on this soon. In any case I would prefer not shipping as is.

Flags: needinfo?(nical.bugzilla)
Severity: normal → S3
Depends on: 1660116
Status: NEW → RESOLVED
Closed: 11 months ago
Flags: needinfo?(bas)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: