Closed Bug 1836063 Opened 11 months ago Closed 9 months ago

Poor performance with translucent rounded-rectangles with box shadows

Categories

(Core :: Graphics: WebRender, defect)

defect

Tracking

()

RESOLVED FIXED
117 Branch
Tracking Status
firefox117 --- fixed

People

(Reporter: david.turner, Assigned: david.turner)

References

Details

Attachments

(11 files)

Attached file w3schools.htm

Steps to reproduce:

I'm running on a Raspberry Pi 4 with Wayland, using GPU-accelerated webrender. I open the attached reproducer (stripped down from the w3schools.com homepage. The attached CSS file also needs to be in the same directory) in Firefox then scroll up and down quickly and repeatedly.

Actual results:

I see poor scrolling performance, dropping down to 10fps and looking very juddery.

The webrender debug profiler shows about 10,000 vertices when scrolling on this page, which seems excessive. The vertex count is consistently this high when scrolling up and down on this page. If I use developer tools to disable the box shadows OR the opacity setting of 0.9 then scrolling performance is fine and the vertex count is under 1000, so it seems like the combination of translucency and box-shadows is causing the problem. Disabling the rounded corners on the boxes halves the vertex count and improve performance but it is still not great.

The high vertex count can also be seen with stable Firefox on x86_64, but even my laptop Intel GPU is powerful enough that there is no performance drop.

I have tentatively put this in the WebRender category because the scene display-list looks sane to me.

Expected results:

Decent scrolling performance. Chromium scrolls smoothly on this page on the same Raspberry Pi 4.

Attached file w3schools31.css
Attached file about_support.txt

Attached about:support from Firefox Stable (I can also reproduce the problem on a fresh build from mozilla-central)

Can you share a screenshot with the webrender debug profiler on screen?

Flags: needinfo?(david.turner)
Attached image with_shadows.png

I couldn't fit the full default profiler on my display, let me know if there's other graphs it would be useful to see.

Flags: needinfo?(david.turner)
Attached image without_shadows.png

And for comparison this is what the profiles look like when I disable the box-shadows

Can you turn on gfx.webrender.debug.gpu-time-queries and redo the screenshots (provided the gpu times show up with that option set)?

Flags: needinfo?(david.turner)
Attached image gpu-time-queries.png

I'm afraid that gpu-time-queries isn't working on Raspberry Pi, the items in the key are reordering/updating but nothing actually appears on the chart. It works correctly on the Pi if I force software-webrender so it seems like there's a separate bug going on here (can make a second ticket to delve into that if you want).

Flags: needinfo?(david.turner)
Attached image gpu-time-queries-sw.png

This is how gpu-time-querires appears when I force software webrender, with box-shadows enabled. Given that I see the same performance difference with/without box-shadows with software webrender this might still be useful for debugging.

For completeness, GPU-time-queries with software webrender with the box-shadows disabled.

(In reply to David Turner from comment #7)

I'm afraid that gpu-time-queries isn't working on Raspberry Pi, the items in the key are reordering/updating but nothing actually appears on the chart. It works correctly on the Pi if I force software-webrender so it seems like there's a separate bug going on here (can make a second ticket to delve into that if you want).

It's probably worth doing that. Does the hardware/driver support EXT_disjoint_timer_query?

Flags: needinfo?(david.turner)

gw, any guesses as to what might be going on here? Is it expected for the vertex count to get that high?

Flags: needinfo?(gwatson)

It's probably worth doing that. Does the hardware/driver support EXT_disjoint_timer_query?

The Mesa driver for V3D (the GPU on Pi 4) does not appear to support that extension. If that's a dealbreaker then I could ask our Mesa people if it's possible to add. But if gpu-time-queries should work without GL_EXT_disjoint_timer_query (we do have GL_KHR_debug but not GL_EXT_debug_marker) then I'll make another ticket for that.

Flags: needinfo?(david.turner)
Assignee: nobody → gwatson
Flags: needinfo?(gwatson)

The reason gpu-time-queries wasn't working is because the VideoCore Mesa driver doesn't implement the TIME_ELAPSED query so all the timers were coming back as zero. So can't blame Firefox for that one!

I've been doing some digging into the original performance issue, I'm now pretty sure the issue is to do with picture cache invalidation. Enabling gfx.webrender.debug.picture-caching on the test page, when scrolling I constantly see red tiles indicating invalidation even though nothing is changing on the page. If I disable the box shadows or the translucency then no picture invalidation occurs when scrolling (all tiles stay green) and performance is good.

Looking at the render graphs (render-tasks-2-0.svg, see attached screenshot), when there are both box shadows and translucency there is an additional Picture task for each box. I assume that each box with box-shadow is getting drawn to a temporary texture which is then drawn to the picture cache tiles with translucency applied. I can get the same behaviour without box shadows by adding text inside the boxes, I suspect the issue applies to any non-trivial translucent object. It seems like something about these intermediate Pictures used for translucency is causing erroneous invalidation of picture cache tiles.

I'll keep digging to see if I can work out in more detail what's going wrong.

The severity field is not set for this bug.
:gw, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(gwatson)

When processing a picture-cache-tile we find the lowest common ancestor
clip of each primitive in the cache-tile and set that as the clip root.
Not only does this save separately applying the clip to each primitive,
it also means we actually draw the parts of the cache tile which are out
of view. This means we don't have to redraw the cache tile every time
more of it scrolls in to view.

This mechanism doesn't work when we have Picture primitives inside a
picture-cache-tile, e.g. for applying a filter. Primitives in the
sub-Picture still had the viewport clip applied and so when the
sub-Picture intersected with the viewport edge we had to redraw the
cache tile on every scroll event.

This diff copies the common-ancestor-clip logic to sub-Picture
primitives. On pages with lots of opacity filtered areas (e.g.
w3schools.com) this eliminates unnecessary cache-tile invalidation and
massively improves scrolling performance on systems with a weak GPU.

Severity: -- → S3
Flags: needinfo?(gwatson)
Assignee: gwatson → david.turner

Add a regression test for picture cache invalidation caused by scrolling
content with opacity-filtered stacking contexts. This test ensures that
picture cache tiles are not invalidated by scrolling when such
sub-pictures are present, which was the problem in bug 1836063.

Depends on D181664

Pushed by gwatson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/59cfe3846fff
Exclude common clip in subpictures r=gfx-reviewers,gw
https://hg.mozilla.org/integration/autoland/rev/cd0a65056113
Add invalidation regression test r=gfx-reviewers,gw

Backed out for causing reftest failures on bg-fixed-in-css-filter.html

Backout link

Push with failures

Failure log

Flags: needinfo?(david.turner)

Looking into this failure. Weirdly it looks like I broke the reference page rather than the actual test.

Flags: needinfo?(david.turner)
Pushed by gwatson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8ebdbb15927a
Exclude common clip in subpictures r=gfx-reviewers,gw
https://hg.mozilla.org/integration/autoland/rev/6c723e14e86b
Add invalidation regression test r=gfx-reviewers,gw
Status: UNCONFIRMED → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → 117 Branch
QA Whiteboard: [qa-117b-p2]
Regressions: 1854062
See Also: → 1871784
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: