Poor performance with translucent rounded-rectangles with box shadows
Categories
(Core :: Graphics: WebRender, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox117 | --- | fixed |
People
(Reporter: david.turner, Assigned: david.turner)
References
Details
Attachments
(11 files)
37.28 KB,
text/html
|
Details | |
62.25 KB,
text/css
|
Details | |
24.31 KB,
text/plain
|
Details | |
244.58 KB,
image/png
|
Details | |
207.11 KB,
image/png
|
Details | |
64.85 KB,
image/png
|
Details | |
84.64 KB,
image/png
|
Details | |
49.23 KB,
image/png
|
Details | |
131.52 KB,
image/png
|
Details | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
Steps to reproduce:
I'm running on a Raspberry Pi 4 with Wayland, using GPU-accelerated webrender. I open the attached reproducer (stripped down from the w3schools.com homepage. The attached CSS file also needs to be in the same directory) in Firefox then scroll up and down quickly and repeatedly.
Actual results:
I see poor scrolling performance, dropping down to 10fps and looking very juddery.
The webrender debug profiler shows about 10,000 vertices when scrolling on this page, which seems excessive. The vertex count is consistently this high when scrolling up and down on this page. If I use developer tools to disable the box shadows OR the opacity setting of 0.9 then scrolling performance is fine and the vertex count is under 1000, so it seems like the combination of translucency and box-shadows is causing the problem. Disabling the rounded corners on the boxes halves the vertex count and improve performance but it is still not great.
The high vertex count can also be seen with stable Firefox on x86_64, but even my laptop Intel GPU is powerful enough that there is no performance drop.
I have tentatively put this in the WebRender category because the scene display-list looks sane to me.
Expected results:
Decent scrolling performance. Chromium scrolls smoothly on this page on the same Raspberry Pi 4.
Assignee | ||
Comment 1•2 years ago
|
||
Assignee | ||
Comment 2•2 years ago
|
||
Attached about:support from Firefox Stable (I can also reproduce the problem on a fresh build from mozilla-central)
Comment 3•2 years ago
|
||
Can you share a screenshot with the webrender debug profiler on screen?
Assignee | ||
Comment 4•2 years ago
|
||
I couldn't fit the full default profiler on my display, let me know if there's other graphs it would be useful to see.
Assignee | ||
Comment 5•2 years ago
|
||
And for comparison this is what the profiles look like when I disable the box-shadows
Comment 6•2 years ago
|
||
Can you turn on gfx.webrender.debug.gpu-time-queries and redo the screenshots (provided the gpu times show up with that option set)?
Assignee | ||
Comment 7•2 years ago
|
||
I'm afraid that gpu-time-queries isn't working on Raspberry Pi, the items in the key are reordering/updating but nothing actually appears on the chart. It works correctly on the Pi if I force software-webrender so it seems like there's a separate bug going on here (can make a second ticket to delve into that if you want).
Assignee | ||
Comment 8•2 years ago
|
||
This is how gpu-time-querires appears when I force software webrender, with box-shadows enabled. Given that I see the same performance difference with/without box-shadows with software webrender this might still be useful for debugging.
Assignee | ||
Comment 9•2 years ago
|
||
For completeness, GPU-time-queries with software webrender with the box-shadows disabled.
Comment 10•2 years ago
|
||
(In reply to David Turner from comment #7)
I'm afraid that gpu-time-queries isn't working on Raspberry Pi, the items in the key are reordering/updating but nothing actually appears on the chart. It works correctly on the Pi if I force software-webrender so it seems like there's a separate bug going on here (can make a second ticket to delve into that if you want).
It's probably worth doing that. Does the hardware/driver support EXT_disjoint_timer_query?
Comment 11•2 years ago
|
||
gw, any guesses as to what might be going on here? Is it expected for the vertex count to get that high?
Assignee | ||
Comment 12•2 years ago
|
||
It's probably worth doing that. Does the hardware/driver support EXT_disjoint_timer_query?
The Mesa driver for V3D (the GPU on Pi 4) does not appear to support that extension. If that's a dealbreaker then I could ask our Mesa people if it's possible to add. But if gpu-time-queries should work without GL_EXT_disjoint_timer_query
(we do have GL_KHR_debug
but not GL_EXT_debug_marker
) then I'll make another ticket for that.
Updated•2 years ago
|
Assignee | ||
Comment 13•2 years ago
|
||
The reason gpu-time-queries
wasn't working is because the VideoCore Mesa driver doesn't implement the TIME_ELAPSED query so all the timers were coming back as zero. So can't blame Firefox for that one!
I've been doing some digging into the original performance issue, I'm now pretty sure the issue is to do with picture cache invalidation. Enabling gfx.webrender.debug.picture-caching
on the test page, when scrolling I constantly see red tiles indicating invalidation even though nothing is changing on the page. If I disable the box shadows or the translucency then no picture invalidation occurs when scrolling (all tiles stay green) and performance is good.
Looking at the render graphs (render-tasks-2-0.svg
, see attached screenshot), when there are both box shadows and translucency there is an additional Picture
task for each box. I assume that each box with box-shadow is getting drawn to a temporary texture which is then drawn to the picture cache tiles with translucency applied. I can get the same behaviour without box shadows by adding text inside the boxes, I suspect the issue applies to any non-trivial translucent object. It seems like something about these intermediate Pictures used for translucency is causing erroneous invalidation of picture cache tiles.
I'll keep digging to see if I can work out in more detail what's going wrong.
Comment 14•2 years ago
|
||
The severity field is not set for this bug.
:gw, could you have a look please?
For more information, please visit BugBot documentation.
Assignee | ||
Comment 15•2 years ago
|
||
When processing a picture-cache-tile we find the lowest common ancestor
clip of each primitive in the cache-tile and set that as the clip root.
Not only does this save separately applying the clip to each primitive,
it also means we actually draw the parts of the cache tile which are out
of view. This means we don't have to redraw the cache tile every time
more of it scrolls in to view.
This mechanism doesn't work when we have Picture primitives inside a
picture-cache-tile, e.g. for applying a filter. Primitives in the
sub-Picture still had the viewport clip applied and so when the
sub-Picture intersected with the viewport edge we had to redraw the
cache tile on every scroll event.
This diff copies the common-ancestor-clip logic to sub-Picture
primitives. On pages with lots of opacity filtered areas (e.g.
w3schools.com) this eliminates unnecessary cache-tile invalidation and
massively improves scrolling performance on systems with a weak GPU.
Updated•2 years ago
|
Updated•2 years ago
|
Assignee | ||
Comment 16•2 years ago
|
||
Add a regression test for picture cache invalidation caused by scrolling
content with opacity-filtered stacking contexts. This test ensures that
picture cache tiles are not invalidated by scrolling when such
sub-pictures are present, which was the problem in bug 1836063.
Depends on D181664
Comment 17•2 years ago
|
||
Comment 18•2 years ago
•
|
||
Backed out for causing reftest failures on bg-fixed-in-css-filter.html
Assignee | ||
Comment 19•2 years ago
|
||
Looking into this failure. Weirdly it looks like I broke the reference page rather than the actual test.
Comment 20•2 years ago
|
||
Comment 21•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/8ebdbb15927a
https://hg.mozilla.org/mozilla-central/rev/6c723e14e86b
Updated•2 years ago
|
Description
•