Closed Bug 1661135 Opened 5 years ago Closed 5 years ago

Zoom stutters with Webrender on Mali-G71

Categories

(Core :: Graphics: WebRender, defect)

defect

Tracking

()

RESOLVED FIXED
82 Branch
Tracking Status
firefox81 --- verified
firefox82 --- fixed

People

(Reporter: evilpies, Assigned: jnicol)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

I have a Galaxy S8 (SM-G950F). Since bug 1659499 enabled Webrender for my device zooming has been stuttering. I can easily reproduce this on https://cgit.freedesktop.org/mesa/mesa/log/ and other pages.

Setting gfx.webrender.force-disabled to true fixes this stuttering for me. Everything is smooth and I get a continuous zoom.

:jnicol, can you comment to the bug?

Flags: needinfo?(jnicol)

Thanks for the report and the profile Tom.

Most of the time is spent in eglSwapBuffers, which indicates the GPU is busy.

I can also reproduce this on my Galaxy S9 (which has a Mali G72 vs the S8's G71). In fact, it's even more pronounced: ~90% of time in swapbuffers vs your ~30%! I cannot reproduce on any of my Adreno devices, however.

Tom, I'd be really interested to know specifically which other pages you see this on, along with profiles for bonus points! I don't doubt we have performance issues when zooming on many (or maybe even most) pages. But in most cases that I'm aware of the profiles show lots of time in glClear or perhaps texture upload, and I can reproduce to varying degrees on Adreno devices too. This is the first instance I've seen of the time being spent in eglSwapBuffers, and the fact I cannot reproduce on Adreno indicates there's something specific on this page that the Mali GPU struggles with.

Kris, this should definitely block us letting webrender on Mali ride the trains.

Flags: needinfo?(jnicol) → needinfo?(evilpies)

Zooming definitely feels worse on basically every website that can be zoomed like Hacker News. Mesa is probably the worst page I have come across so far however. I will enable WebRender again and report back if I find another particularly bad site.

Flags: needinfo?(evilpies)

Great, thanks. I think we have an underlying issue which affects basically all websites, which is probably bug 1658205. And then any websites which are especially bad probably have something else going on too, such as this one. A profile of hackernews on your device would be useful so I can verify if it is indeed the same as bug 1658205, and then yes profiles of any other particularly bad websites. I understand it's not pleasant to use it when zooming is so slow, so the help is much appreciated!

As for the mesa cgit log page, I've been playing around with the testcase and found out that changing the CSS font-size hugely affects the performance. It's set to 10pt on the page. Anything up to 18pt suffers the performance issue, but 19pt or higher seems to run smoothly.

This is for https://hg.mozilla.org/integration/autoland/summary, which is similar bad to Mesa: https://hg.mozilla.org/integration/autoland/shortlog https://share.firefox.dev/2YzkdSA
Hacker News only stutters when zooming for the first time, afterwards it's actually mostly ok, except maybe for one very short stutter when zooming stops: https://share.firefox.dev/2YBf1gS

Severity: -- → S2

Thanks for the profiles! The first profile looks similar to this one. (Is that for the "summary" page, or the "shortlog"? I'm guessing summary because I can reproduce the slowness there and not on shortlog, which makes sense given what I'm about to explain.) And the hackernews profile indeed looks like bug 1658205, so let's ignore that for now.

The reason zooming on this page (and hg.mozilla/org/integration/autoland/summary) is so slow is because we're rerasterizing and uploading every glyph for each slight change in zoom level each frame. (This happens on Adreno too, but our texture upload path is just about quick enough that it's not noticeable. I think it's worth investigating whether we are hitting a slow path on Mali. I'll file a separate bug for that.)

We attempt to avoid this by calling rasterizing the glyphs at a raster scale chosen by calling PicturePrimitive::get_raster_space(). That function checks whether the Picture's spatial node or one of its ancestors is pinch zooming.

Spatial node 4 is the one being pinched zoomed, but the text run's pic.spatial_node_index == 0. This means that get_raster_space() sees that Spatial Node 0 is not pinch zooming, so we use the Screen raster space and have a bad time.

The reason that increasing the font size prevents the bug is that when the page is vertically scrollable at the default zoom level, pic.spatial_node_index == 5 (which is a child of 4), so we use a Local raster space. That's the same reason why I can reproduce on hg.mozilla.org "summary" pages, but not "shortlog" ones.

Kats, Glenn, do you know why the default-zoom-level scrollability of the page affects whether the picture is attached to Spatial Node 0 vs 5? If that's the expected behaviour, then what is the solution here? Could get_raster_space() be a property of each primitive rather than their picture?

This reminds me of bug 1603637 which had fallen off the bottom of my todo list. Similar problem but for fixed-position content. Ideally the fix for this will fix that bug too.

Flags: needinfo?(kats)
Flags: needinfo?(gwatson)

Kats, Glenn, do you know why the default-zoom-level scrollability of the page affects whether the picture is attached to Spatial Node 0 vs 5?

I guess happens because SpatialTree::find_scroll_root() ignores scroll frames which have no scrollable area (and this is based off of the default zoom, ie moving the visual viewport when zoomed in does not count as scrolling here). So when the page is not scrollable, the tile cache is built with spatial node 0 as the scroll root, and then the pictures are based on the tile cache somehow?

I should probably try to fill in the gaps in my knowledge there about how pictures are created, but I'm leaning towards the idea that get_raster_space() should be per-primitive rather than per-picture.

See Also: → 1661528

When zooming, webrender overrides the raster space used to render text, so that
we do not expensively rerasterize the glyphs for every fractional change in zoom
level. Previously we chose to do this when any ancestor of the picture's spatial
node was being zoomed. This worked on most pages, because the scroll frame which
is used as the main picture caching slice is a descendent of the zooming
reference frame.

However, on pages without a scrollable frame, or for fixed-position content, the
picture's spatial node will not be a descendent of the zooming reference
frame. This meant that we did not detect that we were zooming and rendered the
text in screen raster space rather than the overridden local space, leading to
poor zooming performance.

To fix this, check whether the primitive's spatial node (rather than the
picture's) is a descendent of the zooming frame.

Assignee: nobody → jnicol
Status: NEW → ASSIGNED

Looks like you have found a good solution to this.

Flags: needinfo?(gwatson)
Flags: needinfo?(kats)
Pushed by jnicol@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a2087453d11a Use primitive's spatial node rather than its picture's when deciding raster space. r=gw
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → 82 Branch

Even though webrender is only enabled for Mali on Nightly only, once I knew what I was looking for I was able to reproduce on Adreno, and also on Desktop because desktop-zooming is enabled. So we should uplift this fix.

Comment on attachment 9172494 [details]
Bug 1661135 - Use primitive's spatial node rather than its picture's when deciding raster space. r?gw

Beta/Release Uplift Approval Request

  • User impact if declined: Stuttery zooming on certain pages when webrender is enabled (on Android or desktop touchpad/touchscreen zooming)
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Very minor change - uses a mechanism to improve zooming performance which has been in place for a year. Just fixes an edge case on some pages where we used the wrong value.
  • String changes made/needed: N/A
Attachment #9172494 - Flags: approval-mozilla-beta?

Comment on attachment 9172494 [details]
Bug 1661135 - Use primitive's spatial node rather than its picture's when deciding raster space. r?gw

Approved for 81.0b5.

Attachment #9172494 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Hello, we did a check, issue is fixed on Firefox Preview Beta 81.1.0-beta.1 with Samsung Galaxy S9 (Android 8) - Mali 72

Reproducing video

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: