Closed Bug 1642308 Opened 5 years ago Closed 5 years ago

Full screen picture-cache invalidation occurs whenever a new display list is received when scrolling whilst zoomed

Categories

(Core :: Graphics: WebRender, defect, P3)

defect

Tracking

()

RESOLVED FIXED
84 Branch
Tracking Status
firefox84 --- fixed

People

(Reporter: jnicol, Assigned: aosmond)

References

(Blocks 3 open bugs)

Details

Attachments

(1 file, 1 obsolete file)

Scrolling while zoomed used to cause full screen picture cache invalidation on every frame. This was because due to the zoom != 1.0, the async scroll offset was changing by a non-whole number every frame. This meant the subpixel position of the picture cache tiles kept changing, causing them to be invalidated. In bug 1589669 we ensured that the scroll offset was snapped, then in bug 1620014, we did the same for the visual viewport offset. With both these methods of async scrolling snapped (the scroll offset and visual viewport offset) the subpixel position remains constant between frames, meaning we do not invalidate.

However, as noted in bug 1620014 comment 16, there is still more picture cache invalidation than desired. Rather than occur on every frame, however, it now only occurs whenever a new display list is received. This might be every 128 or 256 pixels, depending on the platform. My understanding is that while we are now snapping the async scroll offsets, the main-thread scroll offset is not snapped. Each item in the new display list will therefore have moved by a fractional amount, causing everything to be invalidated.

Currently this is probably only an issue on android, but when desktop zooming is enabled it will be an issue on desktop too.

Probably a naive question but what is the benefit of scrolling in fractional amounts in the first place? I feel like if we only scrolled in whole pixels that would solve this issue (and would make snapping the visual viewport offset in webrender unnecessary, which would be nice as I keep causing new regressions with each attempt to fix that!)

(In reply to Jamie Nicol [:jnicol] from comment #0)

However, as noted in bug 1620014 comment 16, there is still more picture cache invalidation than desired. Rather than occur on every frame, however, it now only occurs whenever a new display list is received. This might be every 128 or 256 pixels, depending on the platform. My understanding is that while we are now snapping the async scroll offsets, the main-thread scroll offset is not snapped. Each item in the new display list will therefore have moved by a fractional amount, causing everything to be invalidated.

My understanding is that main thread does at least make an attempt to snap its scroll offset: https://searchfox.org/mozilla-central/rev/5e4d4827aa005d031580d2d17a01bae1af138b2e/layout/generic/nsGfxScrollFrame.cpp#2774

(In reply to Jamie Nicol [:jnicol] from comment #1)

Probably a naive question but what is the benefit of scrolling in fractional amounts in the first place? I feel like if we only scrolled in whole pixels that would solve this issue (and would make snapping the visual viewport offset in webrender unnecessary, which would be nice as I keep causing new regressions with each attempt to fix that!)

There are probably other reasons as well, but at least one reason these days is that the web specs are moving in the direction of the web-exposed scroll offsets being fractional (see Rendering Independent Scroll Offsets).

It's also not entirely clear to me what "only scrolling in whole pixels" would mean. If it meant "only whole CSS pixels", the scroll offset in screen pixels could still be fractional when zoomed. If it meant "only whole screen pixels", then the scroll offset in CSS pixels would have to change (to a different fractional value) when zooming to yield a whole screen pixel at the current zoom level.

Thanks Botond. "Scrolling in whole pixels" probably isn't the answer, but I'm not sure what is.

I've refreshed my memory a bit more on what the problem is in webrender. It's wrong to say "the subpixel position of picture cache tiles kept changing", it's more that the subpixel position of all the glyphs changes:

When we are checking for invalidation of picture cache tiles, every TextRun gets marked invalid because the prim UIDs do not match the previous frame's. A new prim UID means the interner lookup failed and a new prim was created, and the interner lookup failed because the fractional position of each GlyphInstance in the TextRunKey has changed. The position is set here, derived from the spatial node's (unsnapped) external_scroll_offset and the (snapped?) prim_info.rect.

So each new display list updates the external scroll offset to a new fractional value, which changes every glyph position, which causes (almost) full screen invalidation (only the tiles containing text runs).

Andrew, do you have any ideas about how to solve this?

Flags: needinfo?(aosmond)
Severity: -- → S3
Priority: -- → P3

Hmmmm. The glyph position is supposed to be stable during animations. For normal animations, we do this:

https://searchfox.org/mozilla-central/rev/559b25eb41c1cbffcb90a34e008b8288312fcd25/layout/painting/nsDisplayList.cpp#8008

to request we rasterize the glyphs in local space during the animation. Doing the same thing for the zoom animation would probably help, given from what you linked to me before, it follows a different path:

https://searchfox.org/mozilla-central/rev/559b25eb41c1cbffcb90a34e008b8288312fcd25/layout/painting/nsDisplayList.cpp#6352

To be clear, we only use local raster space while the animation is happened -- before it starts and after it is complete, we must switch away from local raster space.

Flags: needinfo?(aosmond)

@jamie: desktop zooming is default enabled on 81 now, are you seeing this issue on desktop and how bad is it?

Flags: needinfo?(jnicol)

I can reproduce on desktop very easily. If you enable gfx.webrender.debug.picture-caching, zoom in slightly to any page, and then scroll continuously, you will see the entire screen flash red pretty frequently.

On simple pages my laptop is powerful enough to render the entire screen fairly quickly, so I wouldn't notice anything without the picture caching debug overlay open. But on more complex pages or less powerful machines it might be noticeable.

Flags: needinfo?(jnicol)
Flags: needinfo?(aosmond)
Blocks: gfx-83
No longer blocks: gfx-triage

Assigning to Andrew per discussion in #gfx. The patch above apparently helps with some pages but not all so there's some more work needed.

Assignee: nobody → aosmond
Attachment #9177412 - Attachment description: Bug 1642308 - Calculate tile rect size based on suppressed fract offset. → Bug 1642308 - Fix some snapping related picture cache tile rect calculations.

I'm not sure this will solve all of the invalidation concerns, but I think it will solve the ones related to fractional offsets and snapping highlighted in this bug.

Flags: needinfo?(aosmond)
Pushed by aosmond@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/37708d37f554 Fix some snapping related picture cache tile rect calculations. r=gw

(In reply to Narcis Beleuzu [:NarcisB] from comment #13)

Backed out for dt failures on browser_aboutdebugging_persist_connection.js

Backout link: https://hg.mozilla.org/integration/autoland/rev/9e61e67ae323fc6b32a5a8a0a3e367899a5b0afd
Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=317238591&repo=autoland&lineNumber=2782

This code doesn't run on that (non-WR) target.

Flags: needinfo?(aosmond)
Pushed by aosmond@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/1b7bc7ee74b4 Fix some snapping related picture cache tile rect calculations. r=gw

This is troubling in that these tests should be running without WR but the logs suggest that may not be the case....

Flags: needinfo?(aosmond)

This seems wrong: https://searchfox.org/mozilla-central/rev/7ef5cefd0468b8f509efe38e0212de2398f4c8b3/gfx/config/gfxConfigManager.cpp#237

The test harness should be setting MOZ_WEBRENDER=0 to force-disable WR, but we're apparently only respecting it in some cases?

Previously we respected it except if we were in the marionette test that specifically exercises the rollout code: https://hg.mozilla.org/mozilla-central/rev/7ba7a04cfa43596cdd87041c50563adc55445dd3#l10.442

Andrew: Are you still working on this, we have a dependency on this for DTZ

Flags: needinfo?(aosmond)

I can't land without the (unrelated) assert in the test case getting fixed.....

Flags: needinfo?(aosmond)

I think I found the cause of the assert, although I couldn't repro locally.

The way that tiles get recycled between scenes with new display lists can make storing prev_device_valid_rect in Tile sometimes be incorrect. I moved it to be in the TileDescriptor struct and try runs seem to pass OK with that. The TileDescriptor struct can be relied on to correctly store information from a previous frame for invalidation purposes.

Pending try with attached patch is https://treeherder.mozilla.org/#/jobs?repo=try&revision=992c653dbc5678b8fbe3f2f6c0de3dbf3967cb75

Pushed by aosmond@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5303394c527b Fix some snapping related picture cache tile rect calculations. r=gw
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → 83 Branch

If I'm following things correctly, gw fixed the assertion problem in bug 1672077, so this can reland now?

Depends on: 1672077

I'm also assuming for now that we're not going to try to uplift this (and dependencies) to Fx83.

Blocks: desktop-zoom-post
No longer blocks: desktop-zoom-release
Pushed by aosmond@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/404ed30949a4 Fix some snapping related picture cache tile rect calculations. r=gw
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → 84 Branch
Regressions: 1674480
Status: RESOLVED → REOPENED
Flags: needinfo?(aosmond)
Resolution: FIXED → ---
Target Milestone: 84 Branch → ---
Blocks: 1675030
Attachment #9181721 - Attachment is obsolete: true
Flags: needinfo?(aosmond)

jrmuizel tested the two different parts of this patch and confirmed the fractional offset piece has no regression, and the valid rect piece does. I've filed bug 1675030 to track the valid rect piece of this patch and will reland the fractional offset piece.

Attachment #9177412 - Attachment description: Bug 1642308 - Fix some snapping related picture cache tile rect calculations. → Bug 1642308 - Fix fract offset calculations for picture cache tile invalidation.
Pushed by aosmond@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/af9de2112747 Fix fract offset calculations for picture cache tile invalidation. r=gw
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → 84 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: