Closed Bug 1648641 Opened 4 years ago Closed 4 years ago

Part of screen goes blank when pinch-zoomed

Categories

(Core :: Panning and Zooming, defect, P3)

Desktop
Windows 10
defect

Tracking

()

RESOLVED FIXED
82 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox82 --- fixed

People

(Reporter: ktaeleman, Assigned: kats)

References

(Regression)

Details

(Keywords: regression)

Attachments

(5 files, 5 obsolete files)

Attached image image.png

STR:
Go to https://digg.com/2020/risk-covid-19-next-18-months-visulalized
Pinch-Zoom in the middle of the page until you see part of the image going blank on the right side.
Notice how the image and text stays blank in that area even releasing zoom and panning.

Attached file About:Support

On mac I wasn't able to get the blank to stay on my screen for a long time, but I did see that previously painted areas would get drawn as white and then get painted properly.

Markus suspects this is a regression from bug 1635472.

See Also: → 1648923
See Also: 1648923
See Also: → 1649191

I'm not 100% sure if I'm reproducing the same issue or a different one -- on Linux, the blankness is only temporary, and only occurs at high zoom levels -- but what I'm seeing is not a regression from bug 1635472, it has been in place ever since the STR became possible (with relevant pref flips) in early 2019.

The severity field is not set for this bug.
:botond, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(botond)

S3 for an issue that only affects a configuration that is currently not enabled by default.

Severity: -- → S3
Flags: needinfo?(botond)
Priority: -- → P3

I was planning to try and get back to bug 1649191 after Botond landed his cleanup patches in bug 1519285 as they would help make better sense of the metrics that I was seeing. And my hope was that this would turn out to be the same underlying problem as that one, so fixing that would automatically fix this. However it would be good to try to verify that, by having Kris record some logging while reproducing this problem. (I wasn't able to reproduce it myself). ni? to myself as a reminder to figure out which logs I need.

Flags: needinfo?(kats)

Kris, can you run with MOZ_LOG="apz.inputqueue:5,apz.controller:5,apz.displayport:5" and collect the logs from reproducing this scenario? If, after reproducing the problem, you see a checkerboard report for it show up in about:checkerboard, then attaching the raw log for it to this bug would also be useful.

Flags: needinfo?(kats) → needinfo?(ktaeleman)

@Kats: Log file attached

Flags: needinfo?(ktaeleman) → needinfo?(kats)

I see the checkerboarding in the log but it's not clear to me why the displayport is coming out the way it is. I'll probably need to add more logging for the main-thread displayport calculation code.

Assignee: nobody → kats
Flags: needinfo?(kats)

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=75624337e3323d2d9f32062c366a883c216ae903 has more logging for the main-thread displayport calculation. Kris, when you have a chance, can you run this and collect the logs? Same MOZ_LOG as in comment 8, but note that this build will probably run a bit slower due to the extra log volume.

Forgot the flag...

Flags: needinfo?(ktaeleman)
Attached file kats-Sep10-logs.zip

Logs attached, let me know if that didn't work

Flags: needinfo?(ktaeleman) → needinfo?(kats)

Thanks, it has the logs. I'm hoping this is enough for me to figure out the problem. The displayport on the page is definitely mispositioned, and I think it's because we're not accounting for the visual viewport offset properly. Seems similar to https://bugzilla.mozilla.org/show_bug.cgi?id=1649191#c10

In this case I see for example:

[Child 16488: Main Thread]: V/apz.displayport Setting base rect (x=0, y=0, w=75888, h=41328) for scrollId=5
...
[Child 16488: Main Thread]: V/apz.displayport SetDisplayPortMargins (l=-1224.337402, t=856.524048, r=2805.337402, b=3448.476074) on scrollId=5, newDp=(x=0, y=-23706, w=50100, h=62946)

The base rect is basically the size of the screen, which seems correct. There is a negative left margin because the user is zoomed in and scrolled over to the right of the layout viewport. But then the computed newDp has an x=0 and narrow width, so it doesn't cover the area the user is looking at. The x=0 in particular seems like the problem. I'll need to step through the displayport computation and see why it's coming out that way.

Flags: needinfo?(kats)

Turns out Markus was right, as usual - this appears to be a regression from bug 1635472. At least as far as I can tell - I can't reproduce the exact STR (depends on screen size) but I made a reduced test case which sets the displayport multiplier prefs to zero and can reproduce it there when WR is enabled.

We use negative left displayport margins to shift the displayport over for visual viewport positioning. But the WR position alignment effectively undoes that. I think there is a fundamental flaw in the position alignment code because it tries to align the position to a larger alignment, while keeping the displayport size the same, but doing this means that the final aligned displayport may not contain the unaligned displayport, and it needs to do that in order to avoid checkerboarding in these scenarios.

Regressed by: 1635472
Has Regression Range: --- → yes

Kris, can you try pulling a build from the try push at https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=c77c43541f14b8df410bf0807eedae94a3480814 and seeing if you can reproduce the problem there? I'm expecting that to not have the problem, so if you do see it, collecting logs again (same MOZ_LOG as comment 8) would be helpful.

^

Flags: needinfo?(ktaeleman)

No functional changes here, just refactoring code that is already duplicated
in a bunch of tests.

Depends on D90499

This ensures that even with the displayport margins dropped to zero we don't
get any checkerboarding. This currently fails with WR enabled, because the
displayport alignment code moves the displayport so it doesn't enclose the
unaligned displayport, and that's bad.

Depends on D90500

The next patch requires gfxVars::UseWebRender to work in gtests, so we need
to make sure gfxPlatform is initialized correctly for all the tester classes.

Depends on D90501

The displayport is aligned to screen pixels by an alignment number; there was
code to increase that alignment number for the WebRender codepath. This is
desirable because it causes the displayport to move less frequently, which
provides better performance characteristics with WebRender. However, the
way that was implemented, it was possible for the aligned displayport to not
completely enclose the unaligned displayport, which could then result in
perma-checkerboarding.

This patch removes that code and replaces it with a simpler approach to scale
up the existing alignment number by a factor based on the size of the scroller's
base rect (roughly how much of the scroller is visible to the user).

This does make the displayports larger, more so for scrollers with more visible
area. This has negative performance implications, so we mitigate that by also
scaling down the displayport margin multiplier by the same scaling factor. This
isn't perfect, but seems to work well enough in practice.

Depends on D90502

Comment on attachment 9176192 [details]
Bug 1648641 - Add a bunch of verbose-level displayport logging. r?nical

Revision D90499 was moved to bug 1654836. Setting attachment 9176192 [details] to obsolete.

Attachment #9176192 - Attachment is obsolete: true

Comment on attachment 9176193 [details]
Bug 1648641 - Extract helper method for checking for checkerboards. r?nical

Revision D90500 was moved to bug 1654836. Setting attachment 9176193 [details] to obsolete.

Attachment #9176193 - Attachment is obsolete: true

Comment on attachment 9176194 [details]
Bug 1648641 - Add a test for checkerboarding with zero displayport multipliers. r?nical

Revision D90501 was moved to bug 1654836. Setting attachment 9176194 [details] to obsolete.

Attachment #9176194 - Attachment is obsolete: true

Comment on attachment 9176195 [details]
Bug 1648641 - Initialize gfxVars in all APZ gtests. r?nical

Revision D90502 was moved to bug 1654836. Setting attachment 9176195 [details] to obsolete.

Attachment #9176195 - Attachment is obsolete: true

Comment on attachment 9176196 [details]
Bug 1648641 - Redo how we expand displayport alignment for webrender. r?nical

Revision D90503 was moved to bug 1654836. Setting attachment 9176196 [details] to obsolete.

Attachment #9176196 - Attachment is obsolete: true

(Sorry for the noise, I meant to attach those patches to bug 1654836)

@kats: Sorry, I missed this :/

It fixes the issue as described, the only remaining issue is that there is a fraction of a second of checkerboarding on the image when zooming out, but I would not consider that as a blocker.

Flags: needinfo?(ktaeleman)

Great, we can close this bug as fixed by 1654836 and use bug 1654290 or one of the other ones for transient checkerboarding.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED

Ok, I'm going crazy. the fix has caused a worse issue where after a while, the issue appears.

Zoom out, all looks good, look away from the screen, half the image is gone. I'll record with the logging.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

While trying to repro the image half disappearing, the originial issue showed again, but is harder to repro. log file is the original issue reappearing.

I waded through the logs and while I see the checkerboarding and raw displayport numbers I got different numbers when I walked through the code myself. It's kinda tricky because I have to line up logs from two different processes and do a bunch of unit conversion manually so it's quite likely I made mistake somewhere. Nonetheless, there does appear to still be a problem where the aligned displayport can not contain the unaligned displayport. This can happen with all the alignment codepaths, not just WR. Consider:

alignment = (128, 128)
rect = (x=127, y=127, w=127, h=127)

Then the code here will emit: x=0, y=0, w=128, h=128 which doesn't contain the original rect. As :nical said in bug 1654836 comment 0 we actually want to round out the whole rect rather than rounding the position and size separately. I'm not yet sure if that will fix this bug but it's easy enough to make a build with that and try it out. And if not we should get a log with timestamps enabled so I can more easily correlate logs across processes.

One more try build for you, Kris: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=e738d7a4344d5a5386e342ea613aabf132cfaadd

If this doesn't fix the issue please run with MOZ_LOG="apz.inputqueue:5,apz.controller:5,apz.displayport:5,timestamp" and attach logs. (Note the MOZ_LOG here include "timestamp" which I didn't have before, but which would be useful).

Flags: needinfo?(ktaeleman)

Looking good, can no longer reproduce the issue!

Flags: needinfo?(ktaeleman)

Great! Just need to clean that up a bit and sure we handle some boundary cases without resizing the displayport.

Now I'm confused again... I misread the code originally.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #35)
Nonetheless, there does appear to still be a problem where the aligned displayport can not contain the unaligned displayport. This can happen with all the alignment codepaths, not just WR. Consider:

alignment = (128, 128)
rect = (x=127, y=127, w=127, h=127)

Then the code here will emit: x=0, y=0, w=128, h=128 which doesn't contain the original rect.

This is wrong. The code actually emits x=0, y=0, w=256, h=256 which does contain the original rect. So the patch that I wrote would have made the displayport unnecessarily bigger in some cases. But it shouldn't have changed any scenarios such that the aligned DP contained the unaligned one where it didn't before. So probably my patch was just papering over a deeper problem.

I guess I need to dig through the logs a bit more to figure out what's really going on here. Sigh.

Per our discussion here's some pointers to the displayport code:

This is where APZ computes the desired displayport and figures out the margins that it wants the main-thread to apply: https://searchfox.org/mozilla-central/rev/89d33e1c3b0a57a9377b4815c2f4b58d933b7c32/gfx/layers/apz/src/AsyncPanZoomController.cpp#3750
This happens as part of the repaint request which is sent to the main thread and arrives in the content process here (for the root scrollable, which is the case in this bug). The code here does some adjustments to the incoming margins based on whether or not the scroll offset APZ requested could be applied, and then stores it as a property on the nsIContent, here.

Then later, at paint-time, various pieces of code that need to use the displayport obtain it by calling nsLayoutUtils::GetDisplayPort, the meat of which is in the GetDisplayPortFromMarginsData function. This is what takes the margins, applies it to the base rect, does the alignment and all the other things that need doing, and returns a final rect.

The base rect calculation mostly happens here. Searching for sDisplayportLog can turn up some other slightly-relevant bits of code that deal with the displayport.

Per discussion on #apz yesterday, Kris discovered this is actually fixed now, and the fix range includes bug 1654836. So let's say it was fixed by that.

Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 82 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: