Open Bug 1758352 Opened 2 years ago Updated 2 years ago

On Nvidia WebRender, Page Down causes pages to fail to paint

Categories

(Core :: Panning and Zooming, defect, P3)

Firefox 97
defect

Tracking

()

Tracking Status
firefox-esr91 --- unaffected
firefox98 --- wontfix
firefox99 --- wontfix
firefox100 --- wontfix
firefox101 --- wontfix

People

(Reporter: nyanpasu64, Unassigned)

References

Details

(Keywords: regression)

Attachments

(3 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:97.0) Gecko/20100101 Firefox/97.0

Steps to reproduce:

This occurs on a freshly refreshed profile (and also with partial present forcibly disabled), but not in Troubleshoot mode (software WebRender).

Actual results:

The bottom of the page appears white (fails to paint) until I move my mouse over it. Similar glitches appear when pressing Page Up. If you don't see the bug, try resizing your window, (optionally scrolling to the top), refreshing the page, and trying again.

Expected results:

The page does not fail to paint sections.

The screenshot was taken with X11 font DPI set to 120. If I change DPI to 96 and restart Firefox, I still get the same rendering bug.

Is this related to Bug 1726841?

If the contents of the link changes, I was also able to reproduce the bug on this saved file.

The Bugbug bot thinks this bug should belong to the 'Core::Graphics: WebRender' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Graphics: WebRender
Product: Firefox → Core

Although I can't reproduce with pagedn/pageup I can reproduce by holding the down or up arrow. I get this regression range

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=1f1717cebe55e231d82db0277692f49f6ba65401&tochange=788cf4242ab0fef16b67d917e15d4b11cb749c4c

-> bug 1730998

Status: UNCONFIRMED → NEW
Component: Graphics: WebRender → Panning and Zooming
Ever confirmed: true
Flags: needinfo?(hikezoe.birchill)
Regressed by: 1730998

Set release status flags based on info from the regressing bug 1730998

Possibly related: Firefox Android has a white flickering issue when scrolling long pages quickly (https://github.com/mozilla-mobile/fenix/issues/24161).

Has Regression Range: --- → yes

Though bug 1730998 actually regressed this case, there seems underlying issues in our layout side. On my linux box, if the browser window height is not so high, there appear regions where the content is outside of the browser window, I mean, there are regions not being able to visible and at that moment the vertical scrollbar thumb is going to be outside of the viewport as well. Is this the right behavior? Anyway I will try to create a simplified test case.

I am not sure what I am seeing on my Linux box is the same issue or not, what I am seeing is that the content gets delayed to be painted for a while, it gets painted without mouse moving after a while. The regression range causing the issue I am seeing is; https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=c25899d7b631470b983650ee254177381a62eaad&tochange=573659457e9c755806b3503803c05f45871fbab7, bug 1667475 looks quite suspicious in the range.

(EDITED) bug 1667475 is unrelated. The issue what I was seeing is not reproducible on the latest nightly, that means I did tracked down a different issue which seems to have been already fixed.

I was in a rabbit hole why this issue happens only on keyboard scrolling. The reason is that in this issue's case PresShell::ScrollLine (or Page) is invoked by pressing the keys in question. And the reason why bug 1730998 caused this issue is that PresShell::ScrollLine gets called in between nsRefreshDriver tick calls. Thus what's actually happening is;

  1. a RepaintRequest is queued as an early runner of nsRefreshDriver
  2. PresShell::ScrollLine gets called
  3. the PresShell::ScrollLine ends up appending a pure relative ScrollPositionUpdate into ScrollFrameHelper::mScrollUpdates
  4. When the early runner gets invoked in nsRefreshDriver::Tick, we fail to use the displayport margins for the RepaintRequest due to this APZCCallbackHelper::IsScrollInProgress check, the reason why IsScrollInProgress returns true is 3)

So, I'd say our keyboard scrolling has been relying on the assumption that RequestRequest comming from APZ should be processed before handling keyboard events, that sounds unfortunate. We will have to eliminate the assumption.

Flags: needinfo?(hikezoe.birchill)
Flags: needinfo?(hikezoe.birchill)

https://phabricator.services.mozilla.com/D106757 seems to fix this for me!

I don't remember all the issues surrounding that patch, but from memory I was working on another similar bug where holding the down arrow would cause checkerboarding and I landed a patch to fix that, but I found other issues while debugging and https://phabricator.services.mozilla.com/D106757 was to fix one of those, the only reason I didn't land it is that I was working on a test to go with it but other higher priority things came around because I didn't have a testcase that fixed anything user visible with that patch. The test is in https://phabricator.services.mozilla.com/D107043 but I don't think I got it working (ie failing without the patch, passing with it).

Depends on: 1695598

The bug seems to not occur on that build on my computer.

Flags: needinfo?(nyanpasu64)

Thanks for testing!

Thank you Timothy! I haven't tried https://phabricator.services.mozilla.com/D106757 but indeed it should fix this issue. I will try to make https://phabricator.services.mozilla.com/D107043 work.

Flags: needinfo?(hikezoe.birchill)
See Also: → 1728252

The severity field is not set for this bug.
:botond, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(botond)
Severity: -- → S3
Flags: needinfo?(botond)
Priority: -- → P2

Hey nyanpasu64, would you mind doing double-check that the issue you saw is no longer reproducible on the latest nightly? Bug 1695598 has been landed a couple days ago. Thanks!

Flags: needinfo?(nyanpasu64)

...oops I'd hope it had been fixed. It's not. The bug still occurs on 100.0a1 (2022-03-28) (64-bit), clean profile (deleted contents of ~/.mozilla/firefox/g99crzda.default-nightly, but the program somehow wouldn't start if I deleted the folder entirely).

Tips for reproducing the bug:

  • Keep your mouse in the right of the screen so it doesn't hover the posts, triggering redraws and preventing the bug from showing up.
  • For some reason, the SingleFile page capture has a fixed viewport size, even if your browser window is smaller or bigger.
Flags: needinfo?(nyanpasu64)

Your screenshot confused me initially but it realized me the problem what you are talking about. In short there seem two different issues.

One was observable on the original site, https://bugreports.qt.io/browse/QTBUG-98720, which has been already fixed by bug1695598 as far as I can tell. The other is still observable with using the file in comment 1.

When I commented comment 8, I was able to see the former issue locally and but I wasn't able to reproduce the latter. But now I can reproduce the latter on the latest nightly and tried to find the regression range. It was a bit hard. Initially it pointed out bug 1732358, it doesn't make any sense. So I tried to do mozregression with --pref "fission.autostart:true" option, then the range was https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2cd00fdd2f2a5fe2a72155745b471084e1eff723&tochange=e2e967977e4bc5f9da2bc35f18c6bec772ae1760 . There's bug 1675547 which turned on apz.wr.activate_all_scroll_frames_when_fission. Indeed setting apz.wr.activate_all_scroll_frames_when_fission to false solves the issue at least on my Linux box.

So maybe the latter is not a regression at all, it's there since Fission? I will later try to do mozregression again with the pref on.

I'm not sure what you mean by the two distinct issues. But I was able to reproduce a 10% invisible panel on Linux Nightly with https://bugreports.qt.io/browse/QTBUG-98720 as well as my attachment. Interestingly, the invisible text only appears on my "07 Mar '22 13:00" comment, and this is the only comment which grows its own scrollbar when the window is narrow enough. And disabling apz.wr.activate_all_scroll_frames_when_fission avoids the problem.

Is the latter issue related to https://github.com/mozilla-mobile/fenix/issues/24161? This issue affects https://gitlab.com/exotracker/exotracker-cpp/-/blob/dev/src/gui/pattern_editor.cpp, which contains a code block which (on mobile browsers) also grows a nested horizontal scrollbar. I found out I could also reproduce the white flickering on desktop. However, disabling apz.wr.activate_all_scroll_frames_when_fission (on Android or PC) or enabling desktop mode (on Android) does not avoid that bug, so perhaps it's different.

I also found that enabling apz.force_disable_desktop_zooming_scrollbars fixes both this bug and the Fenix bug report on desktop, but doesn't fix the Fenix issue on mobile.

(Do Firefox core/graphics developers read the Fenix GitHub bug tracker? Or is it staffed by triagers and stale-bot, acting as a black hole of bug reports complete with bureaucracy and triaging but no actual action taken?)

Today, I can no longer reproduce the latter issue I mentioned in comment 17 unfortunately. :/

(In reply to nyanpasu64 from comment #18)

I'm not sure what you mean by the two distinct issues. But I was able to reproduce a 10% invisible panel on Linux Nightly with https://bugreports.qt.io/browse/QTBUG-98720 as well as my attachment. Interestingly, the invisible text only appears on my "07 Mar '22 13:00" comment, and this is the only comment which grows its own scrollbar when the window is narrow enough. And disabling apz.wr.activate_all_scroll_frames_when_fission avoids the problem.

So the problem still persists on the latest nightly is still reproducible on the sites but it's harder to reproduce, right? Anyway, the remaining problem here is definitely related to "apz.wr.activate_all_scroll_frames_when_fission".

Is the latter issue related to https://github.com/mozilla-mobile/fenix/issues/24161? This issue affects https://gitlab.com/exotracker/exotracker-cpp/-/blob/dev/src/gui/pattern_editor.cpp, which contains a code block which (on mobile browsers) also grows a nested horizontal scrollbar. I found out I could also reproduce the white flickering on desktop. However, disabling apz.wr.activate_all_scroll_frames_when_fission (on Android or PC) or enabling desktop mode (on Android) does not avoid that bug, so perhaps it's different.

Yeah, that sounds like a different issue we are tracking now.

(Do Firefox core/graphics developers read the Fenix GitHub bug tracker? Or is it staffed by triagers and stale-bot, acting as a black hole of bug reports complete with bureaucracy and triaging but no actual action taken?)

I don't think core/graphics members normally read the github issues. As far as I can tell those issues are handled by mobile team first, then they file a bug in this bugzilla if they think it's a Gecko bug.

See Also: → 1675547

:botond, is someone going to actively work on this for 101?

Flags: needinfo?(botond)

(In reply to Barret Rennie [:barret] (they/them) from comment #20)

:botond, is someone going to actively work on this for 101?

We discussed this in our APZ meeting today. Summary:

  • The part of this bug which is a regression from bug 1730998, has been fixed by bug 1695598.
  • There is a remaining issue which is more difficult to reproduce. That may also be a regression, but if so from a much older change. The remaining issue seems more like a P3, and we are unlikely to fix it in 101.
Flags: needinfo?(botond)
Priority: P2 → P3
No longer regressed by: 1730998
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: