Scroll is jittery on sites with virtual scrolling
Categories
(Core :: Panning and Zooming, defect, P2)
Tracking
()
People
(Reporter: brawaru, Assigned: hiro)
References
(Regression)
Details
(Keywords: regression)
Attachments
(5 files, 2 obsolete files)
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:120.0) Gecko/20100101 Firefox/120.0
Steps to reproduce:
- Open a site that heavily relies on virtual scrolling like https://bsky.app (can also be reproduced to a lesser degree on Mastodon, https://discord.com/app (channels / messages)).
- Scroll down or up
Actual results:
Scroll is very jittery compared to what it used to be in 120.
Expected results:
Page scrolls as normal and no jitter is observed or it's caused by the implementation of virtual scrolling itself, not the browser.
Reporter | ||
Comment 1•11 months ago
|
||
Notes:
- Firefox <=120.0.1 had a different bug that affected scrolling on Bluesky, Firefox 121 it made so severe that the site is no longer usable in Firefox. I personally had to upgrade to 120.0.1 and hope this can be fixed soon. This bug affects all sites (to some degree) that use virtual scrolling.
- The demonstration video was recorded in clear Firefox 121 profile, no settings were changed.
Comment 2•11 months ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Panning and Zooming' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Comment 3•11 months ago
|
||
Thanks for the video it made reproducing much easier.
Comment 4•11 months ago
|
||
As expected, setting layout.css.scroll-anchoring.absolute-update=false fixes the bug.
Comment 5•11 months ago
|
||
(In reply to Sasha Sorokin [:brawaru] from comment #1)
- Firefox <=120.0.1 had a different bug that affected scrolling on Bluesky,
There are a lot of comments there so it wasn't immediately clear to me, does this older bug still exist (say in 120)? If so, are the steps to reproduce? Thanks.
Comment 6•11 months ago
|
||
Set release status flags based on info from the regressing bug 1856088
Reporter | ||
Comment 7•11 months ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #5)
does this older bug still exist (say in 120)? If so, are the steps to reproduce? Thanks.
I think it's a different bug, because it affected only Bluesky, and all other sites seemed to be alright. The reproduction steps are similar — you just scroll the feed down, eventually your scroll seems to jump randomly (mostly scroll you back a bit, as if you had a broken mouse wheel), it also may stick to the very bottom of the page, continuing loading of the feed (see video in my comment on GitHub).
And yes, I just reproduced this in 120.0.1. Important observation is that turning off smooth scrolling seems to make it occur less. It also happens slightly less often now after Bluesky removed the sliding feeds bar at the top (which was replaced with the fixed tabs bar for the current feed).
I personally believe this one may actually be Bluesky's own bug, but they rejected it on the merits of non-reproduction in Chromium, which only had the feed blinking (you can also see me trying to reproduce this in Chromium via above link). This is why I filled it in Webcompat rather than immediately heading to this tracker.
Reporter | ||
Comment 8•11 months ago
|
||
Updating and disabling layout.css.scroll-anchoring.absolute-update
fixed the jitter, but not the the bug reported on Webcompat: there are still weird scrollbacks as described above.
Comment 9•11 months ago
|
||
:hiro, since you are the author of the regressor, bug 1856088, could you take a look? Also, could you set the severity field?
For more information, please visit BugBot documentation.
Assignee | ||
Comment 10•10 months ago
|
||
I can't since I don't have accounts on bluesky. Can someone provide a local version of the site I can reproduce locally even if it's gigantic?
Reporter | ||
Comment 11•10 months ago
|
||
I can give you the invite code. Do I use your Bugzilla email to send it?
Assignee | ||
Comment 12•10 months ago
|
||
No worries. Timothy has already sent an invitation to me. Though as of now I am not going to use it since I don't want to have accounts on the site. I'd rather wait to local test cases. :)
That's said, it looks like https://github.com/bluesky-social/social-app is the source code of the bluesky web site? If so, it's worth trying?
Comment 14•10 months ago
|
||
If we can, I think we should try to fix this during this cycle.
Comment 15•10 months ago
|
||
Discussed during today's APZ meeting, marking as S2 based on user impact.
Comment 16•10 months ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #14)
If we can, I think we should try to fix this during this cycle.
We are in the final week of beta for Fx122, do thing you'll have a fix in time or will this go with Fx123?
Comment 17•10 months ago
|
||
We should at least flip the pref to nightly-only (or disabled for now...)
Comment 18•10 months ago
|
||
oh, we shipped this in 121...
Comment 19•10 months ago
|
||
(In reply to Donal Meehan [:dmeehan] from comment #16)
(In reply to Timothy Nikkel (:tnikkel) from comment #14)
If we can, I think we should try to fix this during this cycle.
We are in the final week of beta for Fx122, do thing you'll have a fix in time or will this go with Fx123?
Sorry, I'm not really in a position to answer this, I don't know the code affected by the regressing patch well enough.
Comment 20•10 months ago
|
||
Sorry, redirecting my NI to :botond - if there were any discussions on the next steps, etc. re: comment 15
We are near the end of the beta cycle. 122.0b9 builds on 2024-01-12. At this point, it might be too late?
Comment 21•10 months ago
|
||
(In reply to Donal Meehan [:dmeehan] from comment #20)
Sorry, redirecting my NI to :botond - if there were any discussions on the next steps, etc. re: comment 15
We are near the end of the beta cycle. 122.0b9 builds on 2024-01-12. At this point, it might be too late?
(We'll discuss again during today's meeting and I'll post any updates here.)
Updated•10 months ago
|
Comment 22•10 months ago
|
||
Discussed this today:
- Both Hiro and I have tried to reproduce this, with no success so far. We'll keep trying but for this reason an immediate fix is not likely to be forthcoming.
- With regards to the suggestion in comment 17 to flip the pref added in bug 1856088 (the regressing bug): it's a tough call, since flipping the pref will bring back the regression fixed in bug 1856088. We think that may be even more impactful as it potentially affects a more widely used site (Facebook) and a more common scrolling method (touchpad, vs. autoscroll in this bug). For this reason, we are leaning towards leaving the pref as is, even if that means shipping this regression in 122.
Reporter | ||
Comment 23•10 months ago
|
||
(In reply to Botond Ballo [:botond] from comment #22)
vs. autoscroll in this bug
Just to clarify, this bug is perfectly reproducible with regular scrolling, and is even more noticeable by disabling smooth scrolling (pretty sure it's not worsened, just easier to see). I used auto-scrolling in demonstration to speed up the reproduction. Sorry for causing the confusion with that.
Comment 24•10 months ago
|
||
I was able to reproduce on the test bluesky account using a touchpad on macos. The str I used were:
scroll down a large number of pages
scroll up using a fling that travels about 2 pages.
observe the page during the momentum phase. after doing this several times I can observe a jump in the page contents.
Assignee | ||
Comment 25•10 months ago
|
||
Looks like there are two different bugs? Jittery scrolling and jumpy scrolling? Given that flipping layout.css.scroll-anchoring.absolute-update pref fixes jittery one, but not jumpy one, it would be nice to open a new bug for the jumpy one.
Comment 26•10 months ago
|
||
(In reply to Hiroyuki Ikezoe (:hiro) from comment #25)
Looks like there are two different bugs? Jittery scrolling and jumpy scrolling? Given that flipping layout.css.scroll-anchoring.absolute-update pref fixes jittery one, but not jumpy one, it would be nice to open a new bug for the jumpy one.
Why do you think there are two different bugs? Where did you get the information that the pref flip fixes one of the bugs but not the other one?
Updated•10 months ago
|
Updated•10 months ago
|
Assignee | ||
Comment 27•10 months ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #26)
(In reply to Hiroyuki Ikezoe (:hiro) from comment #25)
Looks like there are two different bugs? Jittery scrolling and jumpy scrolling? Given that flipping layout.css.scroll-anchoring.absolute-update pref fixes jittery one, but not jumpy one, it would be nice to open a new bug for the jumpy one.
Why do you think there are two different bugs? Where did you get the information that the pref flip fixes one of the bugs but not the other one?
Oop, I forgot mentioning the information source. It's comment 8.
Comment 28•10 months ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #24)
I was able to reproduce on the test bluesky account using a touchpad on macos. The str I used were:
scroll down a large number of pages
scroll up using a fling that travels about 2 pages.
observe the page during the momentum phase. after doing this several times I can observe a jump in the page contents.
I can reproduce this on Linux with these STR. The issue is fixed by setting layout.css.scroll-anchoring.absolute-update=false
.
Comment 29•10 months ago
|
||
Comment 30•10 months ago
|
||
Comment 31•10 months ago
|
||
I added the following code to log composited scroll offsets and their timestamps:
diff --git a/gfx/layers/apz/src/AsyncPanZoomController.cpp b/gfx/layers/apz/src/AsyncPanZoomController.cpp
--- a/gfx/layers/apz/src/AsyncPanZoomController.cpp
+++ b/gfx/layers/apz/src/AsyncPanZoomController.cpp
@@ -4819,16 +4819,26 @@ bool AsyncPanZoomController::AdvanceAnim
// smooth. If an animation frame is requested, it is the compositor's
// responsibility to schedule a composite.
bool requestAnimationFrame = false;
nsTArray<RefPtr<Runnable>> deferredTasks;
{
RecursiveMutexAutoLock lock(mRecursiveMutex);
{ // scope lock
+ float yOffset = GetEffectiveScrollOffset(
+ AsyncTransformConsumer::eForCompositing, lock)
+ .y.value;
+ if (yOffset != 0) { // exclude things that aren't scrolling
+ printf_stderr("[SAMPLE] %f, %f\n",
+ (aSampleTime.Time() - TimeStamp::ProcessCreation())
+ .ToMilliseconds(),
+ yOffset);
+ }
+
CSSRect visibleRect = GetVisibleRect(lock);
MutexAutoLock lock2(mCheckerboardEventLock);
// Update RendertraceProperty before UpdateAnimation() call, since
// the UpdateAnimation() updates effective ScrollOffset for next frame
// if APZFrameDelay is enabled.
if (mCheckerboardEvent) {
mCheckerboardEvent->UpdateRendertraceProperty(
CheckerboardEvent::UserVisible, visibleRect);
and graphed the results while reproducing Timothy's STR (from comment 24) with both values of layout.css.scroll-anchoring.absolute-updates
. The jumps are fairly clear in the first graph.
Assignee | ||
Comment 32•10 months ago
|
||
This is just a guess though, in bug 1856088 we introduced ScrollUpdateType::MergeableAbsolute
and we merged the delta into both sampled offsets, but we should merge the delta only in the latest offset?
Comment 33•10 months ago
|
||
Note sure if Comment 32 was intended for :botond? Adding needinfo
Comment 34•10 months ago
|
||
For regression tracking purposes, I'm going to suggest that this be marked firefox122: wontfix
:
- Not backing out the regressing change (including by flipping the default value of the
layout.css.scroll-anchoring.absolute-update
pref`) is probably still the right call. While we now understand this issue is not specific to autoscroll but also affects e.g. touchpad scrolling, we're still talking about potentially regressing a similar issue on another popular site (Facebook) vs. keeping this regression on Bluesky. - Given the tricky nature of the code involved, uplifting a fix (that's not just a backout) to 122 would be risky at this stage. We will continue pursuing a fix and hopefully have one to uplift early in the 123 cycle.
(Regarding the change described in comment 32, I did try it locally and it doesn't seem to fix the issue. A deeper investigation is needed to come up with a working fix.)
Updated•10 months ago
|
Assignee | ||
Comment 35•10 months ago
|
||
Sasha, would you mind providing a performance profile result while the jump happens?
My hypothesis is that when a scroll anchor adjustment caused by some DOM node deletions/additions happened if reflowing the contents and building display list for the change takes some amount of time, say 50ms or a couple of VSync frames, in the meantime APZ can scroll further, then when APZ received the updated contents, APZ updates the scroll position where the adjustment tries to set since bug 1856088, it will have some gap between them, so it causes a small jump.
Reporter | ||
Comment 36•10 months ago
|
||
Like this? https://share.firefox.dev/3tYKLiO (scrolling back and forth causing jumps) / https://share.firefox.dev/3u07uLu (scrolling only down causing jumps)
Assignee | ||
Comment 37•10 months ago
|
||
Sasha, thanks for the profile results.
Both results have indeed some janky frames on the content's main-thread. Though it's hard to tell whether the frames happened just before the scroll jumps.
Anyways, I've implemented a mitigating way based on my hypothesis. Here is a try run with the way. You can download a Windows version of the binaries via the link of "target.zip" in the "Artifacts and Debugging Tools" pane when you clicked the "Windows 2012 opt B" symbol. (this is the direct link to the "target.zip", FWIW).
Sasha, would you mind trying the binary to see whether this bug is mitigated or not? Thanks!
Assignee | ||
Comment 38•10 months ago
|
||
Assignee | ||
Comment 39•10 months ago
|
||
Depends on D199313
Assignee | ||
Comment 40•10 months ago
|
||
I pushed the changes of the way for references.
I should note two interesting observations I experienced while I was trying it locally;
- I saw not-smooth scrolling while I was trying to see whether the change doesn't have any side-effects on bsky.app, it may be able to called "jump", but I saw it with enabling/disabling
layout.css.scroll-anchoring.absolute-update
, so I suppose it's something else - I suspected the mitigation regresses the wpt added in bug 1856088, but it doesn't
Reporter | ||
Comment 41•10 months ago
|
||
(In reply to Hiroyuki Ikezoe (:hiro) from comment #37)
Here is a try run with the way.
I've tried this build and unfortunately the scrolling issues are still there. Here's the screen recording (not sure why it got recorded with lower FPS, but you can still see it being jerky and getting in the way a couple of times, as well as rubber banding during auto-scrolling).
Comment 42•9 months ago
|
||
I've debugged this a bit locally and I believe I understand what's causing the jumps that I'm seeing (which are shown in the graph in comment 29). The explanation is simpler than I thought:
- Due to the asynchronicity between the web content main thread and APZ, the current scroll position seen by the main thread can be one or more frames behind the scroll position seen by APZ.
- (The APZ frame delay mitigates this such that in the case where it's no more than 1 frame behind, scroll linked effects should remain in sync. However, besides scroll linked effects, other aspects of correctness (such as not having scroll position jumps like the ones observed in this bug) should not depend on the main thread's ability to keep up with APZ.)
- Scroll anchoring happens on the main thread, and triggers a scroll position update calculated based on the main thread scroll position.
- By the time this update reaches APZ, if there's a scroll animation in progress that updates the scroll offset every frame (such as a fling), the scroll position in APZ will have advanced by one or more frames relative to the one the main thread based its computation on.
- Prior to bug 1856088, the scroll position update triggered by scroll anchoring was a relative one, so it got combined gracefully with the additional frames of scrolling that happen in APZ, i.e. the composited scroll offset reflected both the additional frames of the fling, and the scroll anchor adjustment.
- Since bug 1856088, the scroll position update triggered by scroll anchoring is an absolute one, so it effectively "overwrites" the additional frames of scrolling that have happened in APZ, creating the appearance of "jump" backwards relative to the direction of scrolling.
Based on this diagnosis, I'm fairly sure the site isn't doing anything wrong and this is just an APZ bug; I'm accordingly removing the [apz-needsdiagnosis] flag. I think a fix will need to be along the lines of finding a way to fix the scenario in bug 1856088 without using absolute updates for scroll anchoring.
Assignee | ||
Comment 43•9 months ago
|
||
Thanks Botond. I've started fixing bug 1856088 in a different way which is an alternative approach in bug 1856088 comment 2. That is "We might want to defer the clamp later?" in the comment. It means we mainly revert bug 1779404 and bug 1852818. There was a concern about the approach, but I can't recall it at all. So I'd take the approach here and "let's see what happens".
FWIW, here's a try run;https://treeherder.mozilla.org/jobs?repo=try&revision=14a6ab0edf0f3a14fd2af4491e683c8e80966db9
Assignee | ||
Comment 44•9 months ago
|
||
This change mainly reverts bug 1779404 and bug 1852818.
Updated•9 months ago
|
Assignee | ||
Comment 45•9 months ago
|
||
Two test cases attached in bug 1856088 now work with the previous commit and
work without MergeableAbsolute
.
Updated•9 months ago
|
Updated•9 months ago
|
Updated•9 months ago
|
Comment 46•9 months ago
|
||
Comment 47•9 months ago
|
||
Backed out for causing Wr failures on window-scrollBy-display-change.html
Assignee | ||
Comment 48•9 months ago
|
||
The reason of the window-scrollBy-display-change.html failure is that it's the first paint case, and in D199840 we didn't clamp the scroll offset in the first paint case. I've updated the fixed version.
That's said, I realized that with D199840 and D199841 after-scrollable-range-shrinkage-004.html is flaky. I need to find out the flaky reason.
Assignee | ||
Comment 49•9 months ago
|
||
(In reply to Hiroyuki Ikezoe (:hiro) from comment #48)
The reason of the window-scrollBy-display-change.html failure is that it's the first paint case, and in D199840 we didn't clamp the scroll offset in the first paint case. I've updated the fixed version.
That's said, I realized that with D199840 and D199841 after-scrollable-range-shrinkage-004.html is flaky. I need to find out the flaky reason.
Okay the reason is bug 1845646. With replacing these InvalidateFrame stuff with RecvDidUnsuppressPainting
, I can no longer see failure locally. But still I have no idea why the original change for bug 1856088 could avoid the race.
Assignee | ||
Comment 50•9 months ago
|
||
That's unfortunate.
Because of bug 1845646, this scrollBy call (which is to scroll the anchor node initially) is omitted in APZ. This case also happens with the original change for bug 1856088, but it doesn't matter since the original change handles scroll position change by scroll anchoring as absolute position change.
I am going to add a workaround in the test just like we did in bug 1841305. And I will land it after the next soft freeze finished just in case.
Assignee | ||
Comment 51•9 months ago
|
||
I've started implementing bug 1845646 to fix the test failure since even with the workaround the test in question is quite flaky.
A good news is that fixing bug 1845646 wasn't so hard.
Assignee | ||
Comment 52•9 months ago
|
||
Oops, I forgot mentioning an important thing. Without fixing the race, the test, after-scrollable-range-shrinkage-004.html is stable only with the fix for bug 1856088.
In other words, we can't discard any relative scroll position updates to make the test stable. That means, even with the alternative approach Botond suggested in a review comment, the test is also flaky.
Comment 53•8 months ago
|
||
Comment 54•8 months ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/1a5755ffb35b
https://hg.mozilla.org/mozilla-central/rev/8be226e74ca5
Assignee | ||
Comment 55•8 months ago
|
||
Finally! Thanks Botond for your dedicated reviews and advice, etc.!
Comment 56•8 months ago
|
||
The patch landed in nightly and beta is affected.
:hiro, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox124
towontfix
.
For more information, please visit BugBot documentation.
Assignee | ||
Comment 57•8 months ago
|
||
This bug is important, but I'd bake it on nightly for a while.
Updated•8 months ago
|
Description
•