Scroll is jittery on sites with virtual scrolling
Categories
(Core :: Panning and Zooming, defect, P2)
Tracking
()
People
(Reporter: brawaru, Assigned: hiro)
References
(Regression)
Details
(Keywords: regression)
Attachments
(5 files, 2 obsolete files)
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:120.0) Gecko/20100101 Firefox/120.0
Steps to reproduce:
- Open a site that heavily relies on virtual scrolling like https://bsky.app (can also be reproduced to a lesser degree on Mastodon, https://discord.com/app (channels / messages)).
- Scroll down or up
Actual results:
Scroll is very jittery compared to what it used to be in 120.
Expected results:
Page scrolls as normal and no jitter is observed or it's caused by the implementation of virtual scrolling itself, not the browser.
| Reporter | ||
Comment 1•1 year ago
|
||
Notes:
- Firefox <=120.0.1 had a different bug that affected scrolling on Bluesky, Firefox 121 it made so severe that the site is no longer usable in Firefox. I personally had to upgrade to 120.0.1 and hope this can be fixed soon. This bug affects all sites (to some degree) that use virtual scrolling.
- The demonstration video was recorded in clear Firefox 121 profile, no settings were changed.
Comment 2•1 year ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Panning and Zooming' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Comment 3•1 year ago
|
||
Thanks for the video it made reproducing much easier.
Comment 4•1 year ago
|
||
As expected, setting layout.css.scroll-anchoring.absolute-update=false fixes the bug.
Comment 5•1 year ago
|
||
(In reply to Sasha Sorokin [:brawaru] from comment #1)
- Firefox <=120.0.1 had a different bug that affected scrolling on Bluesky,
There are a lot of comments there so it wasn't immediately clear to me, does this older bug still exist (say in 120)? If so, are the steps to reproduce? Thanks.
Comment 6•1 year ago
|
||
Set release status flags based on info from the regressing bug 1856088
| Reporter | ||
Comment 7•1 year ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #5)
does this older bug still exist (say in 120)? If so, are the steps to reproduce? Thanks.
I think it's a different bug, because it affected only Bluesky, and all other sites seemed to be alright. The reproduction steps are similar — you just scroll the feed down, eventually your scroll seems to jump randomly (mostly scroll you back a bit, as if you had a broken mouse wheel), it also may stick to the very bottom of the page, continuing loading of the feed (see video in my comment on GitHub).
And yes, I just reproduced this in 120.0.1. Important observation is that turning off smooth scrolling seems to make it occur less. It also happens slightly less often now after Bluesky removed the sliding feeds bar at the top (which was replaced with the fixed tabs bar for the current feed).
I personally believe this one may actually be Bluesky's own bug, but they rejected it on the merits of non-reproduction in Chromium, which only had the feed blinking (you can also see me trying to reproduce this in Chromium via above link). This is why I filled it in Webcompat rather than immediately heading to this tracker.
| Reporter | ||
Comment 8•1 year ago
|
||
Updating and disabling layout.css.scroll-anchoring.absolute-update fixed the jitter, but not the the bug reported on Webcompat: there are still weird scrollbacks as described above.
Comment 9•1 year ago
|
||
:hiro, since you are the author of the regressor, bug 1856088, could you take a look? Also, could you set the severity field?
For more information, please visit BugBot documentation.
| Assignee | ||
Comment 10•1 year ago
|
||
I can't since I don't have accounts on bluesky. Can someone provide a local version of the site I can reproduce locally even if it's gigantic?
| Reporter | ||
Comment 11•1 year ago
|
||
I can give you the invite code. Do I use your Bugzilla email to send it?
| Assignee | ||
Comment 12•1 year ago
|
||
No worries. Timothy has already sent an invitation to me. Though as of now I am not going to use it since I don't want to have accounts on the site. I'd rather wait to local test cases. :)
That's said, it looks like https://github.com/bluesky-social/social-app is the source code of the bluesky web site? If so, it's worth trying?
Comment 14•1 year ago
|
||
If we can, I think we should try to fix this during this cycle.
Comment 15•1 year ago
|
||
Discussed during today's APZ meeting, marking as S2 based on user impact.
Comment 16•1 year ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #14)
If we can, I think we should try to fix this during this cycle.
We are in the final week of beta for Fx122, do thing you'll have a fix in time or will this go with Fx123?
Comment 17•1 year ago
|
||
We should at least flip the pref to nightly-only (or disabled for now...)
Comment 18•1 year ago
|
||
oh, we shipped this in 121...
Comment 19•1 year ago
|
||
(In reply to Donal Meehan [:dmeehan] from comment #16)
(In reply to Timothy Nikkel (:tnikkel) from comment #14)
If we can, I think we should try to fix this during this cycle.
We are in the final week of beta for Fx122, do thing you'll have a fix in time or will this go with Fx123?
Sorry, I'm not really in a position to answer this, I don't know the code affected by the regressing patch well enough.
Comment 20•1 year ago
|
||
Sorry, redirecting my NI to :botond - if there were any discussions on the next steps, etc. re: comment 15
We are near the end of the beta cycle. 122.0b9 builds on 2024-01-12. At this point, it might be too late?
Comment 21•1 year ago
|
||
(In reply to Donal Meehan [:dmeehan] from comment #20)
Sorry, redirecting my NI to :botond - if there were any discussions on the next steps, etc. re: comment 15
We are near the end of the beta cycle. 122.0b9 builds on 2024-01-12. At this point, it might be too late?
(We'll discuss again during today's meeting and I'll post any updates here.)
Updated•1 year ago
|
Comment 22•1 year ago
|
||
Discussed this today:
- Both Hiro and I have tried to reproduce this, with no success so far. We'll keep trying but for this reason an immediate fix is not likely to be forthcoming.
- With regards to the suggestion in comment 17 to flip the pref added in bug 1856088 (the regressing bug): it's a tough call, since flipping the pref will bring back the regression fixed in bug 1856088. We think that may be even more impactful as it potentially affects a more widely used site (Facebook) and a more common scrolling method (touchpad, vs. autoscroll in this bug). For this reason, we are leaning towards leaving the pref as is, even if that means shipping this regression in 122.
| Reporter | ||
Comment 23•1 year ago
|
||
(In reply to Botond Ballo [:botond] from comment #22)
vs. autoscroll in this bug
Just to clarify, this bug is perfectly reproducible with regular scrolling, and is even more noticeable by disabling smooth scrolling (pretty sure it's not worsened, just easier to see). I used auto-scrolling in demonstration to speed up the reproduction. Sorry for causing the confusion with that.
Comment 24•1 year ago
|
||
I was able to reproduce on the test bluesky account using a touchpad on macos. The str I used were:
scroll down a large number of pages
scroll up using a fling that travels about 2 pages.
observe the page during the momentum phase. after doing this several times I can observe a jump in the page contents.
| Assignee | ||
Comment 25•1 year ago
|
||
Looks like there are two different bugs? Jittery scrolling and jumpy scrolling? Given that flipping layout.css.scroll-anchoring.absolute-update pref fixes jittery one, but not jumpy one, it would be nice to open a new bug for the jumpy one.
Comment 26•1 year ago
|
||
(In reply to Hiroyuki Ikezoe (:hiro) from comment #25)
Looks like there are two different bugs? Jittery scrolling and jumpy scrolling? Given that flipping layout.css.scroll-anchoring.absolute-update pref fixes jittery one, but not jumpy one, it would be nice to open a new bug for the jumpy one.
Why do you think there are two different bugs? Where did you get the information that the pref flip fixes one of the bugs but not the other one?
Updated•1 year ago
|
Updated•1 year ago
|
| Assignee | ||
Comment 27•1 year ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #26)
(In reply to Hiroyuki Ikezoe (:hiro) from comment #25)
Looks like there are two different bugs? Jittery scrolling and jumpy scrolling? Given that flipping layout.css.scroll-anchoring.absolute-update pref fixes jittery one, but not jumpy one, it would be nice to open a new bug for the jumpy one.
Why do you think there are two different bugs? Where did you get the information that the pref flip fixes one of the bugs but not the other one?
Oop, I forgot mentioning the information source. It's comment 8.
Comment 28•1 year ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #24)
I was able to reproduce on the test bluesky account using a touchpad on macos. The str I used were:
scroll down a large number of pages
scroll up using a fling that travels about 2 pages.
observe the page during the momentum phase. after doing this several times I can observe a jump in the page contents.
I can reproduce this on Linux with these STR. The issue is fixed by setting layout.css.scroll-anchoring.absolute-update=false.
Comment 29•1 year ago
|
||
Comment 30•1 year ago
|
||
Comment 31•1 year ago
|
||
I added the following code to log composited scroll offsets and their timestamps:
diff --git a/gfx/layers/apz/src/AsyncPanZoomController.cpp b/gfx/layers/apz/src/AsyncPanZoomController.cpp
--- a/gfx/layers/apz/src/AsyncPanZoomController.cpp
+++ b/gfx/layers/apz/src/AsyncPanZoomController.cpp
@@ -4819,16 +4819,26 @@ bool AsyncPanZoomController::AdvanceAnim
// smooth. If an animation frame is requested, it is the compositor's
// responsibility to schedule a composite.
bool requestAnimationFrame = false;
nsTArray<RefPtr<Runnable>> deferredTasks;
{
RecursiveMutexAutoLock lock(mRecursiveMutex);
{ // scope lock
+ float yOffset = GetEffectiveScrollOffset(
+ AsyncTransformConsumer::eForCompositing, lock)
+ .y.value;
+ if (yOffset != 0) { // exclude things that aren't scrolling
+ printf_stderr("[SAMPLE] %f, %f\n",
+ (aSampleTime.Time() - TimeStamp::ProcessCreation())
+ .ToMilliseconds(),
+ yOffset);
+ }
+
CSSRect visibleRect = GetVisibleRect(lock);
MutexAutoLock lock2(mCheckerboardEventLock);
// Update RendertraceProperty before UpdateAnimation() call, since
// the UpdateAnimation() updates effective ScrollOffset for next frame
// if APZFrameDelay is enabled.
if (mCheckerboardEvent) {
mCheckerboardEvent->UpdateRendertraceProperty(
CheckerboardEvent::UserVisible, visibleRect);
and graphed the results while reproducing Timothy's STR (from comment 24) with both values of layout.css.scroll-anchoring.absolute-updates. The jumps are fairly clear in the first graph.
| Assignee | ||
Comment 32•1 year ago
|
||
This is just a guess though, in bug 1856088 we introduced ScrollUpdateType::MergeableAbsolute and we merged the delta into both sampled offsets, but we should merge the delta only in the latest offset?
Comment 33•1 year ago
|
||
Note sure if Comment 32 was intended for :botond? Adding needinfo
Comment 34•1 year ago
|
||
For regression tracking purposes, I'm going to suggest that this be marked firefox122: wontfix:
- Not backing out the regressing change (including by flipping the default value of the
layout.css.scroll-anchoring.absolute-updatepref`) is probably still the right call. While we now understand this issue is not specific to autoscroll but also affects e.g. touchpad scrolling, we're still talking about potentially regressing a similar issue on another popular site (Facebook) vs. keeping this regression on Bluesky. - Given the tricky nature of the code involved, uplifting a fix (that's not just a backout) to 122 would be risky at this stage. We will continue pursuing a fix and hopefully have one to uplift early in the 123 cycle.
(Regarding the change described in comment 32, I did try it locally and it doesn't seem to fix the issue. A deeper investigation is needed to come up with a working fix.)
Updated•1 year ago
|
| Assignee | ||
Comment 35•1 year ago
|
||
Sasha, would you mind providing a performance profile result while the jump happens?
My hypothesis is that when a scroll anchor adjustment caused by some DOM node deletions/additions happened if reflowing the contents and building display list for the change takes some amount of time, say 50ms or a couple of VSync frames, in the meantime APZ can scroll further, then when APZ received the updated contents, APZ updates the scroll position where the adjustment tries to set since bug 1856088, it will have some gap between them, so it causes a small jump.
| Reporter | ||
Comment 36•1 year ago
|
||
Like this? https://share.firefox.dev/3tYKLiO (scrolling back and forth causing jumps) / https://share.firefox.dev/3u07uLu (scrolling only down causing jumps)
| Assignee | ||
Comment 37•1 year ago
|
||
Sasha, thanks for the profile results.
Both results have indeed some janky frames on the content's main-thread. Though it's hard to tell whether the frames happened just before the scroll jumps.
Anyways, I've implemented a mitigating way based on my hypothesis. Here is a try run with the way. You can download a Windows version of the binaries via the link of "target.zip" in the "Artifacts and Debugging Tools" pane when you clicked the "Windows 2012 opt B" symbol. (this is the direct link to the "target.zip", FWIW).
Sasha, would you mind trying the binary to see whether this bug is mitigated or not? Thanks!
| Assignee | ||
Comment 38•1 year ago
|
||
| Assignee | ||
Comment 39•1 year ago
|
||
Depends on D199313
| Assignee | ||
Comment 40•1 year ago
|
||
I pushed the changes of the way for references.
I should note two interesting observations I experienced while I was trying it locally;
- I saw not-smooth scrolling while I was trying to see whether the change doesn't have any side-effects on bsky.app, it may be able to called "jump", but I saw it with enabling/disabling
layout.css.scroll-anchoring.absolute-update, so I suppose it's something else - I suspected the mitigation regresses the wpt added in bug 1856088, but it doesn't
| Reporter | ||
Comment 41•1 year ago
|
||
(In reply to Hiroyuki Ikezoe (:hiro) from comment #37)
Here is a try run with the way.
I've tried this build and unfortunately the scrolling issues are still there. Here's the screen recording (not sure why it got recorded with lower FPS, but you can still see it being jerky and getting in the way a couple of times, as well as rubber banding during auto-scrolling).
Comment 42•1 year ago
|
||
I've debugged this a bit locally and I believe I understand what's causing the jumps that I'm seeing (which are shown in the graph in comment 29). The explanation is simpler than I thought:
- Due to the asynchronicity between the web content main thread and APZ, the current scroll position seen by the main thread can be one or more frames behind the scroll position seen by APZ.
- (The APZ frame delay mitigates this such that in the case where it's no more than 1 frame behind, scroll linked effects should remain in sync. However, besides scroll linked effects, other aspects of correctness (such as not having scroll position jumps like the ones observed in this bug) should not depend on the main thread's ability to keep up with APZ.)
- Scroll anchoring happens on the main thread, and triggers a scroll position update calculated based on the main thread scroll position.
- By the time this update reaches APZ, if there's a scroll animation in progress that updates the scroll offset every frame (such as a fling), the scroll position in APZ will have advanced by one or more frames relative to the one the main thread based its computation on.
- Prior to bug 1856088, the scroll position update triggered by scroll anchoring was a relative one, so it got combined gracefully with the additional frames of scrolling that happen in APZ, i.e. the composited scroll offset reflected both the additional frames of the fling, and the scroll anchor adjustment.
- Since bug 1856088, the scroll position update triggered by scroll anchoring is an absolute one, so it effectively "overwrites" the additional frames of scrolling that have happened in APZ, creating the appearance of "jump" backwards relative to the direction of scrolling.
Based on this diagnosis, I'm fairly sure the site isn't doing anything wrong and this is just an APZ bug; I'm accordingly removing the [apz-needsdiagnosis] flag. I think a fix will need to be along the lines of finding a way to fix the scenario in bug 1856088 without using absolute updates for scroll anchoring.
| Assignee | ||
Comment 43•1 year ago
|
||
Thanks Botond. I've started fixing bug 1856088 in a different way which is an alternative approach in bug 1856088 comment 2. That is "We might want to defer the clamp later?" in the comment. It means we mainly revert bug 1779404 and bug 1852818. There was a concern about the approach, but I can't recall it at all. So I'd take the approach here and "let's see what happens".
FWIW, here's a try run;https://treeherder.mozilla.org/jobs?repo=try&revision=14a6ab0edf0f3a14fd2af4491e683c8e80966db9
| Assignee | ||
Comment 44•1 year ago
|
||
This change mainly reverts bug 1779404 and bug 1852818.
Updated•1 year ago
|
| Assignee | ||
Comment 45•1 year ago
|
||
Two test cases attached in bug 1856088 now work with the previous commit and
work without MergeableAbsolute.
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Comment 46•1 year ago
|
||
Comment 47•1 year ago
|
||
Backed out for causing Wr failures on window-scrollBy-display-change.html
| Assignee | ||
Comment 48•1 year ago
|
||
The reason of the window-scrollBy-display-change.html failure is that it's the first paint case, and in D199840 we didn't clamp the scroll offset in the first paint case. I've updated the fixed version.
That's said, I realized that with D199840 and D199841 after-scrollable-range-shrinkage-004.html is flaky. I need to find out the flaky reason.
| Assignee | ||
Comment 49•1 year ago
|
||
(In reply to Hiroyuki Ikezoe (:hiro) from comment #48)
The reason of the window-scrollBy-display-change.html failure is that it's the first paint case, and in D199840 we didn't clamp the scroll offset in the first paint case. I've updated the fixed version.
That's said, I realized that with D199840 and D199841 after-scrollable-range-shrinkage-004.html is flaky. I need to find out the flaky reason.
Okay the reason is bug 1845646. With replacing these InvalidateFrame stuff with RecvDidUnsuppressPainting, I can no longer see failure locally. But still I have no idea why the original change for bug 1856088 could avoid the race.
| Assignee | ||
Comment 50•1 year ago
|
||
That's unfortunate.
Because of bug 1845646, this scrollBy call (which is to scroll the anchor node initially) is omitted in APZ. This case also happens with the original change for bug 1856088, but it doesn't matter since the original change handles scroll position change by scroll anchoring as absolute position change.
I am going to add a workaround in the test just like we did in bug 1841305. And I will land it after the next soft freeze finished just in case.
| Assignee | ||
Comment 51•1 year ago
|
||
I've started implementing bug 1845646 to fix the test failure since even with the workaround the test in question is quite flaky.
A good news is that fixing bug 1845646 wasn't so hard.
| Assignee | ||
Comment 52•1 year ago
|
||
Oops, I forgot mentioning an important thing. Without fixing the race, the test, after-scrollable-range-shrinkage-004.html is stable only with the fix for bug 1856088.
In other words, we can't discard any relative scroll position updates to make the test stable. That means, even with the alternative approach Botond suggested in a review comment, the test is also flaky.
Comment 53•1 year ago
|
||
Comment 54•1 year ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/1a5755ffb35b
https://hg.mozilla.org/mozilla-central/rev/8be226e74ca5
| Assignee | ||
Comment 55•1 year ago
|
||
Finally! Thanks Botond for your dedicated reviews and advice, etc.!
Comment 56•1 year ago
|
||
The patch landed in nightly and beta is affected.
:hiro, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox124towontfix.
For more information, please visit BugBot documentation.
| Assignee | ||
Comment 57•1 year ago
|
||
This bug is important, but I'd bake it on nightly for a while.
Updated•1 year ago
|
Description
•