(un)smooth scroll bug
Categories
(Core :: Panning and Zooming, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox98 | --- | verified |
People
(Reporter: 1.1.1998, Assigned: hiro)
References
(Blocks 3 open bugs, Regressed 1 open bug)
Details
Attachments
(11 files, 5 obsolete files)
608 bytes,
text/html
|
Details | |
2.05 KB,
patch
|
Details | Diff | Splinter Review | |
11.86 KB,
patch
|
Details | Diff | Splinter Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.143 YaBrowser/19.7.2.455 Yowser/2.5 Safari/537.36
Steps to reproduce:
i scroll content on page like(attached bug.html)
Actual results:
element with position absolute jumping on new position.
Expected results:
element with position absolute moving synchronously with scroll without jumping.
Comment 1•6 years ago
|
||
I also see flickering (though less) on other browsers. In general I don't think you're guaranteed getting async scroll events before paint, though Botond can confirm.
Why not using position: sticky for this?
Comment 2•6 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #1)
In general I don't think you're guaranteed getting async scroll events before paint, though Botond can confirm.
That's correct.
The page is using what we call a scroll-linked effect - please see this MDN page for more info about those.
(In reply to Emilio Cobos Álvarez (:emilio) from comment #1)
I also see flickering (though less) on other browsers. In general I don't think you're guaranteed getting async scroll events before paint, though Botond can confirm.
Why not using position: sticky for this?
1)In Chromium browsers i don't see it(maybe my eyes liying me))
2)this example enough for demonstration bug and for my task need compatibility with old browsers(for now) sticky don't work everywhere
Comment 4•6 years ago
|
||
(In reply to Long from comment #3)
1)In Chromium browsers i don't see it(maybe my eyes liying me))
I definitely see some flickering in Chrome when I move back and forth close to the top.
2)this example enough for demonstration bug and for my task need compatibility with old browsers(for now) sticky don't work everywhere
There's various ways to support older browsers while using (faster, more correct, and power-efficent) scrolling on newer browsers.
An example, if you change .abs
to be position: sticky
, you can use the following as a fallback:
<script>
window.addEventListener("load", function() {
if (window.CSS && window.CSS.supports && window.CSS.supports("position: sticky"))
return; //Nothing to do, position: sticky does it for us.
let abs = document.querySelector(".abs");
abs.style.position = "absolute";
abs.parentNode.addEventListener("scroll", function() {
abs.style.top = this.scrollTop + "px";
}, false);
}, false);
</script>
(In reply to Emilio Cobos Álvarez (:emilio) from comment #4)
(In reply to Long from comment #3)
1)In Chromium browsers i don't see it(maybe my eyes liying me))
I definitely see some flickering in Chrome when I move back and forth close to the top.
I tested in Chrome version 76. Maybe my GPU or CPU helps paint right.
2)this example enough for demonstration bug and for my task need compatibility with old browsers(for now) sticky don't work everywhere
There's various ways to support older browsers while using (faster, more correct, and power-efficent) scrolling on newer browsers.
An example, if you change
.abs
to beposition: sticky
, you can use the following as a fallback:<script> window.addEventListener("load", function() { if (window.CSS && window.CSS.supports && window.CSS.supports("position: sticky")) return; //Nothing to do, position: sticky does it for us. let abs = document.querySelector(".abs"); abs.style.position = "absolute"; abs.parentNode.addEventListener("scroll", function() { abs.style.top = this.scrollTop + "px"; }, false); }, false); </script>
Task consists in fixing position <thead> in <table>.
(In reply to Botond Ballo [:botond] from comment #2)
(In reply to Emilio Cobos Álvarez (:emilio) from comment #1)
In general I don't think you're guaranteed getting async scroll events before paint, though Botond can confirm.
That's correct.
The page is using what we call a scroll-linked effect - please see this MDN page for more info about those.
Thanks for article.
Comment 7•6 years ago
|
||
You can test for sticky support of a <thead>
in a <table>
, though I agree it's more elaborate (you need to create a scrollable table with a sticky header, then read out the position of the header).
I'm not sure whether we can do better at this... I suspect the fact that with devtools open this flickers way less means that paint-skipping may be involved somehow?
Botond (sorry for the continuous ni? storm), in this case paint-skipping seems to be doing wrong, isn't it? Or maybe I'm completely off and there's another reason for this.
Comment 8•6 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #7)
I suspect the fact that with devtools open this flickers way less means that paint-skipping may be involved somehow?
Paint skipping shouldn't make a difference here, because while we may not schedule a paint in response to the scroll itself, we still should in response to the style.top
change.
In principle, with the APZ frame delay in place, scroll-linked effects should remain in sync if we can paint the page within the frame budget.
The next step here would be to investigate whether, on this page, we (a) can't paint the page within the frame budget, or (b) there's something else going on that's causing the effect to be out of sync in spite of the frame delay and being able to paint within budget.
Bug 1367770 tracks investigating cases like this, and the page here makes a good test case.
Comment 9•6 years ago
|
||
The priority flag is not set for this bug.
:TYLin, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•6 years ago
|
Assignee | ||
Comment 10•4 years ago
|
||
I did track down this issue (since it's somewhat related to bug 1692708), as far as I can tell the one-frame delay isn't working as expected, in fact it's two-frames delay. That's because we do invoke SampleCompositedAsyncTransform before sampling an animation, whereas a RepaintRequest will be based on the metric after the animation sampling, thus scrollTop value on the main-thread will be actually an one-frame ahead value.
For example, given that there's a queued sampled offset (0, 100) after this AdvanceToNextSample call,
- SampleCompositedAsyncTransform adds a new offset, say (0, 110)
- sampling an animation produces the latest scroll offset (i.e. the APZC's metric), say (0, 120)
- APZCTreeManager::SampleForWebRender uses the (0, 100) offset for the composition
in the meantime the (0, 120) offset is delivered to the main-thread and used there and "updated position:absolute position" is reflected to the compositor - in the next APZCTreeManager::SampleForWebRender the (0, 110) is used but the position:absolute element's position is based on the (0, 120)
That's what I've observed. Moving the SampleCompositedAsyncTransform call after sampling an animation fixes this issue at least on my local Linux box. Though as of now I am not sure it's a right thing to do.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 11•3 years ago
|
||
This is what I commented in comment 10. With this change, all test cases blocking bug 1367770 other than bug 1665900 case look better. I don't know why bug 1665900 looks worse. My wild guess is that painting consumes over 16ms in the case, but not sure yet. I'd need some debugging metrics to tell what's going on there easier.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 12•3 years ago
|
||
I managed to add a (ugly) hack to dump scroll-linked-effects element positions and WebRender's scroll frame metrics. This hack is not perfect since the target scroll frame is not the linked one, it's (probably I guess) ASR.
Here is a typical two frames dump dumped by using a modified test case in comment 0.
--- start --- SystemTime { tv_sec: 1637134429, tv_nsec: 522116809 }
scroll linked effects: Some(SpatialNodeIndex(9))
element pos: Box2D((0.0, 4176.0), (1114.0, 4226.0))
viewport: Box2D((0.0, 0.0), (1114.4, 884.1))
scrollable_size: 0.0x7596.9995
scroll offset: (0.0, -4156.317)
external_scroll_offset: (0.0, 4176.0)
--- end --- SystemTime { tv_sec: 1637134429, tv_nsec: 522116809 }
--- start --- SystemTime { tv_sec: 1637134429, tv_nsec: 538789320 }
scroll linked effects: Some(SpatialNodeIndex(9))
element pos: Box2D((0.0, 4176.0), (1114.0, 4226.0))
viewport: Box2D((0.0, 0.0), (1114.4, 884.1))
scrollable_size: 0.0x7596.9995
scroll offset: (0.0, -4176.0)
external_scroll_offset: (0.0, 4176.0)
--- end --- SystemTime { tv_sec: 1637134429, tv_nsec: 538789320 }
element pos
is representing the abs pos position in the frame, scroll offset
is a value changed by APZ. And external_scroll_offset
is a value set by display list, it's kinda scroll offset at the time when NotifyLayersUpdated gets called.
At the first frame, the abs-pos element position is slightly off to down from the scroll top position at that moment. At the second frame its position is same as the scroll top.
With the patch I posted in comment 11, the issue just like at the first frame will be mostly eliminated, but I think it is not a fundamental fix.
A problem I can see by the dump is the element pos
and external_scroll_offset
are basically based on the last metrics when NotifyLayersUpdated gets called (from the perspective of APZ) whereas the scroll offset
is a value sampled before.
Though I haven't figured out what a proper solution here is, we should fix this inconsistency (along with the patch in comment 11.
Assignee | ||
Comment 13•3 years ago
|
||
What I am currently thinking is we will have to have queued SampledAPZState in WebRender (ScrollFrame I assume) as well as AsyncPanZoomController does. And we will have a value representing generation in the SampledAPZState, and we need to pick a proper SampledAPZState in the webrender renderer thread to find the same generation sampled result. Is it worth a shot?
Comment 14•3 years ago
|
||
(In reply to Hiroyuki Ikezoe (:hiro) from comment #13)
What I am currently thinking is we will have to have queued SampledAPZState in WebRender (ScrollFrame I assume) as well as AsyncPanZoomController does. And we will have a value representing generation in the SampledAPZState, and we need to pick a proper SampledAPZState in the webrender renderer thread to find the same generation sampled result. Is it worth a shot?
That approach sounds like it should fix the problem. We would want to have some kind of limit in place, such that if the generation from the main thread is "too old", we give up trying to stay in sync and just use a newer APZ offset (otherwise, if the main thread takes a long time to paint we are back to seeming unresponsive / getting "sync" scrolling).
That said, I'm a bit curious what makes this approach necessary. I think the original idea behind the frame delay was that if the main thread consistently paints things within the frame budget, then the display lists and corresponding samples (that would have the same generation with this approach) would just naturally align. I'm curious why this doesn't happen -- does a mis-alignment get introduced because sometimes one of the steps involved takes longer than one frame? Or sometimes a step happens twice in one frame?
Comment 15•3 years ago
|
||
Basically, I'm wondering if the described fix might be papering over a bug introduced in bug 1630781 (or not properly fixed by bug 1630781), which may cause other symptoms, and which might have a less invasive fix.
Assignee | ||
Comment 16•3 years ago
•
|
||
(In reply to Botond Ballo [:botond] from comment #14)
That said, I'm a bit curious what makes this approach necessary. I think the original idea behind the frame delay was that if the main thread consistently paints things within the frame budget, then the display lists and corresponding samples (that would have the same generation with this approach) would just naturally align. I'm curious why this doesn't happen -- does a mis-alignment get introduced because sometimes one of the steps involved takes longer than one frame? Or sometimes a step happens twice in one frame?
No, what happens is that an opposed case. A round trip scroll offset (initiated by an APZC and reported to the main-thread, and the offset has been received in the WebRender's scroll offset), happens within a Vsync tick. (Vsync tick isn't precisely describing the situation because there are a couple of different threads running).
For example, an APZC has two sampled offsets, say 10px and 20px, and the APZC notify the 20px in a RepaintRequest to the main-thread, then the main-thread gets the 20px and update the scroll position in question and an abs-pos element uses 20px, then an display item having 20px is notified to the WebRender, then WebRender renders that the scroller's position 10px, but the abs-pos position is 20px.
Comment 17•3 years ago
|
||
Ah, interesting, so it's kind of "two steps in one frame (vsync)". (I wonder if maybe that wasn't the case yet when bug 1630781 was fixed, and then was introduced by a subsequent performance optimization or someting like that.)
Anyways, if I'm understanding correctly that it's actually the main thread position that's ahead of the (composited) async scroll position, then this sounds like pretty good news, in that fixing this will not only put scroll-linked effects into sync, but also increase the responsiveness of async scrolling in general!
Assignee | ||
Comment 18•3 years ago
|
||
FWIW here is a try run with the recent changes I did locally;
https://treeherder.mozilla.org/jobs?repo=try&revision=8ca74557303118556ad153562722aaeaf0a8e81b
There remain a couple of relatively big issues;
- test_wheel_scroll.html fails due to some hit testing issues caused by the changes
- no automated tests has been made
- scroll thumb positions (and fixed/sticky positioned elements on mobile) might be slightly mis-positioned because I haven't changed those transforms, e.g. this transform for scroll thumbs.
I was naively thinking we don't need to change eForHitTesting cases, but given that there's at least one hit testing issue 1), we might need to. I'd think 3) can be deferred in a follow up.
Assignee | ||
Comment 19•3 years ago
|
||
(In reply to Hiroyuki Ikezoe (:hiro) from comment #18)
- test_wheel_scroll.html fails due to some hit testing issues caused by the changes
I did address this test failure locally with a very ugly hack. To be honest, I don't understand how the failure was caused exactly, with the original changes WebRender hit tester sometimes fails to return eApzAwareListeners for some reasons. Anyway I found what makes the test failure. In the current m-c SpatialNode::set_scroll_origin has a calculation like this;
let new_offset = normalized_offset - scrolling.external_scroll_offset;
SpatialNode::set_scroll_origin
is triggered by an APZ sampling and it gets called in a frame transaction (in WebRender's term I think). normalized_offset
is a scroll offset caused by an APZC, scrolling.external_scroll_offset
is a scroll offset came from a display item on the main-thread. My original change deferred this calculation as much as possible, I mean, the calculation is done every time we want the result, e.g. here. I am still believing deferring the calculation as mush as possible is a right thing to do ideally, but given that it introduce an inconsistency between hit tester, I am not going to defer the calculation as much as possible for now, I am going to do the calculation in the frame transaction as what the current code does. The test case in this bug still works with the way.
Also note that I realized the my changes doesn't fix cases where any APZ animation is not involved, e.g. scroll thumb dragging, I hope those cases can be relatively easily fixed by bumping up the scroll generation I will introduce in this bug.
Assignee | ||
Comment 20•3 years ago
|
||
We are going to use it in various places, APZC, SampledAPZState etc.
Assignee | ||
Comment 21•3 years ago
|
||
Each APZC will have a ScrollGeneratioin and will use this 0th generation as a
special case where there's no active animation in APZC.
Depends on D133437
Assignee | ||
Comment 22•3 years ago
|
||
Depends on D133438
Assignee | ||
Comment 23•3 years ago
|
||
Depends on D133439
Assignee | ||
Comment 24•3 years ago
|
||
In a subsequent change, we'd like to use this getting async transform
calculation for each SampledAPZCState.
Depends on D133440
Assignee | ||
Comment 25•3 years ago
|
||
Depends on D133441
Assignee | ||
Comment 26•3 years ago
|
||
It's not used for Gecko at all. And in the next commit we are going to have
multiple scroll offsets in a ScrollFrameInfo and encapsulate
ScrollFrameInfo.scroll_offset method so with this unused scroll function,
the next change will be quite messy.
Depends on D133442
Assignee | ||
Comment 27•3 years ago
|
||
This change mitigates the gap between the external_scroll_offset informed from
the main-thread and scroll_offset informed from APZ.
The change inside GetAsyncScrollDeltaForSampling (which is renamed to
GetSampledScrollOffsets in this change) is basically doing what
GetCurrentAsyncTransformWithOverscroll does but for each SampledAPZCState,
i.e. getting the async transform and applying overscroll transform if it's
necessary. Unfortunately I don't have any great idea to generalize them.
GetCurrentAsyncTransformWithOverscroll will be limited to eHitTesting in
a later commit.
Some wrench reftests for this change are in the next commit.
Depends on D133443
Assignee | ||
Comment 28•3 years ago
|
||
Depends on D133444
Assignee | ||
Comment 29•3 years ago
|
||
Now the function gets caled only for eHitTesting.
Depends on D133445
Assignee | ||
Comment 30•3 years ago
|
||
Now I realized that the way will introduce jitter scrolling in some cases.
So for example, given that there were two sampled scroll offsets;
- 10px with generation: 1
- 20px with generation: 2
and if on the main-thread we had used the 20px offset with generation 2, then rendered it. And in the next sampling we have
- 20px with generation 2
- 30px with generation 3
in the meantime if the main-thread is busy for some reasons, then in the next rendering we do re-use 20px offset again. Though in the next rendering we will use 30px with generation 3 or the next sampled offset depending on the main-thread work.
A way to mitigate is checking scroll-linked effects (we will have to check scroll-linked animations too) and if there's any scroll-linked effect, we use this new machinery, and if there's no such thing, we don't use this machinery.
Botond, what do you think? I am quite unsure how often this situation happens in the wild.
Assignee | ||
Comment 31•3 years ago
|
||
I am going to add a pref to be able to flip the behavior at least.
Comment 32•3 years ago
|
||
(In reply to Botond Ballo [:botond] from comment #14)
We would want to have some kind of limit in place, such that if the generation from the main thread is "too old", we give up trying to stay in sync and just use a newer APZ offset (otherwise, if the main thread takes a long time to paint we are back to seeming unresponsive / getting "sync" scrolling).
Now I understand this part: the limit is naturally in place due to this code which ensures we don't retain too many samples (never more than 2, I think).
(In reply to Hiroyuki Ikezoe (:hiro) from comment #30)
Now I realized that the way will introduce jitter scrolling in some cases.
So for example, given that there were two sampled scroll offsets;
- 10px with generation: 1
- 20px with generation: 2
and if on the main-thread we had used the 20px offset with generation 2, then rendered it. And in the next sampling we have
- 20px with generation 2
- 30px with generation 3
in the meantime if the main-thread is busy for some reasons, then in the next rendering we do re-use 20px offset again. Though in the next rendering we will use 30px with generation 3 or the next sampled offset depending on the main-thread work.
Yeah, good catch. I guess, if some paints complete before the end of the current vsync interval and others not until the next vsync interval, we will in fact get some jitter.
It wouldn't surprise me if this happens somewhat often.
A way to mitigate is checking scroll-linked effects (we will have to check scroll-linked animations too) and if there's any scroll-linked effect, we use this new machinery, and if there's no such thing, we don't use this machinery.
I think this is a promising idea. In the case we have a scroll-linked effect, the jitter is probably preferable to having the effect be out of sync. But in the case we don't have a scroll-linked effect, it would be a regression compared to the current consistent one-frame delay.
In fact, if we do this, we could consider doing something else: using the most recent sample (offsets.last()
instead of offsets.first()
) in the case where we have no scroll-linked effect. That would effectively get rid of the one-frame delay altogether (which has been requested by users, for example in this bug) if there are no scroll-linked effects.
One caveat here is that our detection logic for "are there any scroll-linked effects on this page?" is incomplete (see bug 1276361). If we are going with this route, we may want to take the opportunity to improve that detection logic as well.
An alternative solution to the jitter that we could consider, is to keep a queue of main-thread samples as well, and if the latest main-thread sample is very fresh (latest vsync), use a sample corresponding to the previous vsync (and the APZ sample with the corresponding generation) instead. This would mean the one-frame delay will always be there, but the jitter would be avoided (even with scroll-linked effects).
Assignee | ||
Comment 33•3 years ago
|
||
(In reply to Botond Ballo [:botond] from comment #32)
An alternative solution to the jitter that we could consider, is to keep a queue of main-thread samples as well, and if the latest main-thread sample is very fresh (latest vsync), use a sample corresponding to the previous vsync (and the APZ sample with the corresponding generation) instead. This would mean the one-frame delay will always be there, but the jitter would be avoided (even with scroll-linked effects).
Thanks for the alternative solution. It took time to understand what it means. As I understand it, it can be done by without having the queue of the main-thread since we can tell the freshness by just comparing the last APZ sampled generation with the main-thread generation. And indeed, it fixes this bug pretty well as I've confirmed it locally. Hooray! I am going to drop the pref I added to flip this machinery.
Thank you!
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 34•3 years ago
|
||
Assignee | ||
Comment 35•3 years ago
|
||
(In reply to Botond Ballo [:botond] from comment #32)
An alternative solution to the jitter that we could consider, is to keep a queue of main-thread samples as well, and if the latest main-thread sample is very fresh (latest vsync), use a sample corresponding to the previous vsync (and the APZ sample with the corresponding generation) instead. This would mean the one-frame delay will always be there, but the jitter would be avoided (even with scroll-linked effects).
Thanks for the alternative solution. It took time to understand what it means. As I understand it, it can be done by without having the queue of the main-thread since we can tell the freshness by just comparing the last APZ sampled generation with the main-thread generation. And indeed, it fixes this bug pretty well as I've confirmed it locally. Hooray! I am going to drop the pref I added to flip this machinery.
As Botond noticed me in a review comment, I did misunderstand it.
In fact we can tell "if the latest main-thread sample is very fresh (latest vsync)" situation to compare the last APZ scroll generation with the main-thread one (that's what I did in D133444), but we can't tell the difference between;
- the main-thread generation is constantly one-frame delay
- a non-fresh main-thread generation is a one-time or a couple of time delays
To tell the difference, what I can think of is;
- add the last rendered generation to ScrollFrameInfo
- if the main-thread generation is same as the last rendered generation, we use the next generation's scroll offset
With this way, a question did arise in my head, that is, given that a situation;
- We did keep using the last fresh APZ sampled offset for a while
- The main-thead got busy for a while
- So we couldn't get any information from the main-thead
- Thus we did still keep using the last fresh APZ sampled offset during the busy
- The main-thread notified a very stale offset and a generation
In such case, we should keep using the last fresh APZ sampled offset, otherwise the last scroll position will jump back.
Another issue I have in my mind is our scroll generation implementation is not generating continuous numbers, it may be tricky to get the next generation.
Assignee | ||
Comment 36•3 years ago
|
||
As discussed with Botond in #apz channel, I introduced a new flag representing there's any scroll linked effect or not and used it to use this new machinery. The new flag is stored in each ScrollFrameInfo, fwiw (I was initially thinking I should ask Glenn where we should store it since in Gecko it's a flag per DOM document, but as far as I can tell, in WebRender there's no struct corresponding to DOM document).
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 37•3 years ago
|
||
Depends on D133445
Assignee | ||
Comment 38•3 years ago
|
||
Hey :aosmond, I am running into a floating point rounding issue here in this bug.
In short with APZ's scroll offset this snap function gets called with negative value, thus, the snapped result causes unexpected 1px difference. For negative value, I'd guess it would be (value - 0.5).ceil()
instead of (value + 0.5).floor()
? Note that during scrolling downwards scroll offset in the WebRender is basically negative so that the snap function gets called with the negative values.
Assignee | ||
Comment 39•3 years ago
|
||
FWIW, with the (value - 0.5).ceil()
change, the new mochitest in comment 37 worked properly in a try run; https://treeherder.mozilla.org/jobs?repo=try&revision=b2b2569ac4b7ca5b472f5d7c0e708e670885eb79 (It's the first time the test doesn't cause any oranges, hooray!)
Comment 40•3 years ago
|
||
I'd be very wary about changing the snapping. That will have implications far beyond your test case, even if the tree is green, I would expect to see complaints in the field. Can you link to the failing test cases?
Generally we want to snap with + 0.5
but I suppose it is possible there is something special about scroll offsets that require a difference touch. Often this sort of problem is caused by us snapping too early. Any thoughts Jeff on snapping away from zero for negative scroll offsets?
Comment 41•3 years ago
|
||
(value - 0.5).ceil()
also seems wrong to me. Hiro, do you have an explanation for why the direction of snapping should change when we cross 0?
Assignee | ||
Comment 42•3 years ago
|
||
I am afraid I don't have any explanation on that. The way is a speculative attempt to make the rounding way symmetric to zero because the place is the only one suspicious place I had found at that moment, given that async scroll can be both negative/positive values I thought it kinda makes sense. That said, I think now I found another suspicious place, will dig into there.
Here is a failed try run, fwiw; https://treeherder.mozilla.org/jobs?repo=try&revision=ee34988c082def104d7bb51123d8e02f40835e0d
Thanks!
Assignee | ||
Comment 43•3 years ago
|
||
Now I have no idea how to solve the rounding issue. The another suspicious place I meant in comment 42 was this ClampAndAlignWithPixels, the function is invoked by APZ operations, but I have no idea the function is correct or not.
So for example, if APZ scrolls at (0, 170.5), then the ClampAndAlignWithPixels function aligns it to (0, 171), then if a JS script tried to use the aligned scroll position for another element, say element.style.top = "171px", the element will be rendered at (0, 171). In the mean time, APZ also notifies the scroll offset in a different route, as a result of it, this scrolling.offset value will be (0, -170.5). Then unfortunately with the snap function in question clamped to (0, -170). That's what I've been seeing.
Note that the reason why offset is negative is describing in this comment, though TBH I don't quite understand it (I guess we can make it positive, even so there are cases the value would be definitely negative in cases of overscrolling).
Andrew and Jeff, what's the best solution here?
Comment 44•3 years ago
|
||
My instinct here would be to either try consistently using the snapped result from ClampAndAlignWithPixels, so that when it calculates the scrolling.offset passed into WebRender, we would get -171 instead of -170.5, or to change how we snap scrolling.offset inside of WR to mirror APZ's expectations. This should not change how we snap anything else however, just APZ.
Comment 45•3 years ago
|
||
The latter likely makes more sense (unless we remove it entirely in favour of ClampAndAlignWithPixels on the content side.)
Updated•3 years ago
|
Updated•3 years ago
|
Comment 46•3 years ago
|
||
Comment on attachment 9254678 [details]
Bug 1571758 - Wrench reftests for scroll offset generation. r?botond!
Revision D133445 was moved to bug 1571759. Setting attachment 9254678 [details] to obsolete.
Assignee | ||
Comment 47•3 years ago
|
||
(In reply to Andrew Osmond [:aosmond] (he/him) from comment #44)
My instinct here would be to either try consistently using the snapped result from ClampAndAlignWithPixels, so that when it calculates the scrolling.offset passed into WebRender, we would get -171 instead of -170.5, or to change how we snap scrolling.offset inside of WR to mirror APZ's expectations. This should not change how we snap anything else however, just APZ.
Thank you Andrew for the great suggestion. With your suggestion and as Botond realized me on Matrix, I am going to apply the same logic as what ClampAndAlignWithPixels does here in AsyncPanZoomController::GetCurrentAsyncTransform, it should probably fix the rounding issue.
Updated•3 years ago
|
Assignee | ||
Comment 48•3 years ago
|
||
(In reply to Hiroyuki Ikezoe (:hiro) from comment #47)
I am going to apply the same logic as what ClampAndAlignWithPixels does here in AsyncPanZoomController::GetCurrentAsyncTransform, it should probably fix the rounding issue.
It's quite hard to fix this rounding issue.
Firstly, it turned out GetCurrentAsyncTransform is not the right place to do the stuff. The reason is the ClampAndAlignWithPixels logic uses NS_round() against the delta between the new scroll offset and the previous scroll offset, in the APZ side, the previous scroll offset could be the aligned value. So for example, once we sampled a new async scroll offset 29.5 and the scroll position on the main-thread at that moment was 160, then it will be aligned to 190, in the mean time, the FrameMetrics in question still has 189.5 ( = 160 + 295) offset, thus if we received the new aligned 190 value, the delta in the APZ side will be -0.5 (= 189.5 - 190), thus the NS_round() clamps the value to 189 unfortunately.
Secondly I tried to apply the ClampAndAlignWithPixels logic with mLastContentPaintMetrics in AsyncPanZoomController::SampleCompositedAsyncTransform, but there's still a race conditions that the mlastContentPaintMetrics is updated with the new scroll position from the main-thread (i.e. NotifyLayersUpdated gets called).
So, a last resort now I can think of is to store the aligned offset in FrameMetrics (i.e. in mLayoutViewport) in the first place. Though I am not 100% sure whether it works as expected.
Assignee | ||
Comment 49•3 years ago
|
||
(In reply to Hiroyuki Ikezoe (:hiro) from comment #48)
So, a last resort now I can think of is to store the aligned offset in FrameMetrics (i.e. in mLayoutViewport) in the first place. Though I am not 100% sure whether it works as expected.
This seems to be working, to be correct, I did apply ClampAndAlignWithLayerPixels against FrameMetrics::mScrollOffset in FrameMetrics::SetVisualScrollOffset. (Applying ClampAndAlignWithLayerPixels to mLayoutViewport didn't help). That said, this will more or less change our scroll behavior. So for example, here is an early state of scroll positions with/without the change
the original position | the rendered position with the change | the rendered position without the change (snapped in WR) |
---|---|---|
0.116667 | 0 | 0 |
0.516667 | 1 | 0 |
1.366670 | 1 | 1 |
2.783330 | 3 | 2 |
With the change the second value 0.516667 is aligned by floor(0.516667 + 0.5), it will be 1.0.
Whereas without the change the second value is aligned by (-0.516667 + 0.5).floor(), it will be 0.0.
Due to this behavior change, the tests added in bug 1692708 will frequently fail since we will observe 1
twice.
Also note that I've noticed that this floating rounding error has been causing a race condition that we mis-detect this userScrolled flag at the end of async scroll, but that's not a big deal.
Anyway, I am going to upload the change here in this bug.
Assignee | ||
Comment 50•3 years ago
|
||
Assignee | ||
Comment 51•3 years ago
|
||
After some more attempts I found out;
- the reason why clamp-and-align FrameMetrics::mLayoutViewport didn't work as expected is AsyncPanZoomController::ScrollBy scrolls based on FrameMetrics::mScrollOffset instead of the previous mLayoutViewport.TopLeft()
- Doing clamp-and-align in FrameMetrics::SetVisualScrollOffset is overkill since we don't do clamp-and-align the visual viewport offset on the main-thread, it causes some test failures (e.g. helper_zoom_restore_position_tabswitch.html)
- there's still a race condition that an APZC samples multiple times but on the main-thread we update the scroll position just once due to heavy load on the main-thread, thus
delta
value in ClampAndAlignWithPixels will defer, so that we can't have the same clamp-and-align-ed result both on the APZC and on the main-thread
Botond, I am leaning to defer this floating rounding difference into a later bug. Even without fixing the difference rest of patches would work in most cases of scroll linked effect issue. What do you think?
Comment 52•3 years ago
|
||
(In reply to Hiroyuki Ikezoe (:hiro) from comment #51)
Botond, I am leaning to defer this floating rounding difference into a later bug. Even without fixing the difference rest of patches would work in most cases of scroll linked effect issue. What do you think?
Thanks for your investigation on this, Hiro. It sounds like a tricky issue, and I think it's completely reasonable to land what we have so far and defer this issue to a later bug.
Assignee | ||
Comment 53•3 years ago
|
||
Thank you Botond. FIled bug 1752789.
Updated•3 years ago
|
Updated•3 years ago
|
Comment 54•3 years ago
|
||
Comment 55•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/70bbedfc5b91
https://hg.mozilla.org/mozilla-central/rev/f60c125e2015
https://hg.mozilla.org/mozilla-central/rev/58003e7dc5a0
https://hg.mozilla.org/mozilla-central/rev/75a9175ff83a
https://hg.mozilla.org/mozilla-central/rev/3f9fdc20eac2
https://hg.mozilla.org/mozilla-central/rev/b13b738bc31b
https://hg.mozilla.org/mozilla-central/rev/9e0f35846353
https://hg.mozilla.org/mozilla-central/rev/c55e3a40a4de
Updated•3 years ago
|
Updated•3 years ago
|
Comment 56•3 years ago
|
||
Comment on attachment 9259474 [details]
Bug 1571758 - Add a mochitest for scroll linked effects. r?botond
Revision D136207 was moved to bug 1754129. Setting attachment 9259474 [details] to obsolete.
Updated•3 years ago
|
Reproducible on Linux x86_64(Ubuntu 20.04) on Firefox 97(20220202182137). This is fixed on Firefox 98.0b5(20220215194438) and the latest Nightly 99.0a1(20220215210051) build on Linux x86_64(Ubuntu 20.04), macOS Big Sur 11 and Win10.
Comment 58•3 years ago
|
||
Hi, this bug was not present to me until recently, so I bisected it with mozregression GUI on Linux.
It can be always reproduced on the phoronix forum and golem.de with enabled auto scrolling via middle mouse.
This is the result of mozregression:
Bug 1571758 - Wrench reftests for scroll offset generation. r=botond
Differential Revision: https://phabricator.services.mozilla.com/D133445
Please feel free to move this comment to a new bug in case it does not belong here, regards.
Comment 59•3 years ago
|
||
(In reply to puxplaying from comment #58)
Hi, this bug was not present to me until recently, so I bisected it with mozregression GUI on Linux.
It can be always reproduced on the phoronix forum and golem.de with enabled auto scrolling via middle mouse.
This is the result of mozregression:
Bug 1571758 - Wrench reftests for scroll offset generation. r=botond
Differential Revision: https://phabricator.services.mozilla.com/D133445
Please feel free to move this comment to a new bug in case it does not belong here, regards.
That sounds like bug 1759017. The patches in bug 1760222 should fix that. You can test a build with those patches here https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/bt1XBV_HQNuF0cfOkMQydg/runs/0/artifacts/public/build/target.tar.bz2 if you'd like.
Comment 60•3 years ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #59)
That sounds like bug 1759017. The patches in bug 1760222 should fix that. You can test a build with those patches here https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/bt1XBV_HQNuF0cfOkMQydg/runs/0/artifacts/public/build/target.tar.bz2 if you'd like.
I can confirm that this build does resolve all scrolling issues in my case, thanks!
Description
•