Overscroll sometimes grabs overflow:auto elements that don't have scrollbars (GitHub commit page)
Categories
(Core :: Panning and Zooming, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox109 | --- | fixed |
People
(Reporter: val, Assigned: hiro)
References
(Blocks 2 open bugs, Regressed 1 open bug, )
Details
(Keywords: perf-alert)
Attachments
(4 files)
Found this with code blocks on GitHub commit pages (e.g.) — sometimes scrolling with the touchpad with apz overscroll results in overscrolling these containers instead of scrolling the page. They do have overflow-x:auto set, but don't even actually overflow (no scrollbars are drawn) in my particular case — and I'm scrolling in the Y direction anyway. Seems like it does not happen when overflow-x:auto is removed from these elements.
Screen recording: https://dl.unrelenting.technology/overscroll-wtf.webm
Comment 1•3 years ago
|
||
Ah, I'm able to reproduce if I start with a horizontal swipe and then turn it vertical.
Comment 2•3 years ago
|
||
Adding bug 1720240 to "See Also", I'm not sure if it's actually related but the symptoms are broadly similar.
At the moment I'm without a working device with a precision touchpad, but as that situation improves in the near future I'll try to repro and investigate.
Comment 3•3 years ago
|
||
The severity field is not set for this bug.
:botond, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 5•3 years ago
|
||
I am starting on this.
Assignee | ||
Comment 6•3 years ago
•
|
||
(In reply to Hiroyuki Ikezoe (:hiro) from comment #5)
I am starting on this.
I meant bug 1720240. :)
Assignee | ||
Comment 7•3 years ago
|
||
Now I think bug 1720240 is unrelated to this. Given that setting overflow-x: auto
causes this bug;
overflow-x:auto
makes the target frame scrollable- thus an APZC is created
- the APZC messes up it's overscrollable vertically because it doesn't check overflow-y property
that's my guess.
Assignee | ||
Comment 8•3 years ago
|
||
(In reply to Hiroyuki Ikezoe (:hiro) from comment #7)
Now I think bug 1720240 is unrelated to this. Given that setting
overflow-x: auto
causes this bug;
overflow-x:auto
makes the target frame scrollable- thus an APZC is created
- the APZC messes up it's overscrollable vertically because it doesn't check overflow-y property
that's my guess.
My guess was wrong. It seems to be the correct behavior that setting overflow-x:auto
makes overflow-y:auto too. I can see it on Chrome as well.
So a question here is "why the overflow-x:auto
causes overflowing?". It's probably an issue in nsGfxScrollFrame.cpp.
Assignee | ||
Comment 9•3 years ago
|
||
Looking at the recording in comment 0, I can see overflow
annotation on the element in the devtools panel, so the element is definitely overflowing. An interesting observation on my Linux box, on a local debug build, by default the element is not overflowing at all, i.e. no overflow
annotation appears, with layout.css.devPixelsPerPx:1.1 it's overflowing but I see scroll
annotation instead. I have no idea what's the difference is.
Assignee | ||
Comment 10•3 years ago
|
||
Though I haven't yet been able to reproduce this issue (bug 1751042 either), what I am suspecting is in a certain condition, this mY.CanScroll() call returns true for some reasons. The CanScoll implementation is;
bool Axis::CanScroll() const {
return GetPageLength() - GetCompositionLength() > COORDINATE_EPSILON;
}
Can someone give me the metrics, GetPageLength() and GetCompositionLength() in the function when the unexpected overscroll happens?
Comment 11•3 years ago
|
||
Can you still reproduce this? I'm wondering if something on github changed because I can no longer reproduce.
Reporter | ||
Comment 12•3 years ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #11)
Can you still reproduce this? I'm wondering if something on github changed because I can no longer reproduce.
Yes, though the behavior seems slightly different in ./mach run
of a build just built (I only get caught in the very last code file on the github page, but maaaybe more often/consistently?) vs. the 10 days old build I've been running (same as in the video, i.e. it can happen on any file).
(In reply to Hiroyuki Ikezoe (:hiro) from comment #10)
Can someone give me the metrics, GetPageLength() and GetCompositionLength() in the function when the unexpected overscroll happens?
Both when the overscroll happens and when it doesn't, it CanScroll()
due to a difference of exactly 1 pixel like so:
[Parent 31389: Main Thread]: D/apz.axis 3976ec12ae00|Y PageLen1592.500000 - CompLen1591.500000 -> 1
Assignee | ||
Comment 13•3 years ago
|
||
(In reply to greg v [:unrelentingtech/myfreeweb] from comment #12)
(In reply to Hiroyuki Ikezoe (:hiro) from comment #10)
Can someone give me the metrics, GetPageLength() and GetCompositionLength() in the function when the unexpected overscroll happens?
Both when the overscroll happens and when it doesn't, it
CanScroll()
due to a difference of exactly 1 pixel like so:[Parent 31389: Main Thread]: D/apz.axis 3976ec12ae00|Y PageLen1592.500000 - CompLen1591.500000 -> 1
Thank you, greg! This 0.5 fraction makes me suspect our pixel alignment on the main-thread causes this.
Assignee | ||
Comment 14•3 years ago
|
||
Now I am mostly sure the 1px difference comes from these SnapCoord calls.
I am unsure whether the bottom position snapping is still a thing we need to do in this WebRender era or not. It was introduced in bug 1012752 for the old layers backend. Markus, do you think it's still necessary for WebRender? Or can you please bounce to someone who is familiar with the area?
Assignee | ||
Comment 15•3 years ago
|
||
greg, would you mind trying this patch to make sure the place I am suspecting is the root cause of this issue?
Reporter | ||
Comment 16•3 years ago
|
||
(In reply to Hiroyuki Ikezoe (:hiro) from comment #15)
greg, would you mind trying this patch to make sure the place I am suspecting is the root cause of this issue?
Yep, cannot reproduce with the patch, very nice.
Assignee | ||
Comment 17•3 years ago
|
||
I managed to create a simplified test case. I confirmed that I can see the undesirable overscroll effect on Mac.
STR;
- Open the attachment
- Click somewhere on the green area
- Try to scroll down
Actually result; you can see red area at the bottom of the green area, it means the green box is overscrolled
I am totally unsure this case works as expected on other environments, since the green box's height is 200px and there's another box whose height is 200.45px with red background color. In my theory with 200.5px inner box the issue should be appeared, but with 200.5px there appears the vertical scrollbar.
Assignee | ||
Comment 18•3 years ago
|
||
Now I am mostly 100% sure that this SnapCoord is unnecessary now. The biggest reason I believe it's unnecessary is this aosmond's comment in bug 1574493. From the comment;
- Round in layout space in Gecko. This effectively snaps any primitive rects and clip rects if there are no transforms applied. Most of the time, this is correct.
(snip)
- is fundamentally wrong. This is faking snapping, and works most of the time due to the relative rarity of fractional transforms and offsets.
Proposed solution: Solve 2), 3) and 4), and then we can finally stop doing 1).
Given that bug 1574493 has solved the 2,3 and 4. I guess we can now safely drop the SnapCoord which is one of the parts of where we round metrics in Gecko.
:aosmond, do you think it's reasonable to drop this SnapCoord function which is used to snap the bottom edge of the scrolled rect?
A couple of caveats I've noticed.
- The SnapCoord function was introduced in bug 1012752, and in the bug two different type of reftests were added, one is checking rendering results, the other is checking frame layer invalidation stuff, the latter has been broken on WebRender unfortunately. The broken reftest uses
reftest-no-paint
but on WebRender the element in question will never be painted on the main-thread instead it does create a a WR command for border thus, the reftest-no-paint's element hasn't been worked as expected - I guessed that dropping SnapCoord will fix bug 1736757 (now it's duped as bug 1735008) since the SnapCoord is called with the visualViewport size. But actually it doesn't fix
Assignee | ||
Comment 19•3 years ago
|
||
I am stuck at figuring out the reason why blob-fallback-clip.html reftest fails on Windows without the SnapCoord function. Without the SnapCoord, the testing image is rendered 1px up position. Here is a link to Reftest analyzer to see the difference.
Facts what I've found so far;
- the reference file, blob-fallback-clip-ref.html is not affected by the SnapCoord change, even though the return value of ScrollFrameHelper::GetScrolledRange (which is the only function using the SnapCoord) is affected by the change.
- the test file has a couple of rAF callbacks before rendering the result, whereas the reference doesn't have them. And in fact, with two rAF callbacks the reference starts being affected by the SnapCoord change for some reason.
2-1) scrolled positions with/without two rAF callbacks are pretty much same
2-2) I did double check 1) by adding linear gradient background color to <html> to tell the scroll position differences
2-3) display list items with/without two rAF callbacks are also same
With these facts, I think this failure is more related to text rendering rather than scrolling. I am assuming that text rendering at the very beginning, in this test case, it's the first paint, the rendering result is slightly different from results after the second paint. That said, it's just a guess, it's hard to tell for me whether the test failure is indicating that dropping the SnapCoord function will regress something, or it's just caused by pre-existing issues or not.
Clearning NIs to both Markus and Andrew, I am going to ask Andrew in the next comment to get his attention.
Assignee | ||
Comment 20•3 years ago
|
||
Andrew, can you please shed some light on proceeding this bug?
There are some test failures caused by dropping the SnapCoord function, but other than the blob-fallback-clip reftest failure, I understand why they fail. They just need some tweaks in the test itself.
Assignee | ||
Updated•2 years ago
|
Comment 21•2 years ago
|
||
I meant to respond to this today (actually I wrote a response ages ago that was lost). Will do so tomorrow.
Comment 22•2 years ago
|
||
Text snapping is very complicated. I'm sure there will be interactions with changing the SnapCoord
behaviour, although it isn't immediately obvious to me how the math will work out.
If we are dealing with anything other than a static simple 2d transform, we will snap excluding those effects to minimize the text movement during an animation:
https://searchfox.org/mozilla-central/rev/59f0bf3c13dd455d9f5415b89178de701ea6b850/gfx/wr/webrender/src/prim_store/text_run.rs#319
From the test case, I can only assume it will be snapping in device space, so it will take this path in the shader:
https://searchfox.org/mozilla-central/rev/59f0bf3c13dd455d9f5415b89178de701ea6b850/gfx/wr/webrender/res/ps_text_run.glsl#147
The reference_frame_relative_offset
is just the current offset for the spatial node during scene building:
https://searchfox.org/mozilla-central/rev/59f0bf3c13dd455d9f5415b89178de701ea6b850/gfx/wr/webrender/src/scene_building.rs#699
We do a bunch of snapping inside of the spatial node, which I am not sure is relevant, but they do include the scroll offset (directly or indirectly):
https://searchfox.org/mozilla-central/rev/59f0bf3c13dd455d9f5415b89178de701ea6b850/gfx/wr/webrender/src/spatial_node.rs#450
https://searchfox.org/mozilla-central/rev/59f0bf3c13dd455d9f5415b89178de701ea6b850/gfx/wr/webrender/src/spatial_node.rs#533
https://searchfox.org/mozilla-central/rev/59f0bf3c13dd455d9f5415b89178de701ea6b850/gfx/wr/webrender/src/spatial_node.rs#539
https://searchfox.org/mozilla-central/rev/59f0bf3c13dd455d9f5415b89178de701ea6b850/gfx/wr/webrender/src/spatial_node.rs#471
The external scroll offset contribution to that calculation comes from the ClipManager, which does its own rounding:
https://searchfox.org/mozilla-central/rev/59f0bf3c13dd455d9f5415b89178de701ea6b850/gfx/layers/wr/ClipManager.cpp#360
My first thought would be if we are removing snapping from ScrollOffsetHelper, we may need to reconsider how we snap in the ClipManager and/or the spatial node. I'm less familiar generally with how APZ feeds changing scroll offsets into WebRender but there may be more to consider along those paths.
Assignee | ||
Comment 23•2 years ago
|
||
Thank you :aosmond for the detailed information. I will dive into places where you linked in the next week. Thank you!
(In reply to Andrew Osmond [:aosmond] (he/him) from comment #22)
I'm less familiar generally with how APZ feeds changing scroll offsets into WebRender but there may be more to consider along those paths.
For references/records, I know this bit. The async scroll offset comes from this offset() function and it will be snapped by this snap_offset IIRC.
Assignee | ||
Comment 24•2 years ago
•
|
||
Interestingly on the latest m-c the reftest in question doesn't fail on Windows 11 (at least on my Window 11 laptop), so I wondered it was fixed at some point recently, but unfortunately the reftest still fails on Windows 10 on the latest m-c. So I wondered the failure is caused by something timing specific, then I tried use "MozReftestInvalidate" event instead of onload
, the result was pretty green. So I am going to tweak the reftest rather than diving into details of text snapping on Windows 10.
Assignee | ||
Comment 25•2 years ago
|
||
With onload event handler this reftest isn't in a good state, in other words
it's not stable on all platforms. Specifically on Windows 10 this testing html
is rendered at 1px upper than the reference html without pixel snapping on the
main-thread. Interestingly this reftest doesn't fail at least on my Windows 11
laptop without the pixel snapping. So there's something that metrics are timing
specific (e.g. font loading) to make this test flaky.
Assignee | ||
Comment 26•2 years ago
|
||
With doing SnapCoord we mis-consider non-scrollable frames as scrollable in APZ,
thus it causes unexpected overscroll gutters.
Depends on D161417
Comment 27•2 years ago
|
||
Comment 28•2 years ago
|
||
Backed out for causing wpt/mochitest failures
Backout link: https://hg.mozilla.org/integration/autoland/rev/5650f42da0c8c631066e1a60bd0a036475b10784
Comment 29•2 years ago
|
||
Assignee | ||
Comment 30•2 years ago
|
||
There were more "UNEXPECTED-PASS" on android.
Comment 31•2 years ago
|
||
Assignee | ||
Comment 32•2 years ago
|
||
There were 3 test failures, test_bug842853-2.html, test_bug842853-2.html and test_bug851485.html. All tests use file_bug842853.html. The file_bug842853.html has an anchor node at the bottom of the root scroller, and all the tests try to go to the anchor then scroll back to 500px.
I was optimistic that the scroll position alignment (bug 1774315) will not be affected by dropping SnapCoord
since the SnapCoord is just snapping the scroll container's bottom edge. But in fact in these specific test cases, it affects. The scroll position alignment (ClampAndAlignWithPixels) does;
double currentLayerVal = (aRes * aCurrent) / aAppUnitsPerPixel;
double desiredLayerVal = (aRes * desired) / aAppUnitsPerPixel;
double delta = desiredLayerVal - currentLayerVal;
double nearestLayerVal = NS_round(delta) + currentLayerVal;
aCurrent
is the current scroll position which is in these cases it's the maximum scroll position (affected by dropping SnapCoord), so the result of ClampAndAlignWithPixels is also affected by dropping SnapCoord unfortunately.
Dropping ClampAndAlignWithPixel (bug 1774315) altogether is as hard as this bug since it requires changing our internal representation of style values (see bug 1774315 comment 1). Thus I am inclined to relax these test conditions to allow 0.5 difference.
Though this isn't directly related to this bug, the code in ClampAndAlignWithPixels was introduced in bug 772679, I did confirm that the STR in bug 772679 comment 0 doesn't reproduce the issue with/without ClampAndAlignWithPixels (without SnapCoord) both on Linux and Mac. So I am mostly sure that it was necessary for our old layer backend, but it's no longer necessary. Also note that I am quite unsure the NS_round(delta)
is correct, my naive guess is that it should be delta > 0.0 ? ceil(delta + 0.5) : floor(delta - 0.5)
and clamp it in the scrollable range, otherwise there may be unreachable 1px at the edges.
Comment 33•2 years ago
|
||
Comment 34•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/c381bdd1fea3
https://hg.mozilla.org/mozilla-central/rev/248c025cb922
Comment 35•2 years ago
|
||
(In reply to Pulsebot from comment #33)
Pushed by hikezoe.birchill@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c381bdd1fea3
Use "MozReftestInvalidate" event instead of onload to wait for ready to be
tested. r=tnikkel
https://hg.mozilla.org/integration/autoland/rev/248c025cb922
Stop doing SnapCoord. r=tnikkel
== Change summary for alert #36103 (as of Wed, 16 Nov 2022 21:52:02 GMT) ==
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
7% | bing-search-restaurants ContentfulSpeedIndex | android-hw-a51-11-0-aarch64-shippable-qr | warm webrender | 987.25 -> 913.50 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=36103
Updated•2 years ago
|
Assignee | ||
Comment 36•2 years ago
|
||
Hello Val, can you please double-check that this bug doesn't happen on recent nightlies (March 26th and later I suppose)?
I did revert the change for fixing this bug in bug 1817126. And even without the change, I believe this original bug has been fixed by bug 1782339 in a different way. At least I can no longer reproduce this bug in the test case in comment 17, and as far as I tried, I can no longer see this bug on the github page either. Though I am unsure whether I could see the github issue locally before. :/
Reporter | ||
Comment 37•2 years ago
|
||
Can't reproduce either, everything seems good. :)
Assignee | ||
Comment 38•2 years ago
|
||
Glad to hear! Thank you!
Description
•