Closed Bug 1259781 Opened 4 years ago Closed 4 years ago

[APZ] Autoscrolling sometimes doesn't respect (changing) mouse position

Categories

(Core :: Panning and Zooming, defect)

48 Branch
defect
Not set

Tracking

()

RESOLVED DUPLICATE of bug 1280667
Tracking Status
firefox47 --- unaffected
firefox48 --- fix-optional
firefox49 --- fix-optional
firefox50 --- wontfix

People

(Reporter: arni2033, Assigned: botond)

References

()

Details

(Keywords: regression, Whiteboard: [gfx-noted])

Attachments

(1 file)

>>>   My Info:   Win7_64, Nightly 48, 32bit, ID 20160324030447
Read Note (3) and watch "screencast 1" first, if STR seems too big

STR:
1. Open http://mediametrics.ru/rating/ru/online.html
   Disable auto-update (play-stop button in the header)
   Only for convenience: stop music and video at the right side (if it's playing), open Style Editor,
   add the following stylesheet:   .banners-left-side, .banners-right-side {display:none!important;}
2. Set zoom level to 300% (yes, this is the most reliable STR I found)
3. Scroll the page to the middle-right
  (drag vertical scrollbar thumb to the middle, drag horizontal scrollbar thumb to the end)
4. Hover mouse over the number in any news item
5. Hold mouse wheel ('mousedown' _without_ 'mouseup')
6. Quickly (not super-fast OFC) move mouse pointer to the left by ~150px, then to the right by ~75px
7. While the page is still scrolling, move mouse pointer to the left/right by ~1px

AR:
 After Step 6 Sometimes the page just doesn't scroll,
              sometimes it scrolls at lower speed than it should (according to the mouse position),
              sometimes it even scrolls at wrong direction.
 After Step 7 it always continue autoscrolling in a normal way.

ER:  
 The page should always autoscroll in a normal way

Notes:
1) This may be tricky to reproduce, but once you reproduced it once, it gets easy.
   Out of 10 times in a row I always reproduce some autoscrolling glitch
2) That reliably doesn't happen with APZ disabled.
3) I originally detected another (very similar) bug on that page:
   When I hovered mouse over header, hold mouse wheel, then moved mouse to the bottom by ~100px, 
   I noticed that sometimes the page sometimes just stops or even starts to scroll up (in the
   opposite direction). Every time I encountered it, I never moved in such a strange way as in
   Step 6. I only moved it in one direction. That is way harder to reproduce but has worst effect.

My suggestion is to fix STR above which looks exactly like Note (3), but easier to reproduce.
I played around with this on both Linux and Windows 8 to try and reproduce it, but wasn't able to. Perhaps someone with a slightly different setup might have more luck?
Can you reproduce this on Firefox 47 as well? I'm wondering if it's a regression from bug 1242690.
Flags: needinfo?(arni2033)
This is regression from bug 1242690.
I first found 1-day range, then easily found possible last good and 1st bad builds - and tested them.
On the last good build it's reproducible 0/30 (0%). On the 1st bad build it's reproducible 7/10 (70%)
Regression range:
> https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=0c1e74f83b4187dc5563b2fc4a64f19065132700&tochange=1d931aacb06118b0c182fea0d6d3d529b380cdb6
Blocks: 1242690
Has Regression Range: --- → yes
Flags: needinfo?(arni2033)
Keywords: regression
I occasionally see autoscrolling scroll in the wrong direction while autoscrolling vertically on a long bugzilla.mozilla.org page. Not sure if it's related.
(In reply to Botond Ballo [:botond] from comment #5)
> I occasionally see autoscrolling scroll in the wrong direction while autoscrolling vertically on a
> long bugzilla.mozilla.org page. Not sure if it's related.
If what you do is "autoscroll in one ditection, then move mouse pointer to autoscroll in another direction without releasing mouse wheel, and as result autoscrolling continues in old direction for several seconds", then no, this is not related. I see that bug for a very long time, and I still can't reproduce it reliably. It's quite rare, yet very annoying
Comment 0 only describes strangeness when you only move mouse to one side from autoscroll circle, and
that scenario was broken in bug 1242690. It wasn't fixed in bug 1257862 for sure (I tested again).

So have you managed to reproduce reliably either your bug or this one?
I was able to reproduce this once back around when I posted comment 3 but I can't seem to reproduce it in my local build now. It's probably still an issue, just flaky. I have a theory about why this is happening, see bug 1258851 comment 6. Taking because I'll likely fix the two bugs together somehow.
Assignee: nobody → bugmail.mozilla
Version: Trunk → 48 Branch
I added a bunch of logging and it looks like the mouse events are getting untransformed as I expected, but with autoscroll it's a converging oscillation rather than a diverging oscillation, so it's much less obvious than bug 1258851. Most of the time there's no user-perceptible impact, it's only when the transform moves the event coordinate to the other side of the autoscroll popup that it ends up manifesting to the user. Triggering that manually is pretty hard.

A couple of options I was thinking of for fixing this:

- Set a flag in the APZ code when we enter/exit autoscroll mode so that we don't untransform the events. This seems bad because the mousemove events are also being dispatched to content and those events wouldn't get untransformed either.

- Make sure that the GetScreenX()/GetScreenY() coordinate on the MouseEvent returns a stable value instead of being affected by the untransformation. This seems like a more correct approach but I'm not sure how to implement it yet. The current implementation always uses the refpoint and transforms it based on the page scroll position, and we'd want to skip that somehow.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #8)
> The current implementation always uses the refpoint and transforms it
> based on the page scroll position, and we'd want to skip that somehow.

That's not true; the refPoint isn't transformed by the page scroll position. It just removes the APZ resolution and adds the widget-to-screen offset [1]. If we want to omit the callback transform being applied here (without adding new fields to WidgetEvent), then we would want move the code that applies the callback transform out of APZCCallbackHelper and do it in other places that walk the tree, most notably in nsLayoutUtils::GetEventCoordinatesRelativeTo but possibly other places. There's also a risk that other pieces of code are relying on refPoint directly and those might need updating. On the flip side, this approach has the advantage of allowing us to fix [2] and bug 1224748 along the way, if we can get it working.

Another approach is to keep a copy of refPoint, without the callback transform applied, in the WidgetEvent, and use that in GetScreenX()/GetScreenY(). If we do that we might want to consider moving refPoint out of WidgetEvent and into the specific subclasses that use it (e.g. keyboard events don't need a refPoint). Or introducing a new WidgetPositionedEvent which has a refPoint and sits between WidgetGUIEvent and Widget(Mouse|Touch|...)event in the inheritance hierarchy.

[1] http://mxr.mozilla.org/mozilla-central/source/dom/events/Event.cpp?rev=b6cb07fc550f&mark=915-920#915
[2] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/apz/util/APZCCallbackHelper.cpp?rev=e5232739c23c#461
Alas, I can still hit this a bit too easily on several sites. For example:

1. Go to https://github.com/servo/servo/pull/11058/files?diff=split while signed into GitHub (so that the add comment plus signs show up on the diff)
2. Hold down the middle mouse button starting right next to the watch button at the top 
3. Move the mouse quickly down to ~3/4th of the way down the screen and continue holding the middle mouse button

Results:
The page scroll position will bounce around randomly for a couple seconds until it comes to a rest (sometimes at the top!)

A fun variation:

3. Instead of holding the mouse still continuously move the mouse in a small circle ~3/4th down the screen (with the middle button held)

Results:
The page scroll position will continuously bounce between the top and bottom of the scrollbar until the mouse stops moving.

If I can help with testing or collecting logs, please let me know.
Has STR: --- → yes
(In reply to samuel.l.harrington from comment #10)
> Alas, I can still hit this a bit too easily on several sites. For example:

Thanks! The issue is more reproducible on this site, which will make testing easier.

(In reply to samuel.l.harrington from comment #10)
> If I can help with testing or collecting logs, please let me know.

Thanks for the offer! We have a pretty good understanding of the cause of the issue (described in comment 8 and comment 9), so I don't think we'll need any logs, but once a fix is in place, testing it is always helpful; I'll let you know.
(In reply to (Back on May31) Kartikaya Gupta (email:kats@mozilla.com) from comment #9)
> That's not true; the refPoint isn't transformed by the page scroll position.
> It just removes the APZ resolution and adds the widget-to-screen offset [1].
> If we want to omit the callback transform being applied here (without adding
> new fields to WidgetEvent), then we would want move the code that applies
> the callback transform out of APZCCallbackHelper and do it in other places
> that walk the tree, most notably in
> nsLayoutUtils::GetEventCoordinatesRelativeTo but possibly other places.
> There's also a risk that other pieces of code are relying on refPoint
> directly and those might need updating. On the flip side, this approach has
> the advantage of allowing us to fix [2] and bug 1224748 along the way, if we
> can get it working.

I agree that this is conceptually the cleanest solution, and the one we should aim for in the longer term, especially as we look to support pinch-zooming on desktop. 

As it may also be rather tricky to implement (at least, our work in this area for Fennec has run into a number of complications), I wanted to throw another idea out there:

Would implementing APZ handling of autoscrolling, in a similar fashion to scrollbar dragging, side-step this issue? If so, I wonder if it makes sense to pursue that as a shorter-term fix (with the bonus side effect of making autoscrolling smoother).
I've never read any of APZ code, but on front-end it looks like mouse coordinate (reported by event?) always oscillates/twitches when scroll position changes. If that is possible, then this bug and
bug 1276122 I recently reported are the same. (See also bug 1234988)
(In reply to arni2033 from comment #13)
> I've never read any of APZ code, but on front-end it looks like mouse
> coordinate (reported by event?) always oscillates/twitches when scroll
> position changes. If that is possible, then this bug and
> bug 1276122 I recently reported are the same.

Yes, that sounds like the same issue. Good catch!

> (See also bug 1234988)

That is a different issue; there, the problem is that the page is (via a CSS animation) changing the page size during scolling.
Blocks: 1276122
Actually, there's something I don't understand about the diagnosis of this bug:

(In reply to (Back on May31) Kartikaya Gupta (email:kats@mozilla.com) from comment #8)
> I added a bunch of logging and it looks like the mouse events are getting
> untransformed as I expected, but with autoscroll it's a converging
> oscillation rather than a diverging oscillation, so it's much less obvious
> than bug 1258851.

Seeing as autoscrolling isn't handled in APZ, and the STR does not involve any other APZ scrolling (such as wheel scrolling) - why is there an async transform (that's not the identity) that needs to be unapplied to begin with?
(In reply to Botond Ballo [:botond] from comment #15)
> Seeing as autoscrolling isn't handled in APZ, and the STR does not involve
> any other APZ scrolling (such as wheel scrolling) - why is there an async
> transform (that's not the identity) that needs to be unapplied to begin with?

IIRC the untransform that's relevant here is the callback transform. That's getting set because as the main thread scrolls, the APZ repaint requests get rejected as stale, and so there's a (transient) delta between the APZ and main-thread scroll positions. Note that for dragging the scrollthumb or in the scrollbar track, I added some (hacky) code [1] to explicitly skip the callback transform to avoid this problem. Autoscroll still has the problem though.

[1] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/apz/src/APZCTreeManager.cpp?rev=db473770c2eb#734
Reassigning to botond, he said he'd look into this a bit next week and then we can hash out any conceptual/architectural issues in London.
Assignee: bugmail.mozilla → botond
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #17)
> Reassigning to botond, he said he'd look into this a bit next week and then
> we can hash out any conceptual/architectural issues in London.

Specifically, into the "proper" solution described in the first paragraph of comment 9. We believe that, if it's tractable, this solution would be the ideal one.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #16)
> (In reply to Botond Ballo [:botond] from comment #15)
> > Seeing as autoscrolling isn't handled in APZ, and the STR does not involve
> > any other APZ scrolling (such as wheel scrolling) - why is there an async
> > transform (that's not the identity) that needs to be unapplied to begin with?
> 
> IIRC the untransform that's relevant here is the callback transform. That's
> getting set because as the main thread scrolls, the APZ repaint requests get
> rejected as stale, and so there's a (transient) delta between the APZ and
> main-thread scroll positions. Note that for dragging the scrollthumb or in
> the scrollbar track, I added some (hacky) code [1] to explicitly skip the
> callback transform to avoid this problem. Autoscroll still has the problem
> though.

Thanks! This explanation makes sense :)
This simple proof-of-concept patch makes the testcase in comment 10 behave well. It probably breaks other stuff, including quite possibly Fennec.

Let's see what all it breaks: 
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=81a141afb90d
(In reply to Botond Ballo [:botond] from comment #21)
> Let's see what all it breaks: 
>   https://treeherder.mozilla.org/#/jobs?repo=try&revision=81a141afb90d

And here's a Try push without the patch to use as a baseline, to see what's actually broken by this patch:
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=60de10496304
(In reply to Botond Ballo [:botond] from comment #21)
> Let's see what all it breaks: 
>   https://treeherder.mozilla.org/#/jobs?repo=try&revision=81a141afb90d

It looks like this is causing the following tests to fail:

  dom/events/test/test_wheel_default_action.html
  devtools/client/inspector/test/browser_inspector_highlighter-03.js
  devtools/client/webconsole/test/browser_console_keyboard_accessibility.js
  toolkit/mozapps/extensions/test/browser/browser_uninstalling.js

I'm doing so retriggers to better understand the nature of these failures, but at least the first one appears to be a permafail. I will investigate.
The problem in test_wheel_default_action.html (and likely in the other tests) is that we're now applying the APZ callback transform in places where we shouldn't, that is, in places where we aren't operating on coordinates originated by APZ.
Attempt #2, mutilating Layout code a bit more:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=17ee4fc332ea
(In reply to Botond Ballo [:botond] from comment #25)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=17ee4fc332ea

This one's looking green! There are a few things I'd like to test manually, though, particularly on Android where there has been a suspicious lack of test failures.
(In reply to Botond Ballo [:botond] from comment #26)
> This one's looking green! There are a few things I'd like to test manually,
> though, particularly on Android where there has been a suspicious lack of
> test failures.

As I suspected, the patch breaks Android; specifically, it breaks the test page in bug 1229752, which added some code to APZCCallbackHelper::ApplyCallbackTransform() which this patch removes.
Even if we figure out the Android issues, I don't see this common and nasty enough to warrant a beta uplift.
I'd be ok with duping this to bug 1280667, since the fix on that bug seems to have taken care of this problem.
I'd like to ask arni and samuel to confirm once tomorrow's nightly comes out, and then I'll dupe.
This should be fixed by the patch in bug 1280667, which should be in today's nightly build.

arni, samuel, could you verify that with the latest nightly build you no longer see this problem? Thanks!
Flags: needinfo?(samuel.l.harrington)
Flags: needinfo?(arni2033)
I can confirm that this is fixed for me - thanks Kartikaya and Botond!

Tested on https://github.com/servo/webrender/compare/master...glennw:wr-next-next and my other morning browsing.
Flags: needinfo?(samuel.l.harrington)
Wonderful, thanks! Closing as dupe of bug 1280667.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1280667
(In reply to Ritu Kothari (:ritu) from comment #35)
> Too late to fix in 50.1.0 release

This bug is marked as a duplicate of bug 1280667, which was fixed for Firefox 50 and uplifted to 49 and 48.

Are we tracking some other issue besides the one fixed in bug 1280667 here?
You need to log in before you can comment on or make changes to this bug.