Mouse wheel smooth-scroll "jumps forward" every ~250ms with APZ enabled - regression from 2015-08-24

RESOLVED FIXED in Firefox 44

Status

()

RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: avih, Unassigned)

Tracking

(Depends on: 1 bug, Blocks: 1 bug)

unspecified
mozilla44
x86
Windows 8.1
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox44+ fixed)

Details

(Reporter)

Description

3 years ago
Before this date, mouse wheel scroll was very smooth after either a single mouse wheel roll "notch" or in consecutive rolls which keep the page moving. After this date, it seems that the page "jumps" forward in the direction of scrolling every ~250ms.

This doesn't happen when scrolling with the keyboard arrows.

This is unrelated to the rate at which the mouse wheel is rolled and is noticeable either in normal configuration (where I noticed it) when rolling the mouse wheel steadily such that the page keeps moving, or in a configuration where the scroll duration is much longer such that after several quick mouse wheel roll notches the page keeps scrolling for few seconds (the video demonstrates the latter*).

Video: first scroll is after few quick mouse wheel rolls, second scroll is after few quick down-arrow presses, on both cases the input lasts ~500ms and then the page keeps scrolling on its own:
https://people.mozilla.org/~ahalachmi/share/bugs/wheel-jumps_arrows-smooth.mp4

I single-stepped the recording and found out that every 15 or 16 frames the scroll step appears bigger than what it should be.

I bisected it and came up with the following range:

> 9:36.51 LOG: MainThread Bisector INFO Last good revision: f8b3d9a3d5771e23cd99f899315831b97c774a92
> 9:36.51 LOG: MainThread Bisector INFO First bad revision: e13c23f424f664651417fffe914efa7348106249
> 9:36.51 LOG: MainThread Bisector INFO Pushlog:
> https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=f8b3d9a3d5771e23cd99f899315831b97c774a92&tochange=e13c23f424f664651417fffe914efa7348106249

Brian, the entire range appears to be bug 1026803.


* Firefox settings in In order to let it scroll for few seconds after few quick mouse wheel roll notches or arrow-down presses:
> general.smoothScroll.lines.durationMaxMS      = 5000
> general.smoothScroll.lines.durationMinMS      = 5000
> general.smoothScroll.mouseWheel.durationMaxMS = 5000
> general.smoothScroll.mouseWheel.durationMinMS = 5000
(Reporter)

Comment 1

3 years ago
(In reply to Avi Halachmi (:avih) from comment #0)
> Brian, the entire range appears to be bug 1026803.

Which is a bit weird, since I noticed this on Windows (8.1 64, Firefox 32 bits), but this bug seems to address Linux...
OS: Unspecified → Windows 8.1
Hardware: Unspecified → x86
(Reporter)

Comment 2

3 years ago
(In reply to Avi Halachmi (:avih) from comment #1)
> (In reply to Avi Halachmi (:avih) from comment #0)
> > Brian, the entire range appears to be bug 1026803.
> 
> Which is a bit weird, since I noticed this on Windows (8.1 64, Firefox 32
> bits), but this bug seems to address Linux...

I redid the bisection to verify and got the same range.
The testcase in Bug 1202119 may be helpful.
See Also: → bug 1160540
(Reporter)

Updated

3 years ago
Blocks: 1199468
Per discussion with Avi on IRC
Blocks: 1013364
Summary: Mouse wheel smooth-scroll "jumps forward" every ~250ms - regression from 2015-08-24 → Mouse wheel smooth-scroll "jumps forward" every ~250ms with APZ enabled - regression from 2015-08-24
(Reporter)

Comment 5

3 years ago
Brian, any idea what might be going on here? Any reason to suspect it might not be bug 1026803 afteral?
Flags: needinfo?(bbirtles)
(In reply to Avi Halachmi (:avih) from comment #1)
> (In reply to Avi Halachmi (:avih) from comment #0)
> > Brian, the entire range appears to be bug 1026803.
> 
> Which is a bit weird, since I noticed this on Windows (8.1 64, Firefox 32
> bits), but this bug seems to address Linux...

That bug refactored the code used on Windows so it could be shared with Linux. It made some changes to the behavior on Windows too.
(In reply to Avi Halachmi (:avih) from comment #5)
> Brian, any idea what might be going on here? Any reason to suspect it might
> not be bug 1026803 afteral?

It's highly likely to be bug 1026803. Are you able to bisect that range to the actual changeset?
Flags: needinfo?(bbirtles) → needinfo?(avihpit)
I guess it's either:
* part 3,
* part 6 (most likely), or
* part 8b (unlikely)

I really hope it's not part 6.
(Reporter)

Comment 9

3 years ago
I used mozregression and this is the highest resolution it got with pre-built binaries of nightlies and inbound.

My build machine is slow so it would be a hassle to build all those revisions (even incrementally), but I can do try pushes of all of them and use the binaries.

However, I'm not able to push to try right now, so if you'd do that for me and create win32 builds, I'll test them tomorrow, otherwise I'll push to try tomorrow.
Flags: needinfo?(avihpit)
(Reporter)

Comment 11

3 years ago
(In reply to Brian Birtles (:birtles) from comment #8)
> I guess it's either:
> * part 3,
> * part 6 (most likely), or
> * part 8b (unlikely)

(In reply to Brian Birtles (:birtles) from comment #10)
> part 1:
> https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/
> bbirtles@mozilla.com-00a0bf4154e1
> ...

Thanks.

Parts 1 and 3 are OK, parts 6 and 8b have the bug. So it seems 6 indeed.
It may be worth testing with NowLoRes() replaced with Now() to remove some possible effects of bug 1026765.
(Reporter)

Comment 13

3 years ago
FWIW, I reproduced the issue (i.e. it always happens) on different Windows systems, so it's probably fair to say that many users are affected by this. Tested on:

1. win 8.1 64 - i7-4500U
2. win 10  64 - i3-5010U
3. win 10  32 - Atom z3740

All systems using the integrated GPU.

Comment 14

3 years ago
I too experince this behavior. And just as well, parts 1&3 are unaffected while part 6 has the bug.

Specs for reference:

Win7 64 - i7-2630qm
nvidia gtx560m - no integrated gpu
(Reporter)

Comment 15

3 years ago
[Tracking Requested - why for this release]:

APZ is planned to ride the trains with Firefox 44 (according to jrmuizel), and so this bug will be there as well.

The bug also affected Firefox 43 nightly, but AFAIK 43 will not ship with APZ and so will not have release users affected by it. I didn't check if it affects 43 Aurora and I don't know if it will affect 43 beta.

It's visible to users and considerably degrades the behavior of smooth scroll using the mouse wheel - which is probably one of the most common methods to scroll the page.

Comment 13 and comment 14 suggest that many users might be affected by it. I don't know if it affects OSs other than Windows.
tracking-firefox44: --- → ?
(In reply to Avi Halachmi (:avih) from comment #15)
> APZ is planned to ride the trains with Firefox 44 (according to jrmuizel),
> and so this bug will be there as well.

That's what we're aiming for, but I don't intend to let APZ ride the trains unless all the bugs blocking 1178298 are dealt with one way or another. Adding this one to that pile.
Blocks: 1178298
No longer blocks: 1013364
Comment hidden (obsolete)
(Reporter)

Comment 18

3 years ago
(In reply to Karl Tomlinson (ni?:karlt) from comment #17)
> Given APZ behaviour seems to depend on event timestamps, it should be
> retested with dom.event.highrestimestamp.enabled false before jumping to the
> beta train.

I tested with dom.event.highrestimestamp.enabled = false (had it as true by default), and it does _not_ fix the bug.

FWIW, Not sure if interesting or not, but I _think_ I don't see the bug at http://dbaron.org/log/20150916-compositor-animations-smoothness-demo i.e. as far as I can tell the animation is 100% smooth, and regardless if dom.event.highrestimestamp.enabled is true/false.
(In reply to Karl Tomlinson (ni?:karlt) from comment #17)
> Bug 1026803 should not have any effect on beta or release builds unless bug
> 1026804 is fixed.

Unless APZ is using the internal TimeStamp member, which seems possible. We currently store the timestamp in two formats but the pref determines which one we actually report back to the DOM (as well as the format we use for reporting it).

(In reply to Avi Halachmi (:avih) from comment #18)
> FWIW, Not sure if interesting or not, but I _think_ I don't see the bug at
> http://dbaron.org/log/20150916-compositor-animations-smoothness-demo i.e. as
> far as I can tell the animation is 100% smooth, and regardless if
> dom.event.highrestimestamp.enabled is true/false.

Compositor animations should have no relationship to event timestamps.
(Reporter)

Comment 20

3 years ago
(In reply to Brian Birtles (:birtles) from comment #19)
> Compositor animations should have no relationship to event timestamps.

As far as I can tell, it happens with the animation itself, and while I can't tell about events in general, I do think it's not directly related to the wheel scroll events.

As comment 0 states (the video shows this too), when setting the animation duration to last few seconds, and rolling the mouse wheel few quick notches, it keeps animating for several seconds without any further inputs, and still exhibits this "jump" every ~250ms.
(In reply to Brian Birtles (:birtles) from comment #19)
> (In reply to Karl Tomlinson (ni?:karlt) from comment #17)
> > Bug 1026803 should not have any effect on beta or release builds unless bug
> > 1026804 is fixed.
> 
> Unless APZ is using the internal TimeStamp member

It does: https://dxr.mozilla.org/mozilla-central/rev/6256ec9113c115141aab089c45ee69438884b680/gfx/layers/apz/src/AsyncPanZoomController.cpp#1615

Comment 22

3 years ago
As per comment 15, marking status affected for FF 44 and setting tracking flags for 44 because it has potential user impact especially for Windows.
status-firefox44: --- → affected
tracking-firefox44: ? → +
I started looking into it and here's what I've found so far.

Narrowing down what part of https://hg.mozilla.org/mozilla-central/rev/1bf77ab3474c caused this it basically comes down to changing the following calculation:

    Time timeSinceReference = aTime - mReferenceTime;
    return mReferenceTimeStamp + TimeDuration::FromMilliseconds(timeSinceReference);
    
to:
    
    Time timeToNowByTimeStamp =
      static_cast<int64_t>((roughlyNow - mReferenceTimeStamp).ToMilliseconds());
    Time deltaFromNow = timeToNowByTimeStamp - timeSinceReference;
    return roughlyNow - TimeDuration::FromMilliseconds(deltaFromNow);

Now, ignoring imprecisions introduced in conversion routines, these would seem to be equivalent:

i.e. call the old behaviour A, and the new B

    A = mReferenceTimeStamp + timeSinceReference

    B = roughlyNow - deltaFromNow
      = roughlyNow - (timeToNowByTimeStamp - timeSinceReference)
      = roughlyNow - ((roughlyNow - mReferenceTimeStamp) - timeSinceReference)
      = roughlyNow - (roughlyNow - mReferenceTimeStamp - timeSinceReference)
      = mReferenceTimeStamp + timeSinceReference
      = A

One inaccuracy is that when we calculate deltaFromNow, Time is a uint32_t so if timeToNowByTimeStamp > timeSinceReference it wraps and we end up with a large number. This is easily fixed (and possibly already is) but doesn't appear to affect the result.

However, if I use TimeStamp::Now() instead of TimeStamp::NowLowRes() to calculate roughlyNow, the issue goes away. Based on the calculation above I wouldn't expect that to matter but it does appear to.
(In reply to Brian Birtles (:birtles) from comment #23)
> However, if I use TimeStamp::Now() instead of TimeStamp::NowLowRes() to
> calculate roughlyNow, the issue goes away. Based on the calculation above I
> wouldn't expect that to matter but it does appear to.

Bug 1026765 means it does.
Depends on: 1026765
(In reply to Karl Tomlinson (ni?:karlt) from comment #24)
> (In reply to Brian Birtles (:birtles) from comment #23)
> > However, if I use TimeStamp::Now() instead of TimeStamp::NowLowRes() to
> > calculate roughlyNow, the issue goes away. Based on the calculation above I
> > wouldn't expect that to matter but it does appear to.
> 
> Bug 1026765 means it does.

I'm not sure really. We don't call NowLoRes twice. In the very-much reduced code I'm debugging we call it once, store the result, and then perform a calculation that should eliminate its value, whatever it is.

What I can see is that sometimes methods A and B from comment 23 differ slightly, probably due to further int-float-int conversions, but it's always within 1ms. What's interesting is that we only appear to be hitting the timestamp conversion code at the beginning of the scroll. After that the scroll runs by itself and yet it still seems to show the jumping behavior.

It might be possible that we're getting a slightly inaccurate time for one of the events that sets up the scroll and that sets up a pattern for the animation that causes it to jump. Basically a bad input to the velocity calculation for the scroll animation.
If I return the timestamp from A everything works fine.
If I return the timestamp from B, it jumps.
If I do:

  if (a != b) {
    b = a;
  }
  return b;

then it jumps.

I guess it's something in the Windows implementation of TimeStampValue which might come back to bug 1026765 after all.
(In reply to Brian Birtles (:birtles) from comment #26)
>   if (a != b) {
>     b = a;
>   }
>   return b;

This is failing because a != b returns false because the equality operator is actually "roughly equal" operator. It is based on using operator- which given two operands with identical GTC values but one with a QPC value and one without, will return zero. As a result operator== will return true even when the two objects are not actually equal.

In this case, the timestamp calculated using the 'B' code path will have no QPC value because it is based on a value calculated from TimeStamp::NowLoRes().
I think the basic problem here is that we're mixing timestamps that use GTC values only, and timestamps that also have a QPC value. mReferenceTimeStamp is based on TimeStamp::Now() and hence stores a QPC value. However, we subtract it from roughlyNow which is based on TimeStamp::NoLowRes(). Although I still haven't pin-pointed the exact nature of the inaccuracy it's clear that doing arithmetic with these two types of values is introducing error which is triggering the jumpy behavior.

I originally chose to use TimeStamp::NoLowRes() because I anticipated a flood of events where the cost of TimeStamp::Now() might be significant. That's not based on any data however, just a hunch, and I'm now inclined to think we should just use TimeStamp::Now() here. I'll keep trying to dig into this but for now, Karl, what do you think about switching to calling TimeStamp::Now()?
Flags: needinfo?(karlt)
(In reply to Brian Birtles (:birtles) from comment #28)
> Karl, what do you think about switching to calling
> TimeStamp::Now()?

I think that is the fastest way to a solution here.
durations from Now only can't be mixed with those involving NowLoRes, so if the TimeStamp returned here may be compared to one from Now, then it must not involve arithmetic with a TimeStamp from NowLoRes().

That's a r+ on a patch to change the NowLoRes() call to Now().
Some comments will need updating I expect.
Flags: needinfo?(karlt)
Thanks Karl. I've thrown a patch at try for now. If all look ok, I'll fix up comments and land next week:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=38b04db643dd
https://hg.mozilla.org/mozilla-central/rev/c9af4f7acba9
Status: NEW → RESOLVED
Last Resolved: 3 years ago
status-firefox44: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
You need to log in before you can comment on or make changes to this bug.