Closed Bug 1160540 Opened 6 years ago Closed 6 years ago

APZ scrolling with mouse wheel becomes choppy after some time with general.smoothScroll enabled

Categories

(Core :: Panning and Zooming, defect)

40 Branch
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: jastekken, Unassigned)

References

Details

Attachments

(3 files)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0
Build ID: 20150501030205

Steps to reproduce:

Set layers.async-pan-zoom.enabled to true and restart the browser.
Set mousewheel.system_scroll_override_on_root_content.vertical.factor to 600 (this makes the issue easier to notice, but isn't really needed)
Browse the web normally, reddit, deviantart, imgur, news sites, blogs...


Actual results:

After ~20 minutes it begins to be noticeable that if I input a scroll step with mouse wheel (the wheel has steps) before previous scroll animation has finished then there appears a discontinuity in the animation.

The discontinuity starts barely noticeable but it gets bigger over time. It kind of looks like as if the new scroll animation starts before the old has finished and so, the page appears to jump to the location where the new one is supposed to start if the first would've finished. But this might not be accurate as it is very hard to follow what actually is happening.

This might or might not be related but after even more time (over an hour) the first scroll event starts to be visibly delayed. This effect also starts barely noticeable but gets eventually to the point where it takes over half a second before the first scroll animation begins.


Expected results:

Scrolling should stay smooth like it is when the browser has just started.
If you discover any other ways to make this more easily reproducible please let us know. If it takes over an hour to reproduce it's going to be very time-consuming to fix and so will be lower-priority for us.
Video in case people cannot reproduce this (clean profile behaves just similarly). 

Should be noted that this is easily visible after just 15 minutes. Interestingly the Firefox window doesn't even need to be active while waiting. So, I could start Firefox and then work on some other window for 15 minutes, then activating Firefox window and the choppiness is there.

Any ideas what settings could affect this? I've tried going through apz preferences but none seems to have any effect. Do some preferences need a browser restart to take effect, like layers.async-pan-zoom.enabled does?
Thanks! It's too bad that the symbol server is broken at the moment, so the profile only contains raw addresses and no function names. However, I think we can still get some information from it.
The timeline around
> http://people.mozilla.org/~bgirard/cleopatra/#report=c7eecb2224fc03b565d1f8da6a34da9031502722&filter=[{%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A2248609,%22end%22%3A2252959},{%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A2248938,%22end%22%3A2249873}]
is a little confusing: First, we keep compositing for a while during which the content thread just spins and doesn't process any repaint requests. Then, we pause compositing while the content thread processes a few refresh driver ticks, but none of those do any painting (otherwise we'd see a layer transaction in the compositor thread). And during the pauses, each thread seems to just be spinning its event loop.

My only explanation for this is that we just stop receiving events for a while, during the pauses. It's not a very satisfying explanation. I think we need to reproduce this in a debugging environment if we want to track down the cause.
Severity: major → normal
The behavior is exactly the same if I use touchpad (Synaptics) to scroll, so this is probably not a mouse driver issue.

In the first comment I wrote about the first animation being delayed. Nightly has been open for around 4 hours. And first scroll animation appears with ~1 second delay now. But if I input two scroll steps in very quick succession then the scroll animation appears instantly after the second one has registered, although the animation appears jumpy like in the video.

This leads me to believe that the second animation starts as it should, at the correct spot in the page and at the correct time in relation to the actual input. But the first animation gets more and more delayed after time. At first the delay is so short that it can't be perceived itself. But if I input two scroll steps, then the delay of the first animation can be seen as a discontinuity between the first and second scroll animation, because at the time the second one is supposed to start the actual first animation hasn't finished yet.
Checking in, this issue is still reproducible for me in latest Nightly. I have not found any setting in Firefox or Windows that has any noticeable effect on this problem. However I was able to uncover that there is a direct relation between this issue and "scroll" and "wheel" events.

As Firefox up-time increases, the time between "wheel" and "scroll" events triggering increases too, at a reasonably stable rate of ~100ms per hour. "Wheel" event triggers instantly after physical wheel input, but "scroll" takes increasing amount of time to happen. Measured with performance.now() the time between "wheel" and "scroll" is initially something around 20ms but that is probably subject to all kinds of inaccuracies.
This bug does also happen without hardware acceleration. I had thought APZ requires hwa and thus didn't try this earlier. I had figured this being a gpu driver issue but maybe that's not the case. Is it safe to assume this is not, since it happens without hwa?

Further information of what's happening in the jumpy animation:

1st wheel input - wheel event fires but scroll animation doesn't play yet.

Input 2nd wheel step before the animation of the first input has even begun - the view jumps to the point where it would have been if the first animation would have started correctly. It stops there for a moment before continuing smoothly to the endpoint of the second scroll event. Anyway, the page is scrolled the right amount even if the animation is jumpy.
Although I haven't been able to repro this I'm moving it to block all-aboard-apz because we should figure out what's going on here (at least understand the circumstances it occurs in) before we let APZ ride the trains.
Blocks: all-aboard-apz
No longer blocks: apz-desktop
I was kind of hoping that someone could repro this now that apz got enabled in nightly. A bit weird, because this behavior is really consistent for me. I've been running Nightly exclusively with apz enabled since may, and this happens in each session. 

Only time different was in build of July 27th (or 28th, I was in a hurry and couldn't debug at the time), where instead of the "delayed" first animation the page jumped right to the endpoint of the scroll - no delay but no animation either. I've been trying to figure out what I did that might've caused this, but with no luck.

Additionally, I want to add that this same scrolling behavior happens in chrome content too. Such as browser console, inspector, addons manager etc. But not on TabsToolbar if it is overflowing.
(In reply to jastekken from comment #14)
> Only time different was in build of July 27th (or 28th, I was in a hurry and
> couldn't debug at the time), where instead of the "delayed" first animation
> the page jumped right to the endpoint of the scroll - no delay but no
> animation either. I've been trying to figure out what I did that might've
> caused this, but with no luck.

Apparently this happens if Nightly uptime is long enough. The browser has been running for 3 hours. The last 2-ish hours the computer was on sleep though. When I woke my computer up the delay from wheel event to first scroll event is about two seconds. This is much longer than what it was (<0.5s) than what it was before I put my computer to sleep. 

However, should I input new wheel event before this delayed scroll happens the page will jump the amount of lines that one scrolling should. This basically looks exactly like non-smooth scrolling does. With the exception that the last delayed scroll event happens after those multiple non-smooth scrollings.

Isn't it a bit weird though, that this delay increases even if the computer is on sleep?

 This behavior is with latest Windows 7 64bit Nightly and I don't have a reason to believe any Windows Nightly since May would behave differently. Other symptoms that I have tried to describe have been the same in all Nightly versions (32bit and 64bit).
(In reply to jastekken from comment #15)
> However, should I input new wheel event before this delayed scroll happens
> the page will jump the amount of lines that one scrolling should. This
> basically looks exactly like non-smooth scrolling does. With the exception
> that the last delayed scroll event happens after those multiple non-smooth
> scrollings.

I should have added here that the delayed scroll animation still reserves the lines it want's to scroll. So if I scroll down those direct jumps cannot get me to the end of the page. Instead the browser waits for the delayed animation that will then get me to the end. But, if I change the direction the view will jump to the end and then wait those two seconds before it scrolls up those few lines.
Thanks for the info you've provided so far. I've been running Nightly and local builds on Windows with APZ enabled and haven't yet seen this behaviour. At some point I'll probably make a try build with extra logging that you can run which will hopefully give us more information.
I'm able to reproduce this fairly easily on my desktop with Windows 7. It seems to happen whenever the main thread is busy. Also disabling "smooth scrolling" helps a lot. I don't have to wait very long to reproduce this.
I should also note, disabling APZ also shows lots of jank and pauses. I wonder if the pause feels worse since with APZ, things are smooth then it has an abrupt pause, versus w/o APZ, it's just a constant terrible jank.

I can't reproduce this on Windows 10.
Hm, this might be the same issue that smaug reported in bug 1190064, although that was on win 8.1.
(In reply to Mason Chang [:mchang] from comment #18)
> I'm able to reproduce this fairly easily on my desktop with Windows 7. It
> seems to happen whenever the main thread is busy. Also disabling "smooth
> scrolling" helps a lot. I don't have to wait very long to reproduce this.

Oh boy... It seems I have forgotten to say that this whole issue does not happen without smooth scrolling. Without smooth scroll there is no delay after actual input, nor there is any unwanted jumpiness as far as I can see. For example, my current Nightly session has been up for about an hour without smooth scrolling - no issues. But when I toggle the setting in preferences->advanced the delay between actual input and the smooth scroll animation can be perceived. So this delay builds up all the time even if smooth scrolling isn't enabled.

But I'm not sure if my issue is related to busy main thread based on the fact that on my system this issue happens on each and every single one scroll input. And frankly, I don't believe the main thread is busy all the time.
Thanks, that's good to have confirmed. It should help narrow down the possibilities.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: APZ scrolling with mouse wheel becomes choppy after some time → APZ scrolling with mouse wheel becomes choppy after some time with general.smoothScroll enabled
I have an idea: The timestamp that we set on the wheel event might be wrong, or drifting out of sync with TimeStamp::Now(). The wheel scroll animation computes its end time by doing wheelEvent.timeStamp + animationDuration, and on each animation sample compares the sample time to that.
I also wouldn't be surprised if there is some difference from the Timestamp given to the compositor that APZ uses to calculate the scroll position from vsync versus the wheel timestamp. I've seen a couple of millisecond delay's between when the Compositor actually gets around to compositing and the vsync timestamp, which may throw off the calculations during smooth scrolling.
The discrepancy reported here sounds like more than just a couple of milliseconds, though.

Can somebody test this build, please?
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mstange@themasta.com-16c110ea9dce/try-win32/firefox-42.0a1.en-US.win32.zip
(In reply to Markus Stange [:mstange] from comment #27)
> The discrepancy reported here sounds like more than just a couple of
> milliseconds, though.
> 
> Can somebody test this build, please?
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mstange@themasta.
> com-16c110ea9dce/try-win32/firefox-42.0a1.en-US.win32.zip

I've been running that build for half an hour now and there is no sign of issues. By that time my issues have been very much visible previously. I'll continue for couple of more hours, but for now it looks really promising.
I'm happy to say this try build completely fixes the issue I've been having. I left the build running overnight (some hours of which the computer was on sleep) and there is no sign of choppiness. Smooth scroll is smooth as butter.

I just wonder what would cause the timestamp to drift such a large amount on this particular system? Feels weird that only very few systems are affected.
I don't have my Windows 7 desktop with me this weekend, so I'll try tomorrow when I'll have access to the desktop again.
(In reply to Markus Stange [:mstange] from comment #27)
> The discrepancy reported here sounds like more than just a couple of
> milliseconds, though.
> 
> Can somebody test this build, please?
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mstange@themasta.
> com-16c110ea9dce/try-win32/firefox-42.0a1.en-US.win32.zip

I tried this build and it seemed to be a little better. However, there are still sites where I can reproduce the glitch in the middle. However, on today's nightly (08/10/15 - US date format), the issue seems to have gotten a lot better in general.

Can anyone reproduce the pause in scrolling while browsing this site:

http://arstechnica.com/gadgets/2015/08/review-new-180-moto-g-is-a-stylish-upgrade-worthy-of-the-original/

I'm getting a pretty distinctive jank when scrolling between the "Look and Feel" and "Software" part of the page.
This is the patch that I sent to try.

The only reason I can imagine why this helps is that on the affected machines, GetTickCount() doesn't advance at a rate of 1ms per "tick".
Attachment #8645772 - Flags: feedback?(bbirtles)
Here's an alternative patch. What do you think about this, Brian?

The comment mentions "consistency" as the reason for not calling GetTickCount() for every event. I think this patch should preserve consistency without all the complexity. Do you agree?
Attachment #8645776 - Flags: feedback?(bbirtles)
Some more comments on the simplification patch: The current code calls TimeStamp::Now() instead of NowLoRes for the first event. I don't really see the benefit of doing that since that value is compared to a LoRes timestamp anyway, so the added precision is thrown out of the window.
Also, calling GetTickCount is apparently extremely cheap.
Can you still reproduce your problem on the latest nightly? Thanks!
Flags: needinfo?(jastekken)
(In reply to Mason Chang [:mchang] from comment #36)
> Can you still reproduce your problem on the latest nightly? Thanks!

I can. Also, I don't experience any additonal pauses (On latest Nightly or that try-build) while scrolling the arstechnica article you linked, so I wonder if it's a different issue altogether.

(In reply to Markus Stange [:mstange] from comment #37)
> Here's a build with the simplification patch. Can you test that one, too?
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mstange@themasta.
> com-ea8103e59f99/try-win32/firefox-42.0a1.en-US.win32.zip

I don't have time now to see how it behaves on long run. But the scroll behavior on that new build is different. Worse, in my opinion. There is a small discontinuity in velocity if I input two or more scroll steps in succession. In the previous try build the velocity stayed smooth between inputs, only the endpoint was changed.

I'll get back to some more testing tomorrow.
Flags: needinfo?(jastekken)
(In reply to Markus Stange [:mstange] from comment #37)
> Here's a build with the simplification patch. Can you test that one, too?
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mstange@themasta.
> com-ea8103e59f99/try-win32/firefox-42.0a1.en-US.win32.zip

I tried this patch. So far, I haven't been able to reproduce the issue with this build after about 2 hours. I was actually able to reproduce the issue once on Nightly after a couple of hours :/. I agree with comment 38 though, that the scrolling feels different and it's kind of worse. It feels like it caps out and hits a weird plateau.
I'm going to try and finish off the patches for bug 1026803 as a way of getting my head around what that code is supposed to be doing and to see if they help.
Markus, it's going to be a few days before I can answer your questions. All I can say is that much of the code in those patches is already included in part 1 from bug 1026803 (attachment 8646152 [details] [diff] [review]). You might like to try that and see if it helps. I'm going to get those patches up to date and try to get them landed.
Oh, wait, actually part 6 from that bug too which is what I'm currently trying to fix.
Comment on attachment 8645772 [details] [diff] [review]
periodically re-sync GetTickCount() and TimeStamp::Now()

Hi Markus, can you try with the updated patches from bug 1026803? We should be detecting this sort of skew automatically.
Attachment #8645772 - Flags: feedback?(bbirtles)
Attachment #8645776 - Flags: feedback?(bbirtles)
Thanks Brian!

jastekken, can you try again with this build?
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bbirtles@mozilla.com-ec77f5c19262/try-win32/firefox-43.0a1.en-US.win32.zip

This is a build from one of the try pushes for bug 1026803 that includes part 6 (the clock skew fix).
(In reply to Markus Stange [:mstange] from comment #44)
> Thanks Brian!
> 
> jastekken, can you try again with this build?
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bbirtles@mozilla.
> com-ec77f5c19262/try-win32/firefox-43.0a1.en-US.win32.zip
> 
> This is a build from one of the try pushes for bug 1026803 that includes
> part 6 (the clock skew fix).

The behavior on this build looks the same as with previous try (ea8103e59f99). I cannot see any delay, it certainly doesn't build up like with the original issue. Still, if the scroll animation is running and new wheel event is input, then there is noticeable change in velocity (Or small, but constant jump in the actual offset on the page, but I cannot distinguish which one it is) . This in turn makes the scrolling different kind of choppy. Note, that the first try you provided newer suffered from this issue.
It seems this has some connection to e10s. I cannot reproduce this issue with Nightly if e10s is turned off. But if e10s is enabled then the new non-e10s windows will still suffer from this. Same with Nightly from July 16th. 

Interestingly, Aurora build from mid-May which I had lying around did not suffer from this issue in any non-e10s windows (even if e10s was normally enabled). I would probably be really time consuming to find when exactly this changed so I'm not going to do that unless it's important information.
So bug 1026803 landed for latest Nightly. This effectively fixes the cumulative delay problem which is nice. But the behavior described in comment 45 happens instead. Although, it might not be related to additional scroll inputs.

 To test this, I set general.smoothScroll.mouseWheel.durationMaxMS to 5000 and mousewheel.system_scroll_override_on_root_content.vertical.factor to 7500. Outrageous numbers, but this way I could easily follow the behavior.

This "jump" happens maybe 3-4 times each second but the interval is consistent. Actual framerate measured by layers.acceleration.draw-fps shows no hiccups though, just 60-61 fps as usual. 

Additionally, using such outrageous number for scrolling another issue that could be related (this happens only with apz enabled): Scrolling on a not-overflowing page such as about:home (and rather interestingly, window titlebar) leads to increased cpu and gpu usage. layers.acceleration.draw-fps also shows increased fps which lasts for the time specified with general.smoothScroll.mouseWheel.durationMaxMS.
On the latest Nightly I am seeing some strange APZ scrolling - it seems like when you smooth scroll with the mouse wheel it's skipping frames and it's very obvious. 
All APZ settings in about:config are default.

Maybe the latest fixed that god landed in Nightly could cause this?
Brian, can you look into this some more? It sounds like the drift detection from your patch isn't good enough, or some other noise is introduced into the timestamps. At least that's what I'm concluding from the fact that jastekken reports that the current state is worse than with my first try build (with the patch in comment 32).
Flags: needinfo?(bbirtles)
Bas is working on a patch for mousewheel scrolling choppiness in bug 1199468. Note that bug 1199468 is unrelated to APZ
(In reply to Markus Stange [:mstange] from comment #49)
> Brian, can you look into this some more? It sounds like the drift detection
> from your patch isn't good enough, or some other noise is introduced into
> the timestamps. At least that's what I'm concluding from the fact that
> jastekken reports that the current state is worse than with my first try
> build (with the patch in comment 32).

I'm afraid it's going to be difficult for me to have a proper look at this, at least during Q3. I'm not really sure how APZ is using timestamps so it's hard to guess how the native UI event time to TimeStamp code might be affecting this. Karl may have a better idea.

The next step might be to add some debugging output to widget/SystemTimeConverter.h to indicate when we adjust for forwards skew and backwards skew and see if they correspond with the jumps. Also, we might see if there is an async method for getting the current time on windows like we use on Linux.
Flags: needinfo?(bbirtles)
The lack of a winnt GetTimeAsyncForPossibleBackwardsSkew() implementation
means that skew correction is implemented in one direction only.
However, the report of correction of cumulative delay sounds like it happens
to be the direction that helps.

If the skew in this direction is large then the result is essentially that
event timestamps represent times of processing the events instead of
hardware (or OS) event times.  However, the skew rates reported here don't sound so large that I would guess that effect.  I suspect something else might be involved.
I don't know whether this is involved here but
https://bugzilla.mozilla.org/show_bug.cgi?id=1026765#c4 means that code needs
to be careful not to mix the two versions of Timestamp if it wants sane
results on winnt.
Duplicate of this bug: 1200135
I don't really know if it's related to APZ, but seems like the best way to see the drop of frames I have mentioned in comment #48 is to update some add on that has a change log which can be viewed by the add-on manager.

When you click on the "Show Release Notes" button, you can see that the animation of opening the release notes drops frames, and on the hide animation also.
Bug 1202119 has a nice testcase for checking mouse event timestamps.
I'd guess that wheel event timestamps are similar.
I am curious does anyone who has experienced this problem have OMTC disabled? I have OMTC disabled and I was just told this in a different bug:

> The APZ feature
> (which is enabled by default on nightly only) requires OMTC in order to work
> properly. If you are disabling OMTC then you should disable APZ as well, by
> setting the layers.async-pan-zoom.enabled pref to false. We do not intend
> (and cannot) support APZ without OMTC.
(In reply to Ray Satiro from comment #57)
> I am curious does anyone who has experienced this problem have OMTC
> disabled? I have OMTC disabled and I was just told this in a different bug:
> 
> [bug]

No. I have been running OMTC enabled the whole time. Do new Nightlies even allow turning it off? about:support reports OMTC being enabled even if layers.offmainthreadcomposition.enabled is set to false (and restarted). No difference in behavior though.
See Also: → 1207412
I've not experienced the original issue after landing of bug 1026803. And it seems the jumpiness that remains is indeed the same issue as reported on bug 1207412.

So I guess this bug could be marked as resolved?
Sounds good to me.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.