Closed Bug 1209970 Opened 9 years ago Closed 8 years ago

Main thread scroll event latency should be uniform

Categories

(Core :: Panning and Zooming, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
firefox46 --- fixed

People

(Reporter: mstange, Assigned: kats)

References

(Blocks 2 open bugs)

Details

Attachments

(4 files, 3 obsolete files)

Attached file testcase
The attached testcase visualizes APZ latency. The black part is moved by APZ, and the red part is moved by JavaScript responding to the "scroll" event. In an ideal world, the two parts would move completely in sync, but we don't get to have that with async scrolling.
So the next best thing we can strive for is to have the main thread consistently be one frame behind, or even two frames if one frame isn't possible - but the latency should be consistent. In the testcase, this would lead to the red circle lagging behind but still moving smoothly.

However, what I'm seeing in this testcase is that the red part is really jumpy. In a screen recording it looks like it misses every third frame.

It shouldn't be jumpy.
Fixing this will probably help with some of the parallax bugs - the ones that rely on scroll event handlers.
Blocks: apz-parallax
As a random piece of info - the test case completely fails to move the red dot in Safari.
In Edge the behavior is similar to Firefox, but the objects stay much closer together. Chrome moves them perfectly in sync.
Is chrome using async scrolling?
My guess is no, in past testing it looked like they strictly defer to scroll handlers. On this page they are repainting on every scroll.
(In reply to Milan Sreckovic [:milan] from comment #2)
> As a random piece of info - the test case completely fails to move the red
> dot in Safari.

It works in Safari 9. I think Safari < 9 doesn't support document.scrollingElement, and I didn't account for that possibility when reading the current scroll position.
Depends on: 1221069
I grabbed a devtools profile of scrolling this page and the scroll events are definitely not as regular as we would like them to be. See attached screenshot for example, where there appear to be two scroll events dispatched in quick succession. We should look into why this sort of things happens and hammer them out. Based on a brief discussion with mstange yesterday there should be code in nsGfxScrollFrame which holds on to scroll events until a will-paint notification or something, and since painting should be lined up with vsync the scroll events should be too. If this is not the case we should make it so.
So actually it looks like when the scroll event is queued up in ScrollFrameHelper::PostScrollEvent, the call at [1] goes into [2] which dispatches it right away. So pretty much any time we get a repaint request from APZ we're going to fire a scroll event. I'm not sure what the intended behaviour here is, but I'll poke around in the history.

[1] https://dxr.mozilla.org/mozilla-central/rev/f07e71078bc8991f74c2101944c8f869c77f442a/layout/generic/nsGfxScrollFrame.cpp#4254
[2] https://dxr.mozilla.org/mozilla-central/rev/f07e71078bc8991f74c2101944c8f869c77f442a/layout/base/nsPresContext.cpp#3279
Attached patch WIP (obsolete) — Splinter Review
The immediate dispatch was added in bug 696248 but I think it might not be needed any more now that we have vsync-aligned painting. The attached WIP seems to remove the double-scroll and also makes the scroll event fire earlier in the vsync cycle, which is what we want.
Unfortunately it's still really easy to make the red part of the test case float out of the bullseye, at least on OS X with trackpad scrolling.
Attached patch Patch (obsolete) — Splinter Review
Try push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=9755fc81e546
Assignee: nobody → bugmail.mozilla
Attachment #8698051 - Attachment is obsolete: true
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #11)
> Unfortunately it's still really easy to make the red part of the test case
> float out of the bullseye, at least on OS X with trackpad scrolling.

With dvander's patches from bug 1192919 this is greatly improved; the lag is a lot less.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #12)
> Try push at
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=9755fc81e546

Seems like there are some test failures I'll need to investigate, though.
The test failures appear to be because the AddWillPaintObserver function is also called from somewhere else, and there's a test that relies on the thing getting dispatched sooner than the next paint. It's probably a legitimate requirement in that case so I'm trying to rework the patch a bit.
So I tried a bunch of things. First I tried only changing the scroll event (and not the overflow event, which also uses AddWillPaintObserver) so that it doesn't do a NS_DispatchToMainThread. That still caused the browser/components/sessionstore/test/browser_scrollPositions.js test to fail. The test is loading a page in a background tab and calling the scrollTo function on it. The test (rightfully, I guess) expects a scroll event to be fired, but because the tab is in the background it doesn't paint and so it doesn't fire the scroll event.

After a discussion with Markus I agree with him that we should hook the scroll event to the refresh driver instead. He already had a patch to do that so I pushed it to try at https://treeherder.mozilla.org/#/jobs?repo=try&revision=243e96ea83ee.
The try push has a consistent timeout in browser/components/sessionstore/test/browser_scrollPositions.js but based on the log it seems to be running to completion just fine. It's also only failing on 64-bit debug non-e10s for Linux, which is odd. I can't reproduce the failure locally. I'm wondering if bumping up the timeout will fix it. There's also a failure in widget/tests/test_wheeltransaction.xul which looks possibly-related. I can repro that locally at least.
I think I figured out the test_wheeltransaction.xul failure as well, there's a couple of places where the code wasn't giving enough time to flush the scroll event. I fixed them locally and did another try push:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=33b9fc0252cd
No longer depends on: 1221069
Attached patch PatchSplinter Review
Attachment #8698064 - Attachment is obsolete: true
Attachment #8699071 - Flags: review?(roc)
Summary: Main thread scroll latency should be uniform → Main thread scroll event latency should be uniform
Comment on attachment 8699071 [details] [diff] [review]
Patch

roc is on PTO. mats, do you mind reviewing, since you were the reviewer for bug 696248?
Attachment #8699071 - Flags: review?(roc) → review?(mats)
The documentation for nsARefreshObserver says "When nothing needs to be painted, callers
may not be notified".  So why doesn't this solution have the same problem as you noted
in comment 15?
I think that documentation is out of date. My understanding is that now that we have DLBI and vsync we tick the refresh driver on vsyncs and always fire the refresh observers assuming we get through the other conditions at the top of nsRefreshDriver::Tick (and if those conditions cause a bailout then we don't need to dispatch a scroll event anyway).

I can update that documentation as part of this patch if you like.
Comment on attachment 8699071 [details] [diff] [review]
Patch

OK, then this looks fine to me.

> I can update that documentation as part of this patch if you like.

Thanks, but I'd prefer a separate patch for that and then have someone
who knows the current nsRefreshDriver internals to review it. (:bz?)
Attachment #8699071 - Flags: review?(mats) → review+
Comment on attachment 8699071 [details] [diff] [review]
Patch

roc should probably have a look at this too when he gets back,
in case he sees some problem with calling scroll handlers this
often.
Attachment #8699071 - Flags: feedback?(roc)
Thanks. I filed bug 1233558 for the documentation being out-of-date.
Filed bug 1233715 with a patch to fix the Gu failure, will reland this once that goes through.
as a note, this caused some tps and tscrollx regressions on talos- something to keep in mind when this relands.
I am sure everyone on the bug appreciates a very poorly made comment 29.  Lets get details!  :kats asked how to figure out the regressions and I have retriggered the inbound run to collect more data along with the previous revision.  Right now jobs are being scheduled and running- so data is not all here, but with some certainty we can pick out the points of concern here:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=396059a2cebe&newProject=mozilla-inbound&newRevision=698cda645acb&framework=1&showOnlyConfident=1

Keep in mind anything that shows up here is a suggestion- it could be related to a noisy test that happens to have data in two different areas of the acceptable noise range.  I will try to comment in an hour with some analysis.
results are in and despite the noise, there are 6 regressions we can get from this:
* linux64 tscrollx: https://treeherder.mozilla.org/perf.html#/graphs?timerange=604800&series=[mozilla-inbound,f58e3f07b738cd3393275d04d37b3333622e0fb7,1]&highlightedRevisions=396059a2cebe&highlightedRevisions=698cda645acb
* win7 tps: https://treeherder.mozilla.org/perf.html#/graphs?timerange=604800&series=[mozilla-inbound,280ad0c05312fb1334630d84a29da07b0ce8fd4b,1]&highlightedRevisions=396059a2cebe&highlightedRevisions=698cda645acb
* win8 tps: https://treeherder.mozilla.org/perf.html#/graphs?timerange=604800&series=[mozilla-inbound,868503082ab830f176a3a234d695f556eaea4c67,1]&highlightedRevisions=396059a2cebe&highlightedRevisions=698cda645acb
* linux64 tp5o_scroll: https://treeherder.mozilla.org/perf.html#/graphs?timerange=604800&series=[mozilla-inbound,4c4dfc86eefd577ec8f1a280c4b08d3f6f0f108a,1]&highlightedRevisions=396059a2cebe&highlightedRevisions=698cda645acb
* win7 e10s tp5o_scroll: https://treeherder.mozilla.org/perf.html#/graphs?timerange=604800&series=[mozilla-inbound,0a6714ee3af67c7918a730968318fe7d09d9f8fe,1]&highlightedRevisions=396059a2cebe&highlightedRevisions=698cda645acb
* winxp e10s tp5o_scroll: https://treeherder.mozilla.org/perf.html#/graphs?timerange=604800&series=[mozilla-inbound,e2aacca065f581cb00ffe7be5d6b4cee427b35da,1]&highlightedRevisions=396059a2cebe&highlightedRevisions=698cda645acb

if you look for those tests on the compare view link in comment 30, then you can dive into the subtests.

Please let me know if this doesn't help!
^ All of these tests look like improvements rather than regressions (i.e. they show up in green) using the link in comment 30. There are a few large-valued reds though:

* tpaint opt windows xp - the 4.81% regression seems to be largely because of a single outlier data point which brings up the average significantly.
* tresize opt linux64 - looking at the graph at https://treeherder.mozilla.org/perf.html#/graphs?timerange=604800&series=[mozilla-inbound,61f6b38023425efa219d03f69e8fb39fbec6577f,1]&highlightedRevisions=396059a2cebe&highlightedRevisions=698cda645acb the data is fairly noisy and the data points are well within the usual range of values.
* tscrollx opt e10s windows xp - looking at the graph at https://treeherder.mozilla.org/perf.html#/graphs?timerange=604800&series=[mozilla-inbound,c3eb69f0719aacca596bca0626205e4b30953034,1]&highlightedRevisions=396059a2cebe&highlightedRevisions=698cda645acb the test is clearly bimodal and the "regression" is really just sampling error
* tsvgr_opacity opt linux64 - https://treeherder.mozilla.org/perf.html#/graphs?timerange=604800&series=[mozilla-inbound,6981e256ea8173cdb53dec0741ec05e8ace13f30,1]&highlightedRevisions=396059a2cebe&highlightedRevisions=698cda645acb again the data is noisy and the new values are within the usual range
ok, looking at the data again, I see that this patch does actually expect improvements and the backout triggered the alerts and focus on this.  As for the data in the compare view which suggested some regressions, those really do look like noise.  Apologies for the confusion, and glad to see this is a win in many tests!
No worries - and yeah it's nice to see improvements for once! :) I'll reland this on b-i since that's where the fix for bug 1233715 is (it hasn't gotten merged around yet).
https://hg.mozilla.org/mozilla-central/rev/ef961a0fe82f
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
Depends on: 1234242
Depends on: 1234189
Depends on: 1238504
Depends on: 1269255
Depends on: 1326991
Attached file mobile-friendly testcase (obsolete) —
Attachment #9186288 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: