Closed Bug 1235899 Opened 8 years ago Closed 8 years ago

Facebook guest list doesn't support flings with APZ turned on

Categories

(Core :: Panning and Zooming, defect)

Unspecified
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox45 --- unaffected
firefox46 --- wontfix
firefox47 --- fixed

People

(Reporter: jrmuizel, Assigned: kats)

References

Details

Attachments

(3 files, 1 obsolete file)

Flings work with APZ off but not with it on.
Touch input? Trackpad? Platform?
Flags: needinfo?(jmuizelaar)
MacBook Pro with the track pad on OS X.
Flags: needinfo?(jmuizelaar)
I haven't tried reproducing this yet, but the "fling" stuff is momentum scrolling events sent to us from OS X. Most likely the page is doing something to cancel this but we'll need to investigate to find out what. Whatever it's doing should probably affect non-APZ codepaths equally, so it's likely a bug in our code.
Blocks: apz-desktop
OS: Unspecified → Mac OS X
Version: unspecified → Trunk
I can reproduce the fling getting aborted intermittently. There are two scenarios:
1) Keep your fingers on the trackpad, start the scroll, and then do a fling. In this case the fling works
2) Do a quick fling right away. In this case the fling doesn't work, and it may even scroll and then jump back to where you started.

What's happening is that the page has a scroll event listener that is obfuscated, but seems to be resetting scrollTop to itself. i.e. something like this:

element.onscroll = function() {
  element.scrollTop = element.scrollTop;
}

Whenever this happens, any APZ animations or momentum scrolling are (intentionally) aborted. In case (1), the scroll event fires and this code runs while the fingers are still down, so you might see a bit of a jank when it happens. However the user-driven scroll continues after that and can trigger the fling without issues. In case (2) the scroll event happens after the fingers have left the trackpad and the momentum scrolling has taken over. In this case the scroll listener aborts the momentum scrolling and can end up restoring a stale scroll offset (whatever the first scroll event had).

I expect the same behaviour to occur on other browsers that do async scrolling here and don't do any special magic for the scroll event listener. We should try to get facebook to stop clobbering the scroll offset.
Whiteboard: [apz-evangelism]
=had).
> 
> I expect the same behaviour to occur on other browsers that do async
> scrolling here and don't do any special magic for the scroll event listener.
> We should try to get facebook to stop clobbering the scroll offset.

The Safari I have doesn't seem to asynchronous scrolling of subframes (http://people.mozilla.org/~jmuizelaar/implementation-tests/apz/async-subframe.html). I expect Chrome is disabling async scrolling in this case.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #5)
> =had).
> > 
> > I expect the same behaviour to occur on other browsers that do async
> > scrolling here and don't do any special magic for the scroll event listener.
> > We should try to get facebook to stop clobbering the scroll offset.
> 
> The Safari I have doesn't seem to asynchronous scrolling of subframes
> (http://people.mozilla.org/~jmuizelaar/implementation-tests/apz/async-
> subframe.html). I expect Chrome is disabling async scrolling in this case.

Flinging works in both browsers. I tried to determine Edge's behaviour but it was especially difficult to work and so I've thus far failed.
is this happening everywhere on Facebook or in a specific usage scenario? 
Is there a URL?
There is a (recent) mailing-list for discussing issues with Facebook. I can report there, but I need a bit more step by step to reproduce the bug and describe it to engineers at Facebook.
Flags: needinfo?(bugmail.mozilla)
It's a specific usage scenario - I was seeing it in the guest list of an event that I had created. However I just tried to repro it now and I'm not seeing it any more. Jeff - can you check to see if you can still repro?
Flags: needinfo?(bugmail.mozilla) → needinfo?(jmuizelaar)
Whiteboard: [apz-evangelism] → [apz-evangelism] [needsdiagnosis]
It seems better, but it still gets stuck sometimes.
Flags: needinfo?(jmuizelaar)
Hallvord, can you take a look please?
Flags: needinfo?(hsteen)
Excuse me, where and how do you use fling gestures on Facebook to achieve what? I tried to create an event and drag friends into the guest list rather than just checking the box (why would one use a harder method than the easiest one?) but DnD doesn't seem to be enabled here.
So you don't need to create a new event for this - log in to facebook, click on the "events" link in the left sidebar, and just go to any event shown there (even if it's just a "suggested event"). On the event page, on the right side, there should be links to "X interested", "Y going", "Z invited" (where X, Y, and Z are numbers). Click on any of those, and it will pop up a dialog with a list of people. Flinging on that list (OS X trackpad) is where the buggy behaviour manifests.
I've tested this on Windows and Mac (multi-touch trackpads on both computers). I've tried to compare builds with and without APZ enabled - Windows Nightly w/APZ (defaulting to on), Windows current release which seems to have the layers.async-pan-zoom.enabled preference available but defaulting to off, Firefox dev edition on Mac with async-pan-zoom defaulting to on.

To be absolutely sure we're on the same page: a "flick" here means a short and quick two finger gesture that starts a scrolling action, and the scrolling continues for a while with decreasing momentum after the gesture itself is finished. Right?

Main observation: flicking seems to work much better on Mac than Windows overall - it's really clunky on Windows for me, no matter if APZ is on or off :-p. 

On Mac, flicking works pretty well with one consistent exception: if APZ is on and you start flicking near a "header" above a guest category ("Friends", "Others") scroll position tends to jump back to that header. I think this is the "jumps back to top"-like effect described above.

I'm trying to figure out where scrollTop is set - trying to add logging doesn't always work for some reason :-/
Flags: needinfo?(hsteen)
So, I'm not seeing scrollTop being set when this happens but it seems like returning before the second line of this snippet stops it from happening:

        k.conditionShow(this._gripper, this._needsGripper());
        k.conditionClass(this._elem, 'contentBefore', q.getTop(this._wrap) > 0);
        k.conditionClass(this._elem, 'contentAfter', !this.isScrolledToBottom());

So that line sets a class name that will make some generated content appear at the top of the list. I think that combined with the general styling and complexity causes the interrupted scrolling. I've tried to write a minimal demo, but it's not quite there - it's not failing as reliably as on Facebook. I see flicking fail sometimes but I'm never quite sure if the trackpad just didn't correctly register the two-finger gesture or if I actually triggered the bug. I'll attach it anyway, perhaps kats or someone can fill in the missing piece and make it reproduce reliably..
I found and fixed a TC bug, might work better now :)
Kats, is this of interest?
Flags: needinfo?(bugmail.mozilla)
Thanks, this test case is really helpful!

It looks like there's some bad interaction between the scroll restoration in layout and APZ. The CSS change results in the frame tree getting rebuilt, which calls SaveState()/RestoreState()/.../ReflowFinished() on the nsGfxScrollFrame. This has the effect of saving the current main-thread scroll position, destroying the frame and re-creating it at 0,0, then restoring the saved scroll position into mRestorePos, and then scrolling to it. That's effectively the same as scrollTop = scrollTop, but entirely in gecko code.

I'll try and come up with a fix and then see if that fixes the original facebook scenario as well.
Assignee: nobody → bugmail.mozilla
Flags: needinfo?(bugmail.mozilla)
Attached patch WIP to fix testcase (obsolete) — Splinter Review
I spent a lot of time thinking about this. I think the core of the problem is that in some cases we want the restored scroll position to update APZ (such as if you're going back in history, or restoring a session) and in other cases we don't (such as this case, where the frame tree is rebuilt while in the middle of a fling). I think the main distinction between the two is that in the former case, the APZ itself is getting discarded and rebuilt and in the latter case it is not.

The good news is that when the APZ is rebuilt it automatically picks up the scroll offset coming in from content at [1], so we just need to make sure that the APZ *doesn't* pick up the scroll offset in the other case. This patch does that, but it will need some heavy testing to make sure nothing else breaks because I'm not convinced I know all the cases in which this code gets invoked.

[1] https://dxr.mozilla.org/mozilla-central/rev/584870f1cbc5d060a57e147ce249f736956e2b62/gfx/layers/apz/src/AsyncPanZoomController.cpp#3283
I cleaned up the patch, hammered on it a bit locally, and wrote an APZ mochitest for it. It seems good as far as I can tell. I've pushed it to try to hammer the new mochitest a bit, will put it up for review if everything works out. But this is not an evangelism issue, so I'm removing the apz-evangelism whiteboard. Thanks again to Hallvord for boiling down the scenario into a small test case - that was super helpful!

Also while investigating this I discovered that smooth-scrolling with APZ disabled has the same problem; if a frame reconstruction happens during the scroll, the scroll gets interrupted. I'll file another bug for that because it's in different code and it's not a regression in 46, it's probably been that way since the original smooth-scroll implementation.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=8efdaedc418d
Whiteboard: [apz-evangelism] [needsdiagnosis]
Try was green, except for on Android. I reproduced that failure locally and fixed it. New try push to confirm that at https://treeherder.mozilla.org/#/jobs?repo=try&revision=207e83451728, and I'll put the patches up for review.
Attachment #8717057 - Attachment is obsolete: true
Comment on attachment 8717950 [details]
MozReview Request: Bug 1235899 - Don't allow frame reconstruction to clobber the APZ scroll offset. r?botond

https://reviewboard.mozilla.org/r/34365/#review31075

Should the commit message more generally say "Don't allow frame reconstruction to clobber the APZ scroll offset"?

::: dom/base/nsGkAtomList.h:1049
(Diff revision 1)
> +GK_ATOM(restore, "restore")

Is there a list somewhere of all the atoms used to represent scroll origins, grouped together? If not, can we make one? (My preference would be to do that by grouping the atoms in this file, but if we want/need to stick to the current everything-in-one-big-alphabetical-list, a comment somewhere will do instead.)
Attachment #8717950 - Flags: review?(botond) → review+
Attachment #8717951 - Flags: review?(botond) → review+
Comment on attachment 8717951 [details]
MozReview Request: Bug 1235899 - Add a test for frame reconstruction during an APZ scroll animation. r=botond

https://reviewboard.mozilla.org/r/34367/#review31079

::: gfx/layers/apz/test/mochitest/test_frame_reconstruction.html:16
(Diff revision 1)
> +        direction: ltr;

Why does this need to be specified?
(In reply to Botond Ballo [:botond] from comment #24)
> Should the commit message more generally say "Don't allow frame
> reconstruction to clobber the APZ scroll offset"?

Sure, I can update that.

> ::: dom/base/nsGkAtomList.h:1049
> (Diff revision 1)
> > +GK_ATOM(restore, "restore")
> 
> Is there a list somewhere of all the atoms used to represent scroll origins,
> grouped together? If not, can we make one? (My preference would be to do
> that by grouping the atoms in this file, but if we want/need to stick to the
> current everything-in-one-big-alphabetical-list, a comment somewhere will do
> instead.)

There's one for smooth-scroll origins. I can move it into that list and update the comment so it's not specific to "smooth" scrolls. In practice I would prefer if there was just one big alphabetical list in the file because it's not obvious when you open the file what the different groups are and how to find a specific group that you're interested in. I'm sure some atoms are used in multiple contexts as well. I also don't see any value provided by the grouping in the first place.

(In reply to Botond Ballo [:botond] from comment #25)
> > +        direction: ltr;
> 
> Why does this need to be specified?

Oh, I meant to take that out - it doesn't need to be there. It's left over from Hallvord's reduced testcase.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #26)
> In practice I
> would prefer if there was just one big alphabetical list in the file because
> it's not obvious when you open the file what the different groups are and
> how to find a specific group that you're interested in. I'm sure some atoms
> are used in multiple contexts as well. 

That's a reasonable argument for leaving nsGkAtoms.h alone.

How about listing the scroll origins in a comment above nsIScrollableFrame::LastScrollOrigin(), then?

> I also don't see any value provided by the grouping in the first place.

The value of listing all the scroll origins in one place *somewhere* is that if someone's trying to understand a function such as CanScrollOriginClobberApz(), one can see which origins can as well as which origins cannot, and reason about the correctness of each choice.
(In reply to Botond Ballo [:botond] from comment #27)
> > I also don't see any value provided by the grouping in the first place.
> 
> The value of listing all the scroll origins in one place *somewhere* is that
> if someone's trying to understand a function such as
> CanScrollOriginClobberApz(), one can see which origins can as well as which
> origins cannot, and reason about the correctness of each choice.

(By way of analogy, imagine that for some reason we needed to store PanZoomState in the form of "atoms" rather than an enum. You'd still want all the pan/zoom states listed together somewhere, wouldn't you?)
That analogy is apt. I think it would be better to convert the list of scroll origins to an enum rather than keeping them as atoms. AIUI the main purpose of atoms is to have a string constant that is unique so that you can compare it with ==, but in the case of scroll origins we don't really need the "string" aspect of it at all, so we might as well use an enum. I'll file a follow-up bug for that.
I filed bug 1247374 for it, and then discovered that we do actually use the string aspect of the atom. So converting to an enum doesn't work either.

Since there's a number of functions that deal with these origins I don't think it makes sense to put the comment on only LastScrollOrigin(). I'll do the grouping in the atoms list and update the comment to indicate which ones are smooth scroll vs generic scroll origins, because that also impacts which ones have prefs associated with them.
Attachment #8717950 - Attachment description: MozReview Request: Bug 1235899 - Don't allow frame reconstruction to clobber APZ flings. r?botond → MozReview Request: Bug 1235899 - Don't allow frame reconstruction to clobber the APZ scroll offset. r?botond
Comment on attachment 8717950 [details]
MozReview Request: Bug 1235899 - Don't allow frame reconstruction to clobber the APZ scroll offset. r?botond

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34365/diff/1-2/
Attachment #8717951 - Attachment description: MozReview Request: Bug 1235899 - Add a test for frame reconstruction during an APZ scroll animation. r?botond → MozReview Request: Bug 1235899 - Add a test for frame reconstruction during an APZ scroll animation. r=botond
Comment on attachment 8717951 [details]
MozReview Request: Bug 1235899 - Add a test for frame reconstruction during an APZ scroll animation. r=botond

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34367/diff/1-2/
Comment on attachment 8717950 [details]
MozReview Request: Bug 1235899 - Don't allow frame reconstruction to clobber the APZ scroll offset. r?botond

Would like a quick re-review on part 1 to make sure the changes are ok. After discovering that there are prefs associated with some origins I also added a small change to AsyncScroll::InitPreferences to be on the safe side.
Attachment #8717950 - Flags: review+ → review?(botond)
Comment on attachment 8717950 [details]
MozReview Request: Bug 1235899 - Don't allow frame reconstruction to clobber the APZ scroll offset. r?botond

https://reviewboard.mozilla.org/r/34365/#review31099

Thanks!

::: dom/base/nsGkAtomList.h:2273
(Diff revisions 1 - 2)
> +// nsIScrollableFrame and nsGfxScrollFrame). These are divided into two lists

s/nsGfxScrollFrame/ScrollFrameHelper (or nsGfxScrollFrame.cpp)

::: layout/generic/nsGfxScrollFrame.cpp:1731
(Diff revisions 1 - 2)
> +    // changes we should never have aOrigin == nsGkAtoms::restore here.

MOZ_ASSERT(aOrigin != nsGkAtoms::apz) ?
Attachment #8717950 - Flags: review?(botond) → review+
https://hg.mozilla.org/mozilla-central/rev/427fce68f67a
https://hg.mozilla.org/mozilla-central/rev/04710089fa64
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Not yet sure if I want to uplift this to 46 since it has some risk. I'll let it bake for a few days.
So far no regressions (except intermittent test failures) on nightly, but I still don't feel very comfortable uplifting this. I'm going to wontfix it for 46 and let it ride on 47.
Depends on: 1286179
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: