1280667 - Dragging fake scrollbar on Facebook jumps around wildly

Reporter

Description

•

8 years ago

Facebook has a custom (fake) scrollbar in the mini newsfeed in the top right corner. Dragging that scrollbar shows behavior similar to the one reported in bug 1258851.

Markus Stange [:mstange]

Reporter

Updated

•

8 years ago

Keywords: regression

Milan Sreckovic [:milan] (needinfo for best results)

Comment 1

•

8 years ago

Regression from non-APZ, or did it work at some point with APZ and stopped?

Version: Trunk → 48 Branch

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 2

•

8 years ago

I can repro this on 48 as well if I click on the notifications icon on the FB top-toolbar and then drag the scrollbar inside to scroll through the notifications.

status-firefox48: --- → affected

status-firefox49: --- → affected

Priority: -- → P1

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 3

•

8 years ago

Turning off paint-skipping doesn't help here either.

Milan Sreckovic [:milan] (needinfo for best results)

Updated

•

8 years ago

Whiteboard: [gfx-noted]

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 4

•

8 years ago

I'm starting investigation on this one.

Assignee: nobody → bugmail.mozilla

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 5

•

8 years ago

Reduced test case.

URL: http://people.mozilla.org/~kgupta/bug...

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 6

•

8 years ago

Order of events in one example: Main thread scrolls to y=3319 Main thread scrolls to y=3622 APZ gets layers update with y=3319 APZ sends repaint request with y=3319 Main thread rejects repaint request, sets callback transform y=-303 Input event goes through APZ (no untransform) and main thread (applies -303 callback transform) The callback transform produces a discontinuity in the screenY/clientY coordinates of the event which throws off the "fake scrollbar" scrolling, and introduces further discontinuities and diverges. This is basically the same problem we had with scrollbars in bug 1244549 and bug 1258851, the same thing botond is trying to fix in bug 1259781. I have a simple patch that extends the scrollbar hack to cover fixed-pos items as well which solves the problem on my reduced test case and Facebook. I think it should be safe enough to uplift to 48, whereas any other "more correct" solutions will carry more risk. That being said I'm starting to wonder if we'll want to do the clone-refpoint thing I suggested in bug 1259781 comment 9 regardless. That would allow us to drop the scrollbar hack and move incrementally towards a more correct solution.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 7

•

8 years ago

I wrote the hacky fix and wrote a test to go with it, try push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=933878c938c1 I'll see if I can write a better fix with the clone-refpoint method.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 8

•

8 years ago

I have a try push with a WIP for the other method here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=584b59e57580

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 9

•

8 years ago

New try push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=55feac48ef58&group_state=expanded&selectedJob=22881553 looking better, there's still one test failure which I think I can fix easily. I spent a while trying to come up with some mental model to decide which of these approaches is more correct. One thing I realized is that if the APZ scroll position is different from the main-thread scroll position, then there's going to be some inconsistencies exposed to web content, and there's no way to avoid that. Consider the case where we have a page is scrolled to y=0 on the MT and y=100 in APZ. The user does a mousedown at y=100 on the screen, which is visually y=200 on the page because of the APZ scroll offset. Now when we dispatch that event to web content, assuming the event has a single refPoint (which is the case in trunk gecko), the pageY and clientY values on the event will be the same, because they are both computed from the same refpoint and the MT scroll offset is 0. In this case we will send a refpoint with y=100 from widget to APZ, APZ will untransform that to a refpoint of y=200 (because of the transient async transform), and so web content will report pageY of 200 and clientY of 200. However in reality the clientY should be 100 (distance from mouse position to top of client area, as per spec). In the general case, if the APZ scroll offset is different from the MT scroll offset, the pageY and clientY values on the event cannot both be correct as long as we are computing them from a single refpoint (I hereafter call this "the APZ dilemma"). A different aspect of the dilemma is that in the non-APZ world web content could generally assume that e.clientY + window.scrollY == e.pageY, with all the values being correct per spec, but this assumption no longer holds with APZ. In current gecko trunk, the e.clientY value comes out incorrect even though we maintain the equality. My patch tries to resolve the APZ dilemma by splitting the single refpoint into two, so that we can have correct (per spec) pageY and clientY values. However, it breaks the equality. There are also inconsistencies with other properties. For example element.getBoundingClientRect() is computed on the MT using the MT scroll position. If we change e.clientY to be visually correct, it will be using the APZ scroll position. That means web content implementing hit-testing by comparing e.clientY and element.getBoundingClientRect() will fail, and I suspect that's a reasonably common thing that happens on the web. So I'm not sure going down this road is worthwhile, as it just exchanges one can of worms for another. Need to think about this more still.

Botond Ballo [:botond]

Comment 10

•

8 years ago

Up until recently, my understanding was that an APZ callback transform arose in two situations: 1) When panning around within the initial scroll port of a zoomed-in overflow:hidden root scroll frame. 2) When Gecko introduces a fractional difference between the APZ scroll position and the Gecko scroll position by rounding to an integer number of Layer pixels. In case (1), we can probably live with the "APZ dilemma", because the whole scenario is a bit of an edge case (it only happens on mobile, and even then only on sites that are not mobile-friendly and need to be zoomed in). In case (2), the issue shouldn't be noticeable. However, recently - in bug 1259781 and this bug - we seem to have encountered a third category of situations, where we sometimes have a large callback transform in spite of not being in the edge-case situation (1). I confess I don't fully understand yet the mechanism by which the callback transform arises in these cases, but I wanted to suggest one avenue of exploration: instead of solving the "APZ dilemma" for this case, could we avoid it by modifying our scroll offset synchronization mechanism in some way that prevents the callback transform from arising in the first place?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 11

•

8 years ago

The two situations you describe are the only ones in which we have a "steady state" callback transform. We can get transient callback transforms winking in and out of existence in other scenarios, such as when the APZ sends a repaint request while the main thread has just changed the scroll position. That is the case here. The callback transform is transient but high-frequency during scrollbar dragging or the scenario in this bug, and if an event goes through while the callback transform is in place the clientX/Y on the event comes out wrong.

Botond Ballo [:botond]

Comment 12

•

8 years ago

Can we detect the difference between a steady-state callback transform and a transient one? If so, I wonder what would happen if we just ignored the transient one, i.e. never applied it.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 13

•

8 years ago

That sounds equivalent to just never setting a transient callback transform in the first place. I could probably construct a case where that would regress something, but I don't know how often that happens in practice.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 14

•

8 years ago

I gave this some more thought, and I think your suggestion might be the best way forward. For one thing, we already have scenarios where there's a race condition between user input events and content changing the scroll position. i.e. consider if the user is clicking on something at the same time as a setTimeout is changing the scroll position. The mouse events could arrive just at the content process before or just after the scroll position change, and so it might click at the old or the new scroll position. The existence of the transient callback transform just changes the click from arriving at the "new" position to arriving at the "old" position, but fundamentally doesn't eliminate the race. So if we got rid of it, the overall behaviour should be no worse. In terms of the APZ dilemma formulation, we would be changing the behaviour from having incorrect e.clientY and correct e.pageY to the reverse - a correct e.clientY and an incorrect e.pageY. While this sounds bad, the "incorrect e.pageY" scenario can already happen in the racy condition described in the previous paragraph, so not really a big deal.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 15

•

8 years ago

Attached file Bug 1280667 - Don't update the callback transform if the APZ repaint request was rejected by a concurrent main-thread scroll position update. — Details

Review commit: https://reviewboard.mozilla.org/r/60848/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/60848/

Attachment #8765497 - Flags: review?(botond)

Attachment #8765498 - Flags: review?(botond)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 16

•

8 years ago

Attached file Bug 1280667 - Add a test. — Details

Review commit: https://reviewboard.mozilla.org/r/60850/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/60850/

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 17

•

8 years ago

I'm not super happy with the test because it's racy. Without the patch it's failed every single time I've run it, but in theory the test might still pass if the stars align just so. With the patch the test should never fail. So in theory if we regress this behaviour, the test might start failing intermittently and it might be hard to pin down. But in my local runs I've gotten a 100% failure rate so hopefully even on try the intermittent rate will be high enough. I wasn't able to come up with a way to isolate the behaviour in question with our existing APIs exposed to testing infrastructure, and considering this is a P1 that needs uplifting to beta I thought it was better to get it landed in this form now.

Liz Henry (:lizzard) (relman/hg->git project)

Comment 18

•

8 years ago

Once this lands let's verify the fix - then please request uplift.

tracking-firefox48: --- → +

tracking-firefox49: --- → +

tracking-firefox50: --- → +

Flags: needinfo?(andrei.vaida)

Botond Ballo [:botond]

Comment 19

•

8 years ago

Comment on attachment 8765497 [details] Bug 1280667 - Don't update the callback transform if the APZ repaint request was rejected by a concurrent main-thread scroll position update. https://reviewboard.mozilla.org/r/60848/#review57938 Nice! I assume this fixes the STR of bug 1259781, too?

Attachment #8765497 - Flags: review?(botond) → review+

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 20

•

8 years ago

(In reply to Botond Ballo [:botond] from comment #19) > Nice! I assume this fixes the STR of bug 1259781, too? It should in theory, but I haven't actually tested it. Bug 1259781 was pretty hard for me to consistently reproduce so it would be hard for me to verify with any degree of certainty.

Botond Ballo [:botond]

Comment 21

•

8 years ago

Comment on attachment 8765498 [details] Bug 1280667 - Add a test. https://reviewboard.mozilla.org/r/60850/#review57940

Attachment #8765498 - Flags: review?(botond) → review+

Botond Ballo [:botond]

Comment 22

•

8 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #20) > (In reply to Botond Ballo [:botond] from comment #19) > > Nice! I assume this fixes the STR of bug 1259781, too? > > It should in theory, but I haven't actually tested it. Bug 1259781 was > pretty hard for me to consistently reproduce so it would be hard for me to > verify with any degree of certainty. I found the STR from bug 1259781 comment 10 to be fairly reliable; I'll test it when I get into the office tomorrow.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 23

•

8 years ago

I autolanded this, apparently Pulsebot isn't watching the autoland integration tree. https://hg.mozilla.org/integration/autoland/rev/06685c0eba058f37bd251a378c8f32bdccba325c https://hg.mozilla.org/integration/autoland/rev/a5f7ad06f03b9ae9ea2ab8e06f89ef67e413ede9

Carsten Book [:Tomcat]

Comment 24

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/06685c0eba05 https://hg.mozilla.org/mozilla-central/rev/a5f7ad06f03b

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox50: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla50

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 25

•

8 years ago

Comment on attachment 8765497 [details] Bug 1280667 - Don't update the callback transform if the APZ repaint request was rejected by a concurrent main-thread scroll position update. Approval Request Comment [Feature/regressing bug #]: APZ, probably regressed by bug 1242690 but didn't verify that [User impact if declined]: The fake scrollbars that Facebook has in some places (they're not platform scrollbars, they are reimplemented in HTML) don't behave properly when dragged with the mouse. [Describe test coverage new/current, TreeHerder]: tested locally on Facebook, as well as added automated test for it [Risks and why]: some risk, the code is a little tricky and may introduce regressions. QA cycles on this and/or a little bake time would probably be good. [String/UUID change made/needed]: none

Attachment #8765497 - Flags: approval-mozilla-beta?

Attachment #8765497 - Flags: approval-mozilla-aurora?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

8 years ago

Attachment #8765498 - Flags: approval-mozilla-beta?

Attachment #8765498 - Flags: approval-mozilla-aurora?

Petruta Horea [:phorea], Desktop Test Engineering (back on Sep 5th)

Comment 26

•

8 years ago

I will verify this tomorrow when the fix reaches Nightly. Currently latest Nightly 2016-06-29 doesn't have the fix.

Flags: needinfo?(andrei.vaida) → needinfo?(petruta.rasa)

QA Contact: petruta.rasa

Botond Ballo [:botond]

Comment 27

•

8 years ago

(In reply to Botond Ballo [:botond] from comment #22) > (In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #20) > > (In reply to Botond Ballo [:botond] from comment #19) > > > Nice! I assume this fixes the STR of bug 1259781, too? > > > > It should in theory, but I haven't actually tested it. Bug 1259781 was > > pretty hard for me to consistently reproduce so it would be hard for me to > > verify with any degree of certainty. > > I found the STR from bug 1259781 comment 10 to be fairly reliable; I'll test > it when I get into the office tomorrow. Yup, can't repro those STR with this fix any more! \o/

Sylvestre Ledru [:Sylvestre]

Comment 29

•

8 years ago

Comment on attachment 8765497 [details] Bug 1280667 - Don't update the callback transform if the APZ repaint request was rejected by a concurrent main-thread scroll position update. Taking it (impact a major website) in aurora to improve the coverage Beta later.

Attachment #8765497 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Sylvestre Ledru [:Sylvestre]

Updated

•

8 years ago

Attachment #8765498 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 30

•

8 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-aurora/rev/90fd7548ed20 https://hg.mozilla.org/releases/mozilla-aurora/rev/7dca72fff37b

status-firefox49: affected → fixed

Petruta Horea [:phorea], Desktop Test Engineering (back on Sep 5th)

Comment 32

•

8 years ago

Verified as fixed using latest Nightly 50.0a1 across platforms.

Status: RESOLVED → VERIFIED

status-firefox50: fixed → verified

Flags: needinfo?(petruta.rasa)

Gerry Chang [:gchang]

Comment 34

•

8 years ago

Comment on attachment 8765497 [details] Bug 1280667 - Don't update the callback transform if the APZ repaint request was rejected by a concurrent main-thread scroll position update. This patch fixes a regression. Take it in 48 beta 6.

Attachment #8765497 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Gerry Chang [:gchang]

Comment 35

•

8 years ago

Comment on attachment 8765498 [details] Bug 1280667 - Add a test. Take it in 48 beta 6.

Attachment #8765498 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Carsten Book [:Tomcat]

Comment 36

•

8 years ago

has problems to apply to beta: grafting 352141:7dca72fff37b "Bug 1280667 - Add a test. r=botond a=sylvestre" remote changed gfx/layers/apz/test/mochitest/test_group_mouseevents.html which local deleted use (c)hanged version, leave (d)eleted, or leave (u)nresolved? c merging gfx/layers/apz/test/mochitest/mochitest.ini warning: conflicts while merging gfx/layers/apz/test/mochitest/mochitest.ini! (edit, then use 'hg resolve --mark') abort: unresolved conflicts, can't continue

Flags: needinfo?(bugmail)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 37

•

8 years ago

Uplifting the test would require uplifting bug 1264017 and a bunch of other things, so I just skipped that patch. Uplifted the fix: https://hg.mozilla.org/releases/mozilla-beta/rev/6ad50d4d6f7b

status-firefox48: affected → fixed

Flags: needinfo?(bugmail)

Cornel Ionce [:noni] [Hubs QA]

Updated

•

8 years ago

Flags: qe-verify+

Cristian Comorasu, QA [:ccomorasu], Releae Desktop QA - Please contact mboldan@mozilla.com

Comment 38

•

8 years ago

I reproduced this bug using Fx 48.0b1 build ID:20160507030302 , on OS X 10.9.5 . Confirmed the fix on Fx 48.0b6, build ID 20160706215822 and Fx 49.0a2, build ID 20160708004052 on OS X 10.9.5 .

status-firefox48: fixed → verified

status-firefox49: fixed → verified

Flags: qe-verify+

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

8 years ago

URL: http://people.mozilla.org/~kgupta/bug... → https://mozilla.staktrace.com/bug/128...

Pulsebot

Comment 39

•

7 years ago

Pushed by kgupta@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/081472be4c0e Followup to use the ok function from the testgroup, which logs the subtest name as well. r=me

Dorel Luca [:dluca]

Comment 40

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/081472be4c0e

Bug 1280667 - Don't update the callback transform if the APZ repaint request was rejected by a concurrent main-thread scroll position update. 8 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 58 bytes, text/x-review-board-request	botond : review+ Sylvestre : approval-mozilla-aurora+ gchang : approval-mozilla-beta+	Details
Bug 1280667 - Add a test. 8 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 58 bytes, text/x-review-board-request	botond : review+ Sylvestre : approval-mozilla-aurora+ gchang : approval-mozilla-beta+	Details