Closed Bug 1503029 Opened 6 years ago Closed 6 years ago

Intermittent gfx/layers/apz/test/mochitest/test_group_mouseevents.html | helper_scrollbar_snap_bug1501062.html | Scrolled to 0

Categories

(Core :: Panning and Zooming, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla65
Tracking Status
firefox65 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: kats)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Filed by: ncsoregi [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=208468181&repo=mozilla-inbound

https://queue.taskcluster.net/v1/task/C0PH801ATHu7Bwn1ijRlFg/runs/0/artifacts/public/logs/live_backing.log

[task 2018-10-29T20:30:51.813Z] 20:30:51     INFO - TEST-PASS | gfx/layers/apz/test/mochitest/test_group_mouseevents.html | helper_bug1462961.html | Scrollbar drag resulted in a scroll position of 52 
[task 2018-10-29T20:30:51.814Z] 20:30:51     INFO - TEST-PASS | gfx/layers/apz/test/mochitest/test_group_mouseevents.html | Starting subtest helper_scrollbar_snap_bug1501062.html 
[task 2018-10-29T20:30:51.815Z] 20:30:51     INFO - must wait for load
[task 2018-10-29T20:30:51.817Z] 20:30:51     INFO - Buffered messages finished
[task 2018-10-29T20:30:51.818Z] 20:30:51     INFO - TEST-UNEXPECTED-FAIL | gfx/layers/apz/test/mochitest/test_group_mouseevents.html | helper_scrollbar_snap_bug1501062.html | Scrolled to 0 
[task 2018-10-29T20:30:51.819Z] 20:30:51     INFO -     spawnTest/w.ok@gfx/layers/apz/test/mochitest/apz_test_utils.js:266:52
[task 2018-10-29T20:30:51.820Z] 20:30:51     INFO -     test@gfx/layers/apz/test/mochitest/helper_scrollbar_snap_bug1501062.html:56:3
[task 2018-10-29T20:30:51.821Z] 20:30:51     INFO -     driveTest@gfx/layers/apz/test/mochitest/apz_test_utils.js:430:19
[task 2018-10-29T20:30:51.821Z] 20:30:51     INFO -     setTimeout handler*test/<@gfx/layers/apz/test/mochitest/helper_scrollbar_snap_bug1501062.html:38:50
[task 2018-10-29T20:30:51.822Z] 20:30:51     INFO -     EventListener.handleEvent*test@gfx/layers/apz/test/mochitest/helper_scrollbar_snap_bug1501062.html:38:3
[task 2018-10-29T20:30:51.827Z] 20:30:51     INFO -     driveTest@gfx/layers/apz/test/mochitest/apz_test_utils.js:430:19
[task 2018-10-29T20:30:51.828Z] 20:30:51     INFO -     runContinuation/</<@gfx/layers/apz/test/mochitest/apz_test_utils.js:436:7
[task 2018-10-29T20:30:51.829Z] 20:30:51     INFO -     runContinuation/<@gfx/layers/apz/test/mochitest/apz_test_utils.js:423:12
[task 2018-10-29T20:30:51.830Z] 20:30:51     INFO -     promise callback*@gfx/layers/apz/test/mochitest/helper_scrollbar_snap_bug1501062.html:95:2
[task 2018-10-29T20:30:51.831Z] 20:30:51     INFO - TEST-PASS | gfx/layers/apz/test/mochitest/test_group_mouseevents.html | helper_scrollbar_snap_bug1501062.html | Scroll position increased to 649 
[task 2018-10-29T20:30:51.832Z] 20:30:51     INFO - Not taking screenshot here: see the one that was previously logged
[task 2018-10-29T20:30:51.833Z] 20:30:51     INFO - TEST-UNEXPECTED-FAIL | gfx/layers/apz/test/mochitest/test_group_mouseevents.html | helper_scrollbar_snap_bug1501062.html | Scroll position was restored to 649 
[task 2018-10-29T20:30:51.834Z] 20:30:51     INFO -     spawnTest/w.ok@gfx/layers/apz/test/mochitest/apz_test_utils.js:266:52
[task 2018-10-29T20:30:51.835Z] 20:30:51     INFO -     test@gfx/layers/apz/test/mochitest/helper_scrollbar_snap_bug1501062.html:84:3
[task 2018-10-29T20:30:51.836Z] 20:30:51     INFO -     driveTest@gfx/layers/apz/test/mochitest/apz_test_utils.js:430:19
[task 2018-10-29T20:30:51.837Z] 20:30:51     INFO -     setTimeout handler*test/<@gfx/layers/apz/test/mochitest/helper_scrollbar_snap_bug1501062.html:79:50
[task 2018-10-29T20:30:51.838Z] 20:30:51     INFO -     EventListener.handleEvent*test@gfx/layers/apz/test/mochitest/helper_scrollbar_snap_bug1501062.html:79:3
[task 2018-10-29T20:30:51.839Z] 20:30:51     INFO -     driveTest@gfx/layers/apz/test/mochitest/apz_test_utils.js:430:19
[task 2018-10-29T20:30:51.840Z] 20:30:51     INFO - GECKO(2491) | Flushed APZ repaints, waiting for callback...
[task 2018-10-29T20:30:51.842Z] 20:30:51     INFO - Not taking screenshot here: see the one that was previously logged
[task 2018-10-29T20:30:51.843Z] 20:30:51     INFO - TEST-UNEXPECTED-FAIL | gfx/layers/apz/test/mochitest/test_group_mouseevents.html | helper_scrollbar_snap_bug1501062.html | Final scroll position was 649 
[task 2018-10-29T20:30:51.844Z] 20:30:51     INFO -     spawnTest/w.ok@gfx/layers/apz/test/mochitest/apz_test_utils.js:266:52
[task 2018-10-29T20:30:51.845Z] 20:30:51     INFO -     test@gfx/layers/apz/test/mochitest/helper_scrollbar_snap_bug1501062.html:91:3
[task 2018-10-29T20:30:51.846Z] 20:30:51     INFO -     driveTest@gfx/layers/apz/test/mochitest/apz_test_utils.js:430:19
[task 2018-10-29T20:30:51.846Z] 20:30:51     INFO -     setTimeout handler*repaintDone@gfx/layers/apz/test/mochitest/apz_test_utils.js:131:7
[task 2018-10-29T20:30:51.848Z] 20:30:51     INFO - GECKO(2491) | MEMORY STAT | vsize 1484MB | residentFast 146MB | heapAllocated 19MB
[task 2018-10-29T20:30:51.849Z] 20:30:51     INFO - TEST-OK | gfx/layers/apz/test/mochitest/test_group_mouseevents.html | took 9713ms
Assignee: nobody → kats
Blocks: 1501062
Reproduced this on try with additional logging:

https://treeherder.mozilla.org/logviewer.html#?job_id=209422209&repo=try&lineNumber=1931

This seems to indicate the gfxScrollFrame scrolls down but then scrolls back to the 0 scroll offset before the scroll event is actually fired. Need more logging to determine why.
https://treeherder.mozilla.org/logviewer.html#?job_id=209818334&repo=try&lineNumber=2010

The scroll back to 0 is coming from the main-thread handling of the thumb, it seems? Need to figure out why and eliminate this.
Hm. So what's happening is that the APZ scroll races the main-thread handling of the mousedown event. If the APZ scroll wins, the main thread handling of the mousedown treats it as a "scroll-to-click" at [1] which resets the scroll position back to zero. Seems like a real bug, I'm surprised we haven't hit this before...

[1] https://searchfox.org/mozilla-central/rev/efc0d9172cb6a5849c6c4fc0f19d7fd5a2da9643/layout/xul/nsSliderFrame.cpp#606
After poking through the code there's a combination of factors here. What I think is happening is:

1) mousedown goes through APZ code. since we have APZ drag-initiation enabled [1], we immediately go into scrollbar dragging. That's the first requirement
2) mouse move goes through APZ code and triggers a scroll. the repaint request from the scroll gets dispatched to the content process main thread via the GeckoContentController, which is a separate codepath than the mousedown from step 1. So the second requirement here is that the repaint request is processed by the content process before the mousedown event.
3) main thread processes the repaint request, moves the scroll position
4) main thread processes the mousedown event. if this mousedown event landed on the scrollbar thumb itself, then the check at [2] would make it *not* trigger the scroll-to-click codepath. But because the scroll position was advanced in step (3), the mousedown event lands on where the thumb used to be, and triggers scroll-to-click. This resets the scroll position to 0
5) the next mousemove event will restore APZ's scroll position

The time between steps (4) and (5) is a glitch. How long the glitch lasts for depends on timing factors. In many cases the glitch has no effects, but I added an NS_ASSERTION when it happens, and it hits relatively frequently (maybe 25% of the time). For the test to fail, there's an additional condition that the nsGfxScrollFrame has to fire a scroll event during the glitch, and the test records the glitched scroll position.

This is technically a real bug that can be hit by users. The visual effect would be of the scrollthumb glitching at the start of a scroll drag. Beyond that I don't think it produces any persistently-bad state. Still something we should fix.

I'm thinking that we need to somehow flag the mousedown event as having started an async drag, so that we ignore it on the main thread in the scroll-to-click codepath. The mHandledByAPZ flag on MouseEvent is already set to true for all mouse events, and used for different checks, so I'm not sure if we can still use that. We might be able to, if we more narrowly set the mHandledByAPZ flag to true only when the mouse event is actually part of a drag block. Another option is the nsEventStatus return code from APZCTreeManager::ReceiveInputEvent. Currently we don't propagate that from TabParent [3] to TabChild but we could, and then plumb it through all the mouse event coalescing goop in TabChild, and propagate it via the InputAPZContext to the slider frame mousedown handling code. Or we could add another flag on the MouseEvent for this purpose.

[1] https://searchfox.org/mozilla-central/rev/7c848ac7630df5baf1314b0c03e015683599efb9/gfx/layers/apz/src/APZCTreeManager.cpp#1838
[2] https://searchfox.org/mozilla-central/rev/7c848ac7630df5baf1314b0c03e015683599efb9/layout/xul/nsSliderFrame.cpp#1273-1275
[3] https://searchfox.org/mozilla-central/rev/efc0d9172cb6a5849c6c4fc0f19d7fd5a2da9643/dom/ipc/TabParent.cpp#1103 (third arg)
As an alternative, could we ensure that the mousedown and the repaint reach the main thread in the correct order?
I feel like that approach is more risky because there are many codepaths that can trigger a repaint in APZ, and we'd have to make sure all of them are delayed appropriately, most likely by changing how they are sent to the content process.
https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=2b59e8abe3c5ddf02f54a67ef7da5246aebc9cab shows the patch not introducing any new failures. However it doesn't fix the other intermittents on this test.
Actually this patch breaks scroll-to-click. I guess we have zero tests exercising that code...
After much fiddling I think I have a fix that works, and works in the way I want it to (I had a few fixes that worked but only by accident):

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=184d31d18a486a59592c8f97adf3a84e727625cd

Basically this sends a message from APZ to the nsSliderFrame along the same codepath that the repaint request flows (and therefore is guaranteed to arrive ahead of the repaint request). The message tells the nsSliderFrame that an APZ drag was started, and so if the slider frame gets the touchdown event corresponding to that drag /after/ the message, it knows to "do the right thing" (i.e. enter an APZ-is-dragging state). In the case where the touchdown event wins the race and arrives first, the message is effectively a no-op. This works by comparing the input block id.

Will put the patch up for review once the try push is done.
This adds a notification from APZ to the scrollbar's slider frame to
inform it of APZ starting an async scrollbar drag. This is useful
because APZ can start a scrollbar drag and even change the scroll
position before the scrollbar frame even handles the mousedown event. In
such a case, the mousedown can land on where the scrollthumb *used to
be* before it was dragged away. This can result in scroll-to-click
behavior getting triggered and the scrollthumb glitching.

With this patch, the new notification follows the same path as the
request-repaint message, and so is guaranteed to arrive at the scrollbar
before any request-repaint messages. It sets some state that can be used
to correct the behaviour described above.
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b446016b76f0
Ensure the slider frame is notified of APZ drag initiation. r=botond
https://hg.mozilla.org/mozilla-central/rev/b446016b76f0
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Depends on: 1525450
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: