Closed Bug 1527182 Opened 6 years ago Closed 6 years ago

Intermittent gfx/layers/apz/test/mochitest/test_group_mouseevents.html | helper_bug1331693.html | Scrollbar drag resulted in a scroll position of 0

Categories

(Core :: Panning and Zooming, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox-esr60 --- unaffected
firefox66 --- disabled
firefox67 --- fixed
firefox68 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: kats)

References

Details

(Keywords: intermittent-failure, regression)

Attachments

(1 file)

#[markdown(off)]
Filed by: ncsoregi [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=227773485&repo=autoland

https://queue.taskcluster.net/v1/task/JOEoY5rURImpa41Hih635A/runs/0/artifacts/public/logs/live_backing.log

23:37:13 INFO - TEST-PASS | gfx/layers/apz/test/mochitest/test_group_mouseevents.html | helper_bug1326290.html | Scrollbar drag resulted in a scroll position of 275
23:37:13 INFO - TEST-PASS | gfx/layers/apz/test/mochitest/test_group_mouseevents.html | Starting subtest helper_bug1331693.html
23:37:13 INFO - Buffered messages finished
23:37:13 INFO - TEST-UNEXPECTED-FAIL | gfx/layers/apz/test/mochitest/test_group_mouseevents.html | helper_bug1331693.html | Scrollbar drag resulted in a scroll position of 0
23:37:13 INFO - SimpleTest.ok@SimpleTest/SimpleTest.js:275:7
23:37:13 INFO - spawnTest/w.ok@gfx/layers/apz/test/mochitest/apz_test_utils.js:286:18
23:37:13 INFO - test@gfx/layers/apz/test/mochitest/helper_bug1331693.html:35:3
23:37:13 INFO - driveTest@gfx/layers/apz/test/mochitest/apz_test_utils.js:456:19
23:37:13 INFO - setTimeout handler*repaintDone@gfx/layers/apz/test/mochitest/apz_test_utils.js:147:7
23:37:13 INFO - TEST-PASS | gfx/layers/apz/test/mochitest/test_group_mouseevents.html | Starting subtest helper_bug1462961.html

Looks like this started right around when we migrated the windows version from 1703 to 1803. I did retriggers on older pushes and the problem happens there too now, even though it wasn't happening before ~Feb 11, which is consistent with the theory (since the windows version bump was out-of-band from m-c).

That being said we'll probably just have to debug this in situ to see why it's failing. The test is pretty simple, just dragging a scrollbar inside a circular SVG clip.

Blocks: 1522896
Assignee: nobody → kats

Passes for me locally on windows 10 version 1809. I'll have to debug via try.

Quick update: I've been doing try pushes with logging to try and narrow this down. It seems like the scrollbar drag part of things happens fine, the repaint request is sent over to the main thread, and does get applied (i.e. geckoScrollPosition is successfully modified) but then moments later when the scroll position is read from JS it is zero. So it must be getting overwritten somehow, trying to track that down.

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=eda865548943f7c69643831f93ad07878bd39641 is the most recent try push that has logs.

After much going around in circles chasing the paint requests, it looks APZC sometimes determines that the mouse is too far from the scrollthumb, and snaps it back to the start position: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=234532073&repo=try&lineNumber=29517

Not yet sure why it determines that.

Further logging seems to indicate we randomly get a mousemove event at (0,8): https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=234758261&repo=try&lineNumber=29460

This coordinate is prior to the APZ transforms being applied, so it doesn't appear to be an APZ bug. Will try and see if it's the OS sending us this or if we're synthesizing it from somewhere in Gecko. It's odd that this failures only shows up on Windows-QR if it's a OS thing. Might be because we're running on different VMs when QR is enabled.

Yup, looks like we just get a (0,0) WM_MOUSEMOVE event in the widget code: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=234804554&repo=try&lineNumber=29493 - and then that gets translated to (0,8) by the time it gets to APZ and causes the scrollbar snapback code to trigger.

At this point I think it might just be worth tying off the investigation and put in a workaround to drop these spurious events somehow.

Quick update here: I wrote a patch to drop the 0,0 mousemove events, did a try push, and the intermittent failure was still happening: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=a88365aae073270325e115728244532f7c823334

I think there's actually multiple causes for this failure, and the one caused by the 0,0 mousemove events is actually relatively rare. I suspect the other one(s) don't happen if the test is run in isolation, which is what I was doing while chasing it down. So now I'm running the full test_group_mouseevents to see if I can chase down the other root cause(s).

Here's one failure where it's the main thread sending over a scroll of (0,0) in the middle of the scrollbar drag: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=235258553&repo=try&lineNumber=8136

Debugging sources of main-thread scroll requests in general is kind of annoying. I'm going to see if I can write a patch to make this easier.

I chased this down. Seems like the mousedown event goes to the scrollbar mediator which calls StartDrag which tries to determine if the scrollframe is APZ-scrollable. That check apparently fails here, so nsSliderFrame leaves the mScrollingWithAPZ flag as false. APZ does however manage to async scroll the scrollframe (presumably it gets layerized right after the previous step) and so both the main thread and APZ are concurrently scrolling the scrollframe, which is bad news. We do have a codepath that's supposed to prevent this from happening - the mAPZDragInitiated variable gets set to the APZ input block id, but then the corresponding bit that's supposed to stop the main-thread from handling it doesn't trigger because the mousedown event already happened and this code is conditional on the event being a mousedown.

I think there's no particular reason that the code there should be conditional on the event being a mousedown; we should allow it for mousemoves as well. That will allow a more seamless handoff of thumb scrolling from the main thread to APZ even in weird cases like this one.

Also note that the test itself acknowledges that the scrollframe is not layerized to begin with, so I don't think that a fix along the lines of "force the scrollframe to be layerized in the test" is the right answer here.

After some more code archaelogy, the bug that this test is testing (bug 1331693) actually added that layerization check exit path, knowing that it would be taken, and expected the scrollframe to be scrolled by the main thread. What's happening here is that with WebRender we are still able to scroll the scrollframe with APZ even though it's inside the filter. So the "keep these two bits of code in sync" kind of fell apart. So maybe the right solution here is to just bring them back into sync.

This is an actual bug that affects production code - when loading the test page in FF with WR enabled, dragging the scrollbar causes APZ and the main thread to fight over scrolling. So yeah, bringing the bits of code back in sync seems to be the right solution.

Pushed by kgupta@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c3931d5145ef With WR enabled, we can async scroll even inside SVG filters. r=botond

This is probably worth uplifting to 67 for the WR MVP release after a little baking. It's a small fix for glitchy behaviour when dragging the scrollbar if the scrollframe is inside an SVG filter.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68

Comment on attachment 9053555 [details]
Bug 1527182 - With WR enabled, we can async scroll even inside SVG filters. r?botond

Beta/Release Uplift Approval Request

  • Feature/Bug causing the regression: webrender
  • User impact if declined: Dragging scrollbars for scrollframes inside SVG filters can result in glitchy scrolling
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: none
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Pretty small fix; two pieces of related code got out of sync. This brings them back in sync
  • String changes made/needed:
Attachment #9053555 - Flags: approval-mozilla-beta?

Comment on attachment 9053555 [details]
Bug 1527182 - With WR enabled, we can async scroll even inside SVG filters. r?botond

Fix for an intermittent failure and a regression when WR is enabled, uplift approved for 67 beta 8 since we will have our first WR release in 67. Thanks.

Attachment #9053555 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Regressions: 1550510
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: