Closed Bug 1651027 Opened 4 years ago Closed 4 years ago

Intermittent gfx/layers/apz/test/mochitest/test_group_touchevents-3.html | helper_touch_action_regions.html | Touchmove processed in chrome process

Categories

(Core :: Panning and Zooming, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla80
Tracking Status
firefox80 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: kats)

References

Details

(Keywords: intermittent-failure)

Attachments

(2 files)

This is an intermittent failure that I'm seeing on Android mochitests with my patches for bug 1621740. I can reproduce it locally, but very rarely. Usually takes 2-3 or runs with --verify before it shows up. I'm not convinced it's actually related to my patches, it's probably pre-existing but somehow the frequency might be increased by those patches.

The test dispatches a series of touchmove events here that drive APZ scrolling. The touchmove events are 10 pixels apart when dispatched. However in some cases as these touch events are processed, the apzc-to-gecko transform seems to glitch from a 10-pixel translation to the identity transform and back. And when that happens, the untransformed event sent to Gecko might end up "duplicated". i.e. the coordinates of the touchmove events sent from the android widget to gecko looks like this:

18,88
18,78
18,68
18,48 <-- glitch
18,48
18,38

When this happens, Gecko throws away the duplicated touchmove without dispatching it to content, which causes the counter checked for to lag by 1 from what the test expects. So that causes all the remaining "Touchmove processed in chrome process" checks to fail.

I'm guessing the glitch happens because we the input event handling happens on the java UI thread and is non-atomic (i.e. we don't hold a lock across the APZC handling of the event and the APZCTM untransformation) so if we get other updates from gecko happening in between (on the updater thread) it can change the transforms.

I don't yet know how concerning this is but I'm filing this anyway to track the problem.

Seems to be happening pretty consistently on try, with my patch for bug 1651050 applied.

Blocks: 1651050

So I think this is happening because on Android the controller thread is not the same as the repaint thread. After the touchmove event triggers a scroll, the APZC calls RequestContentRepaint from the controller thread, but it gets bounced to the repaint thread here. This sets up a race condition between mExpectedGeckoMetrics getting updated on the repaint thread here and being read on the controller thread here. If the repaint thread does the update first then things come out correctly, and if the controller thread computes the apzc-to-gecko transform first, then it produces the glitch.

This seems like a legitimate bug in the code that we should fix. I still need to confirm this theory though.

Theory seems confirmed. I have patches to fix, by updating mExpectedGeckoMetrics synchronously before bouncing the repaint request to the repaint thread. This ensures the apzc-to-gecko transform is not subject to a race condition.

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=21848c687cded813440664a1fd0d1498ab6ba77b

We only use three fields from this so keeping a whole FrameMetrics around seems
heavyweight. We could even reduce it two fields by combining the zoom with the
devPixelsPerCSSPixel ratio but it seems a bit cleaner and more consistent to
do the division at the use site.

Assignee: nobody → kats
Status: NEW → ASSIGNED

When we request a content repaint from the controller thread, we might then
subsequently compute an apzc-to-gecko transform that expects the
mExpectedGeckoMetrics quantity to have been updated to reflect the content
repaint. However, if the content repaint got bounced to the repaint thread, that
may not have happened, if the repaint thread hasn't run yet. This is a data
race that can cause glitching in the apzc-to-gecko transform computation. This
patch addresses it by also synchronously updating the mExpectedGeckoMetrics
synchronously before bouncing to the repaint thread. The repaint thread will
re-update the mExpectedGeckoMetrics which should generally be a no-op, unless
other updates snuck in between, in which case we do want to do that re-update.

Depends on D82761

Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9237a6c459eb
Create a new trimmed-down class to store the expected gecko metrics. r=botond
https://hg.mozilla.org/integration/autoland/rev/52b2fa0697c8
Mitigate race condition caused by updating mExpectedGeckoMetrics on the repaint thread. r=botond
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla80
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: