Intermittent gfx/layers/apz/test/mochitest/test_group_touchevents-3.html | helper_touch_action_regions.html | Touchmove processed in chrome process
Categories
(Core :: Panning and Zooming, defect, P5)
Tracking
()
Tracking | Status | |
---|---|---|
firefox80 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: kats)
References
Details
(Keywords: intermittent-failure)
Attachments
(2 files)
Filed by: kgupta [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=308734390&repo=try
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Qc6KCQCNRH2jeCxnErSgBQ/runs/0/artifacts/public/logs/live_backing.log
Assignee | ||
Comment 1•4 years ago
|
||
This is an intermittent failure that I'm seeing on Android mochitests with my patches for bug 1621740. I can reproduce it locally, but very rarely. Usually takes 2-3 or runs with --verify
before it shows up. I'm not convinced it's actually related to my patches, it's probably pre-existing but somehow the frequency might be increased by those patches.
The test dispatches a series of touchmove events here that drive APZ scrolling. The touchmove events are 10 pixels apart when dispatched. However in some cases as these touch events are processed, the apzc-to-gecko transform seems to glitch from a 10-pixel translation to the identity transform and back. And when that happens, the untransformed event sent to Gecko might end up "duplicated". i.e. the coordinates of the touchmove events sent from the android widget to gecko looks like this:
18,88
18,78
18,68
18,48 <-- glitch
18,48
18,38
When this happens, Gecko throws away the duplicated touchmove without dispatching it to content, which causes the counter checked for to lag by 1 from what the test expects. So that causes all the remaining "Touchmove processed in chrome process" checks to fail.
I'm guessing the glitch happens because we the input event handling happens on the java UI thread and is non-atomic (i.e. we don't hold a lock across the APZC handling of the event and the APZCTM untransformation) so if we get other updates from gecko happening in between (on the updater thread) it can change the transforms.
I don't yet know how concerning this is but I'm filing this anyway to track the problem.
Assignee | ||
Comment 2•4 years ago
|
||
Seems to be happening pretty consistently on try, with my patch for bug 1651050 applied.
Assignee | ||
Comment 3•4 years ago
|
||
So I think this is happening because on Android the controller thread is not the same as the repaint thread. After the touchmove event triggers a scroll, the APZC calls RequestContentRepaint
from the controller thread, but it gets bounced to the repaint thread here. This sets up a race condition between mExpectedGeckoMetrics
getting updated on the repaint thread here and being read on the controller thread here. If the repaint thread does the update first then things come out correctly, and if the controller thread computes the apzc-to-gecko transform first, then it produces the glitch.
This seems like a legitimate bug in the code that we should fix. I still need to confirm this theory though.
Assignee | ||
Comment 4•4 years ago
|
||
Theory seems confirmed. I have patches to fix, by updating mExpectedGeckoMetrics
synchronously before bouncing the repaint request to the repaint thread. This ensures the apzc-to-gecko transform is not subject to a race condition.
Assignee | ||
Comment 5•4 years ago
|
||
We only use three fields from this so keeping a whole FrameMetrics around seems
heavyweight. We could even reduce it two fields by combining the zoom with the
devPixelsPerCSSPixel ratio but it seems a bit cleaner and more consistent to
do the division at the use site.
Updated•4 years ago
|
Assignee | ||
Comment 6•4 years ago
|
||
When we request a content repaint from the controller thread, we might then
subsequently compute an apzc-to-gecko transform that expects the
mExpectedGeckoMetrics quantity to have been updated to reflect the content
repaint. However, if the content repaint got bounced to the repaint thread, that
may not have happened, if the repaint thread hasn't run yet. This is a data
race that can cause glitching in the apzc-to-gecko transform computation. This
patch addresses it by also synchronously updating the mExpectedGeckoMetrics
synchronously before bouncing to the repaint thread. The repaint thread will
re-update the mExpectedGeckoMetrics which should generally be a no-op, unless
other updates snuck in between, in which case we do want to do that re-update.
Depends on D82761
Pushed by kgupta@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9237a6c459eb Create a new trimmed-down class to store the expected gecko metrics. r=botond https://hg.mozilla.org/integration/autoland/rev/52b2fa0697c8 Mitigate race condition caused by updating mExpectedGeckoMetrics on the repaint thread. r=botond
Comment 8•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/9237a6c459eb
https://hg.mozilla.org/mozilla-central/rev/52b2fa0697c8
Comment hidden (Intermittent Failures Robot) |
Description
•