Intermittent TEST-UNEXPECTED-TIMEOUT | gfx/layers/apz/test/mochitest/test_group_zoom-2.html | application timed out after 370 seconds with no output
Categories
(Core :: Panning and Zooming, defect, P5)
Tracking
()
Tracking | Status | |
---|---|---|
firefox73 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: botond)
References
(Regression)
Details
(Keywords: intermittent-failure, regression)
Attachments
(1 file)
Filed by: nerli [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=279740409&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Tg6jkxBGRPmQgzI1aZ9fPA/runs/0/artifacts/public/logs/live_backing.log
Reftest URL: https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Tg6jkxBGRPmQgzI1aZ9fPA/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1
[task 2019-12-05T08:21:45.355Z] 08:21:45 INFO - TEST-START | gfx/layers/apz/test/mochitest/test_group_zoom-2.html
[task 2019-12-05T08:21:46.572Z] 08:21:46 INFO - GECKO(1123) | WaitUntilApzStable: flushed APZ repaints in parent proc, waiting for callback...
[task 2019-12-05T08:21:46.581Z] 08:21:46 INFO - GECKO(1123) | WaitUntilApzStable: APZ flush done in parent proc
[task 2019-12-05T08:21:46.582Z] 08:21:46 INFO - GECKO(1123) | helper_bug1280013.html | got MozAfterPaint: 0,0,0,0
[task 2019-12-05T08:21:46.582Z] 08:21:46 INFO - GECKO(1123) | helper_bug1280013.html | Dispatching 0 onpaint listeners
[task 2019-12-05T08:21:46.582Z] 08:21:46 INFO - GECKO(1123) | helper_bug1280013.html | WaitUntilApzStable: got apz-flush-done in child proc
[task 2019-12-05T08:21:46.590Z] 08:21:46 INFO - GECKO(1123) | helper_bug1280013.html | got MozAfterPaint: 0,0,0,0
[task 2019-12-05T08:21:46.590Z] 08:21:46 INFO - GECKO(1123) | helper_bug1280013.html | Dispatching 0 onpaint listeners
[task 2019-12-05T08:21:46.590Z] 08:21:46 INFO - GECKO(1123) | helper_bug1280013.html | WaitUntilApzStable: done promiseFocus
[task 2019-12-05T08:21:46.590Z] 08:21:46 INFO - GECKO(1123) | helper_bug1280013.html | done...
[task 2019-12-05T08:21:46.590Z] 08:21:46 INFO - GECKO(1123) | helper_bug1280013.html | WaitUntilApzStable: done promiseAllPaintsDone
[task 2019-12-05T08:21:46.590Z] 08:21:46 INFO - GECKO(1123) | helper_bug1280013.html | PromiseApzRepaintsFlushed: Flushed APZ repaints, waiting for callback...
[task 2019-12-05T08:21:46.590Z] 08:21:46 INFO - GECKO(1123) | helper_bug1280013.html | PromiseApzRepaintsFlushed: APZ flush done
[task 2019-12-05T08:21:46.590Z] 08:21:46 INFO - GECKO(1123) | helper_bug1280013.html | WaitUntilApzStable: all done
[task 2019-12-05T08:21:46.654Z] 08:21:46 INFO - GECKO(1123) | helper_bug1280013.html | got MozAfterPaint: 0,0,0,0
[task 2019-12-05T08:21:46.654Z] 08:21:46 INFO - GECKO(1123) | helper_bug1280013.html | Dispatching 0 onpaint listeners
[task 2019-12-05T08:21:46.872Z] 08:21:46 INFO - GECKO(1123) | helper_bug1280013.html | got MozAfterPaint: 0,0,0,0
[task 2019-12-05T08:21:46.872Z] 08:21:46 INFO - GECKO(1123) | helper_bug1280013.html | Dispatching 0 onpaint listeners
[task 2019-12-05T08:21:46.953Z] 08:21:46 INFO - GECKO(1123) | helper_bug1280013.html | PromiseApzRepaintsFlushed: Flushed APZ repaints, waiting for callback...
[task 2019-12-05T08:21:46.954Z] 08:21:46 INFO - GECKO(1123) | helper_bug1280013.html | PromiseApzRepaintsFlushed: APZ flush done
[task 2019-12-05T08:21:47.110Z] 08:21:47 INFO - GECKO(1123) | helper_bug1280013.html | PromiseApzRepaintsFlushed: Flushed APZ repaints, waiting for callback...
[task 2019-12-05T08:21:47.110Z] 08:21:47 INFO - GECKO(1123) | helper_bug1280013.html | PromiseApzRepaintsFlushed: APZ flush done
[task 2019-12-05T08:21:47.126Z] 08:21:47 INFO - GECKO(1123) | MEMORY STAT vsizeMaxContiguous not supported in this build configuration.
[task 2019-12-05T08:21:47.126Z] 08:21:47 INFO - GECKO(1123) | MEMORY STAT | vsize 2558MB | residentFast 167MB | heapAllocated 23MB
[task 2019-12-05T08:21:47.162Z] 08:21:47 INFO - TEST-OK | gfx/layers/apz/test/mochitest/test_group_zoom-2.html | took 1807ms
[task 2019-12-05T08:21:47.198Z] 08:21:47 INFO - TEST-START | Shutdown
[task 2019-12-05T08:21:47.199Z] 08:21:47 INFO - Passed: 9
[task 2019-12-05T08:21:47.200Z] 08:21:47 INFO - Failed: 0
[task 2019-12-05T08:21:47.201Z] 08:21:47 INFO - Todo: 0
[task 2019-12-05T08:21:47.201Z] 08:21:47 INFO - Mode: e10s
[task 2019-12-05T08:21:47.202Z] 08:21:47 INFO - Slowest: 1807ms - /tests/gfx/layers/apz/test/mochitest/test_group_zoom-2.html
[task 2019-12-05T08:21:47.202Z] 08:21:47 INFO - SimpleTest START Loop 2
[task 2019-12-05T08:21:47.203Z] 08:21:47 INFO - TEST-START | gfx/layers/apz/test/mochitest/test_group_zoom-2.html
[task 2019-12-05T08:21:47.360Z] 08:21:47 INFO - GECKO(1123) | helper_bug1280013.html | got MozAfterPaint: 0,0,0,0
[task 2019-12-05T08:21:47.361Z] 08:21:47 INFO - GECKO(1123) | helper_bug1280013.html | Dispatching 0 onpaint listeners
[task 2019-12-05T08:21:47.397Z] 08:21:47 INFO - GECKO(1123) | WaitUntilApzStable: flushed APZ repaints in parent proc, waiting for callback...
[task 2019-12-05T08:21:47.398Z] 08:21:47 INFO - GECKO(1123) | WaitUntilApzStable: APZ flush done in parent proc
[task 2019-12-05T08:21:47.398Z] 08:21:47 INFO - GECKO(1123) | helper_bug1280013.html | WaitUntilApzStable: got apz-flush-done in child proc
[task 2019-12-05T08:21:47.398Z] 08:21:47 INFO - GECKO(1123) | helper_bug1280013.html | WaitUntilApzStable: done promiseFocus
[task 2019-12-05T08:21:47.398Z] 08:21:47 INFO - GECKO(1123) | helper_bug1280013.html | done...
[task 2019-12-05T08:21:47.399Z] 08:21:47 INFO - GECKO(1123) | helper_bug1280013.html | WaitUntilApzStable: done promiseAllPaintsDone
[task 2019-12-05T08:21:47.399Z] 08:21:47 INFO - GECKO(1123) | helper_bug1280013.html | PromiseApzRepaintsFlushed: Flushed APZ repaints, waiting for callback...
[task 2019-12-05T08:21:47.399Z] 08:21:47 INFO - GECKO(1123) | helper_bug1280013.html | PromiseApzRepaintsFlushed: APZ flush done
[task 2019-12-05T08:21:47.399Z] 08:21:47 INFO - GECKO(1123) | helper_bug1280013.html | WaitUntilApzStable: all done
[task 2019-12-05T08:21:47.453Z] 08:21:47 INFO - GECKO(1123) | helper_bug1280013.html | got MozAfterPaint: 0,0,0,0
[task 2019-12-05T08:21:47.454Z] 08:21:47 INFO - GECKO(1123) | helper_bug1280013.html | Dispatching 0 onpaint listeners
[task 2019-12-05T08:21:47.711Z] 08:21:47 INFO - GECKO(1123) | helper_bug1280013.html | PromiseApzRepaintsFlushed: Flushed APZ repaints, waiting for callback...
[task 2019-12-05T08:21:47.711Z] 08:21:47 INFO - GECKO(1123) | helper_bug1280013.html | PromiseApzRepaintsFlushed: APZ flush done
[task 2019-12-05T08:21:47.772Z] 08:21:47 INFO - GECKO(1123) | helper_bug1280013.html | got MozAfterPaint: 0,0,0,0
[task 2019-12-05T08:21:47.772Z] 08:21:47 INFO - GECKO(1123) | helper_bug1280013.html | Dispatching 0 onpaint listeners
[task 2019-12-05T08:21:47.834Z] 08:21:47 INFO - GECKO(1123) | helper_bug1280013.html | PromiseApzRepaintsFlushed: Flushed APZ repaints, waiting for callback...
[task 2019-12-05T08:21:47.834Z] 08:21:47 INFO - GECKO(1123) | helper_bug1280013.html | PromiseApzRepaintsFlushed: APZ flush done
[task 2019-12-05T08:21:47.854Z] 08:21:47 INFO - GECKO(1123) | MEMORY STAT | vsize 2554MB | residentFast 161MB | heapAllocated 23MB
[task 2019-12-05T08:21:47.890Z] 08:21:47 INFO - TEST-OK | gfx/layers/apz/test/mochitest/test_group_zoom-2.html | took 694ms
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 1•3 years ago
|
||
Some logging to investigate:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=eae7ad025c21633d84bb915d1b42032d5e1eb903
Assignee | ||
Comment 2•3 years ago
|
||
The logging indicates that for this touch-drag, while the touch-end event makes it to APZCTM, it is not triggering a TransformEnd notification for some reason.
Trying to narrow it down further: https://treeherder.mozilla.org/#/jobs?repo=try&revision=90b5bb729bcf97b8bd8394a9fcb9410090d37782
Assignee | ||
Comment 3•3 years ago
|
||
The events do not make it to an APZC (not even the touch-start). Possibly they are stuck in the input queue.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e10136c25898eb873a2672a2e49acc62bc82a9be
Assignee | ||
Comment 4•3 years ago
|
||
The events are assigned a null target APZC via SetTargetAPZC
.
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 7•3 years ago
•
|
||
After a few more rounds on Try, I have a partial diagnosis:
The test performs three touch drags. The first two are intended to bring an iframe into view, and the third one is intended to drag content inside the iframe. The iframe is not initially layerized, so the third touch-drag needs to go through the dispatch-to-content / SetTargetAPZC codepath.
On the failing test runs, for some reason (I don't understand why yet) the second touch drag targets the iframe rather than the root content document. As a result, the second touch drag takes the dispatch-to-content codepath.
When the touch-start event for the second drag arrives at content, content determines the target and checks if it's layerized. Since it's not, it sets a displayport, and takes the "layers-dependent" SetTargetAPZC codepath, where it registers a post-refresh observer and sends the SetTargetAPZC notification when the observer fires. This ensures that the SetTargetAPZC notification arrives at APZ after the layer transaction which tells APZ about the scroll id of the newly layerized iframe.
Now here's the key part: the main thread is busy (or the emulator is slow etc.), and takes a while to get around to doing a paint with the new displayport and fire the observer. As a result, the third touch-drag, which also targets the iframe, not only arrives at APZ before the SetTargetAPZC message arrives (and therefore also takes the DTC codepath -- which, in and of itself, is fine), but also arrives at content before the refresh observer from the second drag fires.
At this point, the check that the iframe is layerized passes (because it's checking for the displayport property, which has by now been set), so content figures it's fine to send the SetTargetAPZC notification for the third touch-drag right away, rather than taking the layers-dependent codepath. However, the layer transaction containing the iframe's scroll id hasn't been sent yet, and therefore the SetTargetAPZC notification, mentioning the iframe's scroll id, arrives at APZ before the layer transaction that establishes that scroll id. As a result, the APZC lookup results in a null APZC, we set that as the target, and end up dropping the touch events of the third drag.
Now, this mode of failure wouldn't happen in this test if the second touch drag were targeted correctly. However, the fact that the SetTargetAPZC notification can race the layer transaction in this way in the case where two subsequent input blocks target an element that isn't initially layerized, is a latent bug that should be fixed either way.
Comment 8•3 years ago
|
||
That's one of the greatest analysis for intermittent failures I've ever seen!
Assignee | ||
Comment 9•3 years ago
|
||
Thanks :) I have a candidate fix, let's see if it resolves the intermittent:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=93c87994a7bdf998aed6c2987c66dae30bd14bc2
Assignee | ||
Comment 10•3 years ago
|
||
Better Try push here:
It does appear to resolve the intermittent, but it regressed a couple of other tests; will need to investigate that.
Assignee | ||
Comment 11•3 years ago
|
||
(In reply to Botond Ballo [:botond] from comment #10)
It does appear to resolve the intermittent, but it regressed a couple of other tests; will need to investigate that.
I am confused by this failure:
GECKO(1701) | pointerevent_touch-action-inherit_child-pan-x-child-pan-x_touch-manual.html | TEST-UNEXPECTED-FAIL | /tests/dom/events/test/pointerevents/pointerevent_touch-action-inherit_child-pan-x-child-pan-x_touch-manual.html | touch-action attribute test: assert_not_equals: scroll x offset should not be 0 in the end of the test got disallowed value 0
This is a manual test, and it's running in automation?
Assignee | ||
Comment 12•3 years ago
|
||
(In reply to Botond Ballo [:botond] from comment #11)
This is a manual test, and it's running in automation?
It looks like, even though the pointer-events tests were written as manual tests, we have some logic to drive them in an automated fashion.
Assignee | ||
Comment 13•3 years ago
|
||
(In reply to Botond Ballo [:botond] from comment #11)
GECKO(1701) | pointerevent_touch-action-inherit_child-pan-x-child-pan-x_touch-manual.html | TEST-UNEXPECTED-FAIL | /tests/dom/events/test/pointerevents/pointerevent_touch-action-inherit_child-pan-x-child-pan-x_touch-manual.html | touch-action attribute test: assert_not_equals: scroll x offset should not be 0 in the end of the test got disallowed value 0
So here, the test is synthesizing native events that are expected to cause scrolling, and then checks that scrolling has occurred.
I've verified that the events reach APZ and cause scrolling there. I suspect the issue is that the waitForApzFlushedRepaints()
mechanism is somehow failing to wait until the scrolling is propagated to content.
Assignee | ||
Comment 14•3 years ago
|
||
In fact it's just a bug in my patch. When the displayport changes, the old property object is deleted and a new one is created; the new mPainted
flag needs to be propagated from the old to the new.
Assignee | ||
Comment 15•3 years ago
|
||
I do feel like I've gotten a better understanding of waitForApzFlushedRepaints
as a result of debugging this, though: it is not sufficient to simply synthesize a bunch of native events, call waitForApzFlushedRepaints
, and then assert that scrolling has taken place. For waitForApzFlushedRepaints
to have the desired effect, at the time it's called the paint that will carry the layers-dependent target confirmation needs to have been scheduled already (which generally means, the events that trigger said confirmation need to have arrived at content).
The pointer-events test in question does this correctly (it waits for the drag gesture's touchend
before calling waitForApzFlushedRepaints
), but this is useful to keep in mind for other tests.
Assignee | ||
Comment 16•3 years ago
|
||
Assignee | ||
Comment 17•3 years ago
|
||
Merely checking for having a displayport isn't enough, because the paint to
tell APZ about the new scroll frame could still be pending.
Comment 18•3 years ago
|
||
(In reply to Botond Ballo [:botond] from comment #15)
I do feel like I've gotten a better understanding of
waitForApzFlushedRepaints
as a result of debugging this, though: it is not sufficient to simply synthesize a bunch of native events, callwaitForApzFlushedRepaints
, and then assert that scrolling has taken place. ForwaitForApzFlushedRepaints
to have the desired effect, at the time it's called the paint that will carry the layers-dependent target confirmation needs to have been scheduled already (which generally means, the events that trigger said confirmation need to have arrived at content).The pointer-events test in question does this correctly (it waits for the drag gesture's
touchend
before callingwaitForApzFlushedRepaints
), but this is useful to keep in mind for other tests.
Should we file a bug to improve this situation in the future? Either audit test usage to make sure it's okay and add a comment about how to use waitForApzFlushedRepaints or extend/modify waitForApzFlushedRepaints to avoid this problem?
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 20•3 years ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #18)
Should we file a bug to improve this situation in the future? Either audit test usage to make sure it's okay and add a comment about how to use waitForApzFlushedRepaints or extend/modify waitForApzFlushedRepaints to avoid this problem?
Yes, good idea! Filed bug 1604270.
Comment 21•3 years ago
|
||
Pushed by bballo@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e0d923d3736a Only send a SetTargetAPZC notification right away if the target has a displayport and it has been painted. r=tnikkel
Comment 22•3 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Description
•