Intermittent Windows AArch64 layout/generic/crashtests/<random_test>.html | load failed: timed out waiting for pending paint count to reach zero (waiting for updateCanvasPending) DON'T USE FOR CLASSIFICATION BUT FILE INDIVIDUAL BUGS
Categories
(Core :: Web Painting, defect, P5)
Tracking
()
People
(Reporter: intermittent-bug-filer, Assigned: jmaher)
References
Details
(Keywords: intermittent-failure)
Attachments
(1 file)
Filed by: ncsoregi [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=334277531&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Cg5eDHIJTZOJr9PI-7YN5A/runs/0/artifacts/public/logs/live_backing.log
Reftest URL: https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Cg5eDHIJTZOJr9PI-7YN5A/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1
[task 2021-03-24T12:27:52.585Z] 12:27:52 INFO - REFTEST TEST-START | layout/generic/crashtests/370699-1.html
[task 2021-03-24T12:27:52.586Z] 12:27:52 INFO - REFTEST TEST-LOAD | file:///Z:/task_1616587823/build/tests/reftest/tests/layout/generic/crashtests/370699-1.html | 2180 / 3905 (55%)
[task 2021-03-24T12:29:08.367Z] 12:29:08 INFO - [Parent 7724, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, nullptr) failed with result 0x804B000A (NS_ERROR_MALFORMED_URI): file /builds/worker/checkouts/gecko/caps/BasePrincipal.cpp:1149
[task 2021-03-24T12:31:47.043Z] 12:31:47 INFO - JavaScript error: resource://gre/modules/PurgeTrackerService.jsm, line 387: NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIScriptSecurityManager.createContentPrincipalFromOrigin]
[task 2021-03-24T12:41:21.118Z] 12:41:21 INFO - REFTEST TEST-UNEXPECTED-FAIL | layout/generic/crashtests/370699-1.html | load failed: timed out waiting for pending paint count to reach zero (waiting for updateCanvasPending)
[task 2021-03-24T12:41:21.118Z] 12:41:21 INFO - REFTEST INFO | Saved log: START file:///Z:/task_1616587823/build/tests/reftest/tests/layout/generic/crashtests/370699-1.html
[task 2021-03-24T12:41:21.120Z] 12:41:21 INFO - REFTEST INFO | Saved log: [CONTENT] FromChildAfterPaintListener from about:blank```
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Updated•4 years ago
|
Comment 7•4 years ago
•
|
||
Daniel, there is an instance of this issue where the failure log is full of lines referencing AfterPaintListener, could you please take a look at what might be causing this?
| Comment hidden (Intermittent Failures Robot) |
Comment 9•4 years ago
|
||
(In reply to Alexandru Michis [:malexandru] from comment #7)
Daniel, there is an instance of this issue where the failure log is full of lines referencing AfterPaintListener, could you please take a look at what might be causing this?
That's a case where the test reloads itself forever, and the AfterPaintListener spam is probably just one line per reload. (And we probably never realize that the test is done because we wait for pending paints to hit 0, and we get unlucky and are too slow with our pending-paint check -- it must always happen after there's a pending paint due to the reload.)
That test is just kinda bogus as a crashtest; crashtests shouldn't reload themselves, lest they trigger issues like this.
I'll post a patch to clean up that particular test over in bug 1691034 (which I think is about this same issue for that test).
| Comment hidden (Intermittent Failures Robot) |
Comment 11•4 years ago
|
||
Daniel, any further plans for this issue affecting random crashtests on Windows AArch64? It's failing almost permanently on central and beta.
Comment 12•4 years ago
|
||
It's not a single issue that's affecting random crashtests -- it's distinct issues with individual crashtests, where the crashtest never stops painting (due to e.g. setTimeout loops that never end, which result in a paint queue that's never empty when the harness happens to check). This must be coming up on aarch64 due to specific paint and/or harness-event-timing behavior on our aarch64 test machines, which make us more likely to hit this issue there, or something.
The tests need individual fixups to avoid looping forever. I did one such fixup in bug 1691034, for the test that was implicated in comment 7 here. Other instances of this "waiting for pending paint count to reach zero" issue are likely indications of other tests that are similarly problematic and need fixups.
e.g. one recent instance is https://treeherder.mozilla.org/logviewer?job_id=338238518&repo=mozilla-central&lineNumber=25300 for https://searchfox.org/mozilla-central/source/layout/generic/crashtests/471360.html which is a test that also happened to come up in bug 1691192 recently. We can fix that one over there (and we should probably uplift these test fixes, if this is a problem for beta as well.)
Updated•4 years ago
|
Comment 13•4 years ago
|
||
(In reply to Daniel Holbert [:dholbert] from comment #12)
The tests need individual fixups to avoid looping forever. I did one such fixup in bug 1691034, [...]
we should probably uplift these test fixes, if this is a problem for beta as well.)
(Note: We don't need to uplift that one to beta, because it landed before the most recent merge so it's already there. Any remaining instances of this on beta are probably in different tests.)
Comment 14•4 years ago
•
|
||
Here's a log of the last ~10 days of failures:
https://treeherder.mozilla.org/intermittent-failures/bugdetails?bug=1700634&startday=2021-04-20&endday=2021-04-30&tree=all
471360.html looks like the most common culprit (and I've got a patch to fix it in bug 1691192, as noted in comment 12).
Here's a list of other tests where we also seem to have hit this in that time range (note, not all of these are highlighted in the logviewer, possibly because the logs are quite long):
https://treeherder.mozilla.org/logviewer?job_id=338192696&repo=mozilla-beta&lineNumber=5735
REFTEST TEST-UNEXPECTED-FAIL | layout/generic/crashtests/286491.html | load failed: timed out waiting for pending paint count to reach zero (waiting for updateCanvasPending)
REFTEST TEST-UNEXPECTED-FAIL | layout/generic/crashtests/478504.html | load failed: timed out waiting for pending paint count to reach zero (waiting for updateCanvasPending)
https://treeherder.mozilla.org/logviewer?job_id=338139160&repo=mozilla-beta&lineNumber=23755
REFTEST TEST-UNEXPECTED-FAIL | layout/forms/crashtests/1279354.html | load failed: timed out waiting for pending paint count to reach zero (waiting for updateCanvasPending)
https://treeherder.mozilla.org/logviewer?job_id=338140600&repo=mozilla-central&lineNumber=58114
REFTEST TEST-UNEXPECTED-FAIL | layout/generic/crashtests/1460158-1.html | load failed: timed out waiting for pending paint count to reach zero (waiting for updateCanvasPending)
https://treeherder.mozilla.org/logviewer?job_id=337591720&repo=mozilla-central&lineNumber=16399
REFTEST TEST-UNEXPECTED-FAIL | layout/generic/crashtests/1460158-2.html | load failed: timed out waiting for pending paint count to reach zero (waiting for updateCanvasPending)
https://treeherder.mozilla.org/logviewer?job_id=337581302&repo=mozilla-central&lineNumber=15071
REFTEST TEST-UNEXPECTED-FAIL | gfx/tests/crashtests/783041-3.html | load failed: timed out waiting for pending paint count to reach zero (waiting for updateCanvasPending)
https://treeherder.mozilla.org/logviewer?job_id=337355726&repo=mozilla-beta&lineNumber=61673
REFTEST TEST-UNEXPECTED-FAIL | layout/style/crashtests/1400035.html | load failed: timed out waiting for pending paint count to reach zero (waiting for updateCanvasPending)
Comment 15•4 years ago
•
|
||
It looks like these are tests that continue painting forever.
Some of these are just pathological fuzzer testcases (like 370174-3.html did and 471360.html does, which I've previously posted patches to address). e.g. these two have a setInterval or a setTimeout ping-pong that causes some repeated DOM manipulation, indefinitely:
https://searchfox.org/mozilla-central/source/layout/generic/crashtests/286491.html
https://searchfox.org/mozilla-central/source/layout/generic/crashtests/478504.html
Those are pretty straightforward to fix, if we want to.
But there are other cases that are less clear about what we should do. This one just has a CSS animation (which cycles forever):
https://searchfox.org/mozilla-central/source/layout/style/crashtests/1400035.html
...and in several of the cases, the dynamic painting-forever thing is just a <progress> element, which by default plays a "bounce back and forth" animation:
https://searchfox.org/mozilla-central/source/layout/forms/crashtests/1279354.html
https://searchfox.org/mozilla-central/source/layout/generic/crashtests/1460158-1.html
https://searchfox.org/mozilla-central/source/layout/generic/crashtests/1460158-2.html
https://searchfox.org/mozilla-central/source/gfx/tests/crashtests/783041-3.html
I'm not sure we want CSS animations and <progress> elements to have the power to cause a crashtest to fail (due to spamming the harness with never-ending paints). This doesn't seem to be a problem on other platforms, so I assume we have some sort of solution that's not working on this new platform. mattwoodrow, do you know what might be going wrong here? I recall you working on the piece of the reftest harness that makes us wait until the pending paint count reaches zero.
Comment 16•4 years ago
|
||
I dealt with a lot of those types of issues when converting the reftest harness to handle fission, and this bug might in fact be a regression from that based on what you describe, because we needed to make async operations that were previously sync, which make them take slightly longer wall clock time and means things that invalidate every frame have less chance of finishing that work before the next invalidate asks for another paint.
| Comment hidden (Intermittent Failures Robot) |
Comment 18•4 years ago
|
||
Thanks, tnikkel -- yeah, this sounds like that sort of issue.
Do you have any suggestions for how to address this, based on your approaches in that effort? Particularly for tests that have some continuously-painting thing like <progress> elements and continuous CSS animations.
I'm hoping we don't have to hand-fix all such tests -- which I think would have to mean e.g. adding JS to an otherwise-static test, to remove the animated element after some arbitrary period of time. I'm hoping we don't have to resort to that.
Comment 19•4 years ago
|
||
I usually just added dump statements in the reftest harness and key parts of c++ code to get an idea of how the loop was happening and not getting broken.
| Comment hidden (Intermittent Failures Robot) |
Comment 21•4 years ago
|
||
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
For more information, please visit auto_nag documentation.
Comment 22•4 years ago
|
||
Windows AArch64 tests on mozilla-central
Windows AArch64 crashtests had their last successful run on May 9th, reftests are failing permanently since the switch to WebRender on June 24th (bug 1717912) and had a success rate >50% before that. The current state of test results guarantees the application can be launched in a test environment.
Joel, what shall be done here?
- Fixing: Doesn't look like an option based on previous discussion in the bug without spending a big chunk of developer time on it.
- Disable all intermittently failing tests: Let me know if I shall build a query of all tests for this platform which failed recently. From inspection, this seems to affect random tests and might be an issue with the test environment.
- Demote tasks from tier 2 to tier 3: Sheriffs wouldn't monitor these tasks anymore (they would still be listed in my reports about permanently failing tasks), but it doesn't sound like the tasks would provide any value in this state (similar to the current one).
- Turn off these tests for Windows AArch64. The mochitest-media suite would still be running and indicate the basic health of the build.
| Assignee | ||
Comment 23•4 years ago
|
||
thanks for bringing this up :aryx . I agree that fixing this is difficult and not realistic, also disabling the intermittents would be painful and probably not very beneficial. I suspect after the 100 common causes are disabled on win/aarch64 we would have a more stable crashtest suite, but it could be more- the tasks are timing out.
I think the best course of action is to disable the crashtests on this platform. The media tests run the large majority of the possible media tests and are green almost all the time.
| Assignee | ||
Comment 24•4 years ago
|
||
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Comment 25•4 years ago
|
||
Comment 26•4 years ago
|
||
| bugherder | ||
Comment 27•4 years ago
|
||
| bugherder uplift | ||
Description
•