Perma linux asan canvas [tier 2] [taskcluster:error] Task timeout after 1800 seconds. Force killing container. | single tracking bug
Categories
(Core :: DOM: Content Processes, defect, P5)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr115 | --- | unaffected |
firefox-esr128 | --- | unaffected |
firefox127 | --- | unaffected |
firefox128 | --- | unaffected |
firefox129 | --- | wontfix |
firefox130 | --- | wontfix |
firefox131 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: whimboo)
References
(Depends on 1 open bug, Blocks 1 open bug, Regression)
Details
(Keywords: intermittent-failure, regression)
Attachments
(1 file)
Bug 1904963 - canvas wpt: set dom.ipc.keepProcessesAlive.web = 1 when fission is disabled, r=jstutte
48 bytes,
text/x-phabricator-request
|
Details | Review |
Filed by: imoraru [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=464147572&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/bMi-owgmS5a7mCdjaeQWUQ/runs/0/artifacts/public/logs/live_backing.log
[task 2024-06-26T19:03:18.467Z] 19:03:18 INFO - TEST-START | /html/canvas/offscreen/path-objects/2d.path.clip.basic.2.html
[task 2024-06-26T19:03:18.545Z] 19:03:18 INFO - Closing window 3ab6258d-39ca-4ed1-8f41-80cd56bf4685
[task 2024-06-26T19:03:18.619Z] 19:03:18 INFO - PID 1012 | -----------------------------------------------------
[task 2024-06-26T19:03:18.620Z] 19:03:18 INFO - PID 1012 | Suppressions used:
[task 2024-06-26T19:03:18.621Z] 19:03:18 INFO - PID 1012 | count bytes template
[task 2024-06-26T19:03:18.621Z] 19:03:18 INFO - PID 1012 | 31 16288 nsComponentManagerImpl
[task 2024-06-26T19:03:18.622Z] 19:03:18 INFO - PID 1012 | 2 288 libfontconfig.so
[task 2024-06-26T19:03:18.623Z] 19:03:18 INFO - PID 1012 | 1 9240 style::sharing::SHARING_CACHE_KEY
[task 2024-06-26T19:03:18.624Z] 19:03:18 INFO - PID 1012 | 1 4104 style::bloom::BLOOM_KEY
[task 2024-06-26T19:03:18.624Z] 19:03:18 INFO - PID 1012 | -----------------------------------------------------
[taskcluster:error] Task timeout after 1800 seconds. Force killing container.
[taskcluster 2024-06-26 19:03:21.518Z] === Task Finished ===
[taskcluster 2024-06-26 19:03:21.518Z] Unsuccessful task run with exit code: -1 completed in 1804.01 seconds
Comment 1•3 months ago
|
||
Hi Andrew! Can you please take a look at this? Could this be something regressed by the recent changes from Bug 1901076?
Thank you!
Updated•3 months ago
|
Comment 2•3 months ago
|
||
The regressor cannot be identified on autoland as the job will fail with No checks run.
It first started with this pushlog merged to central: https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=653f0dc8442dd9cae70845896eb3c0a0252677b3
Updated•3 months ago
|
Comment 3•3 months ago
|
||
The test case that is failing doesn't even use any images, so it seems unlikely:
https://searchfox.org/mozilla-central/source/testing/web-platform/tests/html/canvas/element/path-objects/2d.path.clip.basic.2.html
Comment 4•3 months ago
|
||
I bisected this on try to bug 1728331.
Comment 5•3 months ago
|
||
Set release status flags based on info from the regressing bug 1728331
:nika, since you are the author of the regressor, bug 1728331, could you take a look?
For more information, please visit BugBot documentation.
Comment 6•3 months ago
|
||
It appears that this timeout is happening because the test is taking too long to run, rather than because of a hang or similar while running the test.
Given that this is a nofis test, my guess is that the change to remove E10S-only process recycling is negatively interacting with this particular test suite in some way, causing it to run much longer than previously. In the log linked from comment 0, there are Suppressions used:
outputs relatively frequently, which I am guessing correspond to a content process shutting down. Looking at a non-failing run from the backlog, I see much less frequent Suppressions used:
logs (e.g. https://treeherder.mozilla.org/logviewer?job_id=464147617&repo=mozilla-central&lineNumber=6899).
We probably don't want to try to restore the E10S recycling logic, as we don't support e10s on desktop anymore, and it's a significant amount of complexity which would only be used for test code. It might be possible to mitigate the process recycling by setting dom.ipc.keepProcesses.web
to a non-zero number in this test suite when running with Fission disabled, which may reduce process shutdown/startup cycles.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 9•2 months ago
|
||
It might be possible to mitigate the process recycling by setting
dom.ipc.keepProcesses.web
to a non-zero number in this test suite when running with Fission disabled, which may reduce process shutdown/startup cycles.
Artur, would you mind to try this ?
Comment 10•2 months ago
|
||
Set release status flags based on info from the regressing bug 1728331
Comment 11•2 months ago
|
||
Just changed .ini file, and then questioned myself, if this pref is actually used anywhere:
https://searchfox.org/mozilla-central/search?q=keepProcessesAlive.web&path=&case=false®exp=false
(still looking)
Comment 12•2 months ago
|
||
Probably, used there: https://searchfox.org/mozilla-central/source/dom/ipc/ContentParent.cpp#2173
Updated•2 months ago
|
Comment 13•2 months ago
|
||
Updated•2 months ago
|
Updated•2 months ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 16•2 months ago
|
||
Comment 17•2 months ago
|
||
bugherder |
Comment 18•2 months ago
|
||
Hi Artur! Can you please take another look at this? the issue is still happening -> check here
Thank you!
Comment 19•2 months ago
|
||
(looking)
Comment 20•2 months ago
|
||
https://bugzilla.mozilla.org/show_bug.cgi?id=1891526 - see also
(going to try them locally in asan environment)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 25•28 days ago
|
||
We noticed a bug in Marionette lately that I'm going to fix on bug 1761634. With it's landing each test will need some more milliseconds to complete because now we will correctly wait for the initial about:blank
to be loaded. Based on that I'm going ahead and split the canvas jobs into 3 chunks and also increase the task timeout from 1800 to 2700, which brings us to the same settings as for other jobs. That means that this should also help here so that we no longer see these task timeouts.
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 27•24 days ago
|
||
The failures seem to have stopped after my patch landed. I'll re-check later this week and will mark the bug as fixed if there will be still no failures reported.
Assignee | ||
Comment 28•21 days ago
|
||
I can verify that ASAN builds do no longer cause a timeout of the task since bug 1761634 landed. Marking this bug as fixed.
Comment 29•21 days ago
|
||
The patch landed in nightly and beta is affected.
:whimboo, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox130
towontfix
.
For more information, please visit BugBot documentation.
Updated•21 days ago
|
Updated•21 days ago
|
Description
•