Intermittent Mn MacOS, Linux Automation Error: mozprocess timed out after 1000 seconds running ['/Users/cltbld/tasks/task_1589484627/build/venv/bin/python', ...] for TestCrash.test_crash_content_process
Categories
(Testing :: Marionette Client and Harness, defect, P5)
Tracking
(firefox-esr68 unaffected, firefox76 unaffected, firefox77 unaffected, firefox78 fixed)
Tracking | Status | |
---|---|---|
firefox-esr68 | --- | unaffected |
firefox76 | --- | unaffected |
firefox77 | --- | unaffected |
firefox78 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: enndeakin)
References
(Regression)
Details
(Keywords: intermittent-failure, regression, Whiteboard: [stockwell fixed:backout])
Filed by: dluca [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=302328967&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/biV-YbLFQ72h3FCn1xzDWA/runs/0/artifacts/public/logs/live_backing.log
INFO - TEST-START | testing/marionette/harness/marionette_harness/tests/unit/test_crash.py TestCrash.test_crash_content_process
[task 2020-05-14T19:39:37.276Z] 19:39:37 INFO - Suppressing GECKO output. To view, add `print(self.new_out.getvalue())` to the end of this test.
[task 2020-05-14T19:56:17.377Z] 19:56:17 INFO - Automation Error: mozprocess timed out after 1000 seconds running ['/Users/cltbld/tasks/task_1589484627/build/venv/bin/python', '-u', '/Users/cltbld/tasks/task_1589484627/build/tests/marionette/harness/marionette_harness/runtests.py', '--setpref=toolkit.asyncshutdown.log=true', '--setpref=media.peerconnection.mtransport_process=false', '--setpref=network.process.enabled=false', '--gecko-log=-', '--log-raw=-', '-vv', '--log-raw=/Users/cltbld/tasks/task_1589484627/build/blobber_upload_dir/marionette_raw.log', '--log-errorsummary=/Users/cltbld/tasks/task_1589484627/build/blobber_upload_dir/marionette_errorsummary.log', '--log-html=/Users/cltbld/tasks/task_1589484627/build/blobber_upload_dir/report.html', '--binary=/Users/cltbld/tasks/task_1589484627/build/application/Firefox Nightly.app/Contents/MacOS/firefox', '--address=localhost:2828', u'--symbols-path=https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Q8dlOzVfSsiI1meyrIDf4Q/artifacts/public/build/target.crashreporter-symbols.zip', '/Users/cltbld/tasks/task_1589484627/build/tests/marionette/tests/testing/marionette/harness/marionette_harness/tests/unit-tests.ini']
[task 2020-05-14T19:56:17.577Z] 19:56:17 ERROR - timed out after 1000 seconds of no output
[task 2020-05-14T19:56:17.577Z] 19:56:17 ERROR - Return code: -15
[task 2020-05-14T19:56:17.578Z] 19:56:17 ERROR - No suite end message was emitted by this harness.
[task 2020-05-14T19:56:17.578Z] 19:56:17 INFO - TinderboxPrint: marionette<br/>824/0/1
[task 2020-05-14T19:56:17.578Z] 19:56:17 INFO - gecko.log not found
[task 2020-05-14T19:56:17.578Z] 19:56:17 INFO - TinderboxPrint: marionette<br/>824/0/1
[task 2020-05-14T19:56:17.578Z] 19:56:17 INFO - Marionette exited with return code -15: FAILURE
[task 2020-05-14T19:56:17.578Z] 19:56:17 ERROR - # TBPL FAILURE #
[task 2020-05-14T19:56:17.578Z] 19:56:17 WARNING - setting return code to 2
[task 2020-05-14T19:56:17.578Z] 19:56:17 INFO - Running post-action listener: _package_coverage_data
[task 2020-05-14T19:56:17.578Z] 19:56:17 INFO - Running post-action listener: _resource_record_post_action
[task 2020-05-14T19:56:17.578Z] 19:56:17 INFO - Running post-action listener: process_java_coverage_data
[task 2020-05-14T19:56:17.578Z] 19:56:17 INFO - [mozharness: 2020-05-14 19:56:17.578821Z] Finished run-tests step (success)
[task 2020-05-14T19:56:17.578Z] 19:56:17 INFO - Running post-run listener: _resource_record_post_run
[task 2020-05-14T19:56:17.701Z] 19:56:17 INFO - Total resource usage - Wall time: 1490s; CPU: 11.0%; Read bytes: 220069888; Write bytes: 1627289088; Read time: 2655; Write time: 7199
[task 2020-05-14T19:56:17.701Z] 19:56:17 INFO - TinderboxPrint: CPU usage<br/>11.0%
[task 2020-05-14T19:56:17.701Z] 19:56:17 INFO - TinderboxPrint: I/O read bytes / time<br/>220,069,888 / 2,655
[task 2020-05-14T19:56:17.701Z] 19:56:17 INFO - TinderboxPrint: I/O write bytes / time<br/>1,627,289,088 / 7,199
[task 2020-05-14T19:56:17.701Z] 19:56:17 INFO - TinderboxPrint: CPU idle<br/>5,319.1 (89.3%)
[task 2020-05-14T19:56:17.701Z] 19:56:17 INFO - TinderboxPrint: CPU system<br/>150.8 (2.5%)
[task 2020-05-14T19:56:17.701Z] 19:56:17 INFO - TinderboxPrint: CPU user<br/>489.0 (8.2%)
[task 2020-05-14T19:56:17.701Z] 19:56:17 INFO - TinderboxPrint: Swap in / out<br/>478,515,200 / 0
[task 2020-05-14T19:56:17.702Z] 19:56:17 INFO - install - Wall time: 19s; CPU: 38.0%; Read bytes: 3100672; Write bytes: 290271232; Read time: 51; Write time: 582
[task 2020-05-14T19:56:17.711Z] 19:56:17 INFO - run-tests - Wall time: 1471s; CPU: 11.0%; Read bytes: 215494656; Write bytes: 1332401664; Read time: 2575; Write time: 6597
[task 2020-05-14T19:56:17.982Z] 19:56:17 WARNING - returning nonzero exit status 2
[fetches 2020-05-14T19:56:18.001Z] removing /Users/cltbld/tasks/task_1589484627/fetches
[fetches 2020-05-14T19:56:18.002Z] finished
[taskcluster 2020-05-14T19:56:18.012Z] Exit Code: 2
[taskcluster 2020-05-14T19:56:18.012Z] User Time: 6m8.621984s
[taskcluster 2020-05-14T19:56:18.012Z] Kernel Time: 1m28.166325s
[taskcluster 2020-05-14T19:56:18.012Z] Wall Time: 25m33.544811831s
[taskcluster 2020-05-14T19:56:18.012Z] Result: FAILED
[taskcluster 2020-05-14T19:56:18.012Z] === Task Finished ===
[taskcluster 2020-05-14T19:56:18.012Z] Task Duration: 25m33.548510968s
[taskcluster 2020-05-14T19:56:18.121Z] Uploading artifact public/logs/localconfig.json from file logs/localconfig.json with content encoding "gzip", mime type "application/json" and expiry 2021-05-14T19:28:15.127Z
[taskcluster 2020-05-14T19:56:18.370Z] Uploading artifact public/test_info/marionette_errorsummary.log from file build/blobber_upload_dir/marionette_errorsummary.log with content encoding "gzip", mime type "text/plain" and expiry 2021-05-14T19:28:15.127Z
[taskcluster 2020-05-14T19:56:18.802Z] Uploading artifact public/test_info/marionette_raw.log from file build/blobber_upload_dir/marionette_raw.log with content encoding "gzip", mime type "text/plain" and expiry 2021-05-14T19:28:15.127Z
[taskcluster 2020-05-14T19:56:19.067Z] Uploading artifact public/test_info/report.html from file build/blobber_upload_dir/report.html with content encoding "gzip", mime type "text/html; charset=utf-8" and expiry 2021-05-14T19:28:15.127Z
[taskcluster 2020-05-14T19:56:19.186Z] Uploading artifact public/test_info/resource-usage.json from file build/blobber_upload_dir/resource-usage.json with content encoding "gzip", mime type "application/json" and expiry 2021-05-14T19:28:15.127Z
[taskcluster:error] exit status 2
Updated•5 years ago
|
Updated•5 years ago
|
Comment 1•5 years ago
|
||
The failing test here is test_crash_content_process
:
[task 2020-05-14T19:39:37.275Z] 19:39:37 INFO - TEST-START | testing/marionette/harness/marionette_harness/tests/unit/test_crash.py TestCrash.test_crash_content_process
[task 2020-05-14T19:39:37.276Z] 19:39:37 INFO - Suppressing GECKO output. To view, add `print(self.new_out.getvalue())` to the end of this test.
[task 2020-05-14T19:56:17.377Z] 19:56:17 INFO - Automation Error: mozprocess timed out after 1000 seconds running ['/Users/cltbld/tasks/task_1589484627/build/venv/bin/python', '-u', '/Users/cltbld/tasks/task_1589484627/build/tests/marionette/harness/marionette_harness/runtests.py', '--setpref=toolkit.asyncshutdown.log=true', '--setpref=media.peerconnection.mtransport_process=false', '--setpref=network.process.enabled=false', '--gecko-log=-', '--log-raw=-', '-vv', '--log-raw=/Users/cltbld/tasks/task_1589484627/build/blobber_upload_dir/marionette_raw.log', '--log-errorsummary=/Users/cltbld/tasks/task_1589484627/build/blobber_upload_dir/marionette_errorsummary.log', '--log-html=/Users/cltbld/tasks/task_1589484627/build/blobber_upload_dir/report.html', '--binary=/Users/cltbld/tasks/task_1589484627/build/application/Firefox Nightly.app/Contents/MacOS/firefox', '--address=localhost:2828', u'--symbols-path=https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Q8dlOzVfSsiI1meyrIDf4Q/artifacts/public/build/target.crashreporter-symbols.zip', '/Users/cltbld/tasks/task_1589484627/build/tests/marionette/tests/testing/marionette/harness/marionette_harness/tests/unit-tests.ini']
[task 2020-05-14T19:56:17.577Z] 19:56:17 ERROR - timed out after 1000 seconds of no output
[task 2020-05-14T19:56:17.577Z] 19:56:17 ERROR - Return code: -15
As it looks like there was a extremely long time in shutting down the browser until it got soft-killed. Sadly we do not have any gecko log output for it. If it turns out that the test fails more often we could do a try build after adding print(self.new_out.getvalue())
at the end of the test.
Comment 2•5 years ago
|
||
Looks like it actually fails a lot. And that not only for Mac but also Linux. So far I cannot see a hang during shutdown kill on Windows. All are happening for shippable opt.
It's a fairly new regression so a backfill would be good. I just triggered some jobs on mozilla-central to know at least the affecting merge.
Comment 3•5 years ago
|
||
Comment 4•5 years ago
|
||
Looks like https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&tier=1%2C2%2C3&searchStr=mn&revision=045d696faa87fc06c8040b75dec7cd6358c34067 is the causing merge on mozilla-central.
I asked in #sheriffs for a further backfill in autoland.
Comment hidden (Intermittent Failures Robot) |
Comment 6•5 years ago
|
||
Retriggers point to bug 1594752, https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=OS%2CX%2C10.14%2CShippable%2Copt%2Ctest-macosx1014-64-shippable%2Fopt-marionette-e10s%2C%28Mn%29&selectedTaskRun=dLGvnHO1Sq-C-NfOzflmzA-0&tochange=3e68d50950bf122006b30009c8891196c21b8395&fromchange=1813cb675a68f10ca82536fad034766bdd585f83
However, the assertion happens before that bug, but is not followed by the automation error, but by bug 1582736.
Assertion followed by automation error: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=302402842&repo=autoland&lineNumber=7057
vs
Assertion followed by bug 1582736: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=302407011&repo=autoland&lineNumber=14015
Comment 7•5 years ago
|
||
(In reply to Andreea Pavel [:apavel] from comment #6)
Retriggers point to bug 1594752, https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=OS%2CX%2C10.14%2CShippable%2Copt%2Ctest-macosx1014-64-shippable%2Fopt-marionette-e10s%2C%28Mn%29&selectedTaskRun=dLGvnHO1Sq-C-NfOzflmzA-0&tochange=3e68d50950bf122006b30009c8891196c21b8395&fromchange=1813cb675a68f10ca82536fad034766bdd585f83
Thanks a lot. So the failing test can be found here:
I assume that due to the expected crash there is a bad interaction between the actors, which causes a hang?
Given that Tier1 jobs are affected here, I assume the patch on bug 1594752 should be backed out?
However, the assertion happens before that bug, but is not followed by the automation error, but by bug 1582736.
These assertions also happen in passing tests. So not sure how important they are at this time.
Comment 8•5 years ago
|
||
Actually it looks like the backfill was not complete and misses some changesets, especially the one from above which you claimed as regressor:
Note that this is a failure for shippable builds on Mac and Linux.
Comment 10•5 years ago
|
||
As an update to the above comments, the regressor is still bug 1594752.
The TH link above is from the 11th of May, where the bug landed and got backed out.
Retriggers in comment 6 are made on the push from May 14th, where the bug got relanded.
Henrik, the failure is intermittent, we could probably talk to Neil and see how long it will take to be fixed.
Aryx, what do you think? backout or continue conversation ?
Comment 11•5 years ago
|
||
Thanks for clarifying and yes I was looking at a wrong autoland link. Note that this failure happened 33 times yesterday.
Updated•5 years ago
|
![]() |
||
Comment 12•5 years ago
|
||
Yes, please back it out.
Comment 13•5 years ago
|
||
Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
Updated•2 years ago
|
Comment 15•2 years ago
|
||
Description
•