Open Bug 1431125 Opened 2 years ago Updated 2 months ago

Test verification of long-running tests may exceed task timeout

Categories

(Testing :: General, defect, P3)

defect

Tracking

(Not tracked)

People

(Reporter: gbrown, Unassigned)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file)

Consider

https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=156829580&lineNumber=6259

TEST-OK | browser/tools/mozscreenshots/primaryUI/browser_primaryUI.js | took 682544ms
...
[taskcluster:error] Task timeout after 5400 seconds. Force killing container.
Blocks: 1411358
Priority: -- → P2
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=162381172&lineNumber=4696

[task 2018-02-15T12:05:45.624Z] 12:05:45     INFO -  :::
[task 2018-02-15T12:05:45.625Z] 12:05:45     INFO -  ::: Running test verification step "1. Run each test 10 times, sequentially."...
[task 2018-02-15T12:05:45.625Z] 12:05:45     INFO -  :::
[task 2018-02-15T12:05:45.625Z] 12:05:45     INFO -  Running tests sequentially.
[task 2018-02-15T12:05:45.625Z] 12:05:45     INFO -  SUITE-START | Running 1 tests
[task 2018-02-15T12:05:45.830Z] 12:05:45     INFO -  TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js
[task 2018-02-15T12:09:27.473Z] 12:09:27     INFO -  TEST-PASS | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js | took 221644ms
[task 2018-02-15T12:09:28.116Z] 12:09:28     INFO -  TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js
[task 2018-02-15T12:13:06.912Z] 12:13:06     INFO -  TEST-PASS | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js | took 218796ms
[task 2018-02-15T12:13:07.555Z] 12:13:07     INFO -  TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js
[task 2018-02-15T12:16:44.402Z] 12:16:44     INFO -  TEST-PASS | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js | took 216846ms
[task 2018-02-15T12:16:45.097Z] 12:16:45     INFO -  TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js
[task 2018-02-15T12:20:20.093Z] 12:20:20     INFO -  TEST-PASS | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js | took 214996ms
[task 2018-02-15T12:20:20.738Z] 12:20:20     INFO -  TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js
[task 2018-02-15T12:23:57.083Z] 12:23:57     INFO -  TEST-PASS | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js | took 216344ms
[task 2018-02-15T12:23:57.729Z] 12:23:57     INFO -  TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js
[task 2018-02-15T12:27:49.732Z] 12:27:49     INFO -  TEST-PASS | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js | took 232002ms
[task 2018-02-15T12:27:50.431Z] 12:27:50     INFO -  TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js
[task 2018-02-15T12:31:24.381Z] 12:31:24     INFO -  TEST-PASS | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js | took 213950ms
[task 2018-02-15T12:31:25.080Z] 12:31:25     INFO -  TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js
[task 2018-02-15T12:35:01.834Z] 12:35:01     INFO -  TEST-PASS | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js | took 216754ms
[task 2018-02-15T12:35:02.535Z] 12:35:02     INFO -  TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js
[task 2018-02-15T12:38:53.939Z] 12:38:53     INFO -  TEST-PASS | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js | took 231405ms
[task 2018-02-15T12:38:54.587Z] 12:38:54     INFO -  TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js

[taskcluster:error] Task timeout after 5400 seconds. Force killing container.
"Teach" every harness' repeat loop about max run time??

Might be better to run the test once, skip (fail?) verification if a single test run takes longer than ~1 minute.
The test-verify logic tries to predict when verification will take longer than an hour and stops verification prematurely in those cases. But long-running tests - when a single test iteration takes more than a minute or so - can still be problematic. I don't see an easy way of fixing that. For now, I'd like to avoid the task timeouts by increasing the max-run-time significantly - to 3 hours.
Attachment #8954829 - Flags: review?(jmaher)
Keywords: leave-open
Comment on attachment 8954829 [details] [diff] [review]
increase tc max-run-time for test-verify

Review of attachment 8954829 [details] [diff] [review]:
-----------------------------------------------------------------

++ for 3 hour jobs!  That is actually scary, but until we have a better chunking strategy, this makes a lot of sense.
Attachment #8954829 - Flags: review?(jmaher) → review+
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c72e09c45e93
Increase max-run-time of test-verify and test-verify-wpt; r=jmaher
Is there a bug on file for chunking TV?
No. I don't see chunking as a reasonable strategy for TV. The strategy is, if verification is taking too long, give up.

https://developer.mozilla.org/en-US/docs/Mozilla/QA/Test_Verification
Assignee: gbrown → nobody
Priority: P2 → P3
The leave-open keyword is there and there is no activity for 6 months.
:gbrown, maybe it's time to close this bug?
Flags: needinfo?(gbrown)
I hope to finish this off in 2019.
Flags: needinfo?(gbrown)

The leave-open keyword is there and there is no activity for 6 months.
:gbrown, maybe it's time to close this bug?

Flags: needinfo?(gbrown)
Keywords: leave-open
Flags: needinfo?(gbrown)
You need to log in before you can comment on or make changes to this bug.