linux64/tsan xpcshell tasks sometimes exceed their max-run-time (tasks very unbalanced)
Categories
(Testing :: General, defect)
Tracking
(firefox75 fixed)
Tracking | Status | |
---|---|---|
firefox75 | --- | fixed |
People
(Reporter: gbrown, Assigned: gbrown)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
Noted in bug 1411358.
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=289748831&repo=autoland&lineNumber=3066
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=289658042&repo=mozilla-central&lineNumber=2433
When I look at typical run times for these tasks, I see the chunks are very unbalanced: X1 over 70 minutes, X2..X7 about 35 minutes, X8 less than 10 minutes. Surely we can do better...
Comment 1•4 years ago
•
|
||
We could try updating the runtimes files:
$ cd testing/runtimes
$ ./writeruntimes
I notice tsan
was added to the guess_mozinfo
function:
https://searchfox.org/mozilla-central/rev/5a10be606f2d76ef22f1f44565749490de991d35/taskcluster/taskgraph/util/chunking.py#41
This means the algorithm should ignore any manifests where the entire thing is skipped with tsan
. However, it will weight partially skipped manifests with the full runtime of all tests in that manifest (skipped or not).
So e.g, if a manifest has 100 tests and only one of them is tsan
enabled (the other 99 are skipped), the runtime weighting of that manifest will still factor in all 100 tests. I suspect this is the main issue here.
Solving it will be a bit tricky because the runtimes.json
files only contain manifest level data. Maybe we could assign the average runtime to each test.. so runtime = len(non_skipped_tests) * manifest_runtime / len(tests_in_manifests)
This will add some overhead to the decision task though.
Assignee | ||
Comment 2•4 years ago
|
||
I tried updating the runtimes data in a try push -- it didn't help.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=256521ff2e2ceaff8781112c62a9e22e1ced4118
(X1=67 minutes vs X8=4 minutes)
Assignee | ||
Comment 3•4 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #1)
This means the algorithm should ignore any manifests where the entire thing is skipped with
tsan
. However, it will weight partially skipped manifests with the full runtime of all tests in that manifest (skipped or not).
I know the searchconfigs tests tend to dominate normal xpcshell run times. These are all skipped on tsan -- but each one is skipped individually, rather than at the manifest level:
https://searchfox.org/mozilla-central/rev/c1e3d3edd4a9b784971555dc74a5de23d768b2e1/toolkit/components/search/tests/xpcshell/searchconfigs/xpcshell-common.ini#4
I'll see if I can change that!
Comment 4•4 years ago
|
||
As long as they are all skipped it should be the same as having it in the DEFAULT. So I don't think that's the issue.
Assignee | ||
Comment 5•4 years ago
|
||
Confirmed: that makes no difference / that's not the issue.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=74a370a196c6b3b1b94115f41eb757fbadf68e21
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 6•4 years ago
|
||
Avoid intermittent task timeouts by increasing the max-run-time for tsan xpcshell.
These tasks are unbalanced (wide variance in run time from one chunk to the next)
apparently because the tsan times vary significantly on a test-by-test basis when
compared with normal xpcshell tasks on the same base platform. Rather than go to
extraordinary measures to balance this particular set of tests, increase the max
run time so that timeouts are avoided when one chunk runs particularly long.
Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fc61eb14ceee Increase max-run-time for tsan xpcshell; r=bc
Comment 8•4 years ago
|
||
bugherder |
Description
•