Closed Bug 1617031 Opened 4 years ago Closed 4 years ago

linux64/tsan xpcshell tasks sometimes exceed their max-run-time (tasks very unbalanced)

Categories

(Testing :: General, defect)

Version 3
defect
Not set
normal

Tracking

(firefox75 fixed)

RESOLVED FIXED
mozilla75
Tracking Status
firefox75 --- fixed

People

(Reporter: gbrown, Assigned: gbrown)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Noted in bug 1411358.

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=289748831&repo=autoland&lineNumber=3066
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=289658042&repo=mozilla-central&lineNumber=2433

When I look at typical run times for these tasks, I see the chunks are very unbalanced: X1 over 70 minutes, X2..X7 about 35 minutes, X8 less than 10 minutes. Surely we can do better...

We could try updating the runtimes files:

$ cd testing/runtimes
$ ./writeruntimes

I notice tsan was added to the guess_mozinfo function:
https://searchfox.org/mozilla-central/rev/5a10be606f2d76ef22f1f44565749490de991d35/taskcluster/taskgraph/util/chunking.py#41

This means the algorithm should ignore any manifests where the entire thing is skipped with tsan. However, it will weight partially skipped manifests with the full runtime of all tests in that manifest (skipped or not).

So e.g, if a manifest has 100 tests and only one of them is tsan enabled (the other 99 are skipped), the runtime weighting of that manifest will still factor in all 100 tests. I suspect this is the main issue here.

Solving it will be a bit tricky because the runtimes.json files only contain manifest level data. Maybe we could assign the average runtime to each test.. so runtime = len(non_skipped_tests) * manifest_runtime / len(tests_in_manifests)

This will add some overhead to the decision task though.

I tried updating the runtimes data in a try push -- it didn't help.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=256521ff2e2ceaff8781112c62a9e22e1ced4118

(X1=67 minutes vs X8=4 minutes)

(In reply to Andrew Halberstadt [:ahal] from comment #1)

This means the algorithm should ignore any manifests where the entire thing is skipped with tsan. However, it will weight partially skipped manifests with the full runtime of all tests in that manifest (skipped or not).

I know the searchconfigs tests tend to dominate normal xpcshell run times. These are all skipped on tsan -- but each one is skipped individually, rather than at the manifest level:
https://searchfox.org/mozilla-central/rev/c1e3d3edd4a9b784971555dc74a5de23d768b2e1/toolkit/components/search/tests/xpcshell/searchconfigs/xpcshell-common.ini#4

I'll see if I can change that!

As long as they are all skipped it should be the same as having it in the DEFAULT. So I don't think that's the issue.

Summary: linux64/tsan xpcshell tasks sometimes exceed their max-run-time → linux64/tsan xpcshell tasks sometimes exceed their max-run-time (tasks very unbalanced)
Assignee: nobody → gbrown

Avoid intermittent task timeouts by increasing the max-run-time for tsan xpcshell.
These tasks are unbalanced (wide variance in run time from one chunk to the next)
apparently because the tsan times vary significantly on a test-by-test basis when
compared with normal xpcshell tasks on the same base platform. Rather than go to
extraordinary measures to balance this particular set of tests, increase the max
run time so that timeouts are avoided when one chunk runs particularly long.

Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fc61eb14ceee
Increase max-run-time for tsan xpcshell; r=bc
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla75
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: