Reject Tier 1 jobs at the taskgraph level if they don't run across all branches
Categories
(Firefox Build System :: Task Configuration, task)
Tracking
(Not tracked)
People
(Reporter: RyanVM, Unassigned)
References
Details
Attachments
(2 files)
Comment 1•7 years ago
|
||
Comment 2•7 years ago
|
||
Comment 3•7 years ago
|
||
| Reporter | ||
Comment 4•7 years ago
|
||
Comment 5•7 years ago
|
||
Comment 6•7 years ago
|
||
Updated•7 years ago
|
Comment 7•7 years ago
•
|
||
So the different values of run-on-projects that we have for tier 1 jobs at the moment are:
- all
- release
- release, integration, try
- mozilla-beta
- mozilla-beta, trunk, try
- mozilla-beta, autoland, try
- mozilla-central
- mozilla-central, trunk, try
- mozilla-central, try
- mozilla-central, integration
- trunk, try
- try
First, I'll note that "mozilla-central, integration" is the mostly same as "trunk" (trunk includes comm-central, but I don't think this matters, comm-central uses different yaml definitions auiu)
Similarly, "mozilla-central, trunk, try" is the same as "trunk, try".
The "mozilla-beta, autoland, try" one is weird. Why autoland specifically, and not integration?
"release, integration, try" is kind of confusing. "release, trunk, try" would be the same thing, but less confusing.
Now, looking at all the tier 1 jobs that aren't balrog, beetmover, build-signing, cron-bouncer-check, checksums-signing, docker-image, fetch, mar-signing, nightly, packages, pipfile-update, release-*, repackage, repo-update, toolchain, upload-generated-sources or devedition, we're left with:
- A bunch of tests set run_on_projects for an explicit disable on some platforms. I'd say ideally we shouldn't even create tasks for those.
- */opt-browser-screenshots-e10s run on trunk but not try.
- The following tasks run on mozilla-central only (not integration, not try):
- addon-tps-xpi
- build-android-api-16-ccov/debug
- build-android-api-16-without-google-play-services/opt
- build-android-test-ccov/opt
- all serviceworker-e10s tests
- The following tasks run on mozilla-central or try, but not integration:
- build-linux64-asan-fuzzing-ccov/opt
- build-linux64-fuzzing-ccov/opt
- build-macosx64-ccov/debug
- test-android-hw-p2-8-0-android-aarch64/opt-raptor-speedometer-geckoview-power-e10s
- test-android-hw-p2-8-0-arm7-api-16/opt-raptor-speedometer-geckoview-power-e10s
- test-linux64-pgo/opt-talos-sessionrestore-many-windows-e10s
- test-linux64-qr/opt-talos-sessionrestore-many-windows-e10s
- test-linux64/opt-talos-sessionrestore-many-windows-e10s
- test-macosx64/opt-talos-sessionrestore-many-windows-e10s
- test-windows10-64-pgo/opt-talos-sessionrestore-many-windows-e10s
- test-windows10-64-qr/opt-talos-sessionrestore-many-windows-e10s
- test-windows10-64/opt-talos-sessionrestore-many-windows-e10s
- test-windows7-32-pgo/opt-talos-sessionrestore-many-windows-e10s
- test-windows7-32/opt-talos-sessionrestore-many-windows-e10s
- The following tasks don't run on mozilla-central (in fact, they only run on try):
- */opt-awsy-base-dmd-e10s
- */opt-awsy-dmd-e10s
- */*-upload-symbols
Comment 8•7 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #7)
First, I'll note that "mozilla-central, integration" is the mostly same as "trunk" (trunk includes comm-central, but I don't think this matters, comm-central uses different yaml definitions auiu)
I was going to suggest that there might be places where run-on-projects is set in transforms (which might impact c-c), but that doesn't appear to be the case.
- A bunch of tests set run_on_projects for an explicit disable on some platforms. I'd say ideally we shouldn't even create tasks for those.
I haven't looked at any details, but one thing that not generating tasks at all means, is that they can't selected via try, which might be relevant, depending on the reason the tests are disabled on a given platform. (also for add-new-jobs, which might be relevant for some talos jobs)
(In reply to Mike Hommey [:glandium] from comment #7)
The "mozilla-beta, autoland, try" one is weird. Why autoland specifically, and not integration?
That's from https://searchfox.org/mozilla-central/source/taskcluster/ci/test/talos.yml#264-267
Maybe it was for osx capacity issues? No clue why it isn't running on m-c...
Is this basically what we would want? For each task in the full task graph, validation is run to check the following:
- The task is tier-1
- The task name doesn't make an exact match to any of the task names in WHITELISTED_TASKS
- The task name doesn't make a substring match to any of the task names in WHITELISTED_SUBSTRINGS
If a task meets those conditions, we look at the run-on-projects value for the test, and check the following:
- run-on-projects does not include 'trunk'
- run-on-projects does not include 'all'
- run-on-projects does not include both 'mozilla-central' and 'integration'
If one of those conditions occurs, the task name is added to a list of invalid results. If the list is not empty, we print out the list of invalid task names, along with their current run-on-projects values.
Here's what happens for some jobs that don't meet the new validation:
Task configuration changed, generating target task graph
Error running mach:
['try', 'fuzzy']
The error occurred in code that was called by the mach command. This is either
a bug in the called code itself or in the way that mach is calling it.
You should consider filing a bug for this issue.
If filing a bug, please include the full output of mach, including this error
message.
The details of the failure are as follows:
Exception: Invalid Tier-1 tasks identified.
Tier-1 tasks must be run on mozilla-central and all projects that merge into it.
The following task(s) do not meet that requirement:
beetmover-source-fennec-source/opt: []
beetmover-source-firefox-source/opt: []
Please set run-on-projects to include at least one of the following:
[all]
[trunk]
[mozilla-central, integration]
File "/Users/wkocher/mozilla/mozilla-central/tools/tryselect/mach_commands.py", line 150, in try_fuzzy
return run_fuzzy_try(**kwargs)
File "/Users/wkocher/mozilla/mozilla-central/tools/tryselect/selectors/fuzzy.py", line 220, in run_fuzzy_try
tg = generate_tasks(parameters, full, root=vcs.path)
File "/Users/wkocher/mozilla/mozilla-central/tools/tryselect/tasks.py", line 96, in generate_tasks
tg = getattr(TaskGraphGenerator(root_dir=root, parameters=params), attr)
File "/Users/wkocher/mozilla/mozilla-central/taskcluster/taskgraph/generator.py", line 168, in target_task_graph
return self._run_until('target_task_graph')
File "/Users/wkocher/mozilla/mozilla-central/taskcluster/taskgraph/generator.py", line 341, in _run_until
k, v = self._run.next()
File "/Users/wkocher/mozilla/mozilla-central/taskcluster/taskgraph/generator.py", line 286, in _run
yield verifications('full_task_graph', full_task_graph, graph_config)
File "/Users/wkocher/mozilla/mozilla-central/taskcluster/taskgraph/util/verify.py", line 36, in __call__
verification(None, graph, scratch_pad=scratch_pad, graph_config=graph_config)
File "/Users/wkocher/mozilla/mozilla-central/taskcluster/taskgraph/util/verify.py", line 334, in verify_tier_one_requirements
.format(testString)
(The "[]" next to the task names printed out will include the current run-on-project value for that task, these particular tasks just don't have anything for that value.)
Updated the patch to fix lint errors and address glandium's comments.
Locally, a verification failure will print out an error during taskgraph generation like comment 11 shows.
That should cause ./mach try to fail in most cases (assuming the chosen selector generates the taskgraph; empty can still proceed), preventing the try push from ever going out.
If you do manage to push with this verification failure, the Decision task will fail like this: https://treeherder.mozilla.org/#/jobs?repo=try&resultStatus=retry%2Ctestfailed%2Cbusted%2Cexception%2Csuccess%2Crunning%2Cpending%2Crunnable&group_state=expanded&revision=9588652966c1ef3bcda86e03a22d59a11add1d2a
| Reporter | ||
Comment 13•4 years ago
|
||
IMO, this is still something we should do.
Updated•3 years ago
|
Description
•