Proposal for new Taskcluster YML Format
Categories
(Taskcluster :: General, enhancement)
Tracking
(Not tracked)
People
(Reporter: mhentges, Unassigned)
References
Details
Proposal for new Taskcluster YML format
I have two main pain points that I'm encountering with .taskcluster.yml v1:
- At the beginning of all my
.taskcluster.ymlfiles, I need to parse out the same git values (user, revision, branch, repository).
Since all the Github events have this information in different spots, there's a lot of mapping. Here's an example in a simple project - JSON-e adds additional complexity that might not be necessary
- Devs working with
.taskcluster.ymlneed to lean this additional tool - Due to increased complexity, debugging issues within
.taskcluster.ymlis harder - The features of JSON-e solve many of the same problems that the "decision task" handles
- JSON-e increases Taskcluster's attack surface (e.g.: billion laughs)
- Devs working with
My proposal for solves these problems by:
- Provide variables with commonly-available values so they don't need to be specifically mapped for each type of Github event
- Remove support for JSON-e - let projects with complex task graphs represent them in the decision task.
Examples:
Mobile project example:
version: 2
policy:
pullRequests: public
onPullRequest:
# <snip>
payload:
command:
- >-
git fetch ${repository} ${ref}
&& git checkout FETCH_HEAD
&& python decision_task.py pull-request ${pull_request_title}
onRelease:
# <snip>
payload:
command:
- >-
git fetch https://github.com/mozilla-mobile/fenix refs/tags/${tag}
&& git checkout FETCH_HEAD
&& python decision_task.py release ${release_name} {tag}
onMasterPush:
# <snip>
payload:
command:
- >-
git fetch https://github.com/mozilla-mobile/fenix refs/heads/master
&& git checkout FETCH_HEAD
&& python decision_task.py master-push
# Custom triggers, taskcluster-github won't ever populate these
onCustomTrigger:
nightly:
# <snip>
raptor:
# <snip>
Project with same task for different triggers example:
version: 2
policy:
pullRequests: public
baseTask: &baseTask
workerType: github-worker
payload:
image: python:3.7
command:
- /bin/bash
- -c
- git clone ${repository} redo && cd redo && git checkout ${revision} && python decision_task.py
metadata:
owner: ${user_email}
source: ${repository}/raw/${revision}/.taskcluster.yml
onPullRequest:
<<: *baseTask
metadata:
name: Redo Pull Request
description: Redo Pull Request
onMasterPush:
<<: *baseTask
metadata:
name: Redo Master Push
description: Redo Master Push
Specifics:
Specific triggers with contextual variables
Rather than evaluating a provided tasks_for variable, this proposal recommends building the different trigger types directly into the YAML.
Then, when interpolating within these properties, provide contextual variables, rather than passing through the raw Github event object.
The three taskcluster-github trigger types should be:
onPullRequest(triggered when a PR is opened or updated). This has the additional variables of:branch: the name of the git branch associated with the PRpr_title: the title of the Github PRpr_number: the identifier of the Github PRpr_url: URL to Github to view the PR
onRelease(triggered by Github releases)release_name: The name of the release on Githubtag: The name of the tag the release is associated with
onMasterPush(triggered when themasterbranch in the primary repository is pushed to)- (no extra variables)
Notes:
- If you use context variables outside of their associated context (e.g.: an
onPullRequest-specific variable outside of theonPullRequestblock) you should get an error since the value doesn't exist. - We'll need to lazily evaluate values after the task is chosen to avoid "value doesn't exist" errors for unused-for-current-event triggers.
There's a few benefits to providing variables rather than the github events:
- We can simplify access to values (e.g.:
taginstead ofevent.release.tag_name) .taskcluster.ymlusers don't need to learn about Github events (though, they need to learntaskcluster-specific skills instead, so this is a :shrug:)- When github changes the structure of events, we only need to update
taskcluster-github, rather than every project that has a.taskcluster.yml
Remove JSON-e
JSON-e performs two purposes for us with v1: it allows us to disambiguate between trigger types (PR, release, master-push), and it allows us to have reuse between task definitions.
However, the specific triggers listed above handle the first case.
The second purpose of JSON-e (enabling re-use between task definitions) overlaps significantly with the decision task and taskgraph.
Even within .taskcluster.yml, if simple reuse is needed between the different trigger tasks, you can use YAML's anchor-and-extend functionality (see baseTask above in the "Simple project example").
Since JSON-e has significant costs (listed at the top), removing it could significantly benefit our .taskcluster.yml files.
Custom triggers
Sometimes, we need to trigger a task for non-Github-related reasons. Yet, for the sake of the structure of the projects, we still want the specially-triggered task to be defined beside the other tasks.
A good example of this is Fenix's Raptor trigger.
To handle this, allow projects to define custom tasks in the onCustomTrigger object. taskcluster-github shouldn't use this value, but it can be used by projects if they want to
manually interpret .taskcluster.yml and schedule custom tasks with custom values.
Notes:
- If I missed use cases that this is incompatible with, then I'd love to learn more context :)
- There may be use cases in which spawning a decision task is less ideal than representing tasks entirely in JSON-e. However, if we agree that "there's overlap between JSON-e and decision tasks", then I believe we should solve that problem (decision task is significantly harder than
.taskcluster.yml), rather than supporting decision-task-y functionality twice. - My "billion laughs" example for JSON-e might not be the most apt, since standalone YAML is also susceptible to "Billion laughs" :/
- Some projects may use a branch other than
masteras their primary branch: we could call the triggeronPrimaryPushinstead, with a top-level primary branch config value?
Comment 1•6 years ago
|
||
CoT verification rebuilds the decision task to make sure that the task is not similar-to, but different-than, a valid decision task. Removing json-e and moving it all to the code removes this capability.
| Reporter | ||
Comment 2•6 years ago
•
|
||
CoT verification rebuilds the decision task to make sure that the task is not similar-to, but different-than, a valid decision task. Removing json-e and moving it all to the code removes this capability.
FWIW, this would simplify CoT verification, since JSON-e is no longer needed. CoT would need the MOBILE_HEAD_... values (or equivalent) passed forward (as normal), then would download .taskcluster.yml, grab the correct task (e.g.: from onRelease), then compare.
Though, this doesn't matter because:
Dustin and I talked directly via Zoom.
This proposal would not be effective for two reasons:
This proposal would greatly increase the friction for any non-trivial tasks:
Removing JSON-e would force the declation of tasks that could be in the .taskcluster.yml to be moved to a decision task, which has three downsides:
- Performance: we now have the delay of firing up a docker worker before the decision task runs
- It requires project devs to have knowledge of taskcluster + docker, since they have to configure their image and run their script in it
- Any variables that are used by the decision task now need to be piped to it, so there's boilerplate variable-passing in the
payload.command[]
There's plans to add more triggers to .taskcluster.yml
This is more theoretical, but it could multiply the issues described above^
Other notes
- We agree that reading and writing JSON-e (usually) isn't as ergonomic as representing logic in a real programming language
- Running anything dynamic in
.taskcluster.ymlneeds to be safe, which is why JSON-e was chosen (running something like python adds additional security challenges to guard against fs/network/etc access) - We'll talk more at all-hands to see how
.taskcluster.ymlergonomics can be addressed
Description
•