Closed Bug 1328727 Opened 7 years ago Closed 7 years ago

move mozilla-taskcluster info in-tree

Categories

(Taskcluster :: Services, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla56

People

(Reporter: mozilla, Assigned: dustin)

References

Details

Attachments

(3 files)

This is information I can't determine from .taskcluster.yml, and I believe is in mozilla-taskcluster.  I think we should either pull this information out of mozilla-taskcluster into a standalone data file, or land it in-tree.  Alternately, if we determine that some of these don't matter on a security level, we can mark these items as ignorable in the decision task definition at chain of trust verification time.

We also likely need to allow for .taskcluster.yml (on-push, triggered by mozilla-taskcluster) and some other set of information for nightlies or other graph types.  This may just be a different command per graph type, though they may also differ in routes or scopes or ...?

Some of this information is tree-specific, meaning we'd either need to watch this info at merge time or have a dictionary in-tree that refers to all known trees, but only acts on this one.  e.g.

tree-specific:
  schedulerId:
    mozilla-central: gecko-level-3
    mozilla-aurora: gecko-level-3
    ...
  ...

I think this currently lives in mozilla-taskcluster/src/config/default.yml, though I may be wrong.

## Currently missing from .taskcluster.yml

### These seem important to verify
- schedulerId, which is tree-specific.
- scopes, which are tree-specific

### I don't think these are as important to verify, but they're also missing
- dependencies
- requires
- priority
- retries

## Template
Do we know what to put into the template?  e.g., {{project}}.  In the case of {{level}}, I'm guessing that belongs in the tree-specific dictionary
- source, url, revision, revision_hash are likely verifiable as-is
- #as-slugid, now, from-now, owner, pushdate, shellquote, comment, are likely ignorable
- pushlog_id - unknown, but possibly verifiable or ignorable
- project
- level

## Questions
- should we assume empty dependencies and requires: all-completed?
 - in the case of a hooks-triggered decision task, the dependencies might include the pre-decision taskId
I think that everything of interest here is based on the project, so as you suggest, a shared configuration of project settings would be useful.  I think it would be something more like:

  project:
    mozilla-central:
      level: 3
      repository: https://hg.mozilla.org/mozilla-central

I'm intrigued by the idea of putting that in-tree.  Maybe mozilla-taskcluster can pull that file from mozilla-central on every push and parse it from there?

- schedulerId is derived from the level using a simple rule
- scopes can be derived from the repository using a simple rule (admittedly it's in the config right now)
- there should be no dependencies, at which point requires doesn't matter
- priority and retries should be omitted or match what's in .taskcluster.yml

pushlog_id could probably be determined dynamically by querying hg with the repo and revision, if that's useful.  A faked pushlog_id could cause optimization to skip or run tasks.

Anyway, this would line up nicely with some hoped-for simplifications to mozilla-taskcluster.
(In reply to Dustin J. Mitchell [:dustin] from comment #1)
> I think that everything of interest here is based on the project, so as you
> suggest, a shared configuration of project settings would be useful.  I
> think it would be something more like:
> 
>   project:
>     mozilla-central:
>       level: 3
>       repository: https://hg.mozilla.org/mozilla-central

I think that's a better layout too.

> I'm intrigued by the idea of putting that in-tree.  Maybe
> mozilla-taskcluster can pull that file from mozilla-central on every push
> and parse it from there?

Something like that.  Or pull the tree-specific one, so autoland's project file for an autoland push.
Without digging too deep, I could see the benefits of the latter on legitimate project branches, though I'm not sure if a locally-modifiable projects file adds security concerns on Try or other unmonitored branches.  Would bumping Try's level to 3 work for a level 1 committer, resulting in elevated privileges?  That might be the overriding reason to use m-c's copy.

> - schedulerId is derived from the level using a simple rule
> - scopes can be derived from the repository using a simple rule (admittedly
> it's in the config right now)
> - there should be no dependencies, at which point requires doesn't matter
> - priority and retries should be omitted or match what's in .taskcluster.yml
> 
> pushlog_id could probably be determined dynamically by querying hg with the
> repo and revision, if that's useful.  A faked pushlog_id could cause
> optimization to skip or run tasks.
> 
> Anyway, this would line up nicely with some hoped-for simplifications to
> mozilla-taskcluster.

Cool.  Between this and cron tasks, we may be able to significantly simplify chain of trust verification.
Depending on how it's implemented, the tie between repo and level would be the more dangerous one to modify.  For example, if you could convince mozilla-taskcluster that the mozilla-central project's repo is https://hg.mozilla.org/try, then bob's your uncle.

This feels to me like a source-of-truth that should not ride the trains.  In fact, it might be better implemented as repo metadata on hgmo.  I don't know if the hg web service has any capacity to store such things, but if there was an HTTP call that would return `{"gecko-project": "try", "level": 1} from https://hg.mozilla.org/try/automationrelevant for example, that would be ideal (as long as it couldn't be edited by LDAP scm_level_1!).  Then we could easily synchronize the repo roles with the appropriate level.

Greg, is that a crazy idea?  If so, we can just put it in-tree :)
Flags: needinfo?(gps)
We can have hg.mozilla.org expose repo metadata via JSON. I've long wanted something like this to expose the list of hooks that are enabled on a repo. And, Dustin even asked for exposing the level in bug 1234556! But my answer in that bug is basically the same as my answer today: it is low priority until proven otherwise.

I also echo Dustin's concerns about security and potential for (accidental) abuse if this is in-tree.
Flags: needinfo?(gps)
Coming back to this after a few months, I think the ideas here are:

 * render .taskcluster.yml in Gecko using JSON-e
 * define a context for that rendering that can be reproduced by scriptworker
   - some data is based on the push
   - other data is based on the project

The data about the projects can be kept in https://hg.mozilla.org/build/tools/raw-file/default/buildfarm/maintenance/production-branches.json.  Mozilla-taskcluster could pull that, say on startup or on every push.  And scriptworker could do the same.
Assignee: nobody → dustin
Templated variabes, based on https://github.com/taskcluster/mozilla-taskcluster/blob/master/src/try/instantiate.js#L114, are:

* now 
* owner
* source
* revision
* comment
* level
* project
* revision_hash
* pushlog_id
* url 
* pushdate
* from_now *
* as_slugid *
* shellquote *

with the last three being functions.  This matches the descriptoin in .taskluster.yml.  After applying the template, mozilla-taskcluster sets:

 - task.scopes (https://github.com/taskcluster/mozilla-taskcluster/blob/master/src/jobs/taskcluster_graph.js#L202)
 - task.schedulerId (https://github.com/taskcluster/mozilla-taskcluster/blob/master/src/jobs/taskcluster_graph.js#L205)
 - task.taskGroupId (https://github.com/taskcluster/mozilla-taskcluster/blob/master/src/jobs/taskcluster_graph.js#L206)

The scopes are formulaic and could be embedded in .taskcluster.yml, based on the level and project parameters:
 https://github.com/taskcluster/mozilla-taskcluster/blob/master/src/config/default.yml#L132
    mozilla-central:
      level: 3
      scopes:
        - "assume:repo:hg.mozilla.org/mozilla-central:*"
        # temporary until this scope is removed from .taskcluster.yml
        - "assume:repo:hg.mozilla.org/try:*"
(and we can remove that scope now..)

The schedulerId, too, is based on level:
 https://github.com/taskcluster/mozilla-taskcluster/blob/master/src/jobs/taskcluster_graph.js#L163
 let schedulerId = `gecko-level-${templateVariables.level}`;

Finally, taskGroupId is just the taskId of the decision task, so that could easily be templated.

The revision_hash is unused.

JSON-e implements from_now natively (as $fromNow), although it needs an
optional base date (https://github.com/taskcluster/json-e/issues/141).
`as_slugid` is a good built-in to add in this context. And we can avoid shell
quoting by using env vars.

So, I think we could get away with a template context of:
 - from the context
   - now
 - from the push
   - owner
   - revision
   - comment
   - project
   - pushlog_id
   - pushdate
 - and look up in https://hg.mozilla.org/build/tools/raw-file/default/buildfarm/maintenance/production-branches.json based on project:
   - level     

So I think the next steps here are:

- merge json-e into mozilla-taskcluster
- update the .taskcluster.yml parsing to distinguish versions (the current version is 0) and keep doing the old thing on version 0, while doing a new thing with version 1
- for version 1, define a context like the above, including the level look-up, and then apply that to the template in .taskcluster.yml
- upgrade trees to use the new format, jumping trains only if necessary (depending on what scriptworker/CoT needs)

This should help CoT predict decision-task contents (given the context and extracting `now` from tasks.created). It will prepare us to move mozilla-taskcluster into a hook.

Aki, what have I missed?
Flags: needinfo?(aki)
I think this is great for mozilla-taskcluster, which deals with on-push decision tasks. If we only had to deal with on-push decision tasks, we're golden.

We also have other ways of triggering decision tasks, mainly for periodic / nightly / release graphs:

- cron tasks. will they still have the full push info?
- hooks currently have static info in their decision tasks.
- we have a decision task hardcoded into the releasetasks repo: https://github.com/mozilla-releng/releasetasks/blob/master/releasetasks/templates/mobile/candidates_fennec.yml.tmpl
- we're going to need ways to trigger the full promotion graphs via `mach taskgraph`.

We need to have enough info in-tree to make sure these are seen as valid. We also potentially need something like parameterizable hooks to be able to pass in things like revision and buildnumber into a release hook.

mozilla-beta and mozilla-release have special on-push target task methods; is this covered?
Flags: needinfo?(aki)
(In reply to Aki Sasaki [:aki] from comment #8)
> I think this is great for mozilla-taskcluster, which deals with on-push
> decision tasks. If we only had to deal with on-push decision tasks, we're
> golden.

This specific bug deals with mozilla-taskcluster, so likely the other concerns I have belong in another bug(s).
I think it all fits on this bug - thanks for the context.

Cron tasks currently parse and parameterize .taskcluster.yml, so that will need to be modified too.

Hooks -- I'm not clear on which hooks that are not cron-tasks are still relevant, or how CoT will handle those.  Parameterized hooks is still something we want to do, but isn't currently on tap for this quarter.

Similarly with releasetasks and full promotion graphs, I expect CoT doesn't cover those at all yet, so the approach will probably be to use JSON-e if necessary when that is added to CoT?  If parameterized hooks are just relevant to relpromo, maybe the same answer applies?

Special on-push target task methods are covered.  All this bug is doing is giving CoT a much more predictable output (decision task definition) for trusted inputs (revision, branch, etc.) so that it can verify that link in the chain.  What happens in the decision task itself is another matter entirely!

So I'll get hacking on support for cron and on-push decision tasks.
(In reply to Dustin J. Mitchell [:dustin] from comment #10)
> I think it all fits on this bug - thanks for the context.
> 
> Cron tasks currently parse and parameterize .taskcluster.yml, so that will
> need to be modified too.
> 
> Hooks -- I'm not clear on which hooks that are not cron-tasks are still
> relevant, or how CoT will handle those.  Parameterized hooks is still
> something we want to do, but isn't currently on tap for this quarter.

I think hooks are our backup method for triggering release graphs... we may not have a production hook we rely on as the primary right now (I may be wrong). We do see parameterized hooks as one strong possibility for replacing the releasetasks templates: this would allow us to trigger a standard decision task, but modify the target task method, revision, buildnumber, and any other release config we need to pass in.

> Similarly with releasetasks and full promotion graphs, I expect CoT doesn't
> cover those at all yet, so the approach will probably be to use JSON-e if
> necessary when that is added to CoT?  If parameterized hooks are just
> relevant to relpromo, maybe the same answer applies?

CoT v1 does cover them, but only because the decision task verification is so complex and fuzzy. I imagine we'll get JSON-e in for a set of decision tasks (on-push, most likely), then add support for JSON-e in CoT, but leave the fuzzy decision task verification until we support all releng decision task types. CoT v2 will then be simplified greatly once we can tear out the fuzzy decision task verification; we can then be more confident the rest of the graph hasn't been meddled with via well-formed but malicious decision task definitions.

> Special on-push target task methods are covered.  All this bug is doing is
> giving CoT a much more predictable output (decision task definition) for
> trusted inputs (revision, branch, etc.) so that it can verify that link in
> the chain.  What happens in the decision task itself is another matter
> entirely!

Awesome. Sounds like we can then adapt the other decision task types to use the same in-tree config, as long as we have some way to pass in additional info as mentioned above.

> So I'll get hacking on support for cron and on-push decision tasks.

Thanks!
OK, landed that in staging a few minutes ago (for real this time) and it double-triggered a few try jobs, among them
  https://treeherder.allizom.org/#/jobs?repo=try&revision=581c8bf76c20af07e5a8b70255614045f43b64bf
the two decision tasks there (one created by mozilla-taskcluster-staging) look identical, so I think we're good to ship.
Landed in production, and a try push of mine showed up and ran fine.  I think we're good.  Next step is to modify the in-tree template to take advantage (noting that we need to modify cron and action tasks, too)
Status: NEW → ASSIGNED
You can see from the try pushes above that this works as an on-push decision task.  It also works from the command line to force a cron run (--no-create).  I verified the resulting tasks to match existing decision tasks from
  https://queue.taskcluster.net/v1/task/LZx2Fq-LRiGwDwmiRaeC8A (hg-push)
  https://queue.taskcluster.net/v1/task/XNIijO2VQACn7Op5n-rWTw (cron)
and the differences are noted in the commit message. Aki, please let me know if you expect these changes will break nightlies via CoT, and if so how we could avoid that.
Comment on attachment 8888886 [details]
Bug 1328727: vendor json-e 2.1.1;

https://reviewboard.mozilla.org/r/159920/#review165282

Looks correct to me.. not sure how vendoring works, but I'm sure you do :)
Attachment #8888886 - Flags: review?(jopsen) → review+
Comment on attachment 8888887 [details]
Bug 1328727: use a version-1 .taskcluster.yml

https://reviewboard.mozilla.org/r/159922/#review165288
Attachment #8888887 - Flags: review?(jopsen) → review+
Comment on attachment 8888888 [details]
Bug 1328727: use json-e for cron decision tasks too

https://reviewboard.mozilla.org/r/159924/#review165290
Attachment #8888888 - Flags: review?(jopsen) → review+
Comment on attachment 8888887 [details]
Bug 1328727: use a version-1 .taskcluster.yml

https://reviewboard.mozilla.org/r/159922/#review165286

As far as I can tell, current v1 CoT verification should still work with this landed... this just enables us to do more accurate v2 CoT verification.
Attachment #8888887 - Flags: review?(aki) → review+
Comment on attachment 8888888 [details]
Bug 1328727: use json-e for cron decision tasks too

https://reviewboard.mozilla.org/r/159924/#review165276

Awesome. I think the main outstanding piece is the parameterized decision tasks for release graphs, which we currently kick off using releasetasks jinja templates.
Maybe I should sneak in an additional `task.extra` something into those, so I know when to fall back to v1 CoT verification. And once we find a solution to that, I think everything can use v2 CoT verification.
Thanks!

::: .taskcluster.yml:10
(Diff revision 1)
>  tasks:
>    $let:
>      # sometimes the push user is just `ffxbld` or the like, but we want an email-like field..
>      ownerEmail: {$if: '"@" in push.owner', then: '${push.owner}', else: '${push.owner}@noreply.mozilla.org'}
> +    # ensure there's no trailing `/` on the repo URL
> +    repoUrl: {$if: 'repository.url[-1] == "/"', then: {$eval: 'repository.url[:-1]'}, else: {$eval: 'repository.url'}}

This works unless there are multiple trailing /s ? :) I don't suppose there's an `.rstrip('/')` equivalent?

Probably not a huge issue either way.
Attachment #8888888 - Flags: review?(aki) → review+
Regarding the trailing `/`, yeah -- https://github.com/taskcluster/json-e/issues/165 adds stripping capabilities.
Pushed by dmitchell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5525fb2152c0
vendor json-e 2.1.1; r=jonasfj
https://hg.mozilla.org/integration/autoland/rev/88784cbc71cc
use a version-1 .taskcluster.yml; r=aki,jonasfj
https://hg.mozilla.org/integration/autoland/rev/2badbccc0c6e
use json-e for cron decision tasks too; r=aki,jonasfj
Component: Integration → Services
You need to log in before you can comment on or make changes to this bug.