Closed Bug 1374940 Opened 7 years ago Closed 7 years ago

Add taskgraph support for toolchain definitions

Categories

(Taskcluster :: General, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla56

People

(Reporter: glandium, Assigned: glandium)

References

Details

Attachments

(4 files)

      No description provided.
Comment on attachment 8879841 [details]
Bug 1374940 - Cleanup how toolchain names are derived from task labels.

https://reviewboard.mozilla.org/r/151246/#review156072

::: taskcluster/taskgraph/transforms/job/toolchain.py:25
(Diff revision 1)
>  from taskgraph import GECKO
>  
>  
>  TOOLCHAIN_INDEX = 'gecko.cache.level-{level}.toolchains.v1.{name}.{digest}'
>  
> +TOOLCHAIN_ARTIFACTS = {}

I'm not a big fan of this, and how it's used in next patch, and by other things down the road, but I don't have a better idea. I'm open to suggestions.
This is the kind of change that makes everything happen after this bug:
https://hg.mozilla.org/try/diff/462eee9cdf68/taskcluster/ci/build/linux.yml

I wonder if I should enforce the kind-dependency on toolchains... (and if so, how... that's not the kind of thing available in the transform)
Comment on attachment 8879841 [details]
Bug 1374940 - Cleanup how toolchain names are derived from task labels.

https://reviewboard.mozilla.org/r/151246/#review156260

There is, indeed, a race condition here: if TOOLCHAIN_ARTIFACTS is not populated by the time the downstream (build) tasks are generated, you'll get an error.  That kind ordering is probably fairly deterministic, but could change when a new kind is added.  Adding an explicit `kind_dependency` would fix that particular problem, but the use of mutable shared global state like this is still not great design.

I'd prefer to see this implemented as follows:

 * toolchain jobs define a `toolchain_artifact` attribute
 * build tasks have a `kind_dependency` on toolchains
 * the build task toolchain transform looks up the existing toolchain task by its label (prepending the kind name, etc.) and examines its `toolchain_artifact` attribute to calculate the env var contents.

This keeps all the state in the taskgraph. It should be relatively straightforward to calculate the label given the values in the `toolchains` property.

::: taskcluster/taskgraph/transforms/job/toolchain.py:66
(Diff revision 1)
>      # The script
>      files.append('taskcluster/scripts/misc/{}'.format(run['script']))
>  
>      label = taskdesc['label']
>      subs = {
> -        'name': label.replace('toolchain-', '').split('/')[0],
> +        'name': label.replace('%s-' % config.kind, ''),

Did you intend to drop the `split('/')[0]` bit?
Attachment #8879841 - Flags: review?(dustin) → review-
Comment on attachment 8879842 [details]
Bug 1374940 - Allow transforms to access info about tasks from kind dependencies of the current kind.

https://reviewboard.mozilla.org/r/151248/#review156264

::: taskcluster/taskgraph/transforms/toolchain.py:39
(Diff revision 2)
> +            f = os.path.basename(TOOLCHAIN_ARTIFACTS[t])
> +            if f in filenames:
> +                raise Exception('%s-%s cannot use both %s and %s toolchains: '
> +                                'they both have the same artifact name %s'
> +                                % (config.kind, job['name'], filenames[f],
> +                                   t, f))

If I understand, this is a problem only because the downstream task would download both artifacts to the same filename?
Attachment #8879842 - Flags: review?(dustin) → review+
(In reply to Dustin J. Mitchell [:dustin] from comment #7)
> If I understand, this is a problem only because the downstream task would
> download both artifacts to the same filename?

Indeed.

(In reply to Dustin J. Mitchell [:dustin] from comment #6)
> Comment on attachment 8879841 [details]
> Bug 1374940 - Add artifact paths to toolchain jobs definitions.
> 
> https://reviewboard.mozilla.org/r/151246/#review156260
> 
> There is, indeed, a race condition here: if TOOLCHAIN_ARTIFACTS is not
> populated by the time the downstream (build) tasks are generated, you'll get
> an error.  That kind ordering is probably fairly deterministic, but could
> change when a new kind is added.  Adding an explicit `kind_dependency` would
> fix that particular problem, but the use of mutable shared global state like
> this is still not great design.
> 
> I'd prefer to see this implemented as follows:
> 
>  * toolchain jobs define a `toolchain_artifact` attribute

Sure.

>  * build tasks have a `kind_dependency` on toolchains

See the link in comment 5, that's already how it's done. But I was wondering if it should be enforced (and if it can)

>  * the build task toolchain transform looks up the existing toolchain task
> by its label (prepending the kind name, etc.) and examines its
> `toolchain_artifact` attribute to calculate the env var contents.

But how? By accumulating all jobs in a dict first? That would lose the generator nature of the whole chain. Plus, the transforms are called once per kind, with a different TransformConfig for each, so all I see I can do is to accumulate when the kind is toolchains, and reuse when the kind is different, which then means I still have a global state tracking that...
 
> ::: taskcluster/taskgraph/transforms/job/toolchain.py:66
> (Diff revision 1)
> >      # The script
> >      files.append('taskcluster/scripts/misc/{}'.format(run['script']))
> >  
> >      label = taskdesc['label']
> >      subs = {
> > -        'name': label.replace('toolchain-', '').split('/')[0],
> > +        'name': label.replace('%s-' % config.kind, ''),
> 
> Did you intend to drop the `split('/')[0]` bit?

Yes and no, I guess this should go in a different patch. That part comes from when the toolchain jobs had /opt in their name.
Ah, sorry, the missing link is that loader functions get a copy of the taskgraph as generated so far.  This is how tests find the builds they depend on, for example.  So you'd need to write a custom loader based on the existing loader, but doing a little more work around dependencies.  The test loader might offer some good examples.

What I would do in the loader is this: pre-calculate a dictionary mapping toolchain to the necessary metadata (toolchain task label, artifact name, at least). Then attach that dictionary to each job object before yielding it, as something like build['toolchain-info'].  Then in the build transform, combine build['toolchains'] with build['toolchain-info'] to generate the proper dependencies and environment variables.
This appears to not be a small change: I can have a loader for toolchain jobs that collects their artifacts, and fills a dict, but then, to have that dict attached to build jobs, build jobs need to use the same loader... but there are multiple loaders, most of which are independent from the others (only push_apk reuses the transform loader). So the dict would need to be shared somehow, and that would still be global state.

Even if they were all using the same loader, the loader invocations are still independent, so we'd still have global state.

Each invocation could be building its own dict, but then we'd be scanning the entire loaded_tasks for each kind. For the kinds that happen last, that can mean scanning thousands of tasks (btw, I wouldn't be surprised if most loaders scanning loaded_tasks is part of the performance problem running taskgraph)

All things considered, global state doesn't seem too bad to me, and it seems simpler not done at the loader level.

Any better ideas?
Flags: needinfo?(dustin)
(and the sharing between loaders is not hypothetical, l10n jobs, which use a different loader from build jobs, will eventually need toolchain dependencies)
Ah, I didn't realize that more than builds needed toolchains.

The objections to the global-state implementation are a few: keeping any global state; keeping mutable global state in an UPPERCASE_VARIABLE_NAME; and modifying that global state as a side-effect of something that is otherwise purely functional.

How about a memoized function that all relevant loaders can call, passing loaded_tasks along. The loaders can then stick that value into `config['toolchains'] where it's available for use by transforms. Adding it to config will be a one-liner for each loader, and simpler than attaching it to each of the yielded dictionaries.  It's still global state (a memoized function), but at least it looks functional!

We could make that a little more general (maybe a way for a kind to specify extra metadata it needs, or a way for a kind to supply extra metadata to kinds that depend on it), but YAGNI probably applies.

There are only a dozen or so kinds, most of which do not scan -- and iterating over a few thousand tasks is not an especially expensive operation.  Last anyone profiled it, it was the schema validation and the transforms that were the big time consumers.
Flags: needinfo?(dustin)
Attachment #8879842 - Flags: review+ → review?(dustin)
Attachment #8879841 - Flags: review?(dustin)
Attachment #8879842 - Flags: review?(dustin)
Attachment #8879841 - Flags: review?(dustin)
Comment on attachment 8879842 [details]
Bug 1374940 - Allow transforms to access info about tasks from kind dependencies of the current kind.

Please diregard anything mozreview says about r+, this is really r?
Attachment #8879842 - Flags: review?(dustin)
Had to change the format of MOZ_TOOLCHAINS because msys doesn't like :: on the command line, and that busted windows toolchain jobs. Never noticed that before because I didn't have toolchain dependencies on windows jobs yet...
Comment on attachment 8879841 [details]
Bug 1374940 - Cleanup how toolchain names are derived from task labels.

https://reviewboard.mozilla.org/r/151246/#review164184
Attachment #8879841 - Flags: review?(dustin) → review+
Comment on attachment 8879842 [details]
Bug 1374940 - Allow transforms to access info about tasks from kind dependencies of the current kind.

https://reviewboard.mozilla.org/r/151248/#review164186

I like this :)
Attachment #8879842 - Flags: review?(dustin) → review+
Comment on attachment 8887787 [details]
Bug 1374940 - Add artifact paths to toolchain jobs definitions.

https://reviewboard.mozilla.org/r/158708/#review164190

::: taskcluster/taskgraph/transforms/job/toolchain.py:189
(Diff revision 2)
>          ' '.join(hg_command),
>          # do something intelligent.
>          r'{} -c ./build/src/taskcluster/scripts/misc/{}'.format(bash, run['script'])
>      ]
>  
> +    taskdesc['toolchain-artifact'] = run['toolchain-artifact']

It would be better to just add this to `taskdesc.setdefault('attributes', {})` here.  Then `task.py` doesn't need to know anything about it. (same for `docker_worker_toolchain` above)
Attachment #8887787 - Flags: review?(dustin) → review+
Comment on attachment 8887788 [details]
Bug 1374940 - Allow to define a list of toolchains to use for a given TC job.

https://reviewboard.mozilla.org/r/158710/#review164194

This feels unfinished -- I don't see the use_toolchains transform actually used anywhere (that is, referenced from `kind.yml`), and I don't see `task.py` using the `toolchains` property defined in its schema.

I think the key question here is, where do you want to handle this property and convert it to env vars?  It does make sense at the task-description level -- then any task including tests could require a toolchain.  In this case, please add a check to payload builders for worker types that can't support toolchains: `if task['toolchains']: raise Exception('Worker type xx does not support toolchains')`

Another option is to define it at the job level, for one or a few values of `run.using` (probably just `mozharness` for various worker types?).

::: taskcluster/taskgraph/transforms/task.py:157
(Diff revision 2)
>      Required('needs-sccache', default=False): bool,
>  
>      # Path to the artifact produced by the toolchain job
>      Optional('toolchain-artifact'): basestring,
>  
> +    # The list of toolchains to use for the job

Wherever this ends up, please add some of the commentary from the commit message here (especially about the env var)
Attachment #8887788 - Flags: review?(dustin) → review-
(In reply to Dustin J. Mitchell [:dustin] from comment #24)
> Comment on attachment 8887788 [details]
> Bug 1374940 - Allow to define a list of toolchains to use for a given TC job.
> 
> https://reviewboard.mozilla.org/r/158710/#review164194
> 
> This feels unfinished -- I don't see the use_toolchains transform actually
> used anywhere (that is, referenced from `kind.yml`), and I don't see
> `task.py` using the `toolchains` property defined in its schema.

That's because it's going to be setup in another bug. You can see the taskcluster/ci changes in 
https://hg.mozilla.org/try/rev/847b5e0230422be85d500391a1caa1cff7c77bc1 for an example.

> I think the key question here is, where do you want to handle this property
> and convert it to env vars?  It does make sense at the task-description
> level -- then any task including tests could require a toolchain.  In this
> case, please add a check to payload builders for worker types that can't
> support toolchains: `if task['toolchains']: raise Exception('Worker type xx
> does not support toolchains')`
> 
> Another option is to define it at the job level, for one or a few values of
> `run.using` (probably just `mozharness` for various worker types?).

I'm not sure what you're asking here.
Also, the previous version of largely the same patch had r+...
(In reply to Mike Hommey [:glandium] from comment #26)
> Also, the previous version of largely the same patch had r+...

(comment 7)
Blocks: 1382564
Blocks: 1382569
Comment on attachment 8887788 [details]
Bug 1374940 - Allow to define a list of toolchains to use for a given TC job.

https://reviewboard.mozilla.org/r/158710/#review164650

Awesome, I like it!
Attachment #8887788 - Flags: review?(dustin) → review+
We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again.

hg error in cmd: hg rebase -s 8bc75f03b8a0 -d 3e61e9a0fd86: rebasing 408227:8bc75f03b8a0 "Bug 1374940 - Cleanup how toolchain names are derived from task labels. r=dustin"
rebasing 408228:9cac1a34f3cb "Bug 1374940 - Allow transforms to access info about tasks from kind dependencies of the current kind. r=dustin"
rebasing 408229:52c80bfe3557 "Bug 1374940 - Add artifact paths to toolchain jobs definitions. r=dustin"
merging taskcluster/ci/toolchain/linux.yml
merging taskcluster/ci/toolchain/windows.yml
merging taskcluster/taskgraph/transforms/task.py
warning: conflicts while merging taskcluster/ci/toolchain/linux.yml! (edit, then use 'hg resolve --mark')
warning: conflicts while merging taskcluster/ci/toolchain/windows.yml! (edit, then use 'hg resolve --mark')
unresolved conflicts (see hg resolve, then hg rebase --continue)
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/b5390e7949af
Cleanup how toolchain names are derived from task labels. r=dustin
https://hg.mozilla.org/integration/autoland/rev/eac3bc21f370
Allow transforms to access info about tasks from kind dependencies of the current kind. r=dustin
https://hg.mozilla.org/integration/autoland/rev/2023ee951ea6
Add artifact paths to toolchain jobs definitions. r=dustin
https://hg.mozilla.org/integration/autoland/rev/ba64040f39ad
Allow to define a list of toolchains to use for a given TC job. r=dustin
You need to log in before you can comment on or make changes to this bug.