Setup a TC index path for toolchain builds

RESOLVED FIXED in mozilla54

Status

RESOLVED FIXED
2 years ago
7 months ago

People

(Reporter: glandium, Assigned: glandium)

Tracking

unspecified
mozilla54
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments)

Comment hidden (empty)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 4

2 years ago
mozreview-review
Comment on attachment 8832336 [details]
Bug 1335651 - Automatically add the script to files-changed for toolchain jobs.

https://reviewboard.mozilla.org/r/108664/#review110044
Attachment #8832336 - Flags: review?(dustin) → review+

Comment 5

2 years ago
mozreview-review
Comment on attachment 8832337 [details]
Bug 1335651 - Setup an index path in the gecko.cache namespace for toolchain builds.

https://reviewboard.mozilla.org/r/108666/#review110048

This is a nice design, and will be a great improvement after a bit of refinement.

::: taskcluster/taskgraph/merkle.py:14
(Diff revision 1)
> +
> +
> +def hash_paths(base_path, patterns):
> +    """
> +    Give a list of path patterns, return a digest of the contents of all
> +    the corresponding files, in a flat Merkle tree fashion, not unlike

I'm not sure what you mean by "flat Merkle tree fashion" here.  It looks like it's a simple ordered list of (pathname, hash) tuples for each referenced file.  I suppose that's equivalent to a one-node Merkle tree with paths as keys (rather than dentry names)?

Or am I missing something about how this is implemented?  Is there some optimization to avoid, for example, hashing `js/src/**` over and over?

::: taskcluster/taskgraph/task/transform.py:84
(Diff revision 1)
>      def __init__(self, kind, task):
>          self.dependencies = task['dependencies']
>          self.when = task['when']
>          super(TransformTask, self).__init__(kind, task['label'],
> -                                            task['attributes'], task['task'])
> +                                            task['attributes'], task['task'],
> +                                            task.get('index-paths'))

Probably wise to call this with a keyword argument (`index_paths=task.get('index-paths')`) to allow future expansion.

::: taskcluster/taskgraph/task/transform.py:90
(Diff revision 1)
> +        optimized, taskId = super(TransformTask, self).optimize(params)
> +        if optimized:
> +            return optimized, taskId

* Assume that the toolchain task description contains a few supporting files in `when.files-changed`.
 * I push a commit modifying `toolchain.py`
 * add_index_paths implementation below means that the new contents of `toolchain.py` are included in the hash, but the file is not included in `when.files-changed`.
 * `super().optimize()` does not find an existing indexed task for this hash, so it returns `False, None`
 * The few supporting files in `when.files-changed` haven't been modified, so the task is optimized away
 * The decision task fails because other (build) tasks depend on this toolchain build, which has just been optimized away.

Am I missing something?

In general, for each task I think we want to do one of two types of optimization:

 * "have relevant things changed?" -- for leaf tasks that other tasks do not depend on
   * examples: tests, lint
 * "have I already done this?" -- for tasks that provide artifacts to other tasks
   * examples: docker images, toolchains

Doing both for the same task is, I think, problematic due to issues like those I outlined above.  

Maybe the easiest fix is to say that if a Task can have *either* when.index-paths-exist *or* when.files-changed, but not both (moving `Task.index-paths` to `Task.when['index-paths-exist']` in the process).

::: taskcluster/taskgraph/transforms/job/toolchain.py:48
(Diff revision 1)
>      files.append('taskcluster/scripts/misc/{}'.format(run['script']))
>  
> +    label = taskdesc['label']
> +    subs = {
> +        'name': label.replace('toolchain-', '').split('/')[0],
> +        'digest': hash_paths('.', files),

We've tried to avoid assuming the working directory is the root of the repository, instead calculating GECKO based on the file path.  For example, see `taskcluster/taskgraph/docker.py`.
Attachment #8832337 - Flags: review?(dustin) → review-

Comment 6

2 years ago
mozreview-review
Comment on attachment 8832335 [details]
Bug 1335651 - Move index_paths from DockerImageTask to the base Task class.

https://reviewboard.mozilla.org/r/108662/#review110060
Attachment #8832335 - Flags: review?(dustin) → review+
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 13

2 years ago
mozreview-review
Comment on attachment 8832335 [details]
Bug 1335651 - Move index_paths from DockerImageTask to the base Task class.

https://reviewboard.mozilla.org/r/108662/#review113834

Comment 14

2 years ago
mozreview-review
Comment on attachment 8832336 [details]
Bug 1335651 - Automatically add the script to files-changed for toolchain jobs.

https://reviewboard.mozilla.org/r/108664/#review113836

Comment 15

2 years ago
mozreview-review
Comment on attachment 8832337 [details]
Bug 1335651 - Setup an index path in the gecko.cache namespace for toolchain builds.

https://reviewboard.mozilla.org/r/108666/#review113842

One important change below, but with that change I'm happy to see this land.

::: taskcluster/ci/toolchain/linux.yml:21
(Diff revision 3)
> -    when:
> -        files-changed:
> +    extra:
> +        resources:

This will end up appearing in the final task definition, but it's not useful there.  Consider moving it to the `run` section, instead, and represesnting it in `toolchain_run_schema` where some comments can explain what it does?
Attachment #8832337 - Flags: review?(dustin) → review+
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 19

2 years ago
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/a0da8c39431d
Move index_paths from DockerImageTask to the base Task class. r=dustin
https://hg.mozilla.org/integration/autoland/rev/671410de0b24
Automatically add the script to files-changed for toolchain jobs. r=dustin
https://hg.mozilla.org/integration/autoland/rev/e7e02e3c2e56
Setup an index path in the gecko.cache namespace for toolchain builds. r=dustin

Comment 20

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/a0da8c39431d
https://hg.mozilla.org/mozilla-central/rev/671410de0b24
https://hg.mozilla.org/mozilla-central/rev/e7e02e3c2e56
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
(Assignee)

Updated

2 years ago
Depends on: 1341213

Updated

7 months ago
Product: TaskCluster → Firefox Build System
You need to log in before you can comment on or make changes to this bug.