Open Bug 1356529 Opened 5 years ago Updated 4 years ago

Add a `mach artifact toolchain` option to get toolchains for use for a specific build job

Categories

(Firefox Build System :: General, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

People

(Reporter: glandium, Assigned: glandium)

References

Details

Attachments

(1 file)

No description provided.
Blocks: 1313111
Depends on: 1356937
Depends on: 1356934
Blocks: 1356974
Comment on attachment 8858712 [details]
Bug 1356529 - Add a `mach artifact toolchain` option to get toolchains for use for a specific build job.

https://reviewboard.mozilla.org/r/130734/#review133352

Sorry for the drive by, but I peeked because I'm aware of how convoluted mozharness+tooltool has been....

::: commit-message-80600:28
(Diff revision 2)
> +So, for example, the following command:
> +  mach artifact toolchain --for-job linux64
> +would be equivalent to:
> +  mach artifact toolchain --tooltool-manifest \
> +    browser/config/tooltool-manifests/linux64/releng.manifest \
> +    --from-build linux64-gcc

There are (and are going to be) jobs which need toolchains which are NOT `build-{foo}`. E.g. on `date` branch right now we have `./mach repackage` which needs toolchain entries from OSX-(cross-compile).

Windows will be in a similar situation.

Also affects l10n (and nightly-l10n)

::: python/mozbuild/mozbuild/mach_commands.py:1719
(Diff revision 2)
> +                env = task.task.get('payload', {}).get('env', {})
> +                mozharness_configs = env.get('MOZHARNESS_CONFIG', '')
> +                for c in mozharness_configs.split():
> +                    config = parse_config_file(os.path.join(
> +                        self.topsrcdir, 'testing', 'mozharness', 'configs', c))
> +                    m = config.get('tooltool_manifest_src')

And in particular this gets confusing with branch_config magic in mozharness build jobs, and the l10n jobs using other flags too.

See, environment-config and such in:
https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/nightly-l10n/kind.yml#90

Lastly, the `tooltool_manifest_src` config entry is not enough to test for this, since we have multiple (argh) ways to specify it.

https://dxr.mozilla.org/mozilla-central/search?q=path%3Amozharness+path%3Aconfig+tooltool&redirect=true

Things I see in a 30 second glance:
* hostutils_manifest_path
* tooltool_manifest_path
* minidump_tooltool_manifest_path
Comment on attachment 8858712 [details]
Bug 1356529 - Add a `mach artifact toolchain` option to get toolchains for use for a specific build job.

https://reviewboard.mozilla.org/r/130734/#review133408

I have to strenuously object to the idea that emulating a part of the decision task is the appropriate way to accomplish your goal. I came here by way of bug 1341934, but I wish I had seen bug 1356524, which is really where this started.  So I apologize for the drive-by.

* It smashes all manner of abstractions, reaching right into the belly of the taskgraph code and grabbing what it wants.
* It also pulls in code from mozharness and tooltool, further knotting those tools together, which will only make it harder to reverse that later.  For tooltool in particular, this adds yet another way of invoking tooltool in tree, on top of the six or so already present, without removing any of those.
* The result is nondeterministic: the optimization process is not deterministic over time, as it may find a task tomorrow that does not yet exist today. So the toolchain used for a particular build may depend on when that build runs.
* Due to that nondeterminism, this is incompatible with the release chain of trust -- and certainly the toolchain should be included in the chain of trust.
* Taskgraph generation may be slow, although I see that you are cherry-picking only the necessary kinds and their dependents.  Still, in so doing you are generating more data than you need, and the optimization phase will make unnecessary index lookups to optimize tasks that will later be ignored. That time and index load can start to add up when spread across multiple builds, and would be a serious issue if this approach were applied to the test -> build dependencies.
* Related, I worry that this approach, which represents a repudiation of the exsting decision-task model, would be replicated and then we would have two active, competing models for deciding what tasks to run in a task graph.
* As you've seen, taskgraph expects a taskcluster proxy.

I think the motivation is

> This also has the advantage that automation uses the same code as what
> local developers will end up using, ensuring it doesn't break.

and I think that this could be accomplished within the existing model of calculating and arranging dependencies once, in the decision task.

Here's what I suggest:
* In the decision task, provide each build task with enough information to point `mach artifact toolchain` to the appropriate toolchain task.
* Use the TC index to store toolchain tasks by whatever identifier is most useful (probably a hash of the inputs) and build an interface for local developers that queries the index.  This is how we do artifact builds, for example.
* Add a short new task which invokes that local interface and compares its results to the decision task's task-graph.json (expecting to either find the same taskId, or to find nothing if the taskId in task-graph.json isn't completed yet).  This will ensure that the local developer interface stays functional and current.  Artifact builds have a similar check.
Attachment #8858712 - Flags: review-
(In reply to Justin Wood (:Callek) from comment #3)
> Comment on attachment 8858712 [details]
> Bug 1356529 - Add a `mach artifact toolchain` option to get toolchains for
> use for a specific build job.
> 
> https://reviewboard.mozilla.org/r/130734/#review133352
> 
> Sorry for the drive by, but I peeked because I'm aware of how convoluted
> mozharness+tooltool has been....
> 
> ::: commit-message-80600:28
> (Diff revision 2)
> > +So, for example, the following command:
> > +  mach artifact toolchain --for-job linux64
> > +would be equivalent to:
> > +  mach artifact toolchain --tooltool-manifest \
> > +    browser/config/tooltool-manifests/linux64/releng.manifest \
> > +    --from-build linux64-gcc
> 
> There are (and are going to be) jobs which need toolchains which are NOT
> `build-{foo}`. E.g. on `date` branch right now we have `./mach repackage`
> which needs toolchain entries from OSX-(cross-compile).

Maybe it wasn't clear, but full names work too for non 'build' jobs. As a matter of fact, it works for spidermonkey jobs, which are not under build either (although those don't run mozharness).

> ::: python/mozbuild/mozbuild/mach_commands.py:1719
> (Diff revision 2)
> > +                env = task.task.get('payload', {}).get('env', {})
> > +                mozharness_configs = env.get('MOZHARNESS_CONFIG', '')
> > +                for c in mozharness_configs.split():
> > +                    config = parse_config_file(os.path.join(
> > +                        self.topsrcdir, 'testing', 'mozharness', 'configs', c))
> > +                    m = config.get('tooltool_manifest_src')
> 
> And in particular this gets confusing with branch_config magic in mozharness
> build jobs, and the l10n jobs using other flags too.
> 
> See, environment-config and such in:
> https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/nightly-l10n/
> kind.yml#90
> 
> Lastly, the `tooltool_manifest_src` config entry is not enough to test for
> this, since we have multiple (argh) ways to specify it.
> 
> https://dxr.mozilla.org/mozilla-central/
> search?q=path%3Amozharness+path%3Aconfig+tooltool&redirect=true
> 
> Things I see in a 30 second glance:
> * hostutils_manifest_path
> * tooltool_manifest_path
> * minidump_tooltool_manifest_path

Oh my. Maybe we should leave non build jobs alone for the first round.

(In reply to Dustin J. Mitchell [:dustin] from comment #4)
> Comment on attachment 8858712 [details]
> Bug 1356529 - Add a `mach artifact toolchain` option to get toolchains for
> use for a specific build job.
> 
> https://reviewboard.mozilla.org/r/130734/#review133408
> 
> I have to strenuously object to the idea that emulating a part of the
> decision task is the appropriate way to accomplish your goal. I came here by
> way of bug 1341934, but I wish I had seen bug 1356524, which is really where
> this started.  So I apologize for the drive-by.
> 
> * It smashes all manner of abstractions, reaching right into the belly of
> the taskgraph code and grabbing what it wants.

With a TODO item to move the small function that picks things from the taskgraph code there,

> * It also pulls in code from mozharness and tooltool, further knotting those
> tools together, which will only make it harder to reverse that later.  For
> tooltool in particular, this adds yet another way of invoking tooltool in
> tree, on top of the six or so already present, without removing any of those.

It pulls mozharness until bug 1356952, as mentioned in the commit message, and bug 1356683 replaces a lot of direct calls to tooltool.

> * The result is nondeterministic: the optimization process is not
> deterministic over time, as it may find a task tomorrow that does not yet
> exist today. So the toolchain used for a particular build may depend on when
> that build runs.

The optimization process is deterministic for the purpose of this tool. As a matter of fact, it will pick the same as the decision graph would if it was rerun. How is that not desirable? It's not more or less nondeterministic than what the decision graph is going to do. If there is nondeterminism to address here, then the same nondeterminism needs to be addressed for the decision graph, since they are doing the same thing! (same code, same data)

> * Due to that nondeterminism, this is incompatible with the release chain of
> trust -- and certainly the toolchain should be included in the chain of
> trust.
> * Taskgraph generation may be slow, although I see that you are
> cherry-picking only the necessary kinds and their dependents.  Still, in so
> doing you are generating more data than you need

Sorry if I didn't want to block my work on refactoring the taskgraph code to be able to only generate the necessary data.

>, and the optimization phase
> will make unnecessary index lookups to optimize tasks that will later be
> ignored.

The "optimization phase" is only one call to the optimize_task function for the few tasks we're interested in, in order to find the task_id that holds the artifacts we're interested in.

> That time and index load can start to add up when spread across
> multiple builds, and would be a serious issue if this approach were applied
> to the test -> build dependencies.

I'm not proposing that it is.

> * Related, I worry that this approach, which represents a repudiation of the
> exsting decision-task model, would be replicated and then we would have two
> active, competing models for deciding what tasks to run in a task graph.

This doesn't make any decision about tasks. This actually uses the same decision code on the same data, from a task that exists, to find the right artifacts from its relevant dependencies. It doesn't submit anything to taskcluster.

> * As you've seen, taskgraph expects a taskcluster proxy.

> 
> I think the motivation is
> 
> > This also has the advantage that automation uses the same code as what
> > local developers will end up using, ensuring it doesn't break.
> 
> and I think that this could be accomplished within the existing model of
> calculating and arranging dependencies once, in the decision task.

Which is what is happening.

> Here's what I suggest:
> * In the decision task, provide each build task with enough information to
> point `mach artifact toolchain` to the appropriate toolchain task.
> * Use the TC index to store toolchain tasks by whatever identifier is most
> useful (probably a hash of the inputs) and build an interface for local
> developers that queries the index.

This is *exactly* what's going on with this tool.

> This is how we do artifact builds, for
> example.

The artifact builds code builds its own taskcluster index path. If the taskgraph code changes those index paths, now you have two places to change. Here, the same code is shared. The calculation for the index path for toolchain jobs is more involved. Replicating it would be the mistake.

> * Add a short new task which invokes that local interface and compares its
> results to the decision task's task-graph.json (expecting to either find the
> same taskId, or to find nothing if the taskId in task-graph.json isn't
> completed yet).  This will ensure that the local developer interface stays
> functional and current.  Artifact builds have a similar check.

Artifact builds also have their own non-determinism. Because they will pick whichever last artifact for whichever parent changeset has them. Which means artifact builds can break two or more changesets *after* the real breakage landed. Worse, on try, the AB job can't even pick the build from the try push, and you might as well have broken stuff in your try push, and you won't know until you land on an integration branch and several other things land on top of that.
Comment on attachment 8858712 [details]
Bug 1356529 - Add a `mach artifact toolchain` option to get toolchains for use for a specific build job.

https://reviewboard.mozilla.org/r/130734/#review133534

I'd like you and Dustin to get on the same page with regards to an acceptable strategy here because Dustin has a vision for the task graph and I don't want this work to undermine it too much. Please re-request review from me (and/or Dustin) once you two are in agreement over an approach.

Since this command already queries TC for artifact info, perhaps a compromise would be for the decision task to emit an artifact defining all toolchains, relevant platforms, etc. Then, `mach artifact toolchain` could query that for the URLs to artifacts it cares about without having to care too much task graph implementation details.
Attachment #8858712 - Flags: review?(gps) → review-
Depends on: 1356958
In the time since April, :ahal has added support for running `mach taskgraph` locally and caching the results.  Slightly generalizing that support would give easy access to a completed taskgraph which should make it pretty easy to determine this sort of information.
Hi Mike,

Should we close this as WONTFIX now, or do you still plan to work on this?

Thanks,
Pete
Flags: needinfo?(mh+mozilla)
That's still desired.
Flags: needinfo?(mh+mozilla)
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.