Closed Bug 1720715 Opened 4 years ago Closed 4 years ago

Productionize `taskgraph-diff` by integrating it into taskgraph

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: ahal, Assigned: ahal)

References

Details

Attachments

(8 files)

Bug 1720715 - Improve '--parameters' help text, r?#taskgraph-reviewers! 4 years ago Andrew Halberstadt [:ahal] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1720715 - Create and log parameter IDs / specifications, r?#taskgraph-reviewers! 4 years ago Andrew Halberstadt [:ahal] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1720715 - Allow passing multiple parameters on the same command line, r?#taskgraph-reviewers! 4 years ago Andrew Halberstadt [:ahal] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1720715 - Run generations with multiple parameters in parallel, r?#taskgraph-reviewers! 4 years ago Andrew Halberstadt [:ahal] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1720715 - Fix '--verbose' in show taskgraph commands, r?#taskgraph-reviewers! 4 years ago Andrew Halberstadt [:ahal] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1720715 - Support directories with the --parameters flag, r?#taskgraph-reviewers! 4 years ago Andrew Halberstadt [:ahal] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1720715 - Refactor taskgraph.util.vcs to use an interface, r?#taskgraph-reviewers! 4 years ago Andrew Halberstadt [:ahal] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1720715 - Support a '--diff' flag in taskgraph's cli, r?#taskgraph-reviewers! 4 years ago Andrew Halberstadt [:ahal] 48 bytes, text/x-phabricator-request		Details \| Review

Andrew Halberstadt [:ahal]

Assignee

Description

•

4 years ago

The taskgraph-diff module is just a couple scripts and a bunch of parameters files. We want to make this tool easier to use and discover.

After putting some thought into this, I’d like to incorporate it into taskgraph itself. My rough plan here is:

Build taskgraph-gen directly into taskgraph
- Add ability to specify multiple --parameters flags
- Add ability to specifiy a dir of params files (like in braindump)
- Resolve a list of parameters + identifiers for each
- If more than one, generate them all in parallel (like taskgraph-gen does currently)
- If more than one, write taskgraph logs to log files (one per params file)
- If more than one, taskgraph output files will be prefixed by params identifier (if specified), by default will still print to stdout (with log lines denoting which params identifier each one is)
Turn taskgraph-diff into a submodule of the taskgraph package
- We can still provide a taskgraph-diff entrypoint which gets invoked the same way it is now
- We can also add a taskgraph --diff convenience flag that can automatically generate two sets of parameters files on different VCS revisions and then call taskgraph-diff on them
  - Updating vcs feels icky, but since implementing it for m-c it’s honestly been one of the biggest time savers I’ve ever made. Plus we have mozversioncontrol which handles abstractions across hg + git.
  - This could be follow-up work, but imo it is the missing link to making taskgraph-diff useful for laypeople
Move actual dummy parameters files into the repos that they are for. E.g, the parameters in the params-fenix dir would go in the actual fenix repo under taskcluster/test/params or somewhere similar.

Andrew Halberstadt [:ahal]

Assignee

Comment 1

•

4 years ago

•

Edited

So presuming this exists in mozilla-central, the common usage of this tool would be to run something like:

$ ./mach taskgraph full --diff -p project=autoland -p project=mozilla-central --fast --target-kind=build

This would generate the taskgraphs on two separate revisions (current and base like it does currently). For each revision it would also generate two graphs, one for autoland and one for central. After the four total graphs are generated, they'd get fed into taskgraph-diff and normal diff output would be displayed. In all four taskgraphs, only the build kind (and dependencies) are generated.

If working on this I'd likely incorporate it into standalone taskgraph first and then try to sync it to m-c after.

Aki Sasaki (not active)

Comment 2

•

4 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #0)

The taskgraph-diff module is just a couple scripts and a bunch of parameters files. We want to make this tool easier to use and discover.

After putting some thought into this, I’d like to incorporate it into taskgraph itself. My rough plan here is:

Build taskgraph-gen directly into taskgraph

As long as this means both standalone taskgraph and gecko taskgraph, this sounds good.

* Add ability to specify multiple `--parameters` flags
* Add ability to specify a dir of params files (like in braindump)
* Resolve a list of parameters + identifiers for each
* If more than one, generate them all in parallel (like `taskgraph-gen` does currently)

We did a multiprocessing hack here because we were still in py2; this means that a ctrl-C doesn't actually kill things. I'd lean towards asyncio here instead.

* If more than one, write taskgraph logs to log files (one per params file)
* If more than one, taskgraph output files will be prefixed by params identifier (if specified), by default will still print to stdout (with log lines denoting which params identifier each one is)

Sure, this sounds like it will help debug without having to deal with interleaved logs.

Turn taskgraph-diff into a submodule of the taskgraph package

We can still provide a taskgraph-diff entrypoint which gets invoked the same way it is now

We can also add a taskgraph --diff convenience flag that can automatically generate two sets of parameters files on different VCS revisions and then call taskgraph-diff on them

Updating vcs feels icky, but since implementing it for m-c it’s honestly been one of the biggest time savers I’ve ever made. Plus we have mozversioncontrol which handles abstractions across hg + git.

Hm, l10n-cross-channel does something with hglib where it's able to inspect files in a given revision. I wonder if we can do something with that, or if that's way uglier given how many files are in taskcluster/ and how many files we reference outside of it.

We could do something like requiring a clean tree before updating vcs, or maybe hg does things cleanly.

    * This could be follow-up work, but imo it is the missing link to making `taskgraph-diff` useful for laypeople
Move actual dummy parameters files into the repos that they are for. E.g, the parameters in the params-fenix dir would go in the actual fenix repo under taskcluster/test/params or somewhere similar.

+1, and docs on how to curate these.
Something that makes these harder to maintain is the lack of default parameter values; we may want to raise that in priority.

Andrew Halberstadt [:ahal]

Assignee

Comment 3

•

4 years ago

(In reply to Aki Sasaki [:aki] (he/him) (UTC-7) from comment #2)

Build taskgraph-gen directly into taskgraph

As long as this means both standalone taskgraph and gecko taskgraph, this sounds good.

Yes, I'd sync it to both. Though the cli is pretty divergent already, so it might end up being "implement it for both"

We did a multiprocessing hack here because we were still in py2; this means that a ctrl-C doesn't actually kill things. I'd lean towards asyncio here instead.

Hm, l10n-cross-channel does something with hglib where it's able to inspect files in a given revision. I wonder if we can do something with that, or if that's way uglier given how many files are in taskcluster/ and how many files we reference outside of it.

We could do something like requiring a clean tree before updating vcs, or maybe hg does things cleanly.

Yeah, this came up in review when I implemented it. Mitch suggested the same thing but ultimately we don't know exactly which files we need (other than globs in the sparse-profile definition). At the time we concluded it was more effort than any potential benefit warranted.

Though I also know the moz-phab team was looking into a way of solving this as well. If they get it working, we could try and copy their approach, though again I'm not 100% sure it will translate well to taskgraph.

bhearsum@mozilla.com (:bhearsum)

Comment 4

•

4 years ago

Move actual dummy parameters files into the repos that they are for. E.g, the parameters in the params-fenix dir would go in the actual fenix repo under taskcluster/test/params or somewhere similar.

How confident are we that these will stay up to date? The ones in braindump tend to get out of date and cause invalid testing. Personally, I try to use -p task-id or -p [project] these days, to pull a recent one that I'm certain will give me comparable results to m-c, etc.

Other than that, which is not even a real objection, I'm a big +1 on all of this.

Aki Sasaki (not active)

Comment 5

•

4 years ago

Hm, l10n-cross-channel does something with hglib where it's able to inspect files in a given revision. I wonder if we can do something with that, or if that's way uglier given how many files are in taskcluster/ and how many files we reference outside of it.

Ah, we have to support git if we're in standalone taskgraph.

(In reply to bhearsum@mozilla.com (:bhearsum) from comment #4)

Move actual dummy parameters files into the repos that they are for. E.g, the parameters in the params-fenix dir would go in the actual fenix repo under taskcluster/test/params or somewhere similar.

How confident are we that these will stay up to date? The ones in braindump tend to get out of date and cause invalid testing. Personally, I try to use -p task-id or -p [project] these days, to pull a recent one that I'm certain will give me comparable results to m-c, etc.

Other than that, which is not even a real objection, I'm a big +1 on all of this.

I'm confident they'll go out of date, but as long as we have something for every required parameter, it works. Changing parameter files means we can't diff across taskgraph-gen runs, and using -p [project] doesn't include cron or relpro.

Andrew Halberstadt [:ahal]

Assignee

Comment 6

•

4 years ago

Yeah, -p project=autoland has real chance of picking up different parameters, but maybe we can make taskgraph-diff smart enough to re-use whatever parameters it used the first time around.

Andrew Halberstadt [:ahal]

Assignee

Comment 7

•

4 years ago

•

Edited

(In reply to Aki Sasaki [:aki] (he/him) (UTC-7) from comment #5)

I'm confident they'll go out of date, but as long as we have something for every required parameter, it works. Changing parameter files means we can't diff across taskgraph-gen runs, and using -p [project] doesn't include cron or relpro.

Maybe we can add -p task-index=<index> as well so we can include things like cron and relpro. We could make a little manifest that lists all the indices we care about and figure out a way to pass them all into taskgraph. That way we wouldn't need the static files as much anymore.

Edit: though not all contexts have a latest index, so I guess it's still not perfect.

Aki Sasaki (not active)

Comment 8

•

4 years ago

One way might be to publish the parameters used along with the taskgraph-gen output, as artifacts. (I'm not 100% sure if that will be useful or not, but that would allow us to reference and potentially debug.)

bhearsum@mozilla.com (:bhearsum)

Comment 9

•

4 years ago

We did a multiprocessing hack here because we were still in py2; this means that a ctrl-C doesn't actually kill things. I'd lean towards asyncio here instead.

I was thinking about this part, and I'm not sure it's the right choice here. These are CPU intensive tasks AFAIK, which means asyncio probably doesn't speed things up much (we're still stuck in one process). I think we want a `concurrent.futures' instead? https://stackoverflow.com/questions/51271451/how-to-parallelize-computation-with-asyncio

Andrew Halberstadt [:ahal]

Assignee

Comment 10

•

4 years ago

•

Edited

Good call. There's a fair amount of IO involved in taskgraph generation (reading config files, reading moz.build files, etc), but yeah.. I think we should continue using a separate subprocess for each generation here. Haven't looked at what the "hack" is, but switching to a concurrent.futures.ProcessPoolExecutor might still be an improvement.

I'd love to investigate using asyncio within a single invocation of taskgraph such that we can process kinds concurrently (as long as their kind dependencies are already processed), but that's a whole separate project.

Andrew Halberstadt [:ahal]

Assignee

Updated

•

4 years ago

Assignee: nobody → ahal

Status: NEW → ASSIGNED

Andrew Halberstadt [:ahal]

Assignee

Updated

•

4 years ago

Depends on: 1725404

Andrew Halberstadt [:ahal]

Assignee

Comment 11

•

4 years ago

•

Edited

Rather than implement the --diff feature into two distinct cli's, I decided to block on bug 1725404 and try to unify the two first.

Andrew Halberstadt [:ahal]

Assignee

Updated

•

4 years ago

Depends on: 1726017

Andrew Halberstadt [:ahal]

Assignee

Updated

•

4 years ago

No longer depends on: 1726017

Andrew Halberstadt [:ahal]

Assignee

Comment 12

•

4 years ago

Attached file Bug 1720715 - Improve '--parameters' help text, r?#taskgraph-reviewers! — Details

Andrew Halberstadt [:ahal]

Assignee

Comment 13

•

4 years ago

Attached file Bug 1720715 - Create and log parameter IDs / specifications, r?#taskgraph-reviewers! — Details

We will soon allow multiple parameter files on the same cli. In anticipation of
this we need to add identifiers (the hash of the parameters) as well as
specifications (how the parameters were designated by the user). This will make
it easier to trach which logs / output belong to which taskgraph generation
from the user's perspective.

Depends on D122991

Andrew Halberstadt [:ahal]

Assignee

Comment 14

•

4 years ago

Attached file Bug 1720715 - Allow passing multiple parameters on the same command line, r?#taskgraph-reviewers! — Details

Depends on D122992

Andrew Halberstadt [:ahal]

Assignee

Comment 15

•

4 years ago

Attached file Bug 1720715 - Run generations with multiple parameters in parallel, r?#taskgraph-reviewers! — Details

Depends on D122993

Andrew Halberstadt [:ahal]

Assignee

Comment 16

•

4 years ago

This is a WIP stack that mostly solves step 1 from comment 0. Still todo here are:

Ability to specify a dir of parameters
Solve the log issue (interleaved logs from multiple generations). Will likely just redirect these to files.

After that is solving the diff portion of the bug. I'd like to try and have all of this work in a single invocation (i.e taskgraph --diff vs taskgraph && hg up other && taskgraph && taskgraph-diff). Though to do this will mean messing around with version control which isn't everyone's cup of tea. For context we already do this in mozilla-central and some safe guards we use include bailing out if the working dir is dirty and trying to update as close as possible back to the original state (e.g ensure bookmarks are activated again with hg).

If there aren't any major objections to this, I'd like to publish Gecko's mozversioncontrol library to Pypi and make taskgraph depend on that rather than re-inventing the wheel there.

bhearsum@mozilla.com (:bhearsum)

Comment 17

•

4 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #16)

This is a WIP stack that mostly solves step 1 from comment 0. Still todo here are:

Ability to specify a dir of parameters

Solve the log issue (interleaved logs from multiple generations). Will likely just redirect these to files.

After that is solving the diff portion of the bug. I'd like to try and have all of this work in a single invocation (i.e taskgraph --diff vs taskgraph && hg up other && taskgraph && taskgraph-diff). Though to do this will mean messing around with version control which isn't everyone's cup of tea. For context we already do this in mozilla-central and some safe guards we use include bailing out if the working dir is dirty and trying to update as close as possible back to the original state (e.g ensure bookmarks are activated again with hg).

If there aren't any major objections to this, I'd like to publish Gecko's mozversioncontrol library to Pypi and make taskgraph depend on that rather than re-inventing the wheel there.

I'm on board with all of this. I am one of the folks that's not a fan of messing with the VCS, but it's worth it for the enhanced UX IMO. (Unless you want to consider copying the repo to a tempdir or something, and never touching the original - but that's awfully expensive.)

Andrew Halberstadt [:ahal]

Assignee

Comment 18

•

4 years ago

Heh, that might work for some of the smaller repos.. but it's gonna be a bad time with m-c :p.

I'm definitely open to exploring alternatives down the road (bearing in mind we need to support hg and git), but I don't want to block any of the work currently scoped out for H2 on it. If messing with VCS was a deal breaker, I'd advocate the 4 step method as the next best thing (and promptly write myself a script to avoid it).

bhearsum@mozilla.com (:bhearsum)

Comment 19

•

4 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #18)

Heh, that might work for some of the smaller repos.. but it's gonna be a bad time with m-c :p.

I'm definitely open to exploring alternatives down the road (bearing in mind we need to support hg and git), but I don't want to block any of the work currently scoped out for H2 on it. If messing with VCS was a deal breaker, I'd advocate the 4 step method as the next best thing (and promptly write myself a script to avoid it).

Just to be 100% clear...I am not against checking in the VCS stuff. It's not an ideal solution, but it's the best available one. If we can come up with something better later, great, but if not I'm happy for it to live in perpetuity.

Andrew Halberstadt [:ahal]

Assignee

Comment 20

•

4 years ago

Attached file Bug 1720715 - Fix '--verbose' in show taskgraph commands, r?#taskgraph-reviewers! — Details

This flag was previously being ignored.

Depends on D122991

Andrew Halberstadt [:ahal]

Assignee

Updated

•

4 years ago

Depends on: 1714597

Andrew Halberstadt [:ahal]

Assignee

Updated

•

4 years ago

Blocks: 1726573

Pulsebot

Comment 21

•

4 years ago

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/ci/taskgraph/rev/70cf69f7927a Improve '--parameters' help text, r=taskgraph-reviewers,bhearsum https://hg.mozilla.org/ci/taskgraph/rev/644b0b15ef3a Fix '--verbose' in show taskgraph commands, r=taskgraph-reviewers,bhearsum

Andrew Halberstadt [:ahal]

Assignee

Updated

•

4 years ago

Depends on: 1726831

Andrew Halberstadt [:ahal]

Assignee

Comment 22

•

4 years ago

Attached file Bug 1720715 - Support directories with the --parameters flag, r?#taskgraph-reviewers! — Details

Passing --parameters=<dir> will search <dir> for any .json or .yml therein and
run multiple taskgraph generations in parallel (as if passing multiple -p
flags).

This is primarily intended to be used with the sets of parameters defined in
taskgraph-diff:
https://hg.mozilla.org/build/braindump/file/61ad54b07b06ee98cdd5e1177f423dda62b3ac3b/taskcluster/taskgraph-diff

Depends on D122994

Andrew Halberstadt [:ahal]

Assignee

Comment 23

•

4 years ago

Attached file Bug 1720715 - Refactor taskgraph.util.vcs to use an interface, r?#taskgraph-reviewers! — Details

This also refactors the tests to use parametrization.

Depends on D123563

Andrew Halberstadt [:ahal]

Assignee

Comment 24

•

4 years ago

Attached file Bug 1720715 - Support a '--diff' flag in taskgraph's cli, r?#taskgraph-reviewers! — Details

Depends on D123564

Phabricator Automation

Updated

•

4 years ago

Attachment #9236883 - Attachment description: WIP: Bug 1720715 - Create and log parameter IDs / specifications → Bug 1720715 - Create and log parameter IDs / specifications, r?#taskgraph-reviewers!

Phabricator Automation

Updated

•

4 years ago

Attachment #9236884 - Attachment description: WIP: Bug 1720715 - Allow passing multiple parameters on the same command line → Bug 1720715 - Allow passing multiple parameters on the same command line, r?#taskgraph-reviewers!

Phabricator Automation

Updated

•

4 years ago

Attachment #9236885 - Attachment description: WIP: Bug 1720715 - Run generations with multiple parameters in parallel → Bug 1720715 - Run generations with multiple parameters in parallel, r?#taskgraph-reviewers!

Phabricator Automation

Updated

•

4 years ago

Attachment #9237871 - Attachment description: WIP: Bug 1720715 - Support directories with the --parameters flag → Bug 1720715 - Support directories with the --parameters flag, r?#taskgraph-reviewers!

Phabricator Automation

Updated

•

4 years ago

Attachment #9237872 - Attachment description: WIP: Bug 1720715 - Refactor taskgraph.util.vcs to use an interface → Bug 1720715 - Refactor taskgraph.util.vcs to use an interface, r?#taskgraph-reviewers!

Phabricator Automation

Updated

•

4 years ago

Attachment #9237873 - Attachment description: WIP: Bug 1720715 - Support a '--diff' flag in taskgraph's cli → Bug 1720715 - Support a '--diff' flag in taskgraph's cli, r?#taskgraph-reviewers!

Pulsebot

Comment 25

•

4 years ago

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/ci/taskgraph/rev/da7774fc0416 Create and log parameter IDs / specifications, r=taskgraph-reviewers,aki https://hg.mozilla.org/ci/taskgraph/rev/4eb4d8bec730 Allow passing multiple parameters on the same command line, r=taskgraph-reviewers,aki

Andrew Halberstadt [:ahal]

Assignee

Updated

•

4 years ago

Blocks: 1729060

Pulsebot

Comment 26

•

4 years ago

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/ci/taskgraph/rev/5b985cdabe1f Run generations with multiple parameters in parallel, r=taskgraph-reviewers,aki https://hg.mozilla.org/ci/taskgraph/rev/dc5010d02eb0 Support directories with the --parameters flag, r=taskgraph-reviewers,jmaher,aki https://hg.mozilla.org/ci/taskgraph/rev/f7e395a422e2 Refactor taskgraph.util.vcs to use an interface, r=taskgraph-reviewers,aki https://hg.mozilla.org/ci/taskgraph/rev/b1debb683854 Support a '--diff' flag in taskgraph's cli, r=taskgraph-reviewers,aki

Andrew Halberstadt [:ahal]

Assignee

Updated

•

4 years ago

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.