Closed Bug 1359942 Opened 3 years ago Closed 2 years ago

Be consistent about the sense of "optimize", and allow optimizing out whole platforms when changes are limited to a single directory

Categories

(Firefox Build System :: Task Configuration, task)

task
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

References

Details

Attachments

(1 file)

In YAML files, we have

  when:
    files-changed:
      - js/src/**

which reads nicely in a positive sense -- "run this task when these files have changed".  However, there are two problems.

First, this is the opposite of how optimization works under the hood: optimization means *skipping* a task, where the default is to run it.  Currently the above ends up in the final task as

  optimizations: [
    ['files-changed', 'js/src/**'],
  ]
meaning
 optimize = not any(f matches 'js/src/**' for f in changed_files)

It would make a lot more sense if it was ['files-not-changed', 'js/src/**'].  That's easy enough.

Second, in bug 1359838 there's a case where we want to *skip* a task if all file changes match a given pattern like `mobile/**`.
 optimize = all(f matches 'mobile/**' for f in changed_files).
or
 optimize = not any(not f matches 'mobile/**' for f in changed_files)
which might be expressed as
  optimizations: [
    ['files-not-changed', '!', 'mobile/**'],
  ]
so everything following '!' is expected to *not* match.

Expanding that a little (I'm going to make the part about iOS and ios/** up, but I expect there is or will be an example like this at some point):

 android tasks: skip when changes are limited to 'ios/**' and 'browser/**'
 desktop tasks: skip when changes are limited to 'ios/**' and 'mobile/**'
 ios tasks: skip when changes are limited to 'browser/**' and 'mobile/**'

we would annotate android tasks with
  optimizations: [
    ['files-not-changed', '!', 'ios/**', 'browser/**'],
  ]
meaning
  optimize = not any(not f matches 'ios/**' and not f matches 'browser/**'
                     for f in changed_files)

I expect we could implement these in the YAML somewhere with a mapping from a changes-affect key to specific directory patterns:

  desktop: [browser/**]
  android: [mobile/**]
  ios: [ios/**]
  focus: [focus/**]

and then make a transform which adds all patterns *except* those for the task's changes-affect value, if the task has the `when.changes-affect` property set.

  when:
    changes-affect: android

We can add that value automatically based on build platform in a transform.  "Platform" is a vague concept, so I'm sure that will generate a few dozen follow-on bugs, but until we clean up platforms that's about the best we can do.
I specifically implemented "files-changed" as an opt-in mechanism because as the number of projects in mozilla-central approaches infinity, maintaining opt-out lists (like "files-not-changed") becomes effectively impossible since you'll have ~N exclusion lists each with ~N entries. It is not scalable for each project to reference other projects in this way. It also violates loose coupling: you want each project as standalone as possible - it shouldn't need to know about unrelated projects in the repo. This creates problems when adding a new project because all of a sudden changes to that new project may trigger automation in every other unrelated project. That creates a maintenance burden to adding new projects and undermines the value of the monorepo.

Put in concrete terms, android and desktop builds should be as independent as possible and we shouldn't have e.g. references to browser/ in mobile/android/.

So I'm moderately to strongly opposed to the idea of path-based exclusions referencing other projects because the approach isn't appropriate for a monorepo with discrete projects within.

A problem with the current file-based opt-in approach is you need to annotate pretty much everything before you can start constructing a minimal graph based on what's changed. Currently, task graph starts with a full graph then prunes away nodes that require certain files be touched. We effectively have an implicit "when files-changed: **". For us to change this model and only bring in tasks based on file changes, we need path-based annotations for almost every file or some file changes won't schedule anything! This is the ideal I'd like to get to because it should minimize the work that automation does. But for now, it's safer to assume everything and prune until we have enough annotations to invert the model.

I do like your idea of centralized lists of projects/groupings of file patterns. This concept has been discussed before. Definitely in the context of in-tree module definitions (to which we could attach reviewers). Possibly in the context of the task graph. It would help prevent a lot of redundancy and reduce the potential for pattern lists getting out of sync and automation not running when it is supposed to.
Just to be clear, the first minor fix ('files-not-changed') is just making the name align with the behavior: right now the english read of the data structure {'optimize': {'files-changed': ['js/src/**']}} is "optimize if files under js/src have changed", while the behavior is "optimize when the files under js/src have *not* changed".  So I think that's both quite possible and not controversial.  The specification in the yaml ({'when': {'files-changed': ['js/src/**']}}) reads correctly so that wouldn't change.

Another clarification of my proposal: the yml files would just say "when.changes-affect: android" or "..ios".  It's up to the transforms to translate that into one of the ~N lists of length ~N.  So there's no tight coupling from a user perspective, and no great performance impact for reasonable N's on the order of 10^1.  Especially if we do some caching of the file matching.

As for the exclusions, I think the loose coupling is violated by the way the request is phrased: "don't build Android when only desktop-related files have changed".  It's tempting to think that this is equivalent to "only build Android when Android-related files have changed", but Android-related files includes a nebulous "everything else" category, meaning that it is defined as something like "all files except ios/**, focus/**, .." -- and from there it's just a short walk to my proposal.

I like the eventual direction of tagging every file as affecting specific classes of tasks, then constructing optimization based on that classification -- but we would still be labeling something like tools/ as affecting android, ios, focus, and desktop, so I'm not sure that helps the loose coupling.  And I feel like that's some distance off, and I've proposed a reasonably simple alternative that will get us a substantial reduction in load without waiting for the per-file tagging.
Greg, were my arguments here persuasive? Should I go ahead and implement roughly what I outlined in comment 0?
Flags: needinfo?(gps)
I'm with you nearly 100% on the 1st 2 paragraphs of comment #2.

We're still not seeing eye-to-eye on paragraph #3 and #4. It is a complicated problem and likely requires synchronous comms to resolve. But it sounds like resolving that larger issue isn't super critical right now and we don't need to take action?
Flags: needinfo?(gps)
Perhaps -- I think it's going to continue to be a source of frustration and expense, but we can meter that as it comes in :)
Yeah. People have thought about how to crack this nut for a few years without any obvious solution "winning."

I'm usually not a fan of putting things off to the next all hands meeting, but if there was ever a cross-functional problem that would benefit from the attention and attendance that an all hands meeting can guarantee, this would be it. I'd encourage you to organize a session where the outcome is us deciding on a strategy for annotating files and tasks in order that we drastically improve CI efficiency.

Alternatively, I think dev-builds is the most appropriate mailing list to discuss the topic since whatever we do is likely intricately linked with the build system.
> right now the english read of the data structure {'optimize': {'files-changed': ['js/src/**']}} is "optimize if files under js/src have changed", while the behavior is "optimize when the files under js/src have *not* changed".

I've always thought the problem was with "optimize", not "files-changed". Replace with "skip", and it's clear. Also, using "skip" makes it clear that what happens is *different* from actual "optimize", which avoids running tasks by reusing old artifacts.
Optimize is *both* task-replacement ("this {toolchain,image,build} was already built with taskId xyz, so use that instead") and  skipping ("this task just doesn't need to happen").  If you think of "optimize" as meaning "don't run this task", but sometimes providing a substitute, it makes more sense. We could use

  optimizations:
    - skip-when-files-not-changed: ['js/src/**', ..]

if that would be clearer?
(In reply to Dustin J. Mitchell [:dustin] from comment #8)
> Optimize is *both* task-replacement ("this {toolchain,image,build} was
> already built with taskId xyz, so use that instead") and  skipping ("this
> task just doesn't need to happen").  If you think of "optimize" as meaning
> "don't run this task", but sometimes providing a substitute, it makes more
> sense. We could use
> 
>   optimizations:
>     - skip-when-files-not-changed: ['js/src/**', ..]
> 
> if that would be clearer?

I would find this clearer, and I prefer 'unless' to 'when not'.  (This is straight from the Lisp cannon.)  So:
```yaml
optimizations:
  - skip-unless-files-changed: ['js/src/**', ...]
```

In a different direction, might we address this problem with a file system monitoring solution in the same way that Tup addresses the build dependency problem with file system monitoring?  The more the machines derive, the happier we will be.
(In reply to Nick Alexander :nalexander from comment #9)
> In a different direction, might we address this problem with a file system
> monitoring solution in the same way that Tup addresses the build dependency
> problem with file system monitoring?  The more the machines derive, the
> happier we will be.

You mean using filesystem access monitoring to see what files are used by e.g. certain tests? If so, yes, there are limited uses for that. However, to do this adequately requires transitive file associations. For example, anything running `firefox` loads libxul.so and libxul.so is generated by running a build backend (like `make`) and that pretty much reads the world. Therefore thousands of files are relevant to any process running `firefox`. So there still needs to be some logic to sanitize the dependencies.
(In reply to Gregory Szorc [:gps] from comment #10)
> (In reply to Nick Alexander :nalexander from comment #9)
> > In a different direction, might we address this problem with a file system
> > monitoring solution in the same way that Tup addresses the build dependency
> > problem with file system monitoring?  The more the machines derive, the
> > happier we will be.
> 
> You mean using filesystem access monitoring to see what files are used by
> e.g. certain tests? If so, yes, there are limited uses for that. However, to
> do this adequately requires transitive file associations. For example,
> anything running `firefox` loads libxul.so and libxul.so is generated by
> running a build backend (like `make`) and that pretty much reads the world.
> Therefore thousands of files are relevant to any process running `firefox`.
> So there still needs to be some logic to sanitize the dependencies.

Mmm, you're right, I was thinking of the various YAML inputs.  Difficult stuff.
OK, I like the naming suggestion -- "unless" makes it sound a lot less clunky.

The second part of my proposal, though (starting with "Second, .." in comment 0) is the part I'd like to drill down on a little.  Yeah, it's difficult stuff, and very difficult to do automagically, but I think we can at least make a big dent in the per-platform optimizations using something like what I've suggested.  I don't want to let the perfect be the enemy of the good here, especially since ew don't have a good idea of what the perfect soluion is.
(In reply to Dustin J. Mitchell [:dustin] from comment #12)
> OK, I like the naming suggestion -- "unless" makes it sound a lot less
> clunky.
> 
> The second part of my proposal, though (starting with "Second, .." in
> comment 0) is the part I'd like to drill down on a little.  Yeah, it's
> difficult stuff, and very difficult to do automagically, but I think we can
> at least make a big dent in the per-platform optimizations using something
> like what I've suggested.  I don't want to let the perfect be the enemy of
> the good here, especially since ew don't have a good idea of what the
> perfect soluion is.

I don't have a very strong opinion on this, but I lean slightly more towards opt-in for tasks than opt-out.  I do think that there's another expression of gps's opt-in/combinatorial opt-out argument worth noting.  When we _remove_ a platform, if we keep things opt-in we have localized changes; if we move to opt-out, we will have lots of places to update.
Well, some beleaguered for loop in the taskgraph generation will have lots of places to update :)
To add to the bikeshed, I kind of like run-if/skip-if for consistency with manifests. As for opt-in/opt-out, could we not do both?

Task A:
optimizations:
    run-if:
        files-changed: ['ios/**']
        phase-of-the-moon: 'waxing gibbous'

Task B:
optimizations:
    skip-if:
        only-files-changed: ['browser/**', 'mobile/**']
        phase-of-the-moon: 'waxing gibbous'

This way if a task maintainer wants to deal with the pain of opting out of files, they are welcome to do so. Besides I kind of always wanted an easy way to define new predicates to optimize things. Of course this is a lot more work than you likely wanted to deal with :)
There are two negatives here -- do NOT run if these files have NOT changed. We can replace the first with "skip", but I think the second is clearer as "unless" instead of "if-not".  So I've used skip-unless-files-changed.

Regarding doing both, the `optimizations` list supports building up multiple optimizations from different sources.  For example, a test might include both SETA and a files-changed clause.  Allowing both opt-in and opt-out means they can't be composed so easily anymore.  It's also ambiguous: if a task matches neither of its "run-if" or "skip-if", what happens?  Likewise if it matches both.  Or is it impossible to have run-if and skip-if in the same task, in which case do we just always use skip-if, and append `seta` and `foo-files-changed` to that branch?  ..which is equivalent to what we have now :)

I don't think we want a fully expressive logic language here -- and anyway, we have code that can transform complex requirements into simpler equivalents.  The `optimizations` key is currently in disjunctive normal form, with a "true" result causing the task to be skipped. We could change either of those: use conjunctive normal form, or treat "true" as meaning to run the task.
Keywords: leave-open
Comment on attachment 8870841 [details]
Bug 1359942: rename optimization to skip-unless-changed;

https://reviewboard.mozilla.org/r/142410/#review146330

skip-unless-files-changed is a bit long to my taste, though... skip-unless-changed, maybe? On itself, it's ambiguous, but since it would always be followed by paths in the yaml files, it might be clear enough?

::: taskcluster/ci/android-stuff/kind.yml:60
(Diff revision 1)
>              max-run-time: 36000
>          scopes:
>            - docker-worker:relengapi-proxy:tooltool.download.internal
>            - docker-worker:relengapi-proxy:tooltool.download.public
>          optimizations:
> -          - - files-changed
> +          - - skip-unless-files-changed

What's with the double dashes here?
Attachment #8870841 - Flags: review?(mh+mozilla) → review+
Comment on attachment 8870841 [details]
Bug 1359942: rename optimization to skip-unless-changed;

https://reviewboard.mozilla.org/r/142410/#review146330

I like that -- will redraft.

> What's with the double dashes here?

It's a list of lists. android-stuff is sort of a dusty corner of tasks I don't really understand, so it doesn't use the "job" transform that changes "files-changed: .." into an "optimizations: [..]" entry.
Pushed by dmitchell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b52ea63719a6
rename optimization to skip-unless-changed; r=glandium
I suspect this broke the 'files-changed' key for source-test tasks. I haven't dug into the issue, but it appears broken and this looks like a likely candidate. For example, this push that modifies a mozbase file should have scheduled the mozbase python task:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=62c48862970cddb20d66883f7357260217250564
Flags: needinfo?(dustin)
Nvm, I tried backing this out and jobs still weren't being scheduled:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9a45cd1243b617dc36aee99de3eea412af489ed0

Philor had noticed that eslint wasn't working on try, while it is still working on central/autoland. So it must be some try-specific regression. I'll follow-up with you on irc.
Flags: needinfo?(dustin)
See Also: → 1383880
"..and allow optimizing out whole platforms when changes are limited to a single directory" moved to bug 1383880
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
Product: TaskCluster → Firefox Build System
You need to log in before you can comment on or make changes to this bug.