Closed Bug 1383880 Opened 7 years ago Closed 7 years ago

allow optimizing out whole platforms when changes are limited to a single directory

Categories

(Firefox Build System :: Task Configuration, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla58

People

(Reporter: dustin, Assigned: dustin)

References

(Blocks 1 open bug, )

Details

Attachments

(13 files, 5 obsolete files)

59 bytes, text/x-review-board-request
ahal
: review+
glandium
: review+
Details
59 bytes, text/x-review-board-request
ahal
: review+
Details
59 bytes, text/x-review-board-request
gps
: review+
Details
59 bytes, text/x-review-board-request
ahal
: review+
Details
59 bytes, text/x-review-board-request
ahal
: review+
Details
59 bytes, text/x-review-board-request
ahal
: review+
Details
59 bytes, text/x-review-board-request
ahal
: review+
Details
59 bytes, text/x-review-board-request
ahal
: review+
Details
59 bytes, text/x-review-board-request
ahal
: review+
Details
59 bytes, text/x-review-board-request
gps
: review+
Details
59 bytes, text/x-review-board-request
glandium
: review+
Details
59 bytes, text/x-review-board-request
glandium
: review+
Details
59 bytes, text/x-review-board-request
gps
: review+
Details
+++ This bug was initially created as a clone of Bug #1359942 +++

In bug 1359838 there's a case where we want to *skip* a task if all file changes match a given pattern like `mobile/**`.
 optimize = all(f matches 'mobile/**' for f in changed_files).
or
 optimize = not any(not f matches 'mobile/**' for f in changed_files)
which might be expressed as
  optimizations: [
    ['files-not-changed', '!', 'mobile/**'],
  ]
so everything following '!' is expected to *not* match.

Expanding that a little (I'm going to make the part about iOS and ios/** up, but I expect there is or will be an example like this at some point):

 android tasks: skip when changes are limited to 'ios/**' and 'browser/**'
 desktop tasks: skip when changes are limited to 'ios/**' and 'mobile/**'
 ios tasks: skip when changes are limited to 'browser/**' and 'mobile/**'

we would annotate android tasks with
  optimizations: [
    ['files-not-changed', '!', 'ios/**', 'browser/**'],
  ]
meaning
  optimize = not any(not f matches 'ios/**' and not f matches 'browser/**'
                     for f in changed_files)

I expect we could implement these in the YAML somewhere with a mapping from a changes-affect key to specific directory patterns:

  desktop: [browser/**]
  android: [mobile/**]
  ios: [ios/**]
  focus: [focus/**]

and then make a transform which adds all patterns *except* those for the task's changes-affect value, if the task has the `when.changes-affect` property set.

  when:
    changes-affect: android

We can add that value automatically based on build platform in a transform.  "Platform" is a vague concept, so I'm sure that will generate a few dozen follow-on bugs, but until we clean up platforms that's about the best we can do.
I think this sounds good, but I confess I'm a bit confused by the double negatives and have some questions.

1. How is changes-affect different from files-changed? Can they be merged?
2. Does `when.changes-affect="android"` automatically skip the task if there are only browser+ios changes?
3. Is the plan still to encode this information in moz.build files metadata?
4. What if we want to do:

    when:
        changes-affect:
            - android
            - ios
This is the same proposal we've been talking about for a while - I copy/pasted it from the other bug.  The details are vague enough that I think I'm just going to hack something together as a proposal.

1. `changes-affect` refers to "affectables", named things like "android", rather than files.  It would be transformed into a `files-not-changed` section.
2. yes, that's the idea
3. yes
4. I don't see why that couldn't be supported..
OK, I have a draft whipped up.  Salient bits:

## moz.build:

  with Files("gradle.mumble"):
    # this file can only affect android-related tasks
    AFFECTS_ONLY.platform_family = ['android']

  with Files("testing/something/mochi/something.html"):
    # this file can only affect mochitests
    AFFECTS_ONLY.test_flavor = ['mochitest']

So each file is tagged with what it affects, in zero or more classes of affectables ("platform_family" and "test_flavor" being the classes here).  The default for AFFECTS_ONLY.foo is "everything foo", so modifications to un-labeled files will affect all affectables in the class.

Note that the declarations need to be airtight.  For example, a browser-chrome source file might impact the browser-chrome tests (IMPACTED_TESTS) but it could also cause talos to fail, for example.  But a mochitest source file is really *only* relevant to mochitests.

## Push affects

This calculates what a push affects by looking at the AFFECTS_ONLY for the files in the push.  

## Optimization

If `skip-unless-affects` is specified, it gives a set of "affectables".  If the push does affects all of those affectables, then the task is run.  So for example an Android mochitest would have ['skip-unless-affects', 'platform_familiy:android', 'test_flavor:mochitest'].  Then if the push only affects browser-chrome tests, or only affects OS X, the task will be skipped.

# Questions

I'm looking for feedback and ideas on a few things:

* Naming.  I hate these names.  "Affectable"?? "Class"? Ugh.  I think this needs some kind of intermediate concept to which both moz.build files and task descriptions can refer.  But what is that called?

* Stylo and similar.  I think we have (or had a few months ago) tests which are only relevant when stylo/** changes, and tests which need not be run when only stylo/** has changed.  Would those affectables be "stylo:yes" and "stylo:no"?  Does it even fit this model?
Flags: needinfo?(ahalberstadt)
What if instead of classes, we could glob (and maybe regex) the task labels?

    with Files("gradle.mumble"):
        AFFECTS_ONLY = ['android/*']

    with Files("testing/mochitest/**"):
        AFFECTS_ONLY = ['*/mochitest*', '*/browser*']

That way we don't need to maintain complicated lists of all these different classes. Also keep in mind inherited values. E.g, in topsrcdir/moz.build we might want:

    with Files("**/*.js"):
        AFFECTS_ONLY = ['eslint']

That should get merged with any AFFECTS_ONLY defined in child moz.build files. With that in mind, AFFECTS_ONLY makes a bit less sense. Maybe something like INFLUENCES might be better? Or even just AFFECTS without the ONLY. Other naming ideas I had were CONFINED_TO and LIMITED_TO, but those are similar to AFFECTS_ONLY.
Flags: needinfo?(ahalberstadt)
I guess my eslint example is bad as that isn't what AFFECTS_ONLY is meant for.. but there might still be cases of inheritance. Though to support that eslint example, maybe there's room for both AFFECTS and AFFECTS_ONLY.
I think eslint would stick with the files-changed optimization.  The optimization this bug is aiming for is aimed at optimizing away broad swathes of tasks -- all Windows, or all mochitests, or all talos, etc.

The pattern-matching suggestion misses the key feature here, which is that unlabeled files are assumed to affect *all* affectables in a class.  So for example

  with Files("testing/something/mochi/something.html"):
    # this file can only affect mochitests
    AFFECTS_ONLY.test_flavor = ['mochitest']

then that file affects all platform families, but only the mochitest test flavor.  So if it's the only file touched, we'd run mochitests on all platforms, but not xpcshell tests or talos on any platform.

Naming tasks couples tasks to moz.build files, something gps (rightly IMHO) objected to.  It also assumes structure in a hitherto unstructured value. With this arrangement, when I add a new job it will always get optimized until I find all files that might affect it, unless it happens to match some patterns.  And the pattern for, say, all tasks that are related to linux is more complicated than just 'linux.*'.

Naming tasks relies on a human to imagine every possible file that might affect a task. We've seen a lot of trouble with that for files-changed.  In SF, we discussed approaching this from a fail-safe direction: writing lines of config *removes* tasks, so tasks run by default and we can think hard about each bit of added config to make sure it doesn't overreach.

I think inheritance can be useful here, but probably mapping by extension (such as "**/*.rs") will be uncommon.
Motion to refer to 'class' as 'disparate group' from here on out.

(In reply to Dustin J. Mitchell [:dustin] from comment #7)
> I think eslint would stick with the files-changed optimization.  The
> optimization this bug is aiming for is aimed at optimizing away broad
> swathes of tasks -- all Windows, or all mochitests, or all talos, etc.

Right bad example, though the point I was trying to make was that sooner or later someone will try to do inheritance and expect it to work.

> The pattern-matching suggestion misses the key feature here, which is that
> unlabeled files are assumed to affect *all* affectables in a class.  So for
> example
> 
>   with Files("testing/something/mochi/something.html"):
>     # this file can only affect mochitests
>     AFFECTS_ONLY.test_flavor = ['mochitest']
> 
> then that file affects all platform families, but only the mochitest test
> flavor.  So if it's the only file touched, we'd run mochitests on all
> platforms, but not xpcshell tests or talos on any platform.

True, I did miss the fact that we want to support many disparate groups. Though that's sort of tangential to pattern-matching vs maintained lists, e.g:
      
    AFFECTS_ONLY.test_flavor = ['*mochitest*']
    AFFECTS_ONLY.multiprocess = ['*e10s*']


> Naming tasks couples tasks to moz.build files, something gps (rightly IMHO)
> objected to.  It also assumes structure in a hitherto unstructured value.
> With this arrangement, when I add a new job it will always get optimized
> until I find all files that might affect it, unless it happens to match some
> patterns.  And the pattern for, say, all tasks that are related to linux is
> more complicated than just 'linux.*'.

What do you mean by naming tasks?

In your example, the label 'mochitest' will at some point need to be converted to a list of tasks. To me that ties tasks to moz.build an equal amount as pattern-matching does, just more implicitly.

I promise I'm not trying to give you a hard time, I don't actually feel that strongly about pattern-matching :). I was worried that mapping 'disparate group' names to tasks would be messy and was looking for an alternative solution. But I think this is going to be messy no matter what.
(In reply to Andrew Halberstadt [:ahal] from comment #8)
> Motion to refer to 'class' as 'disparate group' from here on out.

Well, I'll second a motion to find a better name than "class" but I don't think "disparate group" is the answer.

> Right bad example, though the point I was trying to make was that sooner or
> later someone will try to do inheritance and expect it to work.

Ah, so I think you're just saying I should use += instead of =.  That's probably a good idea, although having two different configurations saying "affects only X and nothing else" and "affects only Y and nothing else" combine into "affects only X and Y and nothing else" is a little counter-intuitive.  Maybe I should try to forbid inheritance?

> True, I did miss the fact that we want to support many disparate groups.
> Though that's sort of tangential to pattern-matching vs maintained lists,
> e.g:
>       
>     AFFECTS_ONLY.test_flavor = ['*mochitest*']
>     AFFECTS_ONLY.multiprocess = ['*e10s*']
> 
> 
> > Naming tasks couples tasks to moz.build files, something gps (rightly IMHO)
> > objected to.  It also assumes structure in a hitherto unstructured value.
> > With this arrangement, when I add a new job it will always get optimized
> > until I find all files that might affect it, unless it happens to match some
> > patterns.  And the pattern for, say, all tasks that are related to linux is
> > more complicated than just 'linux.*'.
> 
> What do you mean by naming tasks?

I mean referring to specific tasks by label.

> In your example, the label 'mochitest' will at some point need to be
> converted to a list of tasks. To me that ties tasks to moz.build an equal
> amount as pattern-matching does, just more implicitly.

It doesn't actually need to be converted to a list of tasks -- rather, task optimizations looks at the set of "affectables" (sorry, we still haven't invented a better word) to decide whether each task should be skipped.

> I promise I'm not trying to give you a hard time, I don't actually feel that
> strongly about pattern-matching :). I was worried that mapping 'disparate
> group' names to tasks would be messy and was looking for an alternative
> solution. But I think this is going to be messy no matter what.

It's OK, I appreciate the conversation.  I think this is demonstrating that I am not explaining the problem well enough.  I'll keep fiddling until I find something that "clicks".  When I do, I'll probably write a blog entry too (as well as some in-tree docs).
I've spent some time scribbling on scratch paper, and I have convinced myself that simple tags (without classes/disparate groups) is the right approach - it ends up running fewer jobs than what I was suggesting, but in every case for the jobs where the two approaches differ, the job really doesn't need to be run.

I'm going to write this up in a blog article.
Comment on attachment 8890085 [details]
Bug 1383880: Annotate builds and tests with SCHEDULES-related optimizations;

https://reviewboard.mozilla.org/r/161160/#review168046

I just did a drive-by review because I saw the bug activity and was intrigued.

I like the approach and the build system changes look pretty good! I don't anticipate any major review issues there.

I'm anxious to see how this plays out!

One question I have (which may be answered in previous comments or in a promised blog post) is how "optimizations" will scale to multiple elements. Do you combine entries with AND or OR? Can you have complex conditions? That's always a fun problem :)

::: python/mozbuild/mozbuild/affects.py:39
(Diff revision 1)
> +    for p, m in reader.files_info(paths).items():
> +        affects_only = m['AFFECTS_ONLY']
> +        for class_name, full_set in AFFECTS_ONLY_CLASSES.items():
> +            # not specifying AFFECTS_ONLY means it affects all instances in the class
> +            instances = set(affects_only[class_name]) or full_set
> +            class_instances = affects.setdefault(class_name, set())

Nit: you can use a collections.defaultdict to avoid this pattern.
(In reply to Gregory Szorc [:gps] from comment #11)
> One question I have (which may be answered in previous comments or in a
> promised blog post) is how "optimizations" will scale to multiple elements.
> Do you combine entries with AND or OR? Can you have complex conditions?
> That's always a fun problem :)

The tags refer to sets of tasks, so no -- no complex expressions.  A task can have multiple tags (OR) and a file can affect multiple tags (AND).  That's about it.  KISS :)
Just to clarify, as soon as a file is tagged with AFFECTS_TASKS, modifying said file will no longer schedule every task, right?

I think the name AFFECTS_TASKS doesn't make this very clear, i.e some developer might come along and use it with a wide-reaching glob expression not realizing they're disabling test coverage for those files. AFFECTS_ONLY was a bit better, but it's still sort of ambiguous.. what does AFFECTS even mean? I think the name should be more imperative, maybe something like RUN_ONLY_TASKS.

Another edge case to consider is making sure files-changed has precedence over the tagging. For example, if we have both:

    someTask['when']['files-changed'] = ['mobile/**']

and:

    with Files('mobile/**'):
        AFFECTS_TASKS = ['android']

Then `someTask` should run, even if it doesn't have the 'android' tag.

Anyway, I really like this idea, I'm excited to see it land. I'm already thinking of tons of different places this could be applied :)
> right?

Exactly.

A lot of head-scratching on my part has failed to come up with a good name. I'm happy to use "RUN_ONLY_TASKS".  I like "affects" because it makes grammatical sense in the task definitions too (when.changes-affect and optimizations.skip-unless-changes-affect).  I'm not sure what those would be for RUN_ONLY_TASKS.  How about AFFECTS_ONLY_TASKS?  Naming is hard.

> Another edge case to consider is making sure files-changed has precedence over the tagging.

I'm tempted to just disallow this combination at the job level; that is, you can have `when.files-changed` or `when.changes-affect`, but not both.  The background to that is the optimization process, which effectively OR's the reasons for *skipping* a task. That's equivalent to AND'ing the reasons to run a task.

Do you see a use-case for both?  If so, we could change the optimization process.. it's all code after all.  And I might see such a case, between SETA and changes-affect: if SETA decides it's time to run test task X, then we want to run that task even if this particular push doesn't affect it, since a previous push might have affected it.
Never mind mu request for a use-case for both.  The SETA example is enough :)
I think linting is another one. They'll have `files-changed: ["**/*.js"]` and will presumably still want to run on Android JS files without being tagged as android. Otherwise lint files would need to have every tag.
s/lint files/lint tasks
So the options are:

      CHANGE   OPTION OPTION OPTION
SETA  AFFECTS    1      2      3
----  ----     ----   ----   ----
skip  skip     skip   skip   skip
skip  run      skip   run    skip
run   skip     skip   run    run
run   run      run    run    run

OPTION 1 is what we have now: if any optimization says skip, we skip.  OPTION 2 would be the suggested change.  I think what we want is OPTION 3: if SETA says skip it, we always skip it because it's low-value.  But when SETA decides not to skip, then we should always run it regardless of the contents of the push.

Obviously OPTION 3 is equivalent to omitting the change-affects optimization. But SETA is only enabled on some trees, and where it's disabled the behavior should be that of change-affects.

So I think I've talked myself into this:
 - disallow when.files-changed use on the same task as when.changes-affect
 - for tests, either use seta or files-changed, not both
 - don't otherwise change the logic of optimizations (keep the behavior I described in comment 15)
For linting, I think we would just switch to 

with Files('**/*.js'):
    AFFECTS_TASKS += ['lint']

and the .js files under mobile/android would also have AFFECTS_TASK += ['android'], so mobile/android/foo/bar.js would have ['android', 'lint'].  If you modified that file, then we'd run all tasks tagged 'android' *or* 'lint'.  Which is what we want?
> A lot of head-scratching on my part has failed to come up with a good name.
> I'm happy to use "RUN_ONLY_TASKS".  I like "affects" because it makes
> grammatical sense in the task definitions too (when.changes-affect and
> optimizations.skip-unless-changes-affect).  I'm not sure what those would be
> for RUN_ONLY_TASKS.  How about AFFECTS_ONLY_TASKS?  Naming is hard.

I'd be happy with AFFECTS_ONLY_TASKS too. Fwiw, I don't think internal consistency with the taskgraph module is all that important. I see the moz.build variables as developer facing APIs, and making sure the names are intuitive to the people using them is >> making sure the name matches up with the naming taskgraph uses. But AFFECTS_ONLY_TASKS seems just as good as RUN_ONLY_TASKS.


(In reply to Dustin J. Mitchell [:dustin] from comment #20)
> For linting, I think we would just switch to 
> 
> with Files('**/*.js'):
>     AFFECTS_TASKS += ['lint']
> 
> and the .js files under mobile/android would also have AFFECTS_TASK +=
> ['android'], so mobile/android/foo/bar.js would have ['android', 'lint']. 
> If you modified that file, then we'd run all tasks tagged 'android' *or*
> 'lint'.  Which is what we want?

We can't add AFFECTS_TASKS to **/*.js globally, so I'll assume you mean we'd add it to /mobile/moz.build.

But I don't think that's going to work either. It means every time we add a tag anywhere, we also need to remember to apply the lint tag to all appropriate files or else we'll accidentally disable the linter on those files. And keep in mind, it's not just **/*.js.. it's every 'files-changed' item from every task in taskcluster/ci/source-test/mozlint.yml. Just the extensions we'll need to define include: py, js, jsm, jsx, xul, html, xhtml, .flake8, .eslintrc, .eslintignore and soon: cpp, c, h, rs. Then there are all the directories on top of that.

Right now if we want to modify when a lint task runs, we can do so in a single location. Under this new system, we'd need to hunt down N different locations scattered all over the tree. And if we miss a location, it'll fail silently.
Regarding naming, I tihnk this is all developer-facing.  There's no point saying AFFECTS_ONLY_TASKS = ['foo'] if foo isn't defined in the taskgraph config, and if foo is an "optimization tag" in the taskgraph but an "affected task" in moz.build, then that's going to be a speed-bump in someone figuring out how this works.

Regarding lint -- this bug is about "optimizing out whole platforms", or more generally large classes of tasks we can *drop* when the files changed couldn't possibly affect them.  Lint doesn't fit that shape. A change to a .js file does not *only* affect lint.  So I think it makes sense to continue using when.files-changed for lint and **/*.js.

I don't think when.changes-affect and when.files-changed can be meaningfully mixed in the same task.  The one lets source files control what runs; the other lets task definitions control what runs.  Even if we were able to make a design where behavior of a task serving both masters was well-defined in every possible case, that well-defined behavior would be confusing as heck.

Which brings me back to

> So I think I've talked myself into this:
>  - disallow when.files-changed use on the same task as when.changes-affect
>  - for tests, use at most one of seta, when.files-changed or when.changes-affect

In fact, I've convinced myself that any task should only have one optimization - that eliminates all ambiguity of how multiple optimizations combine.
(In reply to Dustin J. Mitchell [:dustin] from comment #22)
> Regarding lint -- this bug is about "optimizing out whole platforms", or
> more generally large classes of tasks we can *drop* when the files changed
> couldn't possibly affect them.  Lint doesn't fit that shape. A change to a
> .js file does not *only* affect lint.  So I think it makes sense to continue
> using when.files-changed for lint and **/*.js.

I agree, but on irc you said something like this makes sense:
https://pastebin.mozilla.org/9028560

Have you changed your mind?


(In reply to Dustin J. Mitchell [:dustin] from comment #22)
> Which brings me back to
> 
> > So I think I've talked myself into this:
> >  - disallow when.files-changed use on the same task as when.changes-affect

I think that's fair, especially if you want to keep it simple for the first iteration. The original goal was to optimize android tasks, and I understand if you don't want to get distracted by minor implementation details.

That said, I'd be sad if the option to support both files-changed and changes-affect at the same time wasn't at least considered for follow-up work. I think there is a lot of potential to optimize away small non-platform related swathes of the code base. Individually these optimizations might not be nearly as impactful as something like a whole platform, but there are a ton of places they could be made and would add up to a significant savings.
> Have you changed your mind?

Yes, you were gone by the time I figured that out though :)

> That said, I'd be sad if the option to support both files-changed and changes-affect at the same time wasn't at 
> least considered for follow-up work.

We can work on it later, but I think we'll find another way to suit these needs - probably by allowing more expressiveness in one of the two existing optimizations.
So my plan for now:
 - land the in-tree infrastructure for this
 - land per-platform-family "affects" support, although I only know of mobile/android
   - per-test-suite can come later, when the concept is proven, and when analysis shows it's useful
 - modify optimization to support optimizing away whole dependency trees
   - if A depends on B, it only an error if A is not optimized but B is optimized away
   - support for tasks that get optimized if any of their dependencies do, for things like signing or symbol uploads
Blocks: 1386625
I'd like to make the observation that `files-changed` in taskgraph YAML files are only semantically different from `with Files` context managers in moz.build files. i.e. you can express `files-changed` with some `with Files(pattern): AFFECTS*` primitive.

At the point we are defining patterns in both places, I'd encourage you to nuke the patterns from the YAML and do everything in moz.build so file-based metadata isn't fragmented.

Also, I had the random thought of using "schedules" as the primary verb (or noun) for the moz.build variables to declare impact to what gets executed. e.g.

   with Files('**/*.js'):
       SCHEDULES += ['js-lint']

Again, you are all thinking about this a lot more than me. I'm mostly lobbing random ideas in case any of them turn out to be good :)
(In reply to Gregory Szorc [:gps] from comment #26)
> I'd like to make the observation that `files-changed` in taskgraph YAML
> files are only semantically different from `with Files` context managers in
> moz.build files. i.e. you can express `files-changed` with some `with
> Files(pattern): AFFECTS*` primitive.

This is true, but we can't use AFFECTS_ONLY_TASKS. We'd need both AFFECTS_ONLY_TASKS and AFFECTS_TASKS for this to work.

> 
> At the point we are defining patterns in both places, I'd encourage you to
> nuke the patterns from the YAML and do everything in moz.build so file-based
> metadata isn't fragmented.

I thought dustin said you had previously objected to something like this (in comment 7). 

(In reply to Dustin J. Mitchell [:dustin] from comment #7)
> Naming tasks couples tasks to moz.build files, something gps (rightly IMHO)
> objected to.

Personally I think I agree. The 'files-changed' config is from the task's perspective, whereas AFFECTS_ONLY is from a file's perspective. So although the fragmentation is unfortunate, I think keeping them separate makes it easier to reason about.
Greg, you're absolutely right on both counts.

I do think that we need both inclusive ("run X if files Y or Z changed", maybe "AFFECTS_TASKS"?) and exclusive ("run only X if only files Y or Z change", "AFFECTS_ONLY_TASKS").  The former is good for jobs with a bounded set of inputs, like docs or lint, while the latter is good for cases where a directory is specific to a thing and we can happily drop all other things. This gets into gps's conception of gecko as a monorepo with lots of projects in it -- exclusive is going to be critical in that environment.

Configuring all of the patterns in the source makes sense to me, and I think it avoids the objection if the configuration points to some intermediary tags, rather than specific tasks.

So how about this proposal: define a set of allowable SCHEDULES tags.  Some of them are inclusive and some are exclusive.  In the root moz.build, we have

    with Files('**'):
        SCHEDULES = [..all exclusive tags..]

and

    with Files('**/*.js')
        SCHEDULES += ['js-lint']

Then for mobile/android/moz.build

    with Files('**'):
        SCHEDULES = ['android']

There are two downsides here:

 1. the difference between += for inclusive and = for exclusive is pretty subtle
 2. the order these are applied would matter; even in the example above mobile/android/foo.js will get tagged 'android' and not 'js-lint'.

Thoughts on how to fix that?

We're making a little bit of progress, and also going in some circles here.  I don't mind as long as the progress eventually wins!
Yes, relying on += vs = is *really* subtle. Note that mozbuild.frontend.context:Files has some magic variables that control how instances are "merged" as it combines instances from parents. The FINAL variable is relevant. See Files.__iadd__. Also read up at https://gecko.readthedocs.io/en/latest/build/buildsystem/files-metadata.html. The logic for individual variables can be changed. See also files_info() in mozbuild.frontend.reader.

Also, the sandbox prevents assignment of an already defined UPPERCASE variable. I'm not 100% sure this works within contexts like Files (it should IMO). If it does, you could hack Files.__enter__ to always define a SCHEDULES. That would force people to do something like `SCHEDULES[:] = ['replacement']` to overwrite an existing list. Not sure why you would ever do that. You probably want __iadd__ to union the values.

Also, don't be afraid to invent additional variables or a tagging/grouping mechanism as necessary. For example, the top-level moz.build could define a global list of COMPONENTS and individual Files() could register themselves as belonging to components. The scheduling info could be associated with a component instead of a file. I dunno. Again, throwing ideas out.

FWIW, you may be interested in how binaries are declared. Essentially, any moz.build can declare that it has sources part of a binary. After reading all moz.build files, we verify that we have a primary entry for that binary declared *somewhere*. But it doesn't need to be known at the time of individual moz.build evaluation. https://gecko.readthedocs.io/en/latest/build/buildsystem/defining-binaries.html.
> You probably want __iadd__ to union the values.

Not for exclusive mode where we want to say things like "these files affect only pocket, not firefox or focus or ..".

> moz.build could define a global list of COMPONENTS

This is much nicer than declaring them in mozbuild.affects, but does mean we'd need to read moz.build files to build the task description schema.  Maybe that's reasonable..

Here's another sketch:

---
SCHEDULING_COMPONENTS.inclusive = ['js-lint', 'py-lint', 'docs']
SCHEDULING_COMPONENTS.exclusive = ['android', 'macosx', 'windows', 'linux']

with Files('**/*.js'):
    SCHEDULES.inclusive += ['js-lint']

with Files('**/*.py'):
    SCHEDULES.inclusive += ['py-lint']

with Files('mobile/android/**'):
    SCHEDULES.exclusive = ['android'] 
---

SCHEDULES would disallow assignment for inclusive and disallow += for exclusive.  Both values would be limited to the declared components.  The default values are SCHEDULES.inclusive = [] and SCHEDULES.exclusive = SCHEDULING_COMPONENTS.exclusive.  Reading from SCHEDULES would return the union of inclusive and exclusive, and files_info would take the union of SCHEDULES for each path.

This makes things a little more explicit with "inclusive' and "exclusive", which can be defined somewhere in the docs, and enforces that they are used with the proper = or +=.

Then:
 configure.in: ['android', 'macosx', 'windows', 'linux']
 taskcluster/ci/build/kind.yml: ['android', 'macosx', 'windows', 'linux']
 mobile/android/foo.js: ['android', 'js-lint']
 dom/bar.py: ['android', 'macosx', 'windows', 'linux', 'py-lint']

This leaves 

with Files('python/mozlint/**'):
    SCHEDULES.inclusive += ['py-lint']

when really that's exclusive (no need to build android, etc. if only lint config has changed).  So as an alternative to this sketch, we could allow inclusive components to be used exclusively, but not the other way around.
Attachment #8890085 - Attachment is obsolete: true
I'm pushing a draft of this idea for review.  It has some weaknesses:
 * I'm not totally happy with the SCHEDULES syntax - feedback welcome
 * it'd be good to have a single source of the whole set of components, for schema validation - how can that be in moz.build?
 * I think the optimization logic needs more work to support optimizing both tests and the builds they depend on

When I have the thing working, I'd like to take a sample of pushes to m-c and then looking at the components selected for those pushes.  I'd also like to optimize some full task graphs with different combinations of components.  I haven't done any of that yet.
Sorry, the reason I started that comment was to say, this is a work in progress, so please don't consider it a full review - I'm not looking for r+.
Comment on attachment 8896094 [details]
Bug 1383880: optimize in three phases;

https://reviewboard.mozilla.org/r/167368/#review172734
Attachment #8896094 - Flags: review?(ahalberstadt) → review+
Comment on attachment 8896092 [details]
Bug 1383880: allow only one optimization per task

https://reviewboard.mozilla.org/r/167364/#review172736

This is an improvement, so I'll change this to an r+ if you want.. but I think the issue below will be really useful, especially for sheriffs.

::: taskcluster/taskgraph/optimize.py:122
(Diff revision 1)
> +    logger.info("optimized {} tasks by replacing with existing tasks and optimized {} tasks away"
> +                .format(optimized_replaced, optimized_away))

Can we get `optimize_task` to return a tuple of `(opt_result, opt_label)`? Then we can store the counts in a `{opt_label: count}` dict and log how many tasks were optimized from each, e.g:

optimized 15 tasks by replacement
optimized 24 tasks by SETA
optimized 70 tasks by exclusive('android')

I think this would really help out the sheriffs figure out why there are so many jobs missing on a given push (especially before they get used to this).

It might even be worth getting treeherder to highlight these lines.
Attachment #8896092 - Flags: review?(ahalberstadt) → review-
Comment on attachment 8896093 [details]
Bug 1383880: allow only one optimization per task;

https://reviewboard.mozilla.org/r/167366/#review172732

::: commit-message-1a763:6
(Diff revision 1)
> +but one of the optimizations. For example, given both `seta` and
> +`skip-unless-files-changed` optimizations, if SETA says to skip a test, it is
> +low value and should be skipped regardless of what files have changed. But if

Not that it matters given what this patch is doing, but I'd argue the opposite. If when.files-changed says to run a test, we would run it regardless of SETA.
Attachment #8896093 - Flags: review?(ahalberstadt) → review+
Comment on attachment 8896095 [details]
Bug 1383880: add support for SCHEDULES in moz.build;

https://reviewboard.mozilla.org/r/167370/#review172768

::: python/mozbuild/mozbuild/frontend/context.py:575
(Diff revision 1)
> +     * VAR.exclusive can only be assigned to (no +=), and can only contain
> +       values from SCHEDULING_COMPONENTS.inclusive or .exclusive
> +    """
> +    __slots__ = ('_exclusive', '_inclusive')
> +
> +    INCLUSIVE_COMPONENTS = ['py-lint', 'js-lint', 'yaml-lint']

Let's leave linting stuff out of it for now, and add it back in when/if we move it out of when.files-changed in a follow-up.
Comment on attachment 8896095 [details]
Bug 1383880: add support for SCHEDULES in moz.build;

https://reviewboard.mozilla.org/r/167370/#review172770

::: python/mozbuild/mozbuild/frontend/mach_commands.py:228
(Diff revision 1)
>          reader = self._get_reader(finder=reader_finder)
>          return reader.files_info(allpaths)
> +
> +
> +    @SubCommand('file-info', 'schedules',
> +                'Show the combined AFFECTS_ONLY_TASKS for the files listed.')

s/AFFECTS_ONLY_TASKS/SCHEDULES
Comment on attachment 8890085 [details]
Bug 1383880: Annotate builds and tests with SCHEDULES-related optimizations;

https://reviewboard.mozilla.org/r/161160/#review172778

::: commit-message-462d7:3
(Diff revision 2)
> +Bug 1383880: Annotate builds and tests with SCHEDULES optimization; r?ahal
> +
> +This means that a push to try affecting only Android only run android builds

nit: Android will only run

::: taskcluster/taskgraph/transforms/tests.py:918
(Diff revision 2)
> -        # run SETA unless this is a try push
> -        if config.params['project'] == 'try':
> -            jobdesc['when'] = test.get('when', {})
> -        else:
> +        # decide how to optimize this task; for non-try branches, that means SETA
> +        if config.params['project'] != 'try':
> +            # when SETA is enabled, the 'when' does not apply (optimizations don't mix),
> +            # so test['when'] is ignored
> -            # when SETA is enabled, the 'when' does not apply (optimizations don't mix)
>              jobdesc['optimization'] = {'seta': None}
> +        elif 'when' in test:
> +            # test specified a `when` clause, so use that
> +            jobdesc['when'] = test['when']
> +        else:
> +            build_platform = attributes['build_platform']
> +            jobdesc['optimization'] = {'skip-unless-schedules': [platform_family(build_platform)]}

Wait, so does this mean this is only being enabled on try? That's a bit disappointing, I thought this was being enabled across all trees.

Bit of an aside, but I don't like that SETA is the optimization of most importance here because it's very naive. SETA only looks at tasks based on how frequently they fail, but it can't distinguish between:

1) Didn't fail because it's not useful
2) Didn't fail because nothing changed that can affect it

So when SETA says "don't run this task", that doesn't necessarily mean it isn't useful for the given push.

The new SCHEDULES mechanism (and even when.files-changed) has real insight into whether the task is affected by the push or not (insight that SETA doesn't have the benefit of), so for that reason I think SCHEDULES should take precedence over SETA.

But I'm digressing, this is probably all debate best left to a follow-up. Even getting this landed on try as a first step is better than nothing.
Attachment #8890085 - Flags: review?(ahalberstadt) → review+
Comment on attachment 8896096 [details]
Bug 1383880: add support for optimizing tasks based on SCHEDULES;

https://reviewboard.mozilla.org/r/167372/#review172766

::: taskcluster/taskgraph/transforms/job/__init__.py:62
(Diff revision 1)
> -    Optional('optimization'): task_description_schema['optimization'],
> +    Exclusive('optimization', 'optimization'): task_description_schema['optimization'],
>      Optional('needs-sccache'): task_description_schema['needs-sccache'],
>  
> -    # The "when" section contains descriptions of the circumstances
> -    # under which this task should be included in the task graph.  This
> -    # will be converted into an optimization.
> -    Optional('when'): Any({
> +    # The "when" section contains descriptions of the circumstances under which
> +    # this task should be included in the task graph.  This will be converted
> +    # into an optimization, so it cannot be specified in a job description that
> +    # also gives 'optimization'.
> +    Exclusive('when', 'optimization'): Any({

I think this means you can remove that error message you added in the second commit of this series. This change likely belongs in that commit as well.
Attachment #8896096 - Flags: review?(ahalberstadt) → review+
Comment on attachment 8896093 [details]
Bug 1383880: allow only one optimization per task;

https://reviewboard.mozilla.org/r/167366/#review172810

Drive-by r+ because I wholeheartedly approve and limiting complex and undefined behavior :)
Attachment #8896093 - Flags: review+
> Let's leave linting stuff out of it for now, and add it back in when/if we move it out of when.files-changed in a follow-up.

Without this, the "inclusive" side is empty, so we have no example of its utility.  I think this is going to need a lot more ancillary work before it's ready to land, so moving from files-changed to SCHEDULES for linting is probably a pretty minimal change to roll in.
The more I think about this, the more problems I find :(

SETA
----

You're right that SETA is naive, in that it skips tests that might be useful; but SCHEDULES is equally naive about tests, just from a conservative direction: it will usually schedule all tests.

  OPTION: only use SCHEDULES.exclusive for platform families (android, macosx, etc.), not for tests, which just use SETA

Unfortunately, SETA will always schedule *some* test on each platform, which means we'll always run all builds, so this would lead to no savings in jobs run.

  OPTION: drop SETA support entirely

This would rely entirely on the conservative SCHEDULES logic, which would not eliminate nearly as many jobs as SETA, and our load and pending times would spike.

✓ OPTION: if SCHEDULES indicates a test should run, consult SETA and run iff SETA says to

This is still kind of a hack, but this is most effective at reducing load, while still running things that could be affected by the push and which SETA thinks are "important".

TRY OPTION SYNTAX
-----------------

Try option syntax runs with optimize_target_tasks = True.  It must do so since it targets a lot of tasks (`-j`obs especially) by default on the assumption that they will be optimized out.  But for builds and tests, developers rely on it to actually run what they have requested -- and it does so because those do not use the when.files-changed optimization.  But if we have optimize_target_tasks = True with SCHEDULES, then try syntax will frequently not run the requested jobs, which is not what users expect (for example, it's common to run tests in an empty try push..)

I think we will need to distinguish how try was invoked, and set optimize_target_tasks differently: True for try option syntax, False for try_task_config.json, and True for the "just try it" mode with no job specification at all.

RECURSIVE OPTIMIZATION
----------------------

To date we've only optimized either tasks with no dependencies (docker images) or tasks with no dependents (tests).  With this change we'll be doing deeper optimization -- skipping builds, toolchain tasks, etc.

Most tasks should only run if their dependents run (for example, a toolchain build need not run unless there is a build to consume it).  We have a few tasks now which are the reverse: they should *only* run if their dependencies run -- things like symbol uploads and source uploads.  So neither a pre-order or post-order traversal of the dependency graph is sufficient, but the optimization dependencies do form a DAG, so it's possible to traverse it correctly.

We also have a few cases now where we want multiple optimization behaviors on the same task.  Builds are the worst:

 - if there are no dependent tests, and SCHEDULED doesn't match, optimize away
 - if the change does not include inputs to the build, find an existing build task and replace with that (artifact builds)

so.. maybe limiting to one optimization isn't the right approach?  Or maybe we just need some specialized optimization functions for critical kinds like test and build that do lots of stuff.  And maybe we should split up the "optimize away" and "optimize by replacing" phases?
I think that's a really good summary, it is *really complicated*.

(In reply to Dustin J. Mitchell [:dustin] from comment #48)
> The more I think about this, the more problems I find :(
> 
> SETA
> ----
> 
> You're right that SETA is naive, in that it skips tests that might be
> useful; but SCHEDULES is equally naive about tests, just from a conservative
> direction: it will usually schedule all tests.
> 
>   OPTION: only use SCHEDULES.exclusive for platform families (android,
> macosx, etc.), not for tests, which just use SETA
> 
> Unfortunately, SETA will always schedule *some* test on each platform, which
> means we'll always run all builds, so this would lead to no savings in jobs
> run.
> 
>   OPTION: drop SETA support entirely
> 
> This would rely entirely on the conservative SCHEDULES logic, which would
> not eliminate nearly as many jobs as SETA, and our load and pending times
> would spike.
> 
> ✓ OPTION: if SCHEDULES indicates a test should run, consult SETA and run iff
> SETA says to
> 
> This is still kind of a hack, but this is most effective at reducing load,
> while still running things that could be affected by the push and which SETA
> thinks are "important".

OPTION: if SCHEDULES indicates a task should run, don't consult SETA

Though upon second thought, I think your current patch does the right thing for now (despite my disappointment :p). Let's just enable it on try (where SETA isn't running) and once we have a feel for how this is being used I think it will be a bit more clear how to handle SETA. Jmaher has also been talking about doing a full SETA rewrite sometime around 2018Q1 (though we could probably prioritize it higher if it helped), so maybe we can coordinate these two efforts at a later time.


> TRY OPTION SYNTAX
> -----------------

I think instead of worrying about this case overly much we should focus on moving try_option_syntax.py out of taskcluster and into tools/tryselect (e.g also make it generate try_task_config.json). This depends on everything being scheduled in taskcluster though, so blocked until at least Q4.


> RECURSIVE OPTIMIZATION
> ----------------------
>
> so.. maybe limiting to one optimization isn't the right approach?  Or maybe
> we just need some specialized optimization functions for critical kinds like
> test and build that do lots of stuff.  And maybe we should split up the
> "optimize away" and "optimize by replacing" phases?

I think longer term we might want more than one optimization enabled, but this is starting to hurt my head thinking about it. I whole-heartedly retract my disappointment about this being try-only and think we should push forward on that for now. Maybe things like a SETA re-write and nuking try_option_syntax will help make it clear what we need to do to get this enabled on non-try branches.
Comment on attachment 8896095 [details]
Bug 1383880: add support for SCHEDULES in moz.build;

https://reviewboard.mozilla.org/r/167370/#review172814

This looks mostly good!

There are some significant things we'll need to figure out eventually. But it's good enough to land with minor fix-ups.

::: python/mozbuild/mozbuild/frontend/context.py:568
(Diff revision 1)
> +       from SCHEDULING_COMPONENTS.inclusive
> +
> +     * VAR.exclusive can only be assigned to (no +=), and can only contain
> +       values from SCHEDULING_COMPONENTS.inclusive or .exclusive

What is this `SCHEDULING_COMPONENTS` you refer to?

::: python/mozbuild/mozbuild/frontend/context.py:575
(Diff revision 1)
> +    INCLUSIVE_COMPONENTS = ['py-lint', 'js-lint', 'yaml-lint']
> +    EXCLUSIVE_COMPONENTS = ['android', 'linux', 'macosx', 'windows']
> +    ALL_COMPONENTS = INCLUSIVE_COMPONENTS + EXCLUSIVE_COMPONENTS

I feel like the sets of schedulable things need to be defined elsewhere. Where, exactly, I'm not sure.
A while back, I had this idea for a moz.build like file that defined static data. Basically it would make a `VARS` dict or something available to all moz.build files.

Anyway, I'm fine doing it like this for now. We can figure out a more maintainable/scalable solution later.
Attachment #8896095 - Flags: review?(gps) → review-
Some progress here.  I revamped when and how try inputs are handled (handling them in the decision task and encoding in parameters.yml) to solve TRY_OPTION_SYNTAX.  And I rewrote optimization to do two passes: one to remove tasks (starting with tests and ending with toolchains) and one to replace tasks (starting with toolchains and ending with tests), to solve RECURSIVE OPTIMIZATION.  Optimization strategies are now objects, so we can implement some general "combiners" such as the various options for SETA⊕SCHEDULES discussed above.

That did sort of pave over some of the work in bug 1387135, but hopefully you'll like the result..
TODO:
 * testing (run against a bunch of old parameters.yml's)
 * fix up the SETA-vs-SCHEDULES
 * find a few more platform-specific files and annotate them
Attached file test-mozbuild-error.txt (obsolete) —
Greg, I'm getting a few of these (this one is for mozilla-central pushid 32358) where the moz.build reading traverses into the moz.build test suites.  I feel like I must be doing something silly to trip over this, or am I just the first one to try to do a filesystem read on that test directory?  Suggestions for fixing this?
Attachment #8900879 - Flags: feedback?(gps)
Attachment #8896093 - Attachment is obsolete: true
Attachment #8896094 - Flags: review+ → review?(ahalberstadt)
Attachment #8896096 - Flags: review+ → review?(ahalberstadt)
Attachment #8896095 - Flags: review?(ahalberstadt)
Oh, I'll need to add another commit or two to account for tasks like uploading symbols that should be optimized away if their dependencies are.  What I've put up should still be ready for review.
Attachment #8890085 - Flags: review+ → review?(ahalberstadt)
(In reply to Dustin J. Mitchell [:dustin] from comment #53)
> Created attachment 8900879 [details]
> test-mozbuild-error.txt
> 
> Greg, I'm getting a few of these (this one is for mozilla-central pushid
> 32358) where the moz.build reading traverses into the moz.build test suites.
> I feel like I must be doing something silly to trip over this, or am I just
> the first one to try to do a filesystem read on that test directory? 
> Suggestions for fixing this?

You may be the first person to hit this!

In mozbuild.frontend.reader, we do have a list of ignored directories in all_mozbuild_paths(). I suspect we'll want something similar for _find_relevant_mozbuilds(). But I suspect the simple solution will break tests! (Run `mach python-test python/mozbuild`.) This will likely require some kind of testing "backdoor." It might be easier to copy all files to a temporary directory instead of running tests out of the source directory.
Comment on attachment 8900879 [details]
test-mozbuild-error.txt

See last comment.
Attachment #8900879 - Flags: feedback?(gps)
Depends on: 1394071
Comment on attachment 8890085 [details]
Bug 1383880: Annotate builds and tests with SCHEDULES-related optimizations;

https://reviewboard.mozilla.org/r/161160/#review178592

::: taskcluster/taskgraph/transforms/tests.py:926
(Diff revision 4)
> -        # run SETA unless this is a try push
> -        if config.params['project'] == 'try':
> -            jobdesc['when'] = test.get('when', {})
> +        if test.get('when'):
> +            jobdesc['when'] = test['when']
> +        else:
> +            schedules = [platform_family(test['build-platform'])]
> +            if config.params['project'] != 'try':
> +                # for non-try branches, include SETA
> +                jobdesc['optimization'] = {'skip-unles-schedules-or-seta': schedules}

Thanks for fixing this!

::: taskcluster/taskgraph/transforms/tests.py:934
(Diff revision 4)
> -            # when SETA is enabled, the 'when' does not apply (optimizations don't mix)
> -            jobdesc['optimization'] = {'seta': None}
> +                # otherwise just use skip-unless-schedules
> +                jobdesc['optimization'] = {'skip-unless-schedules': schedules}

An idea for the future, but on try maybe which optimizations get applied could be defined in try_task_config.json. This way we could control the behaviour based on what tool is being used to push to try. E.g running `./mach try -p all -u all` would leave optimizations enabled, whereas running `./mach try fuzzy -q "android"` would disable optimizations (since it was explicitly requested).

Definitely not something to worry about now.
Attachment #8890085 - Flags: review?(ahalberstadt) → review+
Comment on attachment 8896092 [details]
Bug 1383880: allow only one optimization per task

https://reviewboard.mozilla.org/r/167364/#review178600
Attachment #8896092 - Flags: review?(ahalberstadt) → review+
Comment on attachment 8901323 [details]
Bug 1383880: add Graph.visit_preorder;

https://reviewboard.mozilla.org/r/172780/#review178604

::: taskcluster/taskgraph/graph.py:81
(Diff revision 1)
>              add_nodes = set((left if reverse else right) for (left, right, _) in add_edges)
>              new_nodes = nodes | add_nodes
>              new_edges = edges | add_edges
>          return Graph(new_nodes, new_edges)
>  
> -    def visit_postorder(self):
> +    def visit_postorder(self, _reverse=False):

I find the 'private' argument a little weird, could the whole implementation be defined under a `_visit(reverse=False)` method instead?
Attachment #8901323 - Flags: review?(ahalberstadt) → review+
Comment on attachment 8896094 [details]
Bug 1383880: optimize in three phases;

https://reviewboard.mozilla.org/r/167368/#review178616

::: taskcluster/docs/optimization.rst:66
(Diff revision 2)
> +Removing Tasks
> +::::::::::::::
> +
> +This phase begins with tasks on which nothing depends and follows the
> +dependency graph backward from there -- right to left in the diagram above. If
> +a task is not removed, then nothing it depends on will be removed , either.

nit: delete second comma

::: taskcluster/taskgraph/optimize.py:5
(Diff revision 2)
>  # This Source Code Form is subject to the terms of the Mozilla Public
>  # License, v. 2.0. If a copy of the MPL was not distributed with this
>  # file, You can obtain one at http://mozilla.org/MPL/2.0/.
> +"""
> +The objective of optimization to remove as many tasks from the graph as

nit: is to remove

::: taskcluster/taskgraph/optimize.py:77
(Diff revision 2)
> +def _get_task_optimizations(target_task_graph, strategies):
> +    def optimizations():
> +        for label, task in target_task_graph.tasks.iteritems():
> +            if task.optimization:
> +                opt_by, arg = task.optimization.items()[0]
> +                yield label, (opt_by, strategies[opt_by], arg)
> +            else:
> +                yield label, ('never', strategies['never'], None)
> +    return {label: optinfo for label, optinfo in optimizations()}

Why not change the signature to `get_optimizations(task)` and pass in the function instead of a dict? The `strategies` could be available via closure or using functools.partial. This way we aren't calculating optimizations for tasks that aren't eligible.

::: taskcluster/taskgraph/optimize.py:104
(Diff revision 2)
>  
> -def optimize_task(task, params):
> +def remove_tasks(target_task_graph, params, optimizations, do_not_optimize):
>      """
> -    Run the optimization for a given task
> +    Implement the "Removing Tasks" phase, returning a set of task labels of all removed tasks.
>      """
> -    if not task.optimization:
> +    opt_counts = defaultdict(lambda: 0)

nit: defaultdict(int)

::: taskcluster/taskgraph/optimize.py:136
(Diff revision 2)
> +    """
> +    Implement the "Replacing Tasks" phase, returning a set of task labels of
> +    all replaced tasks. The replacement taskIds are added to label_to_taskid as
> +    a side-effect.
>      """
> +    opt_counts = defaultdict(lambda: 0)

nit: defaultdict(int)

::: taskcluster/taskgraph/optimize.py:248
(Diff revision 2)
> +    def should_remove_task(self, tsak, params, arg):
> +        return False

Just use `pass`. Alternatively consider turning OptimizationStrategy into an abc
Attachment #8896094 - Flags: review?(ahalberstadt) → review-
Comment on attachment 8901324 [details]
Bug 1383880: parse try config during the decision task;

https://reviewboard.mozilla.org/r/172782/#review178642

If you really want to keep the `try_mode` parameter, I guess that's fine. But we should really avoid using `target_tasks_default` in the event neither mode is available.

::: taskcluster/docs/parameters.rst:85
(Diff revision 1)
> +``try_mode``
> +    The mode in which a try push is operating.  This can be one of
> +    ``"try_task_config"``, ``"try_option_syntax"``, or ``None`` meaning no try
> +    input was provided.

Instead of using this parameter, we can just detect if the other parameters exist, e.g:

    try_config = parameters['try_task_config'] or {}
    if try_config.get('templates'):
        morphs.append(apply_jsone_templates...)

I don't feel we need to formalize the concept of a `mode`, especially when the `try_option_syntax` mode is hopefully going away soonish.

I was actually sort of thinking of using the `templates` alongside the try_option_syntax (so we'd have try_task_config.json with templates only and try syntax). You can easily talk me out of this though :).

::: taskcluster/taskgraph/target_tasks.py:112
(Diff revision 1)
> -    return labels
> +    else:
> +        # no try mode -- run everything and let optimization sort it out
> +        return target_tasks_default(full_task_graph, parameters)

I think this means running `hg push try` will schedule all tasks. I don't think we should do this, we want to discourage running all tasks on try as much as possible.

For now, let's just keep the same behaviour, e.g default to what `_try_option_syntax` without a `try:` message.
Attachment #8901324 - Flags: review?(ahalberstadt) → review-
Comment on attachment 8896096 [details]
Bug 1383880: add support for optimizing tasks based on SCHEDULES;

https://reviewboard.mozilla.org/r/167372/#review178658

Latest revision looks good.
Attachment #8896096 - Flags: review?(ahalberstadt) → review+
Comment on attachment 8901325 [details]
Bug 1383880: annotate source files with what they SCHEDULE;

https://reviewboard.mozilla.org/r/172784/#review178660
Attachment #8901325 - Flags: review?(ahalberstadt) → review+
Comment on attachment 8901345 [details]
Bug 1383880: add only-if-dependencies-run optimization for follow-ons;

https://reviewboard.mozilla.org/r/172798/#review178666

::: taskcluster/taskgraph/optimize.py:298
(Diff revision 1)
> +    # This takes advantage of the behavior of the second phase of optimization:
> +    # a task can only be replaced if it has no un-optimized dependencies. So if
> +    # should_replace_task is called, then a task has no un-optimized
> +    # dependencies and can be removed (indicated by returning True)
> +
> +    def should_replace_task(self, tsak, params, arg):

nit: task
Attachment #8901345 - Flags: review?(ahalberstadt) → review+
Comment on attachment 8896095 [details]
Bug 1383880: add support for SCHEDULES in moz.build;

This looks good to me, but deferring to gps for main review.
Attachment #8896095 - Flags: review?(ahalberstadt)
Attachment #8900879 - Attachment is obsolete: true
Thanks!  I appreciate the comments and will redraft, but with a few rejoinders below, boiling down to
 * I think `try_mode` is a feature; and
 * I think pushing to try with no task config or option syntax is (going to be) a feature.

(In reply to Andrew Halberstadt [:ahal] from comment #69)
> An idea for the future, but on try maybe which optimizations get applied
> could be defined in try_task_config.json. This way we could control the
> behaviour based on what tool is being used to push to try. E.g running
> `./mach try -p all -u all` would leave optimizations enabled, whereas
> running `./mach try fuzzy -q "android"` would disable optimizations (since
> it was explicitly requested).

We still want optimizations for try fuzzy -- just optimize_target_tasks = False, where it's true for normal pushes and for try option syntax.

The choice of which optimization to apply to a particular task is more a matter of what the task is.  That SETA is conditioned on project == try is largely because SETA is an awkward fit for an optimization.

(In reply to Andrew Halberstadt [:ahal] from comment #72)
> ::: taskcluster/taskgraph/optimize.py:248
> (Diff revision 2)
> > +    def should_remove_task(self, tsak, params, arg):
> > +        return False
> 
> Just use `pass`. Alternatively consider turning OptimizationStrategy into an
> abc

This is the default behavior.  `pass` would return None rather than False.

(In reply to Andrew Halberstadt [:ahal] from comment #73)
> I don't feel we need to formalize the concept of a `mode`, especially when
> the `try_option_syntax` mode is hopefully going away soonish.

I don't think it will ever go away -- at least not for a year or two.  The reason I like the mode is that the behavior differs on more than just target task selection -- not least, the optimization behavior is different.  I suspect we'll find other corner cases where we want to do the smart thing but find it incompatible with try option syntax.  We can guard that with `try_mode != 'try_option_syntax'`.

> I was actually sort of thinking of using the `templates` alongside the
> try_option_syntax (so we'd have try_task_config.json with templates only and
> try syntax). You can easily talk me out of this though :).

I really don't want to make try option syntax any more appealing, and I don't want to "blur the lines" by allowing use of both try option syntax and the new try task config.  Old and gross or new and shiny -- pick one.

> ::: taskcluster/taskgraph/target_tasks.py:112
> (Diff revision 1)
> > -    return labels
> > +    else:
> > +        # no try mode -- run everything and let optimization sort it out
> > +        return target_tasks_default(full_task_graph, parameters)
> 
> I think this means running `hg push try` will schedule all tasks. I don't
> think we should do this, we want to discourage running all tasks on try as
> much as possible.
> 
> For now, let's just keep the same behaviour, e.g default to what
> `_try_option_syntax` without a `try:` message.

It schedules all tasks with run-on-project: try, which is similar to what the default options are for try option syntax.  I'll do a test to see what the difference is, if any.  But in general this is the behavior I want: a bare push to try runs everything "relevant" to that push.  It typically hasn't been a great idea, but if we can get optimization to a point of being really effective, it could become a good idea.  For example, if I'm hacking on a mochitest and I push a change that just adds some console.log's to that test, ideally optimization will figure out (a) to only run that test suite and (b) to use already-existing builds, so I'll get pretty quick, minimal results.
(In reply to Dustin J. Mitchell [:dustin] from comment #78)
> This is the default behavior.  `pass` would return None rather than False.

Sorry, I meant:

class Never(...):
    pass

I'll address your other comments later.
Comment on attachment 8896094 [details]
Bug 1383880: optimize in three phases;

https://reviewboard.mozilla.org/r/167368/#review178766

::: taskcluster/taskgraph/optimize.py:248
(Diff revision 2)
> +    def should_remove_task(self, tsak, params, arg):
> +        return False

Ah, I see, I was thinking this was in the OptimizationStrategy class.  I will leave this class out entirely.
(In reply to Dustin J. Mitchell [:dustin] from comment #78)
>  * I think pushing to try with no task config or option syntax is (going to
> be) a feature.

I definitely agree with and support this vision. I just don't think it's ready to make the switch yet (more details below).


> I don't think it will ever go away -- at least not for a year or two.  The
> reason I like the mode is that the behavior differs on more than just target
> task selection -- not least, the optimization behavior is different.  I
> suspect we'll find other corner cases where we want to do the smart thing
> but find it incompatible with try option syntax.  We can guard that with
> `try_mode != 'try_option_syntax'`.

Ok, feel free to drop this issue. Fwiw, I don't see try syntax going away anytime soon, but I do see it being re-written on top of try_task_config.json, possibly as early as Q1 next year.


> > I was actually sort of thinking of using the `templates` alongside the
> > try_option_syntax (so we'd have try_task_config.json with templates only and
> > try syntax). You can easily talk me out of this though :).
> 
> I really don't want to make try option syntax any more appealing, and I
> don't want to "blur the lines" by allowing use of both try option syntax and
> the new try task config.  Old and gross or new and shiny -- pick one.

Sold!


> > ::: taskcluster/taskgraph/target_tasks.py:112
> > (Diff revision 1)
> > > -    return labels
> > > +    else:
> > > +        # no try mode -- run everything and let optimization sort it out
> > > +        return target_tasks_default(full_task_graph, parameters)
> > 
> > I think this means running `hg push try` will schedule all tasks. I don't
> > think we should do this, we want to discourage running all tasks on try as
> > much as possible.
> > 
> > For now, let's just keep the same behaviour, e.g default to what
> > `_try_option_syntax` without a `try:` message.

> It schedules all tasks with run-on-project: try, which is similar to what
> the default options are for try option syntax.  I'll do a test to see what
> the difference is, if any.  But in general this is the behavior I want: a
> bare push to try runs everything "relevant" to that push.  It typically
> hasn't been a great idea, but if we can get optimization to a point of being
> really effective, it could become a good idea.  For example, if I'm hacking
> on a mochitest and I push a change that just adds some console.log's to that
> test, ideally optimization will figure out (a) to only run that test suite
> and (b) to use already-existing builds, so I'll get pretty quick, minimal
> results.

Yes, I completely agree with this vision! But there are tons of 'test' kind tasks that have run-on-project: ['try']. So `hg push try` would now schedule all of those (whereas currently it only schedules stuff with a matching skip-unless-changed optimization). I think this change would almost be turning `hg push try` into an alias for `try: -b do -p all -u all`, which is definitely something we don't want to encourage. (If I'm wrong, where would the test tasks be optimized away?)

I know the current try_option_syntax.py default is super hacky and would also love to move away from it. But I don't think our optimizations are smart enough to do that yet.
(In reply to Andrew Halberstadt [:ahal] from comment #81)
> Ok, feel free to drop this issue. Fwiw, I don't see try syntax going away
> anytime soon, but I do see it being re-written on top of
> try_task_config.json, possibly as early as Q1 next year.

I think that will be very difficult to get right.  Let's cross (or burn) that bridge when we get to it.

> Yes, I completely agree with this vision! But there are tons of 'test' kind
> tasks that have run-on-project: ['try']. So `hg push try` would now schedule
> all of those (whereas currently it only schedules stuff with a matching
> skip-unless-changed optimization). I think this change would almost be
> turning `hg push try` into an alias for `try: -b do -p all -u all`, which is
> definitely something we don't want to encourage. (If I'm wrong, where would
> the test tasks be optimized away?)

There are only a few suites with skip-unless-changed defined (jittest, and jsreftest, and test-verify).

That said, you're right: a push to try without syntax right now schedules 33 tasks, while with my change it schedules about 2000.  So, bad idea in either case, but I don't want to make it that much worse!

> I know the current try_option_syntax.py default is super hacky and would
> also love to move away from it. But I don't think our optimizations are
> smart enough to do that yet.

Looking in more detail, I see that the default for `-b` is None, meaning no builds, and indeed those 33 tasks are all `-j`obs.  So I'll fix the default to just error out.  Someday we can make it fall back to the default target task method, when the optimizations are good enough :)
(In reply to Dustin J. Mitchell [:dustin] from comment #82) 
> Looking in more detail, I see that the default for `-b` is None, meaning no
> builds, and indeed those 33 tasks are all `-j`obs.  So I'll fix the default
> to just error out.  Someday we can make it fall back to the default target
> task method, when the optimizations are good enough :)

Is there a downside to keeping the default behaviour as is for the time being? I'm definitely in the minority, but I find the current version of `hg push try` to be very useful. E.g, when I modify mozlint, I can simply `hg push try` to get all lint tasks and the mozlint python tests scheduled.
That's what I ended up doing.  I didn't realize anyone actually used that behavior -- interesting datapoint :)
Comment on attachment 8901324 [details]
Bug 1383880: parse try config during the decision task;

https://reviewboard.mozilla.org/r/172782/#review179204

Thanks, I like this.
Attachment #8901324 - Flags: review?(ahalberstadt) → review+
Comment on attachment 8896094 [details]
Bug 1383880: optimize in three phases;

https://reviewboard.mozilla.org/r/167368/#review179208

Thanks!
Attachment #8896094 - Flags: review?(ahalberstadt) → review+
Comment on attachment 8896095 [details]
Bug 1383880: add support for SCHEDULES in moz.build;

https://reviewboard.mozilla.org/r/167370/#review180196

\o/

::: python/mozbuild/mozbuild/test/frontend/test_reader.py:494
(Diff revision 3)
> +        self.assertEqual(info['foo.win']['SCHEDULES'].exclusive, ['windows'])
> +        self.assertEqual(info['foo.osx']['SCHEDULES'].inclusive, [])
> +        self.assertEqual(info['foo.osx']['SCHEDULES'].exclusive, ['macosx'])
> +        self.assertEqual(info['subd/aa']['SCHEDULES'].inclusive, ['py-lint'])
> +        self.assertEqual(info['subd/aa']['SCHEDULES'].exclusive, ['android', 'linux', 'macosx', 'windows'])
> +        # Files('bb') in subd/moz.build overrides Files('subdir/**')

There is no such ``Files('bb')`` in ``subd/moz.build`` and the ``yaml-lint`` defined in that moz.build is not used. I think your test coverage is lacking.

I reckon you can fix this before landing or ask for another review if the code needs changed.
Attachment #8896095 - Flags: review?(gps) → review+
Comment on attachment 8896096 [details]
Bug 1383880: add support for optimizing tasks based on SCHEDULES;

https://reviewboard.mozilla.org/r/167372/#review180198

::: build/sparse-profiles/taskgraph:23
(Diff revision 4)
>  path:testing/config/tooltool-manifests/
>  path:testing/mozharness/
>  path:tools/lint/
>  
> +# Moz.build files are read in filesystem mode
> +glob:**/moz.build

I think you'll also want ``glob:**/*.mozbuild`` for things like ``toolkit/toolkit.mozbuild``. Although, we may not read these files in filesystem traversal mode. There are enough of them that it we probably don't want to take chances though.
Pushed by dmitchell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/045498bc36c4
allow only one optimization per task; r=ahal
https://hg.mozilla.org/integration/autoland/rev/91f116ce6c2a
add Graph.visit_preorder; r=ahal
https://hg.mozilla.org/integration/autoland/rev/e62c90e3c1e8
optimize in three phases; r=ahal
https://hg.mozilla.org/integration/autoland/rev/8056488d8aed
parse try config during the decision task; r=ahal
https://hg.mozilla.org/integration/autoland/rev/5b10d321cfee
add support for SCHEDULES in moz.build; r=gps
https://hg.mozilla.org/integration/autoland/rev/a33f5a14a471
add support for optimizing tasks based on SCHEDULES; r=ahal
https://hg.mozilla.org/integration/autoland/rev/729a7e2091e8
Annotate builds and tests with SCHEDULES-related optimizations; r=ahal
https://hg.mozilla.org/integration/autoland/rev/a0abda41172a
annotate source files with what they SCHEDULE; r=ahal
https://hg.mozilla.org/integration/autoland/rev/53f5d47a7cb0
add only-if-dependencies-run optimization for follow-ons; r=ahal
Attachment #8890085 - Attachment is obsolete: true
Comment on attachment 8903754 [details]
Bug 1383880: Annotate builds and tests with SCHEDULES-related optimizations;

https://reviewboard.mozilla.org/r/175504/#review180616

::: taskcluster/taskgraph/transforms/tests.py:932
(Diff revision 2)
> -        if config.params['project'] == 'try':
> -            jobdesc['when'] = test.get('when', {})
> +            jobdesc['when'] = test['when']
> +        else:
> +            schedules = [platform_family(test['build-platform'])]
> +            if config.params['project'] != 'try':
> +                # for non-try branches, include SETA
> +                jobdesc['optimization'] = {'skip-unless-schedules-or-seta': schedules}

the backout was because of a typo here, now fixed.. but apparently mozreview decided that needs another r?
Flags: needinfo?(dustin)
Comment on attachment 8903754 [details]
Bug 1383880: Annotate builds and tests with SCHEDULES-related optimizations;

https://reviewboard.mozilla.org/r/175504/#review180634
Attachment #8903754 - Flags: review?(ahalberstadt) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/a3b7bdfdb116300daa3f49e0dfc96177e1369440
Bug 1383880: allow only one optimization per task; r=ahal

https://hg.mozilla.org/integration/mozilla-inbound/rev/804e6c7f441cfcf3761e0721a9a58ef29ef91a3e
Bug 1383880: add Graph.visit_preorder; r=ahal

https://hg.mozilla.org/integration/mozilla-inbound/rev/c25af2c111a5be4e3381d0b002641691d15fe4e8
Bug 1383880: optimize in three phases; r=ahal

https://hg.mozilla.org/integration/mozilla-inbound/rev/b53ff084c2d7968a1d9864d1343f2d9381fb652b
Bug 1383880: parse try config during the decision task; r=ahal

https://hg.mozilla.org/integration/mozilla-inbound/rev/63ded86f8e0e02c50088d96f9ea5d74fbed55a2b
Bug 1383880: add support for SCHEDULES in moz.build; r=gps

https://hg.mozilla.org/integration/mozilla-inbound/rev/046d705929f7a41e977eec19c8503afccdec7592
Bug 1383880: add support for optimizing tasks based on SCHEDULES; r=ahal

https://hg.mozilla.org/integration/mozilla-inbound/rev/b41cd667697e13c989659b16bf649090a3908ecd
Bug 1383880: Annotate builds and tests with SCHEDULES-related optimizations; r=ahal

https://hg.mozilla.org/integration/mozilla-inbound/rev/8586bf76d73db40ef52142fff8f21cdb0b31cf31
Bug 1383880: annotate source files with what they SCHEDULE; r=ahal

https://hg.mozilla.org/integration/mozilla-inbound/rev/4b3fa9587e88431cc28331d129dd7ef4d1a793c2
Bug 1383880: add only-if-dependencies-run optimization for follow-ons; r=ahal
Ugh, so it turns out a bunch of tasks import from optimization for some unholy reason.  It looks like it's to support artifact builds.
Backed out for failure processing config/moz.build and other bustage:

https://hg.mozilla.org/integration/mozilla-inbound/rev/d75488a99bf60fe624a406e3e23e99a86cee78f1
https://hg.mozilla.org/integration/mozilla-inbound/rev/fe6db2c2e33a6881ec6504f5d70a9b9fd6c1a45a
https://hg.mozilla.org/integration/mozilla-inbound/rev/1a49b07428066bc85647249fe9ca9b3fe2a97f2a
https://hg.mozilla.org/integration/mozilla-inbound/rev/f9f68cf3bee6aa7d1b403380ad163087d4caf0a2
https://hg.mozilla.org/integration/mozilla-inbound/rev/1c581da21a7d0c67564aafca5123701094cb2fd2
https://hg.mozilla.org/integration/mozilla-inbound/rev/d65d89adb9acc213ead0b778721d3d5c8a89e062
https://hg.mozilla.org/integration/mozilla-inbound/rev/6f5dc86effe08027d7b76ba67040512f336253cc
https://hg.mozilla.org/integration/mozilla-inbound/rev/f2aaab6e382063b5bd9041daa61805f6459a1baf
https://hg.mozilla.org/integration/mozilla-inbound/rev/956cfc2744f6841661930d5428aad9fbdc3bf81f

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=4b3fa9587e88431cc28331d129dd7ef4d1a793c2&filter-resultStatus=exception&filter-resultStatus=usercancel&filter-resultStatus=runnable&filter-resultStatus=retry&filter-resultStatus=testfailed&filter-resultStatus=busted

Failure log gecko decision task: https://treeherder.mozilla.org/logviewer.html#?job_id=128945145&repo=mozilla-inbound
[task 2017-09-06T15:39:50.102325Z] BuildReaderError: 
[task 2017-09-06T15:39:50.102371Z] ==============================
[task 2017-09-06T15:39:50.102416Z] ERROR PROCESSING MOZBUILD FILE
[task 2017-09-06T15:39:50.102461Z] ==============================
[task 2017-09-06T15:39:50.102514Z] 
[task 2017-09-06T15:39:50.102558Z] The error occurred while processing the following file:
[task 2017-09-06T15:39:50.102579Z] 
[task 2017-09-06T15:39:50.102614Z]     /builds/worker/checkouts/gecko/config/moz.build
[task 2017-09-06T15:39:50.102633Z] 
[task 2017-09-06T15:39:50.102668Z] The error was triggered on line 35 of this file:
[task 2017-09-06T15:39:50.102710Z] 
[task 2017-09-06T15:39:50.102853Z]     'tests/python.ini',
[task 2017-09-06T15:39:50.102876Z] 
[task 2017-09-06T15:39:50.102930Z] An error was encountered as part of executing the file itself. The error appears to be the fault of the script.
[task 2017-09-06T15:39:50.102952Z] 
[task 2017-09-06T15:39:50.102981Z] The error as reported by Python is:
[task 2017-09-06T15:39:50.103089Z] 
[task 2017-09-06T15:39:50.103139Z]     ['IOError: Missing files: /builds/worker/checkouts/gecko/config/tests/python.ini\n']


Another failure example (android checkstyle): https://treeherder.mozilla.org/logviewer.html#?job_id=128942104&repo=mozilla-inbound
[task 2017-09-06T15:35:06.086270Z] 15:35:06     INFO -  Error running mach:
[task 2017-09-06T15:35:06.086680Z] 15:35:06     INFO -      ['artifact', 'toolchain', '-v', '--retry', '4', '--tooltool-manifest', '/builds/worker/workspace/build/src/mobile/android/config/tooltool-manifests/android-frontend/releng.manifest', '--artifact-manifest', '/builds/worker/workspace/build/src/toolchains.json', '--tooltool-url', 'http://relengapi/tooltool/', '--cache-dir', '/builds/worker/tooltool-cache']
[task 2017-09-06T15:35:06.086826Z] 15:35:06     INFO -  The error occurred in the implementation of the invoked mach command.
[task 2017-09-06T15:35:06.087050Z] 15:35:06     INFO -  This should never occur and is likely a bug in the implementation of that
[task 2017-09-06T15:35:06.087229Z] 15:35:06     INFO -  command. Consider filing a bug for this issue.
[task 2017-09-06T15:35:06.087434Z] 15:35:06     INFO -  If filing a bug, please include the full output of mach, including this error
[task 2017-09-06T15:35:06.087600Z] 15:35:06     INFO -  message.
[task 2017-09-06T15:35:06.087787Z] 15:35:06     INFO -  The details of the failure are as follows:
[task 2017-09-06T15:35:06.087967Z] 15:35:06     INFO -  ImportError: cannot import name optimize_task
[task 2017-09-06T15:35:06.088193Z] 15:35:06     INFO -    File "/builds/worker/workspace/build/src/python/mozbuild/mozbuild/mach_commands.py", line 1785, in artifact_toolchain
[task 2017-09-06T15:35:06.088364Z] 15:35:06     INFO -      from taskgraph.optimize import optimize_task
[task 2017-09-06T15:35:06.101912Z] 15:35:06    ERROR - Return code: 1
[task 2017-09-06T15:35:06.102173Z] 15:35:06    ERROR - 1 not in success codes: [0]
[task 2017-09-06T15:35:06.102385Z] 15:35:06  WARNING - setting return code to 2
Flags: needinfo?(dustin)
Greg, thoughts about the first error -- I suspect that's some new issue based on sparse checkouts..

For the second, I'll just copy-paste whatever functions artifact_toolchain was calling in optimize into that mach_commands.py file and leave it be.
Flags: needinfo?(dustin) → needinfo?(gps)
Yes, this is a sparse checkouts failure.

For context, a more complete stack:

[task 2017-09-06T15:39:50.101398Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/optimize.py", line 364, in scheduled_by_push
[task 2017-09-06T15:39:50.101438Z]     for p, m in rdr.files_info(changed_files).items():
[task 2017-09-06T15:39:50.101494Z]   File "/builds/worker/checkouts/gecko/python/mozbuild/mozbuild/frontend/reader.py", line 1342, in files_info
[task 2017-09-06T15:39:50.101564Z]     paths, _ = self.read_relevant_mozbuilds(paths)
[task 2017-09-06T15:39:50.101639Z]   File "/builds/worker/checkouts/gecko/python/mozbuild/mozbuild/frontend/reader.py", line 1306, in read_relevant_mozbuilds
[task 2017-09-06T15:39:50.101676Z]     self.config, metadata=metadata):
[task 2017-09-06T15:39:50.101816Z]   File "/builds/worker/checkouts/gecko/python/mozbuild/mozbuild/frontend/reader.py", line 1061, in read_mozbuild
[task 2017-09-06T15:39:50.101853Z]     raise bre
[task 2017-09-06T15:39:50.102325Z] BuildReaderError: 
[task 2017-09-06T15:39:50.102371Z] ==============================
[task 2017-09-06T15:39:50.102416Z] ERROR PROCESSING MOZBUILD FILE
[task 2017-09-06T15:39:50.102461Z] ==============================
[task 2017-09-06T15:39:50.102514Z] 
[task 2017-09-06T15:39:50.102558Z] The error occurred while processing the following file:
[task 2017-09-06T15:39:50.102579Z] 
[task 2017-09-06T15:39:50.102614Z]     /builds/worker/checkouts/gecko/config/moz.build
[task 2017-09-06T15:39:50.102633Z] 
[task 2017-09-06T15:39:50.102668Z] The error was triggered on line 35 of this file:
[task 2017-09-06T15:39:50.102710Z] 
[task 2017-09-06T15:39:50.102853Z]     'tests/python.ini',
[task 2017-09-06T15:39:50.102876Z] 
[task 2017-09-06T15:39:50.102930Z] An error was encountered as part of executing the file itself. The error appears to be the fault of the script.
[task 2017-09-06T15:39:50.102952Z] 
[task 2017-09-06T15:39:50.102981Z] The error as reported by Python is:
[task 2017-09-06T15:39:50.103089Z] 
[task 2017-09-06T15:39:50.103139Z]     ['IOError: Missing files: /builds/worker/checkouts/gecko/config/tests/python.ini\n']
[task 2017-09-06T15:39:50.103162Z] 

moz.build evaluation performs some file existence checking for things like referenced test manifest files. So adding just the moz.build files to the sparse checkout isn't enough.

There are a few ways we can procedure:

1) hack at sparse checkout until all referenced files are present
2) change moz.build processing logic to not enforce non-moz.build file presence in files info reading mode
3) use a custom Finder for BuildReader

#1 will get ugly fast and will continuously break over time. It won't be fun to maintain.
#2 is certainly doable. But is a bit of scope bloat and likely requires a myriad of random changes. It makes the moz.build reading logic more complicated. I'm not a huge fan.

I like #3. Essentially, we have a VFS layer for moz.build reading. By default, it uses a VFS backed by the filesystem. But if you pass it a mozpack.files.MercurialRevisionFinder or mozpack.hg.MercurialNativeRevisionFinder bound to a specific Mercurial revision, it can read file contents directly from Mercurial without using a checkout! This is, uh, exactly what we want :)

MercurialRevisionFinder uses hglib to obtain data. We now have hglib vendored in repo, so this should "just work."

MercurialNativeRevisionFinder uses the Mercurial APIs directly. This means it is much faster. But Mercurial's APIs are GPL. There are some weird licensing hoops we jump through to allow mozbuild to not be contaminated. See the license header in python/mozbuild/mozpack/hg.py. I suspect we'd also need an exclusion for "taskgraph" if you wanted to "import mozpack.hg" from taskgraph.

A workaround is to run a standalone process running the GPL code. You would likely create a minimal Mercurial extensions providing an `hg mozbuildinfo` command (like https://hg.mozilla.org/hgcustom/version-control-tools/file/f6c98ca8fb36/hgext/hgmo/__init__.py#l651) and have it invoke MercurialNativeRevisionFinder (like in https://hg.mozilla.org/hgcustom/version-control-tools/file/f6c98ca8fb36/pylib/mozhg/mozhg/mozbuildinfo.py).

I would use MercurialRevisionFinder now. Unless it is performance prohibitive, we can defer using MercurialNativeRevisionFinder to a follow-up.

Keep in mind that we need to support Git checkouts as well. So the code using a Mercurial*Finder needs to only be active if a Mercurial repo is used. ``self.repository`` on a MozbuildObject or MachCommandBase instance (which the @command for `mach taskgraph` is) will give you answers.

Actually, now that I've written that all up, I think I can piece together an API that you can use pretty turnkey. Expect a blocking bug and review in your queue in the next ~1 hour.
Flags: needinfo?(gps)
Depends on: 1397406
Thanks Greg, that's great!  I was expecting "well, let's just include "**/python.ini" but this is way better :)

Mike, the other error is from `mach artifact toolchain`. It imports of various bits of taskgraph and hacks up an environment that worked once, but doesn't work anymore.  And it doesn't seem to have any tests that ensure it continues working.

https://dxr.mozilla.org/mozilla-central/source/python/mozbuild/mozbuild/mach_commands.py?q=python%2Fmozbuild%2Fmozbuild%2Fmach_commands.py&redirect_type=direct#1883-1938

How can we make that more robust, and not reliant on implementation details of the taskgraph machinery?
Flags: needinfo?(mh+mozilla)
Sorry, I can't quite figure this one out myself.  mozbuild_reader is a method on MozbuildObject.  But I need the reader deep inside the optimization process.  How should I construct such an object?  Or should I stick `self` from the mach commands in a global variable somewhere?
Flags: needinfo?(gps)
13:57:22 <gps> dustin: you can also call MozbuildObject.from_environment() and it attempts to do the right thing
Flags: needinfo?(gps)
Comment on attachment 8905618 [details]
Bug 1383880: use the latest decision task to find artifacts;

https://reviewboard.mozilla.org/r/177412/#review182542

FWIW I'm soon going to be writing some code that implements "find artifacts from ancestor commits" efficiently. We have this implemented in a few places and none of them are doing it optimally (using thread pools for concurrent requests, actually using version control, etc). Expect some patches in your queue in the next few days. Possibly in bug 1382507.
Comment on attachment 8905619 [details]
Bug 1383880: use a vcs-compatible reader in decision tasks;

https://reviewboard.mozilla.org/r/177414/#review182544

I'm pretty sure this will break Git checkouts since ``vcs_revision is not None`` requires Mercurial since we don't have a VCS-based reader for Git yet. You'll probably want to make ``vcs_revision`` conditional on ``'MOZ_AUTOMATION' in os.environ`` or something. If the logic gets too hairy, add an argument to ``mozbuild_reader()`` to hide the complexity from callers.
Attachment #8905619 - Flags: review?(gps) → review-
(In reply to Gregory Szorc [:gps] from comment #134)
> FWIW I'm soon going to be writing some code that implements "find artifacts
> from ancestor commits" efficiently. We have this implemented in a few places
> and none of them are doing it optimally (using thread pools for concurrent
> requests, actually using version control, etc). Expect some patches in your
> queue in the next few days. Possibly in bug 1382507.

As long as they're not used in automation, that's fine :)

(In reply to Gregory Szorc [:gps] from comment #135)
> I'm pretty sure this will break Git checkouts since ``vcs_revision is not
> None`` requires Mercurial since we don't have a VCS-based reader for Git
> yet. You'll probably want to make ``vcs_revision`` conditional on
> ``'MOZ_AUTOMATION' in os.environ`` or something. If the logic gets too
> hairy, add an argument to ``mozbuild_reader()`` to hide the complexity from
> callers.

Ah, so even though parameters['revision'] is an hg revision, if there's no `.hg/` then this will fail.  It will probably also fail if parameters['revision'] is not present locally.  So, yeah, this should only use the vcs access in an actual decision task (note that MOZ_AUTOMATION isn't set there).  Hmm.
Flags: needinfo?(mh+mozilla)
(In reply to Dustin J. Mitchell [:dustin] from comment #136)
> Ah, so even though parameters['revision'] is an hg revision, if there's no
> `.hg/` then this will fail.  It will probably also fail if
> parameters['revision'] is not present locally.  So, yeah, this should only
> use the vcs access in an actual decision task (note that MOZ_AUTOMATION
> isn't set there).  Hmm.

Since you have a MozbuildObject, you can look at `.repository` to identify the repo type. That could raise if not running from a Mercurial or Git checkout or if the `git` or `hg` executables can't be found. That's probably acceptable for the decision task given that it is so tightly coupled to CI and source control.
Comment on attachment 8905619 [details]
Bug 1383880: use a vcs-compatible reader in decision tasks;

https://reviewboard.mozilla.org/r/177414/#review182578
Attachment #8905619 - Flags: review?(gps) → review+
Comment on attachment 8905618 [details]
Bug 1383880: use the latest decision task to find artifacts;

https://reviewboard.mozilla.org/r/177412/#review183548

Anything that relies on the "latest" build from a specific repo can't land. The previous code - while a bit hacky - was more correct than the new code proposed in this change.

If we want to query remote state in this code, IMO the proper way to do that is to take the current index route "base" into consideration (e.g. gecko.v2.{{repo}}) and then traverse the the commit ancestry until we find an index route for a revision. Coincidentally, I'm hacking on a generic version of code that does just this. So perhaps this commit can be dropped and we can real with it in bug 1397847?

::: python/mozbuild/mozbuild/mach_commands.py:1883
(Diff revision 1)
>                      setup=record.setup)
>  
>          if from_build:
> -            params = {
> -                'message': '',
> -                'project': '',
> +            from taskgraph.util.taskcluster import find_task_id, get_artifact
> +            # find the latest m-c decision task and use its notion of what artifacts are what
> +            decision_task_id = find_task_id('gecko.v2.mozilla-central.latest.firefox.decision')

This assumes we're building against mozilla-central. So this index path will be invalid if we're running against say a beta or release build.

Also, if we're on the inbound or autoland repo, then the "latest" task on mozilla-central may be substantially different than this revision.

The previous code essentially runs taskgraph for the current state of the code as it exists in this revision. This new code relies on "latest" state from some remote code/execution. This is a step backwards.
Attachment #8905618 - Flags: review-
(In reply to Gregory Szorc [:gps] from comment #140)
> Anything that relies on the "latest" build from a specific repo can't land.
> The previous code - while a bit hacky - was more correct than the new code
> proposed in this change.

It's worth noting this is only applies on development systems, not in automation.  And my read of the existing code was that it generated a taskgraph for "latest" -- 'head_rev': ''.  But I don't really understand what it was trying to do, so maybe you're right.

> If we want to query remote state in this code, IMO the proper way to do that
> is to take the current index route "base" into consideration (e.g.
> gecko.v2.{{repo}}) and then traverse the the commit ancestry until we find
> an index route for a revision. Coincidentally, I'm hacking on a generic
> version of code that does just this. So perhaps this commit can be dropped
> and we can real with it in bug 1397847?

This commit can't be dropped because the innards of task-graph generation that it's accessing have changed.  I can move the "import" such that it doesn't fail in automation, but it will still fail on developers' systems.

> This assumes we're building against mozilla-central. So this index path will
> be invalid if we're running against say a beta or release build.
> 
> Also, if we're on the inbound or autoland repo, then the "latest" task on
> mozilla-central may be substantially different than this revision.
> 
> The previous code essentially runs taskgraph for the current state of the
> code as it exists in this revision. This new code relies on "latest" state
> from some remote code/execution. This is a step backwards.

Hm, I think the old code looked at mozilla-central too, by default.  Again, it's sort of abusing the machinery so the actual result is not clear.

So I think we have a few options: 

 * land this (small step backward)
 * just move the import (breakage for desktop uses of this command)
 * bug 1397847 (needs someone who understands better what this is trying to do, as the above notes show I don't)

I'd like to get this landed sooner rather than later, so I lean toward the first two options.
The old code essentially does a dummy taskgraph run to determine the toolchain index paths. It can do this all locally because toolchain index paths are hashes of all the inputs. e.g. https://tools.taskcluster.net/groups/GMYEmd5wQeOEs5YVA5XjQA/tasks/WHtFgPbkQ_eD_fArhtpUrQ/details writes to the index index.gecko.cache.level-3.toolchains.v1.linux64-clang-3.9.d6959ff5b99e2e6de272893de7ad6075015dcbc58f63f8b0ecf6250efd89e409. Note the omission of a repo name or even a changeset.

This allows toolchains to be used across repository and revision boundaries.

In other words, it uses content-based caching and is completely independent of repository structure. This is a stronger and better caching mechanism than relying on repository names and revisions as proxies (since there could be N configurations per revision).
Comment on attachment 8905618 [details]
Bug 1383880: use the latest decision task to find artifacts;

https://reviewboard.mozilla.org/r/177412/#review183626

I don't like that this is essentially relying on the output of a decision task on automation, limited to m-c at that. In practice, I've used this command to get toolchain artifacts from try, and that would kill it. The right way to fix this imho would be to actually make some of the `mach taskgraph` subcommands callable in a more useful way than what they currently do (most of those commands output a list of task, that's rather useless). The not-quite-nice side effect is that this would make this command slow, but additional flags to reduce the set of kinds being loaded and possibly to skip validation would make that fast.
That might be more work than you're ready to put in, though... what are you changing that breaks this code?
Attachment #8905618 - Flags: review?(mh+mozilla) → review-
> (most of those commands output a list of task, that's rather useless)

*task names
The code being changed here was calling into the internals of task-graph generation using mocked-out data structures.  I changed some of those functions and data structures.  I think this code was broken when it was written in April and would have said so quite loudly if asked (ISTR I *did* say so, but it landed without r?dustin so maybe that was another patch).  Now I'm trying to find the least-bad way to work around the brokenness and land my work.

I think the right way to do this is in bug 1397847, and the takeaway I'm getting is that I'm going to need to implement that myself.  So I'll do that as recompense for the grumpiness in my previous paragraph.
Depends on: 1397847
Comment on attachment 8896092 [details]
Bug 1383880: allow only one optimization per task

https://reviewboard.mozilla.org/r/167364/#review186242

::: taskcluster/taskgraph/transforms/job/toolchain.py:88
(Diff revision 4)
> -
>      # We'll try to find a cached version of the toolchain at levels above
>      # and including the current level, starting at the highest level.
>      for level in reversed(range(int(config.params['level']), 4)):
>          subs['level'] = level
> -        optimizations.append(['index-search', TOOLCHAIN_INDEX.format(**subs)])
> +        taskdesc['optimization'] = {'index-search': [TOOLCHAIN_INDEX.format(**subs)]}

This change is wrong.
Before, for e.g. try, optimizations would contain:
[ ['index-search', 'gecko.cache.level-3.toolchains.v1.(...)'],
  ['index-search', 'gecko.cache.level-2.toolchains.v1.(...)'],
  ['index-search', 'gecko.cache.level-1.toolchains.v1.(...)'] ]

And with this change:
{'index-search': ['gecko.cache.level-1.toolchains.v1.(...)']}

which in practice means try wouldn't ever get toolchain artifacts from central, like they're supposed to.
Attachment #8896092 - Flags: review-
Here is a two-line patch that, while it doesn't address the longstanding issue of mach artifact toolchain abusing the taskcluster API, makes it work with the patch queue from this bug, avoiding having to agree on how mach artifact toolchain should not abuse the taskcluster API in a short timeframe:

diff --git a/python/mozbuild/mozbuild/mach_commands.py b/python/mozbuild/mozbuild/mach_commands.py
index f13199117df9..5a6ed6968d40 100644
--- a/python/mozbuild/mozbuild/mach_commands.py
+++ b/python/mozbuild/mozbuild/mach_commands.py
@@ -1782,7 +1782,7 @@ class PackageFrontend(MachCommandBase):
         import shutil
 
         from taskgraph.generator import Kind
-        from taskgraph.optimize import optimize_task
+        from taskgraph.optimize import IndexSearch
         from taskgraph.util.taskcluster import (
             get_artifact_url,
             list_artifacts,
@@ -1926,7 +1926,8 @@ class PackageFrontend(MachCommandBase):
                              'Could not find a toolchain build named `{build}`')
                     return 1
 
-                task_id = optimize_task(task, {})
+                task_id = IndexSearch().should_replace_task(
+                    task, {}, task.optimization.get('index-search', []))
                 artifact_name = task.attributes.get('toolchain-artifact')
                 if task_id in (True, False) or not artifact_name:
                     self.log(logging.ERROR, 'artifact', {'build': user_value},
(In reply to Mike Hommey [:glandium] from comment #146)
> This change is wrong.

Good catch!  The `for` loop worked with `optimizations.append` but not with `optimization =`.  Easy to fix.
Attachment #8905618 - Attachment is obsolete: true
Comment on attachment 8896092 [details]
Bug 1383880: allow only one optimization per task

https://reviewboard.mozilla.org/r/167364/#review186984

::: taskcluster/taskgraph/transforms/job/toolchain.py:90
(Diff revisions 4 - 5)
>      # and including the current level, starting at the highest level.
> +    index_routes = []
>      for level in reversed(range(int(config.params['level']), 4)):
>          subs['level'] = level
> -        taskdesc['optimization'] = {'index-search': [TOOLCHAIN_INDEX.format(**subs)]}
> +        index_routes.append(TOOLCHAIN_INDEX.format(**subs))
> +    taskdesc['optimization'] = {'index-search': [index_routes]}

You're going to have a nested list here. You probably mean 'index-search': index_routes.
Attachment #8896092 - Flags: review?(mh+mozilla) → review+
Comment on attachment 8909757 [details]
Bug 1383880: adjust mach artifact toolchain --from-build;

https://reviewboard.mozilla.org/r/181254/#review186990
Attachment #8909757 - Flags: review?(mh+mozilla) → review+
Comment on attachment 8909758 [details]
Bug 1383880: --from-build is a dev tool, not for automation;

https://reviewboard.mozilla.org/r/181256/#review186996

::: python/mozbuild/mozbuild/mach_commands.py:1884
(Diff revision 2)
> +                self.log(logging.ERROR, 'artifact', {},
> +                         'Do not use --from-build in automation; all dependencies '
> +                         'should be determined in the decision task.')

This is probably not very useful if you don't return right away with an error code.
Attachment #8909758 - Flags: review?(mh+mozilla) → review+
We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again.

hg error in cmd: hg rebase -s 1d08fe5f5ade -d 072a236f67fe: rebasing 421615:1d08fe5f5ade "Bug 1383880: allow only one optimization per task; r=ahal,glandium"
merging taskcluster/ci/build/android-stuff.yml
warning: conflicts while merging taskcluster/ci/build/android-stuff.yml! (edit, then use 'hg resolve --mark')
unresolved conflicts (see hg resolve, then hg rebase --continue)
Pushed by dmitchell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7c07cb798530
allow only one optimization per task; r=ahal,glandium
https://hg.mozilla.org/integration/autoland/rev/b3eb0c939720
add Graph.visit_preorder; r=ahal
https://hg.mozilla.org/integration/autoland/rev/a3116da52b4e
optimize in three phases; r=ahal
https://hg.mozilla.org/integration/autoland/rev/97eedc84d6e8
parse try config during the decision task; r=ahal
https://hg.mozilla.org/integration/autoland/rev/ebcc9d20981a
add support for SCHEDULES in moz.build; r=gps
https://hg.mozilla.org/integration/autoland/rev/ebdd6ccbcfca
add support for optimizing tasks based on SCHEDULES; r=ahal
https://hg.mozilla.org/integration/autoland/rev/b354fdf6e233
Annotate builds and tests with SCHEDULES-related optimizations; r=ahal
https://hg.mozilla.org/integration/autoland/rev/8e5847d9acda
annotate source files with what they SCHEDULE; r=ahal
https://hg.mozilla.org/integration/autoland/rev/ec2b8ba5a949
add only-if-dependencies-run optimization for follow-ons; r=ahal
https://hg.mozilla.org/integration/autoland/rev/571a6c9054a5
use a vcs-compatible reader in decision tasks; r=gps
https://hg.mozilla.org/integration/autoland/rev/8cceb6a82bfb
adjust mach artifact toolchain --from-build; r=glandium
https://hg.mozilla.org/integration/autoland/rev/fd3615e7e0a3
from-build is a dev tool, not for automation; r=glandium
Looks like an issue from bug 1401183.
It seems it's better to keep the current find_task_id behavior than to change all callers to handle the exception.
There aren't that many.  And Bug 1401183 is still stuck in autoland limbo (so I can't back it out or land on top of it).  I think I'll just fix it up here.
Comment on attachment 8910719 [details]
Bug 1383880: handle keyError from find_task_id;

https://reviewboard.mozilla.org/r/182180/#review187566
Attachment #8910719 - Flags: review+
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e4f1ed133c68
allow only one optimization per task; r=ahal,glandium
https://hg.mozilla.org/integration/autoland/rev/b327dc19aeab
add Graph.visit_preorder; r=ahal
https://hg.mozilla.org/integration/autoland/rev/79bb52091750
optimize in three phases; r=ahal
https://hg.mozilla.org/integration/autoland/rev/b2c2b8ada7dd
parse try config during the decision task; r=ahal
https://hg.mozilla.org/integration/autoland/rev/33e1e1b4acd5
add support for SCHEDULES in moz.build; r=gps
https://hg.mozilla.org/integration/autoland/rev/a418182c67b9
add support for optimizing tasks based on SCHEDULES; r=ahal
https://hg.mozilla.org/integration/autoland/rev/ab8f1fe07163
Annotate builds and tests with SCHEDULES-related optimizations; r=ahal
https://hg.mozilla.org/integration/autoland/rev/aa31b00d8eb1
annotate source files with what they SCHEDULE; r=ahal
https://hg.mozilla.org/integration/autoland/rev/e62999cee1cd
add only-if-dependencies-run optimization for follow-ons; r=ahal
https://hg.mozilla.org/integration/autoland/rev/817f14e03e81
use a vcs-compatible reader in decision tasks; r=gps
https://hg.mozilla.org/integration/autoland/rev/e3190ac566b7
adjust mach artifact toolchain --from-build; r=glandium
https://hg.mozilla.org/integration/autoland/rev/0aa93d9b53a9
from-build is a dev tool, not for automation; r=glandium
https://hg.mozilla.org/integration/autoland/rev/b0d1cd898a0b
handle keyError from find_task_id; r=gps
Blocks: 1402010
Comment on attachment 8910965 [details]
Bug 1383880 - Expand taskgraph sparse checkout to support moz.build evaluation;

https://reviewboard.mozilla.org/r/182422/#review187738
Attachment #8910965 - Flags: review?(mh+mozilla) → review+
Attachment #8910719 - Flags: review?(mh+mozilla)
Blocks: 1402142
Attachment #8910965 - Attachment is obsolete: true
This busted mach try fuzzy:

Error running mach:

    ['try', 'fuzzy']

The error occurred in code that was called by the mach command. This is either
a bug in the called code itself or in the way that mach is calling it.

You should consider filing a bug for this issue.

If filing a bug, please include the full output of mach, including this error
message.

The details of the failure are as follows:

Exception: missing parameters: try_mode, try_task_config, try_options; extra parameters: morph_templates, target_task_labels

  File "/home/glandium/gecko-push/tools/tryselect/mach_commands.py", line 143, in try_fuzzy
    return run_fuzzy_try(**kwargs)
  File "/home/glandium/gecko-push/tools/tryselect/selectors/fuzzy.py", line 215, in run_fuzzy_try
    all_tasks = generate_tasks(parameters, full)
  File "/home/glandium/gecko-push/tools/tryselect/tasks.py", line 49, in generate_tasks
    params.check()
  File "/home/glandium/gecko-push/taskcluster/taskgraph/parameters.py", line 53, in check
    raise Exception("; ".join(msg))
Apparently `mach try fuzzy` has some kind of dependency on an automation artifact from mozilla-central or something like that. I experienced a similar error within the last week and it went away after time passed.

ahal knows of the issue but I can't remember if he said there's a bug or fix planned. This bug represents is a window in time where `mach try fuzzy` isn't usable. So IMO it needs to get fixed.
Yes, this is bug 1401199 and I agree it needs to be fixed. I forgot this also modified the parameters, probably should have made it block, but it's too late now (and I want to see this stick anyway).

Fyi, to workaround you can either rebase on central or pass in --parameters to |mach try fuzzy|.
Actually, this is broken because a decision task hasn't passed on central since this landed. As soon as there's a green decision task, this will be fixed. This means bug 1401199 wouldn't even have fixed this.

The workaround is to update to the latest central revision that has a passing decision task.
Depends on: 1402194
Thanks glandium, ahal, gps!
Depends on: 1405570
See Also: → 1407279
Product: TaskCluster → Firefox Build System
You need to log in before you can comment on or make changes to this bug.