Factor tests in legacy kind into real kinds

RESOLVED FIXED

Status

task
RESOLVED FIXED
3 years ago
Last year

People

(Reporter: dustin, Assigned: dustin)

Tracking

(Blocks 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(12 attachments, 6 obsolete attachments)

58 bytes, text/x-review-board-request
gerard-majax
: review+
Details
58 bytes, text/x-review-board-request
armenzg
: review+
Details
58 bytes, text/x-review-board-request
jmaher
: review+
Details
58 bytes, text/x-review-board-request
RyanVM
: review+
Details
58 bytes, text/x-review-board-request
jmaher
: review+
Details
58 bytes, text/x-review-board-request
RyanVM
: review+
Details
58 bytes, text/x-review-board-request
gbrown
: review+
Details
58 bytes, text/x-review-board-request
gbrown
: review+
Details
58 bytes, text/x-review-board-request
gps
: review+
Details
58 bytes, text/x-review-board-request
gbrown
: review+
gps
: review+
Details
58 bytes, text/x-review-board-request
Callek
: review+
Details
58 bytes, text/x-review-board-request
Details
We already have a docker-image kind.  We now need to remove the legacy kind, replacing it with a collection of purpose-specific kinds with better characteristics:

 * not generated from a twisty maze of yml or repetitive config files
 * stable task labels
 * easily extended to support other OS's, etc.
This may also include post-processing of the task graph (generic, but only a few methods defined; selectable by tree)
  * `--interactive`
  * filtering out caches
  * filtering out superseding
  * treeherder routes
  * index routes (replace `testing/taskclsuter/routes.json`)
etc.
What I've done so far, in that try push, is to try to unify things a little:

 * set extra.suite.{name,flavor} for all tests
 * use the allowPtrace feature everywhere (the scope was already present everywhere)

unifications that turned out to be unwise:

 * use the same caches for mulet as for non-mulet jobs (currently mulet caches ~/.tc-vcs and ~/.cache, while non-mulet caches ~/workspace; but mulet actually uses tc-vcs and does pip installs, so that makes sense)

 * use the --download-symbols option in all commands; it turns out that omitting this option has a different meaning than either =true or =ondemand

 * always setting --no-read-buildbot-config; this option is present for all mozharness scripts which also run under buildbot, but would cause an error for scripts not designed to be run in buildbot (e.g., firefox-ui, some mulet)

I've done some comparisons of tasks and generally broken down what part of the task definition varies between which tasks.  I'm working on a way to represent that variance in a way that will be clear but sufficiently flexible.  Once that's done, I'll do some diffs to make sure that I haven't altered the task definitions in any meaningful way.
I have now successfully replicated a single task.  Code is here:
  https://hg.mozilla.org/users/dmitchell_mozilla.com/mozilla-central/pushloghtml?changeset=c5d0d4b9c68f

The rough idea so far is that the kind.yml specifies a bunch of config files:
  https://hg.mozilla.org/users/dmitchell_mozilla.com/mozilla-central/file/c5d0d4b9c68f/taskcluster/ci/desktop-test/kind.yml
these roughly map from build platform to test platform to test set to test descriptions.  Then
  https://hg.mozilla.org/users/dmitchell_mozilla.com/mozilla-central/file/c5d0d4b9c68f/taskcluster/taskgraph/kind/test.py
generates tasks from those test descriptions, for each test platform.

I need some design help, though -- this is already more complicated than I want it to be, and I've only just gotten started.  A few issues I'm running into already:

 - This allows customization on a per-test basis, but in cases where we want the same test to differ between test platforms (for example, linux opt is tier 2 while linux debug is tier 1), there's no way to make that change aside from creating new test descriptions.. which will turn out very similar to the existing maze of yml

 - The task definition generation code is already a mess, and a lot of the constants it contains will differ for mulet.  Things will be even more different for generic-worker (windows) tasks and for taskcluster-worker (MacOS X) tasks.  And then there's BBB tasks, which will look totally different.  I feel like this should be template-driven in some way, but it's a little too complicated to implement with a template.

 - The --rebuild try option should cause multiple copies of the same test task to be created.  I really have no idea how to implement this, since try message interpretation takes place later in the target task set generation phase.

 - The notion of "platform" is sort of a mess: we have linux64, linux64 opt, linux64 debug, linux64-pgo, linux64-asan, all of which are sort of platforms, sort of build types or collections, etc.  I'd like to align this with how treeherder works.

I have a few ideas, but I'm open to more:

 * Implement a lot of the task creation as application of arbitrary Python filters that can be specified either at the test-platform level or the test level.  Then the tier can be set by a filter at the test-platform level, for example.

 * Implement --rebuild at the create_tasks level (so if you pass --rebuild 200, all 200 have the same task label), based on a task.create_count attribute

 * Define a "full platform" to be the combination of a "platform" (e.g., linux64) and a label (e.g., opt or asan, also known as a "build_type" or "option" or "collection").  The result is something like "linux64/pgo" or "winxp/debug".

I'm really kind of getting stuck in the mud here, so any assistance would be greatly appreciated!
(In reply to Dustin J. Mitchell [:dustin] from comment #4)

>  * Define a "full platform" to be the combination of a "platform" (e.g.,
> linux64) and a label (e.g., opt or asan, also known as a "build_type" or
> "option" or "collection").  The result is something like "linux64/pgo" or
> "winxp/debug".

mozharness goes with something like this where you have a 'base' platform (linux64) and a 'sub' platform (debug)

now that you are in the thick mud, I wonder if it is worth reviewing the idea of not being templated at all. going the whole opposite way where every task has a complete single yml file and each one of those tasks (files) is associated with a 'kind' by dir.

I know this is a nightmare for updating common attributes across multiple tasks and every design pattern urges developers to not repeat themselves however, what you gain is pretty amazing:

1) from the developer point of view, this is dead simple to understand as tasks are declared in full so they can edit/copy exactly what they want. sure it's not great if tasks diverge where they shouldn't but I would argue it is no longer our responsibility and most changes should eventually be caught in CI results.

2) removes work for you to come up with something elegant. after watching buildbot-configs->misc.py, mozharness configs, and taskcluster/legacy/* each try their own approach, I am partial to believing that the most elegant solution is no solution.
(In reply to Dustin J. Mitchell [:dustin] from comment #4)
> I have now successfully replicated a single task.  Code is here:
>  
> https://hg.mozilla.org/users/dmitchell_mozilla.com/mozilla-central/
> pushloghtml?changeset=c5d0d4b9c68f

Skimming this patch set, one thing that I've found limiting mentally with the current "tests" is that they all:
* Assume linux (which is ok for now, I suspect)
* Assume mozharness-based
* Assume a known set of artifacts you need/want.

This patch goes further in ingraining those truths. I'd love a way to explicitly declare these as Mozharness-Tests somehow.

One thing I can imagine is linting-like tests that need to be align to the different test platforms or a job that does a linting like result on all the JS in omni.ja per build (basically parsing the preprocessed UI stuff).

On the other hand, I can imagine "build" style kinds to be easier to implement ahead of "test" style kinds. E.g. Designing the Build stuff first, so that you can figure out how we want to truely define "linux64/opt" there, then extrapolate using similar terminology and expression.

Overall I like your approach though (I'd also love some way to diff generated full-taskgraph's in-tree since I was fighting with that myself in the legacy kind stuff)

More specific thoughts to follow...

>  - This allows customization on a per-test basis, but in cases where we want
> the same test to differ between test platforms (for example, linux opt is
> tier 2 while linux debug is tier 1), there's no way to make that change
> aside from creating new test descriptions.. which will turn out very similar
> to the existing maze of yml

I can forsee some potential options, alterable as needed:
 * Something like https://gist.github.com/Callek/d39175565766bf767d8bc13c91d67417
   * Call this tier: demote/promote
 * Something like https://gist.github.com/Callek/2a884b83671f3d2636da277fda3e6862
   * Call this tests inheritence

>  - The task definition generation code is already a mess, and a lot of the
> constants it contains will differ for mulet.  Things will be even more
> different for generic-worker (windows) tasks and for taskcluster-worker
> (MacOS X) tasks.  And then there's BBB tasks, which will look totally
> different.  I feel like this should be template-driven in some way, but it's
> a little too complicated to implement with a template.

I wonder if these variances should either be a subclass of the "Test" kind you have here (for each kind) or if these variances should be their own kind.

I think its sub-optimal if we have BBB, OSX, Windows, and Linux try to share the exact same kind for tests. Since their implementations/setup are markedly different.

>  - The --rebuild try option should cause multiple copies of the same test
> task to be created.  I really have no idea how to implement this, since try
> message interpretation takes place later in the target task set generation
> phase.

I'm less sure we want to support this, as written for try. (Admittedly I've never been a fan) -- perhaps let the target generation duplicate the given tasks (with new taskID's) if its present, somehow?

>  - The notion of "platform" is sort of a mess: we have linux64, linux64 opt,
> linux64 debug, linux64-pgo, linux64-asan, all of which are sort of
> platforms, sort of build types or collections, etc.  I'd like to align this
> with how treeherder works.

I'm partial to your "linux64/pgo" type suggestion, at least given current knowledge.

> I have a few ideas, but I'm open to more:
> 
>  * Implement a lot of the task creation as application of arbitrary Python
> filters that can be specified either at the test-platform level or the test
> level.  Then the tier can be set by a filter at the test-platform level, for
> example.

I'm not too keen on arbitrary python filters, as that degrades into "should we use regex" and "we filtered too much/too little for our thing and stuff is doing what we don't actually expect"

Even though I advocated for more flexibility above...


Really good work so far, Thanks for this!
I like a lot of this, either way I don't think there is aright solution, although I think this is noticeably better than our current solution.

A few things:
* there is mention of tier1 vs tier2 for jobs, in many cases we have different chunks and sometimes different jobs run opt vs debug vs asan...
* when adding a new set of jobs for a platform, what would be the experience?  For example when turning on e10s tests we might add mochitest-devtools-e10s for linux64 debug, not opt and not other platforms.  What type of edits would need to take place in the test definitions for this to happen.  (this is a practical example of the previous point)
* suite/flavor seem repetitive in many cases, possibly we can just merge them together
* I am fine toggling platform/buildtype stuff around as we see fit, we seem to change stuff often enough there.
(In reply to Jordan Lund (:jlund) from comment #5)
> now that you are in the thick mud, I wonder if it is worth reviewing the
> idea of not being templated at all. going the whole opposite way where every
> task has a complete single yml file and each one of those tasks (files) is
> associated with a 'kind' by dir.

I would actually be all for this, I came to a similar conclusion awhile back as well:
https://ahal.ca/blog/2014/part-1-sharing-code-not-always-good-thing/

My ideal scenario is no sharing of configs between tasks, but having a tool that can batch edit them if necessary. And if there is something that we can guarantee will always be the same for all tasks of a particular kind, it would just get baked into the actual kind itself. Maybe there could be a separate "kind config" or something. That way we could still have a place to share configs if we *really* want to, without opening the door to multiple inheritance.
Thanks for the responses!  Noted, and taken into account in my thinking..

I think having an individual file for each task would give us a file per buildbot builder, which is already many thousands, and that is likely to grow quickly as we make smaller chunks and add new jobs.  Adding any sort of automation to that quickly gets us right back to where we are: a bunch of yml files that *almost* correspond to tasks, modulo some magic that 20 people each understand 5% of, and 6 of those people no longer remember.  That said, it does suggest that my move to defining each test suite as a separate stanza (I think separate files will probably lead to madness) is good and I should do more of that :)

Also, thanks for the pointer to the IFFY blog post -- I read that way back when, but it is pretty salient now!
One thing to keep in mind is that most of our jobs are really just chunking a single job- so in that regard we have <40 jobs per *platform*.  I know that will grow in time, but I doubt it will be 200+ per platform (maybe that many jobs, just ignore chunks and it gets smaller)
Right, but in a file-per-task scenario, you can't control the tier, for example, of individual chunks if those chunks aren't in their own files.  So, 40 suites, times 10 chunks each, times 26 platforms, times 16 repos = 166,400 task definitions.  That's going to put a big dent in the gecko clone times on Windows!

I do think we can win by dividing the IFFY up into two pieces: the set of tests we want to run is IFFY; and how to run each test is IFFY.  The former is basically a set of lists where we add and remove items all the time, so a WET solution is probably good.  The latter varies constantly along innumerable, fluctuating dimensions and thus requires some code to avoid exponential blow-up.

I'm working on an iterative transform design that starts with some "test descriptions" in tests.yml, identified as described earlier (the WET part).  It then runs those through a series of transforms to conditionally modify the descriptions, split into chunks, drop some tasks, duplicate some tasks, etc., and then generate a task definition appropriate to the target worker implementation.  With this approach I suspect we can implement just about all of the use-cases we've described with only local complexity, as the transforms can add new semantics to the test descriptions.  If we're careful, we can probably even keep the whole thing from becoming another dumpster fire for a few years.

For example, let's say we start on the g27p project once we finish e10s, and need to run most test suites both with and without g27p.  We document a new 'g27p' attribute on the test descriptions that can be either true, false, or 'both', defaulting to 'both'.  Then we implement a transform early in the process which takes any task with g27p=='both' (or omitted) and produces two similar tasks, one with g27p=false and one with g27p=true.  And we implement another transform further down the process that modifies the test['mozharness-extra-options'] to include '--g27p' and set some environment variables when test['g27p'], and also sets the 'g27p' attribute appropriately for use in try syntax.

So, this gives a pretty simple implementation that results in the desired state (run all tests with and without g27p) immediately and makes it easy to tweak that state on a per-test basis by setting the 'g27p' attribute in tests.yml; for example if the mochitest-dt-chrome suite is permafailing with g27p and we want to disable it, we just set g27p: false for that suite.  Configuring g27p per platform, or per tree, would be a matter of adding another two- or three-line transform that sets the test description attribute appropriately based on the platform or tree.

Because the transforms can add or remove test descriptions, we could also write a transform to do something like remove bc6 on linux64/pgo only, but only on mozilla-inbound and mozilla-beta.  And you know we do crazy stuff like that all the time!  The transform would be small and self-contained, hg blame would tell you who did it and on which bug, and hopefully a comment would indicate why.

I don't think this is especially simple.  But I think it's about as simple, elegant, and compartmentalized as it can get.  I'll try to include some enforced rules to help everyone play nicely together (e.g., test description keys must be documented).

I'm drafting up an implementation now to see how far I can get into the swamp before I get stuck in the mud again.  https://i.ytimg.com/vi/W_vtWQnBRak/hqdefault.jpg
(In reply to Dustin J. Mitchell [:dustin] from comment #11)
> Right, but in a file-per-task scenario, you can't control the tier, for
> example, of individual chunks if those chunks aren't in their own files.

ah, I didn't realize there was a use case to have different tiers per chunk! I thought chunks tried to evenly divide the amount of tests within a suite? How can we be sure that some individual test ends up on a chunk that we want to be on a different tier?

if that restraint wasn't there, I think the per-file plan could work. Especially if you allow templating on one thing and one thing only: branch/repo

then it would be a subset of 40suites * 26platforms because I would assume not every suite would run on every platform.

again, all moot if we have to support different tiers on each chunk in which case, ignore me :)
I am not sure if we have had a case in the past where we had different tiers for the chunks, but mochitest vs mochitest-e10s have been different tiers.  Typically we try to bring up an entire harness at a time (i.e. mochitest-browser-chrome-*).  Still, I think if we make an assumption it will come back to cause us frustration at some point in time.
The transforms thing is working out OK, actually.  I've defined a bunch of transformations over streams of "tests":

https://hg.mozilla.org/users/dmitchell_mozilla.com/mozilla-central/file/c312413aab32/taskcluster/taskgraph/kind/test.py#l211

each of these run in order.  Many have conditions matching our business needs.  We can then define the inputs to these transforms, much as before:

https://hg.mozilla.org/users/dmitchell_mozilla.com/mozilla-central/file/c312413aab32/taskcluster/ci/desktop-test/tests.yml#l63

these are very flexible "task descriptions" that can grow or lose properties as our IFFY needs change.  You can see that I've added e10s here, and mozharness.no-read-buildbot-config to include a flag only needed for some mozharness scripts, for example.

---

The challenges with this approach have to do with overall complexity -- but I think this is unavoidable, and at least this segments the complexity into a bunch of relatively simple functions.

One particular kind of complexity is, what exactly is this `test` object that this particular transform function receives?  What fields does it have?  What do they mean?  The set of fields, etc. change throughout the transformation process as it turns from a test description into a task!

---

Next steps:

 * the transformations will be in several Python files under `taskcluster/ci/desktop-test` rather than in test.py

 * each file will deal with a particular "shape" of input, and will have a transform at the top that checks for and documents that shape (I'm tempted to vendor https://github.com/alecthomas/voluptuous or something like it)

 * a base class implementing the basic filtering behavior, so that it can be used for build tasks too (with different transformations)

 * expand to the rest of the test sets for desktop, then mulet, firefox-ui, etc. with luck, I'll land this before windows or OS X land!
If we had the concept of "test platform" when implementing the e10s split, we might have implemented it as a set of new test platforms (linux64/debug/e10s, linux64/opt/e10s, etc.) with similar or identical test sets as the base test platforms.  Then a simple filter would add --e10s to the command for anything in an /e10s platform, and if necessary include or exclude test suites.

I like this: it's presenting some natural ways of breaking down problems that have been pretty challenging until now.
I've moved a bunch of code around now, preparing this approach to start generating more than 12 test tasks (although it's consistently doing so accurately!)

The latest bit is dividing the transforms into four files:

https://hg.mozilla.org/users/dmitchell_mozilla.com/mozilla-central/rev/3d41961c9e92#l12.1 - test_description.py
https://hg.mozilla.org/users/dmitchell_mozilla.com/mozilla-central/rev/3d41961c9e92#l13.1 - test_duplication.py
https://hg.mozilla.org/users/dmitchell_mozilla.com/mozilla-central/rev/3d41961c9e92#l10.1 - command_construction.py
https://hg.mozilla.org/users/dmitchell_mozilla.com/mozilla-central/rev/3d41961c9e92#l11.1 - task_construction.py

many of which start with a strict schema for the incoming test descriptions to be transformed.  These schemas should help both document the fields (in such a way that nobody can add a new field without documenting it!) and detect bugs where the test descriptions don't look as they should.

At this point I'm pretty confident that I can implement the remainder of the test suites for linux64/* without too much additional code (just a bunch more stanzas in tests.yml).  Implementing mulet shouldn't be too bad.

Things are a little more vague at the next remove, though:
* Is the command/task-construction distinction durable enough to work with taskcluster-worker and generic-worker?
* What would BBB (Buildbot Bridge) tasks look like here?  They don't have commands, after all..
* I'd like to implement build jobs using the same kind of structure.  How much of the code here can be reused?  How much 
  *should* be reused?
* What else can I do to stave off the inevitable descent into chaos?
I'm going to add some generic support for any of the test_description attributes to be determined based on test platform, etc.:

    chunks:
        by-test-platform:
            linux64/debug: 10
            android-api-15/opt: 3
            android-api-15/debug: 3
            default: 8

I suspect I'll find a time when this needs to nest.  Which is going to be tricky if, for example, we want to have

    e10s: both
    chunks:
        by-e10s:
            true: 10
            false: 8

and, in another test:

    chunks: 4
    e10s:
        by-this-chunk:
            1: both
            2: false
            3: both
            4: false

the first requires duplicating tasks for e10s first, then for chunks; the second requires the opposite.

This, friends.  This is why I drink.
Latest is in https://hg.mozilla.org/users/dmitchell_mozilla.com/mozilla-central/rev/6458feb57742 and its parents.  This is now handling all non-mulet, non-android tests defined in the legacy yaml.  Just android and mulet to go.
is mulet a platform that will be around for much longer?  possibly we can save you some time- it is tier-2 and 100% failing- seems like potential for ignoring.

regarding chunks/e10s and other parameters...

would it make sense to treat linux64-debug-opt and linux64-debug-e10s-opt as different platforms.  That would take our ~25 existing *platforms* and double that, but it would simplify our test definitions greatly, possibly worth thinking about, but maybe not realistic to do.

What would happen if we had another variable to key tests off?  Say with-addons, or proc-per-tab?
(In reply to Joel Maher (:jmaher) from comment #19)
> is mulet a platform that will be around for much longer?  possibly we can
> save you some time- it is tier-2 and 100% failing- seems like potential for
> ignoring.

I would be OK with this.  It's actually tier 3.  I didn't realize it was permafailing -- while I was keen to not leave my CD colleagues out in the cold, that data point may tip the balance.  Who should I talk to about just removing the jobs?  I'll shift to supporting Android in the short-term until I hear back.

> would it make sense to treat linux64-debug-opt and linux64-debug-e10s-opt as
> different platforms.  That would take our ~25 existing *platforms* and
> double that, but it would simplify our test definitions greatly, possibly
> worth thinking about, but maybe not realistic to do.

It really depends on how the results are best presented.  In this case, they have been presented in-line.  But for example ASAN is presented as a different subplatform, so there's some precedent.

Either one is pretty easy to implement with this new task-definition system.

> What would happen if we had another variable to key tests off?  Say
> with-addons, or proc-per-tab?

  with-addons: {true,false,both}
  proc-per-tab: {true,false,both}

and then write filters in task_duplication.py that will, for "both", create duplicate tasks with "true" and "false", and adjust TH symbols, names, etc. appropriately.  If this becomes common, then we might need to adjust something in TH to display them more sensibly.  Maybe they become some kind of attribute on which TH can filter into columns?  At any rate, it's all pretty easy to implement in the task-definition system.
Blocks: 1271461
Review commit: https://reviewboard.mozilla.org/r/61836/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/61836/
Attachment #8767307 - Flags: review?(lissyx+mozillians)
Attachment #8767308 - Flags: review?(armenzg)
Attachment #8767309 - Flags: review?(jmaher)
Attachment #8767310 - Flags: review?(ryanvm)
Attachment #8767311 - Flags: review?(jmaher)
Attachment #8767312 - Flags: review?(ryanvm)
Attachment #8767313 - Flags: review?(jmaher)
Attachment #8767314 - Flags: review?(jmaher)
Attachment #8767315 - Flags: review?(gbrown)
For all of ther suites, the -e10s is attached to the group symbol, not the job
symbol.

Review commit: https://reviewboard.mozilla.org/r/61842/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/61842/
I've got a decent set of modifications to the *existing* configuration that make it more consistent and sensible and thus easier to model.  I'm going to put those up as review requests now.

Included in that set is the removal of the gaia tests.  We will probably re-add those later, but working to make sure that the new implementation creates the exact same task definitions is not worth the effort right now.

I'm working on representing Android tasks now, and may have a few more yaml modifications to come, but those will just show up as additional review csets.
All but 80-some-odd tests are now modeled, and I don't expect those to be hard -- I'm just out of time for the week.  That includes a bunch of Android tests.
This introduces a completely new way of specifying test task in-tree,
completely replacing the old spider-web of YAML files.

The high-level view is this:

 - some configuration files are used to determine which test suites to run
   for each test platform, and against which build platforms

 - each test suite is then represented by a dictionary, and modified by a
   sequence of transforms, duplicating as necessary (e.g., chunks), until
   it becomes a task definition

The transforms allow sufficient generality to support just about any desired
configuration, with the advantage that common configurations are "easy" while
unusual configurations are supported but notable for their oddness (they
require a custom transform).

As of this commit, this system produces the same set of test graphs as the
existing YAML, modulo:

  - extra.treeherder.groupName -- this was not consistent in the YAML
  - extra.treeherder.build -- this is ignored by taskcluster-treeherder anyway
  - mozharness command argument order
  - boolean True values for environment variables are now the string "true"
  - metadata -- this is now much more consistent, with task name being the label

Review commit: https://reviewboard.mozilla.org/r/61884/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/61884/
Comment on attachment 8767353 [details]
Bug 1281004: Specify test tasks more flexibly;

I found an hour and finished the last 80 android tasks.  This is still littered with TODO's, and I'm going to do a bunch of shuffling around to make the task generation a bit more natural.  

But structurally this is what I'm proposing.  I'm pretty pleased -- this covered a wide variety of complexities without *too* much awful hackery.

I'd like to address *most* of these TODO's, then land this and delete the old files, after which time I'll work on doing the same thing for builds (which will require some refactoring of shared code).  Once that's done, I think there's scope for a lot of cleanup (e.g., the chunk-related Android mozharness command line). 

I'd love some additional feedback!
Attachment #8767353 - Flags: feedback?(jopsen)
Attachment #8767353 - Flags: feedback?(jlund)
Attachment #8767353 - Flags: feedback?(bugspam.Callek)
Attachment #8767353 - Flags: feedback?(ahalberstadt)
I'm a little late to the game, but here are some of my knee jerk reactions after *skimming* the comments.

Whatever you do, please don't use multi-level inheritance. Both the existing YAML and mozharness are difficult to comprehend because of the complicated inheritance diagram (even if - in the case of YAML - single inheritance is used).

I think duplicating definitions for one-offs will be cumbersome to update. Even with all our magic inheritance today we have sufficient duplication to make updates difficult. I am moderately concerned about making the cost of changing automation configs too high. I think we need to keep costs low so random developers can experiment with automation easily.

I agree that some kind of templating for one-offs would be nice. It would be rad if you could just feed a bunch of parameters to a template renderer. Although at the point you are e.g. using something like Jinja2 to output YAML, you may as well just write Python code to populate a data structure that conveys the same thing as the YAML. This /might/ be similar to the "transforms" concept Dustin proposed.

Regarding chunking, we need to be moving away from the idea that separate chunks are separate jobs. Instead, we need to think of groups of chunks as single "tasks" with N execution units. I say this because continuing to model separate chunks as separate tasks leaves us in a very bad place with having to manage thousands of discrete options, whether they be task definitions or task results (could you imagine Treeherder displaying 1000 letters if Mochitest is chunked into 1000 tasks!). Chunks are simply subsets or slices of a single logical task. The fact that chunking occurs and how it occurs are implementation details and should be abstracted away from the user.
Comment on attachment 8767309 [details]
Bug 1281004: run all web platform tests on desktop-test-xlarge;

https://reviewboard.mozilla.org/r/61840/#review58650

::: taskcluster/ci/legacy/tasks/tests/fx_linux64_web_platform_tests_reftests_e10s.yml:25
(Diff revision 1)
>      name: '[TC] Linux64 web-platform-tests-reftests-e10s'
>      description: Web platform reftest e10s test run
>    extra:
>      suite:
>        name: web-platform-tests-reftests
> -      flavor: web-platform-tests-reftests-e10s
> +      flavor: web-platform-tests-reftests

I was going to comment this seemed wrong, but it does seem correct to remove the -e10s from here- please confirm.
Attachment #8767309 - Flags: review?(jmaher) → review+
Attachment #8767311 - Flags: review?(jmaher) → review+
Comment on attachment 8767311 [details]
Bug 1281004: run e10s jsreftests in 2 chunks like non-e10s;

https://reviewboard.mozilla.org/r/61844/#review58652

good find!
Attachment #8767314 - Flags: review?(jmaher) → review-
Comment on attachment 8767314 [details]
Bug 1281004: give all M(bc) tests 5400s maxRunTime;

https://reviewboard.mozilla.org/r/61850/#review58654

we don't want max runtime >3600 unless there is a very specific need, we would rather chunk more.  Sometimes there is a need though, please justify this a bit more.
Comment on attachment 8767313 [details]
Bug 1281004: only set MOZ_NODE_PATH for desktop;

https://reviewboard.mozilla.org/r/61848/#review58656

do we know these will work?  it takes a lot of work to get new jobs up and green, and these were not running in debug when we shut off buildbot months ago.  To verify this, run the new proposed jobs 50 times each and see if we have <3 failures per job.
Attachment #8767313 - Flags: review?(jmaher) → review-
(In reply to Joel Maher (:jmaher: pto- back july 7th) from comment #43)
> Comment on attachment 8767314 [details]
> Bug 1281004: give all M(bc) tests 5400s maxRunTime;
> 
> https://reviewboard.mozilla.org/r/61850/#review58654
> 
> we don't want max runtime >3600 unless there is a very specific need, we
> would rather chunk more.  Sometimes there is a need though, please justify
> this a bit more.

The justification was to avoid the need to support different times for debug vs. opt.  Is it expected that one takes a lot longer than the other?  Would it be better to chunk the two differently and keep to the 3600s runtime?

(In reply to Joel Maher (:jmaher: pto- back july 7th) from comment #44)
> Comment on attachment 8767313 [details]
> Bug 1281004: run mochitest/devtools-chrome e10s for debug, too;
> 
> https://reviewboard.mozilla.org/r/61848/#review58656
> 
> do we know these will work?  it takes a lot of work to get new jobs up and
> green, and these were not running in debug when we shut off buildbot months
> ago.  To verify this, run the new proposed jobs 50 times each and see if we
> have <3 failures per job.

I don't know that the tests will pass.  Are you sure they were not running in debug when you shut off buildbot?  If that's the case, why is there a .yml file for them?  I assumed this was just a typo.  I can disable them, but if so I'd like to point to someone saying that's the intent.  Is this comment enough?
Flags: needinfo?(jmaher)
(In reply to Joel Maher (:jmaher: pto- back july 7th) from comment #41)
> Comment on attachment 8767309 [details]
> Bug 1281004: run all web platform tests on desktop-test-xlarge;
> 
> https://reviewboard.mozilla.org/r/61840/#review58650
> 
> :::
> taskcluster/ci/legacy/tasks/tests/
> fx_linux64_web_platform_tests_reftests_e10s.yml:25
> (Diff revision 1)
> >      name: '[TC] Linux64 web-platform-tests-reftests-e10s'
> >      description: Web platform reftest e10s test run
> >    extra:
> >      suite:
> >        name: web-platform-tests-reftests
> > -      flavor: web-platform-tests-reftests-e10s
> > +      flavor: web-platform-tests-reftests
> 
> I was going to comment this seemed wrong, but it does seem correct to remove
> the -e10s from here- please confirm.

It's correct
(In reply to Gregory Szorc [:gps] from comment #40)

Your first few paras don't really apply to the work I'm doing here, or at least only apply to the old, existing implementation.  They're good points, just things I think are already handled in my patches.  I'm very interested in your feedback as to how well I did!  Please do have a look at attachment 8767353 [details] and let me know what you think.

> Regarding chunking, we need to be moving away from the idea that separate
> chunks are separate jobs. Instead, we need to think of groups of chunks as
> single "tasks" with N execution units. I say this because continuing to
> model separate chunks as separate tasks leaves us in a very bad place with
> having to manage thousands of discrete options, whether they be task
> definitions or task results (could you imagine Treeherder displaying 1000
> letters if Mochitest is chunked into 1000 tasks!). Chunks are simply subsets
> or slices of a single logical task. The fact that chunking occurs and how it
> occurs are implementation details and should be abstracted away from the
> user.

From a TaskCluster perspective, a task only runs on one system at a time, and there's no notion of an "execution unit".  If we renamed TC tasks to "execution units", then we'd have to call it ExecutionUnitCluster, and that's just not as catchy.  A *job*, on the other hand, might someday be redefined to encompass multiple tasks, at least as far as display in treeherder is concerned.  Even with that redefinition, we'll still need to generate the constituent tasks individually.  At 1000 chunks, we might consider some kind of parameterized queue.createMultipleTasks(..) that can be given a task and the range [1,1000] and insert 1000 tasks into the queue with one API call.  But we can deal with that when the time comes.

At any rate, for the moment my refactor is designed to maintain practically identical task output, and that means the current process of generating chunks with each having a different TH symbol.
Flags: needinfo?(gps)
(In reply to Dustin J. Mitchell [:dustin] from comment #45)
> (In reply to Joel Maher (:jmaher: pto- back july 7th) from comment #43)
> > Comment on attachment 8767314 [details]
> > Bug 1281004: give all M(bc) tests 5400s maxRunTime;
> > 
> > https://reviewboard.mozilla.org/r/61850/#review58654
> > 
> > we don't want max runtime >3600 unless there is a very specific need, we
> > would rather chunk more.  Sometimes there is a need though, please justify
> > this a bit more.
> 
> The justification was to avoid the need to support different times for debug
> vs. opt.  Is it expected that one takes a lot longer than the other?  Would
> it be better to chunk the two differently and keep to the 3600s runtime?
> 
it is ok to have different number of chunks for debug vs opt as debug takes noticeably longer than opt. Keep in mind that just changing chunks can cause odd failures though- less so thank 2 years ago, but it still happens.  This means we need to do a good amount of testing prior to rolling this out.

> (In reply to Joel Maher (:jmaher: pto- back july 7th) from comment #44)
> > Comment on attachment 8767313 [details]
> > Bug 1281004: run mochitest/devtools-chrome e10s for debug, too;
> > 
> > https://reviewboard.mozilla.org/r/61848/#review58656
> > 
> > do we know these will work?  it takes a lot of work to get new jobs up and
> > green, and these were not running in debug when we shut off buildbot months
> > ago.  To verify this, run the new proposed jobs 50 times each and see if we
> > have <3 failures per job.
> 
> I don't know that the tests will pass.  Are you sure they were not running
> in debug when you shut off buildbot?  If that's the case, why is there a
> .yml file for them?  I assumed this was just a typo.  I can disable them,
> but if so I'd like to point to someone saying that's the intent.  Is this
> comment enough?

I looked back in history and we never ran linux64 debug devtools e10s, the only debug devtools e10s we run is on windows 7.  I suspect we added the file, then realized while greening stuff up that it wasn't running on buildbot.  I would recommend deleting the file to avoid confusion in the future.
Flags: needinfo?(jmaher)
They run on Ash and I want them available on Try. Don't delete them please. Also, debug m-dt runs on OSX on all branches and on Win8 on the release branches, so yeah...
(In reply to Ryan VanderMeulen [:RyanVM] from comment #49)
> They run on Ash and I want them available on Try. Don't delete them please.

Sorry, with so much in discussion here I'm not sure which tasks you're referring to.  But whatever they are, I'll be happy not to delete them :)  Please clarify?

> Also, debug m-dt runs on OSX on all branches and on Win8 on the release
> branches, so yeah...

m-dt debug non-e10s *does* seem to be running for linux64.  This is specifically about the e10s variant of that job.

Anyway, since my intent here is to not make substantive changes to the task graph, and something I need to test 50 times counts as substantive, I will remove the cset from this list and implement a transform to omit those jobs.  If I've stumbled on an actual error, it can be fixed in another bug, either with a copy of attachment 8767313 [details] or by deleting the transform I will add.

Regarding the different timings for the debug vs opt m-bc, I'll add support for specifying maxRunTime by test platform, and drop that cset as well.
Flags: needinfo?(ryanvm)
(In reply to Dustin J. Mitchell [:dustin] from comment #50)
> Sorry, with so much in discussion here I'm not sure which tasks you're
> referring to.  But whatever they are, I'll be happy not to delete them :) 
> Please clarify?

This was in response to the end of comment 48, sorry. I'm just saying that we do in fact care about linux debug M-e10s(dt) even though it's not running on m-c, so please include it in whatever you're doing. I was also disputing the assertion that Win7 is the only platform we currently run debug M-e10s(dt) on in production. That's simply not true.
Flags: needinfo?(ryanvm)
Comment on attachment 8767310 [details]
Bug 1281004: remove -e10s from reftest TH symbols;

https://reviewboard.mozilla.org/r/61842/#review58674

Yes! Thank you for killing off that buildbot-ism once and for all :)
Attachment #8767310 - Flags: review?(ryanvm) → review+
Attachment #8767312 - Flags: review?(ryanvm) → review-
Comment on attachment 8767312 [details]
Bug 1281004: put the -e10s in the groupSymbol for marionette;

https://reviewboard.mozilla.org/r/61846/#review58676

r- pending an answer on the web-platform-test change.

::: taskcluster/ci/legacy/tasks/tests/fx_linux64_marionette_e10s.yml:27
(Diff revision 1)
>        name: marionette
>        flavor: marionette
>      treeherder:
>        groupName: Desktop marionette e10s
> -      symbol: Mn-e10s
> +      groupSymbol: tc-e10s
> +      symbol: Mn

I guess I see why you're making this change, but it's probably going to end up being temporary in nature.

I think the long-term plan for Marionette is that it'll be broken down into sub-suites by functional area, i.e. Mn(A B C) instead of lumping things into one job. In that case, Mn-e10s will probably end up being the group symbol for the e10s variants when the time comes. Given that I have no clue what the status/ETA on that work is, though, I'm cool with this change for now.

::: taskcluster/ci/legacy/tasks/tests/fx_linux64_web_platform_tests_e10s.yml:27
(Diff revision 1)
>      name: '[TC] Linux64 web-platform-tests-e10s-{{chunk}}'
>      description: Web platform e10s tests run {{chunk}}
>    extra:
>      suite:
>        name: web-platform-tests
> -      flavor: web-platform-tests-e10s
> +      flavor: web-platform-tests

Why this change?
(In reply to Ryan VanderMeulen [:RyanVM] from comment #52)
> Yes! Thank you for killing off that buildbot-ism once and for all :)

One other problem that I noticed with these jobs that I was intending to fix in bug 1271461 is that we currently specify different groupNames for different jobs within the same group, which is very confusing and nonsensical. Might make sense to clean that up while you're here.
Comment on attachment 8767313 [details]
Bug 1281004: only set MOZ_NODE_PATH for desktop;

https://reviewboard.mozilla.org/r/61848/#review58678

You should drop this patch for now. Bug 1242986 tracks getting these enabled by default on Linux debug. Right now, they're too failure-prone to do so in production. But please don't remove the task definitions! As I mentioned earlier, they *are* running on Ash and I also want them available for easy testing on Try so the devtools team can continue easily chipping away at the issues preventing bug 1242986 from moving forward.
Attachment #8767313 - Flags: review-
https://reviewboard.mozilla.org/r/61846/#review58676

> I guess I see why you're making this change, but it's probably going to end up being temporary in nature.
> 
> I think the long-term plan for Marionette is that it'll be broken down into sub-suites by functional area, i.e. Mn(A B C) instead of lumping things into one job. In that case, Mn-e10s will probably end up being the group symbol for the e10s variants when the time comes. Given that I have no clue what the status/ETA on that work is, though, I'm cool with this change for now.

Temporary is OK by me - it will be easy to change.  Note that the current groupName (inherited) is `tc`.

When this changes, the new functional-area-specific tests can just list `treeherder-symbol: tc-Mn(A)` and `(B)` etc. and the e10s-splitting code will automatically generate a `tc-Mn-e10s` group too.  And maybe by then we'll have gotten rid of the `tc-` prefix :)

> Why this change?

None of the other flavors include -e10s (instead, the flavors are equal between e10s and non-e10s tasks), so it's for consistency.  I don't know that the flavors currently feed into much of anything, but they might eventually be used for try-task selection, instead of the existing "try names" (which differ slightly).
Comment on attachment 8767312 [details]
Bug 1281004: put the -e10s in the groupSymbol for marionette;

https://reviewboard.mozilla.org/r/61846/#review58688
Attachment #8767312 - Flags: review- → review+
Comment on attachment 8767348 [details]
Bug 1281004: run all mochitest-gl-android on xlarge;

https://reviewboard.mozilla.org/r/61874/#review58692

Whoops, this does the complete opposite of what it says
Attachment #8767348 - Flags: review-
Attachment #8767348 - Flags: review?(gbrown)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #54)
> One other problem that I noticed with these jobs that I was intending to fix
> in bug 1271461 is that we currently specify different groupNames for
> different jobs within the same group, which is very confusing and
> nonsensical. Might make sense to clean that up while you're here.

Group names are now generated consistently using a dictionary mapping groupSymbol to name.  Score one for consistency :)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #55)
> You should drop this patch for now. Bug 1242986 tracks getting these enabled

Thanks for the bug link!  I've dropped it and, for the moment implemented it in special_cases.py in the new task generation system.  As this patch set evolves, I will implement that in a target task set method.
Comment on attachment 8767315 [details]
Bug 1281004: fix suite/flavor for android tests;

https://reviewboard.mozilla.org/r/61852/#review58862

::: taskcluster/ci/legacy/tasks/tests/fx_android-api-15_mochitest_gl.yml:17
(Diff revision 1)
>    extra:
>      chunks:
>        total: 10
>      suite:
> -      name: mochitest-gl-{{chunk}}
> +      name: mochitest
> +      flavor: mochitest-gl

The mochitest flavor names seem a little inconsistent: Shouldn't it be "mochitest-plain-clipboard" and "mochitest-gl", *or* "plain-clipboard" and "gl"?

::: taskcluster/ci/legacy/tasks/tests/fx_android-api-15_robocop.yml:16
(Diff revision 1)
>      description: Robocop run {{chunk}}
>    extra:
>      chunks:
>        total: 4
>      suite:
> -      name: robocop-{{chunk}}
> +      name: robocop

Be aware that robocop is a mochitest, depending on how you view it. The robocop harness shares a lot with mochitests and uses the mochitest logging format. On the other hand, it is specialized for Android, doesn't use the javascript parts of the mochitest harness, and is widely known as just "robocop".

On treeherder we've always grouped robocop with mochitests and used M(rc) for the symbol.

So I'm not sure if suite.name should be "robocop" or "mochitest"...probably robocop, but I thought I'd mention it anyway.
Attachment #8767315 - Flags: review?(gbrown) → review+
Attachment #8767347 - Flags: review?(gbrown) → review+
Comment on attachment 8767347 [details]
Bug 1281004: set same maxRunTime between opt and debug for android mochitests;

https://reviewboard.mozilla.org/r/61872/#review58868
(In reply to Dustin J. Mitchell [:dustin] from comment #58)
> Comment on attachment 8767348 [details]
> Bug 1281004: run all mochitest-gl-android on xlarge;
> 
> https://reviewboard.mozilla.org/r/61874/#review58692
> 
> Whoops, this does the complete opposite of what it says

If you still want to go ahead with this, be aware that buildbot mochitest-gl runs on slower aws instances for opt than debug. As I recall, we started everything on standard aws instances, then bumped up the debug jobs because they were too slow.
Comment on attachment 8767353 [details]
Bug 1281004: Specify test tasks more flexibly;

https://reviewboard.mozilla.org/r/61884/#review58902

wow, so organized! how did you do it? a couple thoughts below. please understand that I'm impressed by the way you structured this. it's very neat. just trying to think critically. :)

iiuc, if I was a someone outside platops and I wanted to stand up a new platform, platform variant, or test suite, the process would be editing one or a many of the following depending on my needs, correct:

* taskcluster/ci/desktop-test/command_construction.py
* taskcluster/ci/desktop-test/kind.yml
* taskcluster/taskgraph/transforms/tests/test_duplication.py
* taskcluster/ci/desktop-test/task_construction.py compat_rewrite_build_platform()
* taskcluster/ci/desktop-test/test-sets.yml
* taskcluster/ci/desktop-test/tests.yml

I think the layout makes sense given my familiarity but when these things are programatically done with edge cases, I imagine we will need pretty good docs on how to append to it. I forsee similar resistance like there has been to mozharness/buildbot.

side tiny nit thought: 'desktop-test/*' to me suggests desktop ff not desktop machine. might confuse android folks who are used to living in a 'mobile/*' dir in gecko.

::: taskcluster/ci/desktop-test/tests.yml:6
(Diff revision 1)
> +# Each stanza here describes a particular test suite or sub-suite.  These are
> +# processed through the transformations described in kind.yml to produce a
> +# bunch of tasks.  See the schema in `test-descriptions.py` for a description
> +# of the fields used here.
> +
> +cppunit:

when we support linux32, windows and osx, will you be changing this and all other test suites in this file to: `cppunit-linux64` and adding an equivalent for the other platforms?

::: taskcluster/taskgraph/transforms/tests/command_construction.py:228
(Diff revision 1)
> +
> +        yield test
> +
> +
> +@transforms.append
> +def fix_android_chunking(config, tests):

this might be a source of magic and potentially expand if future platforms have different chunking params

::: taskcluster/taskgraph/transforms/tests/task_construction.py:97
(Diff revision 1)
> +        test['dependencies'] = [(lambda taskgraph: test['build-label'], 'build')]
> +        yield test
> +
> +
> +@transforms.append
> +def compat_rewrite_build_platform(config, tests):

another place of magic as we mutate configuration after definition.

::: taskcluster/taskgraph/transforms/tests/test_duplication.py:23
(Diff revision 1)
> +
> +# TODO: merge this to test_description as a sub-section
> +
> +
> +@transforms.append
> +def split_e10s(config, tests):

do you forsee this as the way future variants similar to e10s will add their configuration? How will devs know to do so?
Attachment #8767353 - Flags: review+
As far as the task.extra.suite information, I'm aiming to keep it consistent with what the YAML files were generating.  To be completely honest, I have no idea what actually looks at that information (I don't think mozilla-taskcluster does anything with it, meaning it doesn't make it to treeherder), so it may ultimately be irrelevant.

I'll implement something to support different workerTypes based on platform, and keep the workerType split for mochitest-gl.
Attachment #8767353 - Flags: review+
Attachment #8767353 - Flags: feedback?(jlund)
Attachment #8767353 - Flags: feedback+
(In reply to Jordan Lund (:jlund) from comment #63)
> iiuc, if I was a someone outside platops and I wanted to stand up a new
> platform, platform variant, or test suite, the process would be editing one
> or a many of the following depending on my needs, correct:

ish, although some of those directories are incorrect.  I'm working on factoring the .py files in that list out a little so they are more clearly delineated.

> I think the layout makes sense given my familiarity but when these things
> are programatically done with edge cases, I imagine we will need pretty good
> docs on how to append to it. I forsee similar resistance like there has been
> to mozharness/buildbot.

I agree about needing good documentation.  Docs for this are coming (and tests.. it was written in an exploratory fashion so those did not precede the code).

What sort of resistance?

> side tiny nit thought: 'desktop-test/*' to me suggests desktop ff not
> desktop machine. might confuse android folks who are used to living in a
> 'mobile/*' dir in gecko.

I have since introduced a parallel 'android-test' kind, since the android tests are invoked quite differently.

> > +cppunit:
> 
> when we support linux32, windows and osx, will you be changing this and all
> other test suites in this file to: `cppunit-linux64` and adding an
> equivalent for the other platforms?

No, in fact it was a mistake to do this for android (and fixing that mistake led to me adding a new kind).  For these other desktop platforms, I'm hoping that the mozharness invocations are similar enough, modulo some platform details like path formatting, that we can use the same test descriptions.  That has the advantage of running "the same" tests across those platforms.  But depending on how things work out, we have other options -- maybe mochitest-gl is just so different on Windows that it's a completely different test.  The downside to that is, then we need to do funny things with try names to make them all match "mochitest-gl".

> > +def fix_android_chunking(config, tests):
> 
> this might be a source of magic and potentially expand if future platforms
> have different chunking params

Indeed.  I wrote it this way because I'd like to get rid of it (it appears there's a layer in mozharness that reverses this transform, so deleting both the layer and the transform should be a no-op).  However, if some platform's chunking args are irrecoverably different, a transform like this is an option.  Also, since Android is a different kind in my latest revisions of this patchset, I don't need to "fix" the chunking so much as just write it out Android-style to begin with.  At any rate, note again:
 * there are several ways to deal with this
 * the "magic" is isolated to one function

> > +def compat_rewrite_build_platform(config, tests):
> 
> another place of magic as we mutate configuration after definition.

Yeah, this is another one I want to get rid of.  I think the platform names are kind of a mess, but I don't want to change them in this bug, since it's much easier to be sure this patchset is correct if it generates the same task-graph as the YAML files did.

> > +def split_e10s(config, tests):
> 
> do you forsee this as the way future variants similar to e10s will add their
> configuration? How will devs know to do so?

Yes, I think this kind of boolean "old, new, or both" distinction will be a common means of migrating the codebase, and this is a pretty nice technique to do it.
A lot of bug activity here over the weekend! Getting hard to follow conversation threads

(In reply to Dustin J. Mitchell [:dustin] from comment #47)
> (In reply to Gregory Szorc [:gps] from comment #40)
> > Regarding chunking, we need to be moving away from the idea that separate
> > chunks are separate jobs. Instead, we need to think of groups of chunks as
> > single "tasks" with N execution units. I say this because continuing to
> > model separate chunks as separate tasks leaves us in a very bad place with
> > having to manage thousands of discrete options, whether they be task
> > definitions or task results (could you imagine Treeherder displaying 1000
> > letters if Mochitest is chunked into 1000 tasks!). Chunks are simply subsets
> > or slices of a single logical task. The fact that chunking occurs and how it
> > occurs are implementation details and should be abstracted away from the
> > user.

I completely agree! But I think that from a taskcluster perspective, not much needs to change to support this (maybe aside from a convenience API or two like Dustin suggests). I.e the onus of work here is on us, not the taskcluster team. I filed bug 1262834 awhile ago and I think the proposal in there pretty much exactly matches with what you're suggesting here. Tackling this will be a large task and would be scope bloat for the purposes of this refactor. Let's move discussion on this over there, I'll make this a blocker.
https://reviewboard.mozilla.org/r/61884/#review58914

I like this approach. I've used a transform-esque system before for dealing with manifestparser filtering. It's worked well and been really flexible. But it can also get confusing to figure out what is and isn't being applied. I still think we'll run into a bit of hairiness in the number/size of transforms down the road, but likely a lot less than we had with the .yaml files.

::: taskcluster/ci/desktop-test/test-sets.yml:19
(Diff revision 1)
> +# TODO: consider 'include-from', 'exclude-from' and 'exclude' to allow editing
> +# lists, since most special cases here will be all-tests without x and y
> +#
> +# TODO: make include etc. recursive
> +
> +all-tests:

I think 'all-tests' is misleading as it excludes android. I can imagine the number of test jobs that aren't included here will also grow in the future. As long as we're sticking with the android/desktop nomenclature, maybe this should be 'desktop-tests'? Though I sort of agree with Jordan's comment that 'desktop' can also be misleading.

::: taskcluster/ci/desktop-test/tests.yml:6
(Diff revision 1)
> +# Each stanza here describes a particular test suite or sub-suite.  These are
> +# processed through the transformations described in kind.yml to produce a
> +# bunch of tasks.  See the schema in `test-descriptions.py` for a description
> +# of the fields used here.
> +
> +cppunit:

Please alphabetize these, at least within the desktop/android sections

::: taskcluster/taskgraph/kind/test.py:86
(Diff revision 1)
> +
> +        # apply all of the transforms that will take the test descriptions
> +        # and pop out a bunch of task definitions
> +        # TODO: find some way to add context (test-name) to exceptions
> +        for xform in transforms:
> +            tests = xform(config, tests)

Might be nice to have a debug mode that can log the state of the config between each transform.

::: taskcluster/taskgraph/transforms/base.py:11
(Diff revision 1)
> +
> +import voluptuous
> +
> +# TODO: TransformSequence representing a sequence of transforms
> +
> +def validate_schema(schema, object, msg_prefix):

nit: object is a keyword collision
Attachment #8767353 - Flags: feedback?(ahalberstadt) → feedback+
Comment on attachment 8767307 [details]
Bug 1281004: delete mulet test jobs;

https://reviewboard.mozilla.org/r/61836/#review59184
Attachment #8767307 - Flags: review?(lissyx+mozillians) → review+
My apologies giving input any sooner. The last two weeks were brutal.

(In reply to Dustin J. Mitchell [:dustin] from comment #4)
>  - This allows customization on a per-test basis, but in cases where we want
> the same test to differ between test platforms (for example, linux opt is
> tier 2 while linux debug is tier 1), there's no way to make that change
> aside from creating new test descriptions.. which will turn out very similar
> to the existing maze of yml
> 
I generally prefer to let the data structures be complicated and lots of it as long as there is a UI tool that allows you to edit all of them at once. Saving the new setting generates all the data structures in disk and the developer can then push.

Same thing as to what ahal mentions on comment 8 [1]

dustin I like what you describe in comment 11.

I like the idea of guaranteeing data structure between phases by using json schemas [2]. It's like having data quality checks at each gate of the process!

I want to bring up a requirement that I hope we can have either as part of this or in a follow up.
I would like us to be able to have some test jobs to need to be declared explicitely on the try syntax.
This is for jobs that don't need to run by default.
This bug might not be the place to bring it up as the main focus is redefining the current data rather than scheduling manipulation but just in case! (as I see as another type of transformation).


[1]
(In reply to Andrew Halberstadt [:ahal] from comment #8)
> (In reply to Jordan Lund (:jlund) from comment #5)
> > now that you are in the thick mud, I wonder if it is worth reviewing the
> > idea of not being templated at all. going the whole opposite way where every
> > task has a complete single yml file and each one of those tasks (files) is
> > associated with a 'kind' by dir.
> 
> I would actually be all for this, I came to a similar conclusion awhile back
> as well:
> https://ahal.ca/blog/2014/part-1-sharing-code-not-always-good-thing/
> 
> My ideal scenario is no sharing of configs between tasks, but having a tool
> that can batch edit them if necessary. And if there is something that we can
> guarantee will always be the same for all tasks of a particular kind, it
> would just get baked into the actual kind itself. Maybe there could be a
> separate "kind config" or something. That way we could still have a place to
> share configs if we *really* want to, without opening the door to multiple
> inheritance.

[2]
(In reply to Dustin J. Mitchell [:dustin] from comment #16)
> 
> The latest bit is dividing the transforms into four files:
> 
> https://hg.mozilla.org/users/dmitchell_mozilla.com/mozilla-central/rev/
> 3d41961c9e92#l12.1 - test_description.py
> https://hg.mozilla.org/users/dmitchell_mozilla.com/mozilla-central/rev/
> 3d41961c9e92#l13.1 - test_duplication.py
> https://hg.mozilla.org/users/dmitchell_mozilla.com/mozilla-central/rev/
> 3d41961c9e92#l10.1 - command_construction.py
> https://hg.mozilla.org/users/dmitchell_mozilla.com/mozilla-central/rev/
> 3d41961c9e92#l11.1 - task_construction.py
> 
> many of which start with a strict schema for the incoming test descriptions
> to be transformed.  These schemas should help both document the fields (in
> such a way that nobody can add a new field without documenting it!) and
> detect bugs where the test descriptions don't look as they should.
>
Comment on attachment 8767308 [details]
Bug 1281004: add suite and flavor for web platform tests;

https://reviewboard.mozilla.org/r/61838/#review59220
Attachment #8767308 - Flags: review?(armenzg) → review+
Attachment #8767353 - Flags: review?(gps)
Attachment #8767353 - Flags: feedback?(jopsen)
Attachment #8767353 - Flags: feedback?(bugspam.Callek)
Attachment #8767353 - Flags: feedback+
Cancelling needinfo because I added myself as a reviewer.

Also, I'll file a MozReview bug to see if wiping out the feedback flag when I added myself as a reviewer using the MozReview web interface was really intended.
Flags: needinfo?(gps)
Comment on attachment 8767353 [details]
Bug 1281004: Specify test tasks more flexibly;

Restoring accidentally cleared flags.
Attachment #8767353 - Flags: feedback?(jopsen)
Attachment #8767353 - Flags: feedback?(bugspam.Callek)
https://reviewboard.mozilla.org/r/61874/#review59252

The patch doesn't seem to match the commit message?

Also, I doubt we'll get much (any?) performance wins from moving mochitest to larger instances. mochitest only runs once test at once, so throwing more cores at it won't improve matters. The main reasons to use a bigger instance would be to get faster cores, more memory, etc. Although I thought frequency was pretty similar for C3 and C4 instances.
Comment on attachment 8767351 [details]
Bug 1281004: vendor voluptuous;

https://reviewboard.mozilla.org/r/61880/#review59254

I'll rubber stamp r+ this. Although I'd like to see a consumer of it before this lands and forever becomes part of history.

Speaking of landing, it looks like most of this series is good to land. I vote for landing what you have ready and splitting the refactoring into another bug.
Attachment #8767351 - Flags: review?(gps) → review+
(In reply to Dustin J. Mitchell [:dustin] from comment #47)
> (In reply to Gregory Szorc [:gps] from comment #40)
> > Regarding chunking, we need to be moving away from the idea that separate
> > chunks are separate jobs. Instead, we need to think of groups of chunks as
> > single "tasks" with N execution units. I say this because continuing to
> > model separate chunks as separate tasks leaves us in a very bad place with
> > having to manage thousands of discrete options, whether they be task
> > definitions or task results (could you imagine Treeherder displaying 1000
> > letters if Mochitest is chunked into 1000 tasks!). Chunks are simply subsets
> > or slices of a single logical task. The fact that chunking occurs and how it
> > occurs are implementation details and should be abstracted away from the
> > user.
> 
> From a TaskCluster perspective, a task only runs on one system at a time,
> and there's no notion of an "execution unit".  If we renamed TC tasks to
> "execution units", then we'd have to call it ExecutionUnitCluster, and
> that's just not as catchy.  A *job*, on the other hand, might someday be
> redefined to encompass multiple tasks, at least as far as display in
> treeherder is concerned.  Even with that redefinition, we'll still need to
> generate the constituent tasks individually.  At 1000 chunks, we might
> consider some kind of parameterized queue.createMultipleTasks(..) that can
> be given a task and the range [1,1000] and insert 1000 tasks into the queue
> with one API call.  But we can deal with that when the time comes.
> 
> At any rate, for the moment my refactor is designed to maintain practically
> identical task output, and that means the current process of generating
> chunks with each having a different TH symbol.

I agree with this. I think Treeherder is the thing that should coalesce multiple execution units into a single logical unit.

I'm just severely opposed to the idea of having multiple task/job files for each chunk. Chunks should be parameters.

I think we're on the same page, so nothing to worry about :)
https://reviewboard.mozilla.org/r/61874/#review59252

I'm not moving anything anywhere in this patch.  And you're right, this one does the opposite of what it says (and has now been removed), which is why I cleared the r? :)
https://reviewboard.mozilla.org/r/61880/#review59254

Thanks!  It's used in the last commit -- or at least, used in the next review push I make.

Regarding landing -- yes, I will start to land the r+'d patches.  We'll see how mozreview deals with that!!
(In reply to Armen Zambrano [:armenzg] - Engineering productivity from comment #69)
> I want to bring up a requirement that I hope we can have either as part of
> this or in a follow up.
> I would like us to be able to have some test jobs to need to be declared
> explicitely on the try syntax.
> This is for jobs that don't need to run by default.
> This bug might not be the place to bring it up as the main focus is
> redefining the current data rather than scheduling manipulation but just in
> case! (as I see as another type of transformation).

Yes, that's bug 1272159 and it will be quite a bit easier to implement after this one is finished.  Not that I'm promising to implement it :)
Blocks: 1244176
Comment on attachment 8767307 [details]
Bug 1281004: delete mulet test jobs;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61836/diff/1-2/
Attachment #8767313 - Attachment description: Bug 1281004: run mochitest/devtools-chrome e10s for debug, too; → Bug 1281004: set MOZ_NODE_PATH the same on android and non-;
Attachment #8767352 - Attachment description: Bug 1281004: factor out searching for python objects by path (NEEDS TESTS) → Bug 1281004: factor out searching for python objects by path;
Attachment #8767313 - Flags: review?(gbrown)
Attachment #8767313 - Flags: review-
Attachment #8767352 - Flags: review?(gps)
Attachment #8767353 - Flags: feedback?(jopsen)
Attachment #8767353 - Flags: feedback?(bugspam.Callek)
Comment on attachment 8767308 [details]
Bug 1281004: add suite and flavor for web platform tests;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61838/diff/1-2/
Comment on attachment 8767309 [details]
Bug 1281004: run all web platform tests on desktop-test-xlarge;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61840/diff/1-2/
Comment on attachment 8767310 [details]
Bug 1281004: remove -e10s from reftest TH symbols;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61842/diff/1-2/
Comment on attachment 8767311 [details]
Bug 1281004: run e10s jsreftests in 2 chunks like non-e10s;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61844/diff/1-2/
Comment on attachment 8767312 [details]
Bug 1281004: put the -e10s in the groupSymbol for marionette;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61846/diff/1-2/
Comment on attachment 8767315 [details]
Bug 1281004: fix suite/flavor for android tests;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61852/diff/1-2/
Comment on attachment 8767313 [details]
Bug 1281004: only set MOZ_NODE_PATH for desktop;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61848/diff/1-2/
Comment on attachment 8767349 [details]
Bug 1281004: Temporary scripts to analyze and compare task graphs

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61876/diff/1-2/
Comment on attachment 8767350 [details]
Bug 1281004: disable legacy test generation

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61878/diff/1-2/
Comment on attachment 8767351 [details]
Bug 1281004: vendor voluptuous;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61880/diff/1-2/
Comment on attachment 8767352 [details]
Bug 1281004: factor out searching for python objects by path;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61882/diff/1-2/
Comment on attachment 8767353 [details]
Bug 1281004: Specify test tasks more flexibly;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61884/diff/1-2/
Attachment #8767314 - Attachment is obsolete: true
Attachment #8767347 - Attachment is obsolete: true
Attachment #8767348 - Attachment is obsolete: true
Comment on attachment 8767353 [details]
Bug 1281004: Specify test tasks more flexibly;

Not looking for review on this one yet -- it still needs tests and docs, and has a number of TODO's left.

Overall, the big changes here, aside from responding to the reviews above, are:
 * break android tests out to their own kind 
 * segment out the transformations nicely
   * generic to any task (make_task.py)
   * specific to test kinds tests/*.py
   * specific kinds (tests/{desktop,android}_test.py
 * use voluptuous schemas to document and enforce data formats

A lot of this is forward-looking: to supporting buildbot-bridge and taskcluster worker, to supporting builds using a similar approach.

Per discussion with :gerard-majax, I've left mulet out.  Given recent changes to re-introduce it to mozilla-central, I think we can work *after* landing this patch to get those working using this new system.

The first few patches here are already r+'d, so as :gps recommended I'll start landing those in the morning after running them through a try push.
Attachment #8767353 - Flags: review?(gps)
Comment on attachment 8767313 [details]
Bug 1281004: only set MOZ_NODE_PATH for desktop;

https://reviewboard.mozilla.org/r/61848/#review59360

We don't define MOZ_NODE_PATH for Android on buildbot.  Defining MOZ_NODE_PATH starts additional servers during xpcshell tests and runs additional tests -- but those tests fail on Android.  You could define MOZ_NODE_PATH and skip those tests on Android, but it seems better to leave it the way it is (less overhead, not running servers that are not used).
Attachment #8767313 - Flags: review?(gbrown) → review-
https://reviewboard.mozilla.org/r/61876/#review59374

As I said before, a means to analyze/compare graph changes would be very useful. I've hit that a few times already, so if this can live on in some way after-this-bug I'd appreciate it. (Just look at how many times we've used the "builder diff" stuff and "dump master" stuff with buildbot, and those are even harder to get things setup)
Comment on attachment 8767352 [details]
Bug 1281004: factor out searching for python objects by path;

https://reviewboard.mozilla.org/r/61882/#review59376

This looks like a good refactor to me, (though I wouldn't land this in-between the patch for disabling legacy tests and the test kinds)
Attachment #8767352 - Flags: review+
https://reviewboard.mozilla.org/r/61878/#review59378

I'd either make sure this is attached to (same patch) the actual desktop-test and android-test kind addition or at least lands with the two of them being next to each other in DAG (so that bisect attempts have less to barf on).

I can even see landing the entire new kinds with a short-circut early "return None" kind of thing, which gets reverted/changed with this patch (so if the main thing lands, and has an issue needing backout, we can leave the larger piece of code inplace.
https://reviewboard.mozilla.org/r/61884/#review59382

Whew! -- This was hefty.

I mostly skimmed, especially the passed values to tests, test names, etc. (that is all presumably validated with your diffing)

Overall I like this approach, but I still do have the worry that it will quickly grow out of control unless we implement some debugging at some point. (e.g. with --debug a WARN that a transform didn't change a task, or an arg to print before/after transform full-task-graph output and diff it, etc.)

Additionally I don't see anywhere in this any comprehension on how we'll limit things to "only run this on branch X or try" etc (but I admit I may have missed it) -- That is something we'll need as we migrate and bring up more things, and though I know its part of the target phase, knowing how to identify/specify/etc those details will be important.

Keep up the good work!

::: taskcluster/ci/android-test/kind.yml:16
(Diff revision 2)
> +
> +# descriptions of the tests to run
> +tests-config: tests.yml
> +
> +# Python paths to transforms to apply to the tests in test-config,
> +# in order.  Lists of transsforms are flattened.

Nit: typo in comment: 'transsforms'

::: taskcluster/ci/android-test/test-platforms.yml:12
(Diff revision 2)
> +# Each test platform further specifies the set of tests that will be scheduled
> +# for the platform, referring to tests defined in test-sets.yml.
> +#
> +# Note that set does not depend on the tree; tree-dependent job selection
> +# should be performed in the target task selection phase of task-grpah
> +# generation.

Comment could be improved with a link to docs on 'how' and some note on if you want 'all' tests defined here and filtered out in target or if you do the opposite (add them in 'target')

I *think* I know the answer here, but a year from now I may forget, and new people to this code may not even know.

::: taskcluster/ci/android-test/tests.yml:13
(Diff revision 2)
> +# substantially from desktop.
> +
> +# Note that these are in lexical order
> +
> +# use YAML anchors and aliases to keep the test definitions a little shorter
> +_mozharness-config-anchor: &mozharness-config

I'm personally not familiar with "YAML anchors and aliases" and I'm not convinced that its a good idea here (more complexity amid different ways to do stuff).

I agree that "a little shorter" is probably useful though, so I admittedly don't have an alternative suggestion, other than maybe a doc link to what they are ;-)

::: taskcluster/ci/android-test/tests.yml:21
(Diff revision 2)
> +        - mozharness/configs/android/androidarm_4_3-tc.py
> +
> +cppunit:
> +    description: "CPP Unit Tests"
> +    suite: cppunittest
> +    aliases: [TODO]

I'd say that having a stanza at the top for what "aliases" is/will-be would be useful, even though its TODO throughout the file.

::: taskcluster/ci/android-test/tests.yml:26
(Diff revision 2)
> +    aliases: [TODO]
> +    treeherder-symbol: tc(Cpp)
> +    e10s: false
> +    loopback-video: true
> +    run:
> +        implementation: mozharness

fwiw, I'd love docs on implementation-options, and maybe we do it differently:

run:
    mozharness:
        script: ...
        ...

And:
run:
    bash:
        - arg0
        - arg1
        - ...

etc.

(I do see the code only uses mozharness, and you have a TODO below about that so it can be followup fodder.)

::: taskcluster/docs/transforms.rst:23
(Diff revision 2)
> +to look like TaskCluster task definitions, with a payload appropriate to the
> +worker type.
> +
> +Other Stuff To Document
> +
> +* YAML anchors/aliases

Ahhh I see you mention YAML anchors here, though admittedly I still stand by my "the code should have a docs pointer"

::: taskcluster/taskgraph/transforms/make_task.py:77
(Diff revision 2)
> +        # treeherder.machine.platform and treeherder.collection or
> +        # treeherder.labels
> +        'platform': basestring,
> +
> +        # treeherder environments (defaults to both staging and production)
> +        Optional('environments'): [basestring],

Optional('environments'): ['staging', 'production']
?

(I'm not aware of any other valid values)

::: taskcluster/taskgraph/transforms/tests/android_test.py:40
(Diff revision 2)
> +    #
> +    # Within the mozharness scripts, there is a translation *back* to
> +    # --this-chunk/--total-chunk.
> +    #
> +    # TODO: remove the need for this with some changes to the mozharness script
> +    # to take --total-chunk/this-chunk

nit: (mentioned here only, but applicable everywhere)

I would prefer pythonic docstring documentation of transforms, rather than # style comments.

::: taskcluster/taskgraph/transforms/tests/desktop_test.py:25
(Diff revision 2)
> +
> +@transforms.add
> +def set_defaults(config, tests):
> +    for test in tests:
> +        # all Android test tasks download internal objects from tooltool
> +        test['run']['build-artifact-name'] = 'public/build/target.tar.bz2'

Obsolete comment? (Android test tasks...)

::: taskcluster/taskgraph/transforms/tests/make_task_description.py:260
(Diff revision 2)
> +            command.append('--this-chunk={}'.format(test['this-chunk']))
> +        elif mozharness['chunking-args'] == 'test-suite-suffix':
> +            suffix = mozharness['chunk-suffix'].replace('<CHUNK>', str(test['this-chunk']))
> +            for i, c in enumerate(command):
> +                if isinstance(c, basestring) and c.startswith('--test-suite'):
> +                    command[i] += suffix

A lot of this function (especially the chunking stuff) feels like it should really be part of its own transform somehow, especially in its own section (like the android transforms).

I don't know a great way to split this up in that way though, offhand. Without doing some sort of overly complex way to chain transforms together...

(I hate having to rely on magic ordering of execution here)

::: taskcluster/taskgraph/transforms/tests/make_task_description.py:274
(Diff revision 2)
> +def buildbot_bridge_setup(config, test, task):
> +    # N.B. This is un-tested!
> +    task['worker'] = copy.deepcopy(test['worker'])
> +    ss = task['worker'].setdefault('sourcestamp', {})
> +    ss['branch'] = config.params['project']
> +    ss['revision'] = config.params['head_rev']

Is there anyway we can improve testing of this before it lands? (noteworthy, nothing in legacy, per dxr, is using buildbot-bridge)

::: taskcluster/taskgraph/transforms/tests/make_task_description.py:280
(Diff revision 2)
> +
> +
> +@transforms.add
> +def make_task_description(config, tests):
> +    for test in tests:
> +        mozharness = test['mozharness']

Seems to assume that everything will be run through mozharness, thats not true at least in releasetasks, nor will it be true in the future (like signingworker).

Probably not worth magic complexity now, as long as we're aware of and thinking of that future.

::: taskcluster/taskgraph/transforms/tests/test.py:58
(Diff revision 2)
> +    # used to select this test in try syntax
> +    'aliases': [basestring],
> +
> +    # the symbol, or group(symbol), under which this task should appear in
> +    # treeherder.
> +    'treeherder-symbol': basestring,

For this I'd suggest using a validation function for foo(bar) or just 'bar' syntax.

https://pypi.python.org/pypi/voluptuous/#validation-functions

Theres lots of other things you may want to do with voluptuous here, like:
    'suite': All(basestring, Length(min=1))

::: taskcluster/taskgraph/transforms/tests/test.py:68
(Diff revision 2)
> +
> +    # Whether to run this task with e10s.  If false, run without e10s; if true, run
> +    # with e10s; if 'both', run one task with and one task without e10s.  E10s
> +    # tasks have "-e10s" appended to the test name and treeherder group.
> +    Required('e10s', default='both'): Any(
> +        bool, 'both',

At your complexity discretion, we can allow a case-insensitive 'both' here with:

Any(
    bool,
    All(str, Coerce(lambda x: x.lower()), 'both'),
    {....},
)

::: taskcluster/taskgraph/transforms/tests/test_description.py:36
(Diff revision 2)
> +
> +
> +@transforms.add
> +def validate(config, tests):
> +    for test in tests:
> +        yield validate_schema(test_schema, test, "In test {!r}:".format(test['test-name']))

I'm assuming the intent is to validate more than once (initially, then {android,desktop} transforms, then validate again... which makes sense to me.

::: taskcluster/taskgraph/transforms/tests/test_description.py:49
(Diff revision 2)
> +def set_tier(config, tests):
> +    for test in tests:
> +        # only override if not set for the test
> +        if 'tier' not in test:
> +            if test['test-platform'] == 'linux64/debug':
> +                test['tier'] = 1

Things like this could use, (imo) some sort of debug flag so that local runs can know "what is going on"

something like log.debug("Tier not set, platform is linux64/debug marking as tier 1")  and log.debug("Tier not set, platform is not linux64/debug marking as tier 2")

(Applicable elsewhere) [followup fodder]

::: taskcluster/taskgraph/transforms/tests/test_description.py:63
(Diff revision 2)
> +        if 'expires-after' not in test:
> +            # try jobs should expire after 1 weeks; everything else is one day
> +            if config.params['project'] == 'try':
> +                test['expires-after'] = "14 days"
> +            else:
> +                test['expires-after'] = "1 year"

comment does not match code:

"1 year" !== "one day", and "14 days" !== "1 weeks" -- I'm not sure which is intended (probably the code)

::: taskcluster/taskgraph/transforms/tests/test_description.py:132
(Diff revision 2)
> +        testdesc['expires-after'] = test['expires-after']
> +        testdesc['chunks'] = test['chunks']
> +        testdesc['this-chunk'] = test['this-chunk']
> +
> +        # TODO: platforms are a mess; for now, be compatible with what we have
> +        # been putting out, despite that differing from the inputs

I feel like this "th_machine_platform" munging should be its own transform

::: taskcluster/taskgraph/transforms/tests/test_description.py:176
(Diff revision 2)
> +        mozharness['build-label'] = test['build-label']
> +        try:
> +            mozharness['build-artifact-name'] = test['run']['build-artifact-name']
> +        except KeyError:
> +            import logging
> +            logging.getLogger().error(str(test['run']))

Need better error message here

::: taskcluster/taskgraph/transforms/tests/test_description.py:201
(Diff revision 2)
> +
> +        worker['implementation'] = 'docker-worker'
> +        worker['allow-ptrace'] = True  # crash-reporter needs this everywhere
> +        worker['loopback-audio'] = test['loopback-audio']
> +        worker['loopback-video'] = test['loopback-video']
> +        worker['max-run-time'] = get_keyed_by(test, 'max-run-time', test['test-name'])

Looking at get_keyed_by in use, I'd prefer its args passed by keyword rather than argc order (would make the code easier to read for me) -- absolutely non-blocking though

::: taskcluster/taskgraph/util/treeherder.py:21
(Diff revision 2)
> +    return groupSymbol, symbol
> +
> +
> +def join_symbol(group, symbol):
> +    """Perform the reverse of split_symbol, combining the given group and
> +    symbol.  If the grou is '?', then it is omitted."""

Typo: "If the grou is"
Thanks -- good points, which I'll address.  What to run on which branch *does* need to be addressed here (since legacy did so), and hasn't been yet.  It'll probably be pretty easy to support `try-by-default: false` in that case, too :)

The debugging thing: I agree it's important, but I'm not sure how to support it.  We're running 700+ tasks through this transform already, and probably many thousands by the time everything is in here.  So even logging just one line for a single transform will give you thousands of lines of logging output.  Filtering logging down to a single test/task is a little hard since they change names and multiply throughout the transformation process.  I'm open to ideas..
(In reply to Dustin J. Mitchell [:dustin] from comment #99)
> Thanks -- good points, which I'll address.  What to run on which branch
> *does* need to be addressed here (since legacy did so), and hasn't been yet.
> It'll probably be pretty easy to support `try-by-default: false` in that
> case, too :)

I look forward to it! try-by-default is not my only use-case here though, tbh, (things like project-branches which may want to iterate on greening up tests and still regularly land/merge with central)

> The debugging thing: I agree it's important, but I'm not sure how to support
> it.  We're running 700+ tasks through this transform already, and probably
> many thousands by the time everything is in here.  So even logging just one
> line for a single transform will give you thousands of lines of logging
> output.  Filtering logging down to a single test/task is a little hard since
> they change names and multiply throughout the transformation process.  I'm
> open to ideas..

Roughly speaking, my ideas:

`--debug --debug` Which would let transform specific things log as they do their work (or just --debug) and that logging won't be by-default for sure.  And yes, it *will* be a lot of output.

Alternatively `--debug-transform='<name>'` which if name matches (exactly, not substring) we'll get debug logging when it changes something somehow. Its mostly a mental thing here, but certainly a hard balance between too-much-output-to-have-anything-useful, and I-need-more-output.
Hm, how about a decorator:

@transform.add
@debug_transform({'test-name': 'mochitest-plain'})
def my_transform(config, tests):
   ..

where the argument to the decorator is a filter for the input objects that it should log.  Just about anyone wanting to debug a transform is probably hacking on it, so adding a decorator to the definition like this wouldn't be any trouble, is nice and clean, and would beat command-line args.
(In reply to Dustin J. Mitchell [:dustin] from comment #101)
> Hm, how about a decorator:
> 
> @transform.add
> @debug_transform({'test-name': 'mochitest-plain'})
> def my_transform(config, tests):
>    ..
> 
> where the argument to the decorator is a filter for the input objects that
> it should log.  Just about anyone wanting to debug a transform is probably
> hacking on it, so adding a decorator to the definition like this wouldn't be
> any trouble, is nice and clean, and would beat command-line args.

That would probably work,

My own personal use-case is in terms of cleanup I have wanted to do with buildbot configs:

* Is this transform (loop) still useful, does it still even do anything!?
* If I modify this transform in any way, does it break/adjust unexpected things, even if the output is fully valid.
* I am adding a new task/test and want/need to see what transforms are being applied so I can make sure the end result is actually how I intend.
* I think I need to write a new transform, and want to make sure that specific transform is doing what I want

The last option is the only one that I envision being easily solved by this proposal though. I'd also think that making specific debugging easier a followup (as long as the main use-case of a diff of sorts happens) is completely ok.
https://reviewboard.mozilla.org/r/61884/#review59462

I like this approach! I feel like most of the important knobs are in YAML and approachable by newcomers while the low-level/power functionality is in Python and available to advanced hackers. Feels like a good compromise.

I identified some nits along the way. I like what I see!

::: taskcluster/taskgraph/kind/test.py:43
(Diff revision 1)
> +        self.group_names = self.load_yaml(path, config, 'group-names-config')
> +
> +    def load_yaml(self, path, config, name):
> +        """Load a named YAML config file from the kind directory"""
> +        filename = os.path.join(path, config[name])
> +        with open(filename) as f:

Nit: always pass the 2nd argument to open(). It defaults to 'r' which opens files in text mode and can normalize line endings. You almost always want 'b' in the open mode to preserve the original bits. So use 'rb' here.

::: taskcluster/taskgraph/kind/test.py:47
(Diff revision 1)
> +        filename = os.path.join(path, config[name])
> +        with open(filename) as f:
> +            return yaml.load(f)
> +
> +
> +class TestTask(base.Task):

docstring please!

::: taskcluster/taskgraph/kind/test.py:55
(Diff revision 1)
> +    @classmethod
> +    def load_tasks(cls, kind, path, config, params, loaded_tasks):

This isn't Java: not everything needs to be in a class. I feel like some of these @classmethod could be module-level functions. Or am I missing inheritance that makes it necessary for these to be attached to a class?
Comment on attachment 8767352 [details]
Bug 1281004: factor out searching for python objects by path;

https://reviewboard.mozilla.org/r/61882/#review59468

::: taskcluster/taskgraph/util/python_path.py:18
(Diff revision 2)
> +        def find_object(modulepath, objectpath):
> +            import <modulepath> as mod
> +            return mod.<objectpath>
> +    """
> +    if path.count(':') != 1:
> +        raise TypeError(

This should be ValueError.

TypeError is when the Python type doesn't match. ValueError is when the value of the expected type isn't good.

But this is an existing issue. Fix it in flight or drop it.
Attachment #8767352 - Flags: review?(gps) → review+
https://reviewboard.mozilla.org/r/61884/#review59382

> I'd say that having a stanza at the top for what "aliases" is/will-be would be useful, even though its TODO throughout the file.

There's a stanza in the schema:

    # other names, in addition to the name of the stanza, that can be
    # used to select this test in try syntax
    'aliases': [basestring],

In general, I think that making all of this self-documenting, without writing all comments everywhere (there's already a lot of duplicated comments!) is going to be difficult.

> A lot of this function (especially the chunking stuff) feels like it should really be part of its own transform somehow, especially in its own section (like the android transforms).
> 
> I don't know a great way to split this up in that way though, offhand. Without doing some sort of overly complex way to chain transforms together...
> 
> (I hate having to rely on magic ordering of execution here)

I had the chunking stuff split off for a while, but given the number of parameters I needed to pass, it didn't turn out to be more readable.

I assume by "magic ordering", you're meaning the ordering of the different transform files?

> Is there anyway we can improve testing of this before it lands? (noteworthy, nothing in legacy, per dxr, is using buildbot-bridge)

It's here more as an example and an enticement :)

> Seems to assume that everything will be run through mozharness, thats not true at least in releasetasks, nor will it be true in the future (like signingworker).
> 
> Probably not worth magic complexity now, as long as we're aware of and thinking of that future.

Right, but releasetasks and signingworker aren't tests, so that's OK :)

> For this I'd suggest using a validation function for foo(bar) or just 'bar' syntax.
> 
> https://pypi.python.org/pypi/voluptuous/#validation-functions
> 
> Theres lots of other things you may want to do with voluptuous here, like:
>     'suite': All(basestring, Length(min=1))

I don't want to get too complicated with voluptuous -- it's mainly here to document the keys present in dictionaries, and double-check that they are provided.  Doing much more would reduce the readability of the schema.

> I feel like this "th_machine_platform" munging should be its own transform

Hopefully it will just go away when we straighten out platforms.
https://reviewboard.mozilla.org/r/61884/#review59462

> This isn't Java: not everything needs to be in a class. I feel like some of these @classmethod could be module-level functions. Or am I missing inheritance that makes it necessary for these to be attached to a class?

Ouch, that Java comment smarts ;)

No, there's no technical or inheritance-related reason that these need to be class methods.  I like that it emphasizes "load objects of this class", but that's about the only justification I have.  That said, I just did a pretty substantial refactor of this (from having kind classes, which were basically task metaclasses, so an improvement) in bug 1280231, so I think I'll leave this as-is for now.  It should be a fairly straightforward change to make later, if it turns out to be bothersome.
I'm sorry, but I had to back out the landing from comment 105 because mozilla-central didn't cleanly merge into mozilla-inbound. These are the new tree rules (autoland repo takes merge precedence over inbound). I think the rebase will be simple since one side of the merge involved a file delete. I didn't follow this bug closely enough to know whether changes to a deleted file need to be made elsewhere, however.

https://hg.mozilla.org/integration/mozilla-inbound/rev/388227afc2e7
Blocks: 1285269
The latest work is getting close to ready for review and landing.  It's at

  https://hg.mozilla.org/users/dmitchell_mozilla.com/mozilla-central/rev/bug1281004@8

I have encountered a design quandry, though.  I need to support a few test tasks not being "green" (bug 1242986), and I want to do so in a general way that will support lots of use-cases.  My understanding is:

 1. in cases like this we have non-green tests in-tree that should not run on integration branches, m-c, release, or try
 2. we have cases for things not running on try by default (bug 1244176)
 3. in most bug conversations I've read, I see people saying "this should run on trees X and Y but not A and B"

The information I lack is:

 * Are 1 and 2 the same case?  That is, if I implement a "greening-up" property for tests, and set that to false for things that aren't greened up yet, would that completely cover the desire for things to not run on try, or would there be exceptions either way (things that aren't greened up but should run on try, and things that are greened up and running on m-c but shouldn't run on try)?  This might look like

  mochitest-cookies:
      e10s: both
      greening-up: true
      mozharness:
          script: scripts/edibles/cookies.py

 * Is #3 a hint that I should try to implement this as a per-branch blacklist instead?  Something like {
    'mozilla-central': [
        exclude_linux64_debug_e10s_m_dt,
    ],
    'try': [
        exclude_this_thing_that_shouldnt_be_on_try,
        exclude_linux64_debug_e10s_m_dt,
    ],
}

Or, to ask at a different level, in what form do 80% of the requests regarding which-job-on-which-trees come?
I originally added buildbot's "not on try by default" mechanism, though I don't know everything it was used for. Thinking back on its various uses, I think the majority were for #1. Though they weren't always non-green *tests*; with the orchestration between 3 different repos required to add things to buildbot, my workflow tended to be to crash-land a broken job but have it off by default, then hack on it to get it actually working without needing further buildbot changes. It was just a normal part of the development workflow. That is no longer necessary, thankfully.

But I also had some jobs that fit under #2 but not #1.

Sometimes it was a resource thing -- an expensive task that I want to run periodically, but not on every push, so I put it up as a nondefault try job and run it manually as needed. Honestly, more flexible scheduling would probably be more appropriate for these than the big hammer of having them off by default. (Also, these are tasks that I wanted to be available to other people, though I confess I'm not sure if anyone else ever pushed them.)

Sometimes it was really an on-demand custom build. It enabled some non-default build option, and was there for people to try out against the current code base at any time. Given that it's straightforward to create new in-tree tasks now, I don't think there's much need for an explicit mechanism anymore.

That's all I'm remembering, though it feels like there were others. Still, apart from better control over scheduling for expensive jobs, I don't think I personally have any remaining need for the #2-but-not-#1 category. So place one anecdote in that bucket.

A specific example: the hazard builds generate some huge intermediate files that are useful in their own right. (eg, I have an IRC bot that uses one of them.) It seemed like it would require too much disk space to archive every push's output, when I only really needed one per day or so. When I moved the jobs to TC, it was straightforward to parse the top commit message for a flag requesting the full output to be uploaded. I now do try pushes with that flag whenever I need them. And as for that one specific output file, I just stopped worrying and now it's uploaded for every hazard build. Which is kind of gross, but I compress it and it's small compared to eg a full debug build with symbols.
the main reason things are only run on certain branches (i.e. ash/try) is that they are in development and need work/time to get green.  Sometimes this is a temporary thing, sometimes it is longer term.  In a few cases we only have stuff on try as these are useful for more information (lets say code coverage).

echo'ing sfink, why we run stuff not by default on try is primarily due to lack of resources (physical machines and backlog), this will always be the case as long as we use physical machines.

I do not think #1 and #2 are the same things.

as for #3, per branch exceptions seem right.  For example, a new job on nightly will ride the trains to aurora/beta/release, or jobs on ash are in the greening up state.  I believe requests here are people working on projects and they add the jobs (like RyanVM  and Ash)
Thanks!  So given the above, the way I'm breaking this down is as follows:

 * For greening up, configuration of the test suite should get merged around at the same time as the green suite.  So if you want to green up in try, you can just keep a cset configuring the test suite in your patch queue underneath the test-greening work, until things are ready to merge.  If you are working on a twig like ash, you can land that same test-suite configuration patch on the twig, and avoid merging it to an integration branch until it's partnered with green tests.  As Steve hinted, the distinction between *enabling* a test and the test going green was a result of the way Buildbot operated, and is no longer necessary.

 * For disabling things by default in try (bug 1244176), we can introduce a new test property (try-by-default, with default value true) that can be reflected into a task attribute and used to implement "-t all" and "-u all" in try_option_syntax.py.  I'm going to leave that for bug 1244176.

 * Try-only jobs can be implemented by adding a bit of filtering to the all_builds_and_tests target task method -- either filtering out specific jobs by pattern there, or introducing a new attribute (maybe "try-only"?) and filtering just by that attribute.  According to my testing, nothing in the task-graph works this way right now, so I won't implement it in this bug.

 * Per-branch requests, like greening-up, are best accomplished by controlling propagation of the patch that enables a test.  For example, there's no need for code to check project=="ash" to enable tc-M-e10s(dt) on linux64/debug -- just enable it unconditionally in that branch, and don't merge the enabling cset until it's ready.  If there are policy-based exceptions to this (for example, maybe there are tests which should never be run on aurora/beta/release, regardless of trains), we can of course represent them in clear, well-described code.  I just want to avoid a proliferation of a-la-carte branch selection functions.

All of this is easier than I feared, so I think this patch set is about done.  Just some minor tweaks and testing in try remain.
Comment on attachment 8767307 [details]
Bug 1281004: delete mulet test jobs;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61836/diff/2-3/
Attachment #8767313 - Attachment description: Bug 1281004: set MOZ_NODE_PATH the same on android and non-; → Bug 1281004: only set MOZ_NODE_PATH for desktop;
Attachment #8767353 - Attachment description: Bug 1281004: Specify test tasks more flexibly → Bug 1281004: Specify test tasks more flexibly;
Attachment #8767313 - Flags: review- → review?(gbrown)
Attachment #8767353 - Flags: review?(gps)
Attachment #8767353 - Flags: review?(gbrown)
Comment on attachment 8767308 [details]
Bug 1281004: add suite and flavor for web platform tests;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61838/diff/2-3/
Comment on attachment 8767309 [details]
Bug 1281004: run all web platform tests on desktop-test-xlarge;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61840/diff/2-3/
Comment on attachment 8767310 [details]
Bug 1281004: remove -e10s from reftest TH symbols;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61842/diff/2-3/
Comment on attachment 8767311 [details]
Bug 1281004: run e10s jsreftests in 2 chunks like non-e10s;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61844/diff/2-3/
Comment on attachment 8767312 [details]
Bug 1281004: put the -e10s in the groupSymbol for marionette;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61846/diff/2-3/
Comment on attachment 8767315 [details]
Bug 1281004: fix suite/flavor for android tests;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61852/diff/2-3/
Comment on attachment 8767351 [details]
Bug 1281004: vendor voluptuous;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61880/diff/2-3/
Comment on attachment 8767313 [details]
Bug 1281004: only set MOZ_NODE_PATH for desktop;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61848/diff/2-3/
Comment on attachment 8767353 [details]
Bug 1281004: Specify test tasks more flexibly;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61884/diff/2-3/
Attachment #8767349 - Attachment is obsolete: true
Attachment #8767350 - Attachment is obsolete: true
Attachment #8767352 - Attachment is obsolete: true
I'm happy to have anyone's feedback or review, of course.  I flagged gps because this is part of the Build Config module, and gbrown as one of the folks likely to be up to his knees in this stuff soon and thus a "concerned party".

This is a good time for line-by-line review comments that I can clean up before landing.  There are a number of other projects that have agreed to wait until this one lands, though, so I may opt to defer any more substantial changes or additions to other bugs.
https://hg.mozilla.org/mozilla-central/rev/2393f903d0a7
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Status: RESOLVED → UNCONFIRMED
Ever confirmed: false
Keywords: leave-open
Resolution: FIXED → ---
(In reply to Dustin J. Mitchell [:dustin] from comment #114)
> Thanks!  So given the above, the way I'm breaking this down is as follows:

To elaborate on a specific use-case I have seen and been operating with as well (with the note that I don't know of any currently identifiable as 'tests' cases of this):

* Tasks that should run on try, in order to validate/test certain types of changes that get run in a periodic fashion of some sort as tier1, rather than on-change (e.g. l10n repacks of nightlies).

I can imagine there being some 'tests' that need that ability as well. 

This is explicitly a "the developer may not know/realize their code can/would break that test" and is guarded by "files:" in legacy.

Since there are no tests of that nature at present in legacy (and there could be an argument made for just making those types of 'tests'/tasks be all-trees anyway) I don't think that should block, just calling it out.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment on attachment 8767313 [details]
Bug 1281004: only set MOZ_NODE_PATH for desktop;

https://reviewboard.mozilla.org/r/61848/#review60334
Attachment #8767313 - Flags: review?(gbrown) → review+
Comment on attachment 8767353 [details]
Bug 1281004: Specify test tasks more flexibly;

https://reviewboard.mozilla.org/r/61884/#review60380

Looking good! This seems to have all the flexibility we need to handle the various test eccentricities I can think of, and also feels reasonably intuitive/comprehensible.

::: taskcluster/ci/android-test/test-platforms.yml:3
(Diff revision 3)
> +# This file maps build platforms to test platforms.  In some cases, a
> +# single build may be tested on multiple test platforms, but a single test
> +# platform can only link to one build platform.  Both bulid and test platforms

s/bulid/build/

::: taskcluster/ci/android-test/test-platforms.yml:11
(Diff revision 3)
> +#
> +# Each test platform further specifies the set of tests that will be scheduled
> +# for the platform, referring to tests defined in test-sets.yml.
> +#
> +# Note that set does not depend on the tree; tree-dependent job selection
> +# should be performed in the target task selection phase of task-grpah

s/grpah/graph/

::: taskcluster/ci/android-test/test-sets.yml:5
(Diff revision 3)
> +# Each key in this file specifies a set of tests to run.  Different test sets
> +# may, for example, be bound to different test platforms.
> +#
> +# Note that set does not depend on the tree; tree-dependent job selection
> +# should be performed in the target task selection phase of task-grpah

s/grpah/graph/

::: taskcluster/ci/desktop-test/test-platforms.yml:3
(Diff revision 3)
> +# This file maps build platforms to test platforms.  In some cases, a
> +# single build may be tested on multiple test platforms, but a single test
> +# platform can only link to one build platform.  Both bulid and test platforms

s/bulid/build/

::: taskcluster/ci/desktop-test/test-platforms.yml:11
(Diff revision 3)
> +#
> +# Each test platform further specifies the set of tests that will be scheduled
> +# for the platform, referring to tests defined in test-sets.yml.
> +#
> +# Note that set does not depend on the tree; tree-dependent job selection
> +# should be performed in the target task selection phase of task-grpah

s/grpah/graph/

::: taskcluster/ci/desktop-test/test-sets.yml:5
(Diff revision 3)
> +# Each key in this file specifies a set of tests to run.  Different test sets
> +# may, for example, be bound to different test platforms.
> +#
> +# Note that set does not depend on the tree; tree-dependent job selection
> +# should be performed in the target task selection phase of task-grpah

s/grpah/graph/

::: taskcluster/taskgraph/target_tasks.py:75
(Diff revision 3)
> +            'linux64-asan',
> +            'linux64-pgo',
> +        ])):
> +            return False
> +        if not attrmatch(attrs, unittest_try_name=set([
> +            'crashtest-e10s',

Could a *-e10s regex be used instead? That might avoid some confusion/iteration the next time a new mochitest subsuite is added.
Attachment #8767353 - Flags: review?(gbrown) → review+
So basically, the code is OK but I suck at typing :)

If ash is just e10s jobs, then yes -- I thought there were a few missing but maybe I missed them.  I'll make that change.
Summary: Factor legacy kind into real kinds → Factor tests in legacy kind into real kinds
Blocks: 1286075
Comment on attachment 8767307 [details]
Bug 1281004: delete mulet test jobs;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61836/diff/3-4/
Comment on attachment 8767308 [details]
Bug 1281004: add suite and flavor for web platform tests;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61838/diff/3-4/
Comment on attachment 8767309 [details]
Bug 1281004: run all web platform tests on desktop-test-xlarge;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61840/diff/3-4/
Comment on attachment 8767310 [details]
Bug 1281004: remove -e10s from reftest TH symbols;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61842/diff/3-4/
Comment on attachment 8767311 [details]
Bug 1281004: run e10s jsreftests in 2 chunks like non-e10s;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61844/diff/3-4/
Comment on attachment 8767312 [details]
Bug 1281004: put the -e10s in the groupSymbol for marionette;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61846/diff/3-4/
Comment on attachment 8767315 [details]
Bug 1281004: fix suite/flavor for android tests;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61852/diff/3-4/
Comment on attachment 8767351 [details]
Bug 1281004: vendor voluptuous;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61880/diff/3-4/
Comment on attachment 8767313 [details]
Bug 1281004: only set MOZ_NODE_PATH for desktop;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61848/diff/3-4/
Comment on attachment 8767353 [details]
Bug 1281004: Specify test tasks more flexibly;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61884/diff/3-4/
https://reviewboard.mozilla.org/r/61884/#review60650

::: taskcluster/ci/android-test/kind.yml:12
(Diff revisions 2 - 4)
> -   - taskgraph.transforms.tests.test:transforms
> +   - taskgraph.transforms.tests.test_description:validate
>     - taskgraph.transforms.tests.android_test:transforms
> -   - taskgraph.transforms.tests.test_description:transforms
> +   - taskgraph.transforms.tests.all_kinds:transforms
> +   - taskgraph.transforms.tests.test_description:validate
>     - taskgraph.transforms.tests.make_task_description:transforms
>     - taskgraph.transforms.make_task:transforms

I have a late-breaking mental concern about naming a directory tests/ in a python project. (could confuse some things, and such) but I don't think this should block.

I also can't think of a sane way to rename this atm.

::: taskcluster/ci/android-test/tests.yml
(Diff revisions 2 - 4)
>  
> -# use YAML anchors and aliases to keep the test definitions a little shorter
> -_mozharness-config-anchor: &mozharness-config
> -        - mozharness/configs/android/androidarm_4_3.py
> -        - mozharness/configs/remove_executables.py
> -        - mozharness/configs/android/androidarm_4_3-tc.py

I do like that you've decided not to go with YAML anchors here, probably much more understandable to the average person

::: taskcluster/ci/desktop-test/tests.yml:227
(Diff revisions 2 - 4)
>      max-run-time: 5400
>      chunks: 10
>      e10s:
> -        # Bug 1242986: debug dt e10s isn't greened up yet
>          by-test-platform:
> +            # Bug 1242986: linux64/debug mochitest-devtools-chrome e10s is not greened up yet

Thanks for the bug link :-)

::: taskcluster/docs/transforms.rst:126
(Diff revisions 2 - 4)
> + * The ``taskgraph.transforms.tests.make_task_description:transforms`` then
> +   take the test description and create a *task* description.  This transform
> +   embodies the specifics of how test runs work: invoking mozharness, various
> +   worker options, and so on.
> +
> + * Finally, the ``taskgraph.transforms.make_task:transforms``, described above,

where is "make_task:transforms" described above? (I'm not saying it needs describing, more that its referenced and I don't see where/how)

::: taskcluster/docs/transforms.rst:128
(Diff revisions 2 - 4)
> +   embodies the specifics of how test runs work: invoking mozharness, various
> +   worker options, and so on.
> +
> + * Finally, the ``taskgraph.transforms.make_task:transforms``, described above,
> +   are applied.
>  

Should we have a *task* schema validation at the end?

::: taskcluster/taskgraph/transforms/make_task.py:165
(Diff revisions 2 - 4)
> +    'tc-W': 'Web platform tests executed by TaskCluster',
> +    'tc-W-e10s': 'Web platform tests executed by TaskCluster with e10s',
> +    'tc-X': 'Xpcshell tests executed by TaskCluster',
> +    'tc-X-e10s': 'Xpcshell tests executed by TaskCluster with e10s',
> +}
> +UNKNOWN_GROUP_NAME = "Treeherder group {} has no named; add it to " + __file__

"has no name;"

::: taskcluster/taskgraph/transforms/tests/all_kinds.py:73
(Diff revision 4)
> +        yield test
> +
> +
> +@transforms.add
> +def resolve_keyed_by(config, tests):
> +    """Resolve fields that can be keyed by platform, etc."""

GREAT idea here (rather than all the lower transforms needing to `get_keyed_by`

::: taskcluster/taskgraph/util/attributes.py:12
(Diff revision 4)
> +    """Determine whether the given set of task attributes matches.  The
> +    conditions are given as keyword arguments, where each keyword names an
> +    attribute.  The keyword value can be a literal, a set, or a callable.  A
> +    literal must match the attribute exactly.  Given a set, the attribute value
> +    must be in the set.  A callable is called with the attribute value.  If an
> +    attribute is specified as a keyword argument bug not present in the

argument *but* not present
Comment on attachment 8769925 [details]
Bug 1281004: remove now-unused yaml task definitions;

https://reviewboard.mozilla.org/r/63584/#review60662

bitrot all the things!
Attachment #8769925 - Flags: review?(bugspam.Callek) → review+
Attachment #8767353 - Flags: review?(gps) → review+
Comment on attachment 8767353 [details]
Bug 1281004: Specify test tasks more flexibly;

https://reviewboard.mozilla.org/r/61884/#review61074

This is amazing.

I loved the commit message and the docs. It almost made me feel like I didn't need to look at the code.

Like any large change, I'm sure there are subtle bugs lingering. But given the scope and impact of this change, I think you should land (to the autoland repo) as soon as possible. Fallout can be addressed as follow-ups.

Great work on this refactor.
Comment on attachment 8767307 [details]
Bug 1281004: delete mulet test jobs;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61836/diff/4-5/
Comment on attachment 8767308 [details]
Bug 1281004: add suite and flavor for web platform tests;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61838/diff/4-5/
Comment on attachment 8767309 [details]
Bug 1281004: run all web platform tests on desktop-test-xlarge;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61840/diff/4-5/
Comment on attachment 8767310 [details]
Bug 1281004: remove -e10s from reftest TH symbols;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61842/diff/4-5/
Comment on attachment 8767311 [details]
Bug 1281004: run e10s jsreftests in 2 chunks like non-e10s;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61844/diff/4-5/
Comment on attachment 8767312 [details]
Bug 1281004: put the -e10s in the groupSymbol for marionette;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61846/diff/4-5/
Comment on attachment 8767315 [details]
Bug 1281004: fix suite/flavor for android tests;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61852/diff/4-5/
Comment on attachment 8767351 [details]
Bug 1281004: vendor voluptuous;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61880/diff/4-5/
Comment on attachment 8767313 [details]
Bug 1281004: only set MOZ_NODE_PATH for desktop;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61848/diff/4-5/
Comment on attachment 8767353 [details]
Bug 1281004: Specify test tasks more flexibly;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61884/diff/4-5/
Comment on attachment 8769925 [details]
Bug 1281004: remove now-unused yaml task definitions;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/63584/diff/1-2/
Comment on attachment 8767353 [details]
Bug 1281004: Specify test tasks more flexibly;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61884/diff/5-6/
Comment on attachment 8770777 [details]
Bug 1281004: make fennec debug tier-1 (forward-port of Bug 1282850);

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/64126/diff/1-2/
Comment on attachment 8769925 [details]
Bug 1281004: remove now-unused yaml task definitions;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/63584/diff/2-3/
Comment on attachment 8770777 [details]
Bug 1281004: make fennec debug tier-1 (forward-port of Bug 1282850);

https://reviewboard.mozilla.org/r/64126/#review61292

Stealing from jordan
Attachment #8770777 - Flags: review+
Attachment #8770777 - Flags: review?(jlund)
(In reply to Justin Wood (:Callek) from comment #163)
> Comment on attachment 8770777 [details]
> Bug 1281004: make fennec debug tier-1 (forward-port of Bug 1282850);
> 
> https://reviewboard.mozilla.org/r/64126/#review61292
> 
> Stealing from jordan

thief!

and of course I caused the first post-impl transform just to make things more complicated..
Blocks: 1287018
Comment on attachment 8767307 [details]
Bug 1281004: delete mulet test jobs;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61836/diff/5-6/
Attachment #8770777 - Flags: review+ → review?(jlund)
Comment on attachment 8767308 [details]
Bug 1281004: add suite and flavor for web platform tests;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61838/diff/5-6/
Comment on attachment 8767309 [details]
Bug 1281004: run all web platform tests on desktop-test-xlarge;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61840/diff/5-6/
Comment on attachment 8767310 [details]
Bug 1281004: remove -e10s from reftest TH symbols;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61842/diff/5-6/
Comment on attachment 8767311 [details]
Bug 1281004: run e10s jsreftests in 2 chunks like non-e10s;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61844/diff/5-6/
Comment on attachment 8767312 [details]
Bug 1281004: put the -e10s in the groupSymbol for marionette;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61846/diff/5-6/
Comment on attachment 8767315 [details]
Bug 1281004: fix suite/flavor for android tests;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61852/diff/5-6/
Comment on attachment 8767351 [details]
Bug 1281004: vendor voluptuous;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61880/diff/5-6/
Comment on attachment 8767313 [details]
Bug 1281004: only set MOZ_NODE_PATH for desktop;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61848/diff/5-6/
Comment on attachment 8767353 [details]
Bug 1281004: Specify test tasks more flexibly;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61884/diff/6-7/
Comment on attachment 8770777 [details]
Bug 1281004: make fennec debug tier-1 (forward-port of Bug 1282850);

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/64126/diff/2-3/
Comment on attachment 8769925 [details]
Bug 1281004: remove now-unused yaml task definitions;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/63584/diff/3-4/
Comment on attachment 8767353 [details]
Bug 1281004: Specify test tasks more flexibly;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61884/diff/7-8/
Attachment #8770777 - Flags: review?(jlund)
Comment on attachment 8770777 [details]
Bug 1281004: make fennec debug tier-1 (forward-port of Bug 1282850);

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/64126/diff/3-4/
Comment on attachment 8769925 [details]
Bug 1281004: remove now-unused yaml task definitions;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/63584/diff/4-5/
Sorry for the mozreview spam :)

This still tests out as producing the same jobs before and after.  I'm just running a try push to verify, then I will land it.
Comment on attachment 8767307 [details]
Bug 1281004: delete mulet test jobs;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61836/diff/6-7/
Comment on attachment 8767308 [details]
Bug 1281004: add suite and flavor for web platform tests;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61838/diff/6-7/
Comment on attachment 8767309 [details]
Bug 1281004: run all web platform tests on desktop-test-xlarge;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61840/diff/6-7/
Comment on attachment 8767310 [details]
Bug 1281004: remove -e10s from reftest TH symbols;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61842/diff/6-7/
Comment on attachment 8767311 [details]
Bug 1281004: run e10s jsreftests in 2 chunks like non-e10s;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61844/diff/6-7/
Comment on attachment 8767312 [details]
Bug 1281004: put the -e10s in the groupSymbol for marionette;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61846/diff/6-7/
Comment on attachment 8767315 [details]
Bug 1281004: fix suite/flavor for android tests;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61852/diff/6-7/
Comment on attachment 8767351 [details]
Bug 1281004: vendor voluptuous;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61880/diff/6-7/
Comment on attachment 8767313 [details]
Bug 1281004: only set MOZ_NODE_PATH for desktop;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61848/diff/6-7/
Comment on attachment 8767353 [details]
Bug 1281004: Specify test tasks more flexibly;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61884/diff/8-9/
Comment on attachment 8770777 [details]
Bug 1281004: make fennec debug tier-1 (forward-port of Bug 1282850);

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/64126/diff/4-5/
Comment on attachment 8769925 [details]
Bug 1281004: remove now-unused yaml task definitions;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/63584/diff/5-6/
Comment on attachment 8767307 [details]
Bug 1281004: delete mulet test jobs;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61836/diff/7-8/
Comment on attachment 8767308 [details]
Bug 1281004: add suite and flavor for web platform tests;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61838/diff/7-8/
Comment on attachment 8767309 [details]
Bug 1281004: run all web platform tests on desktop-test-xlarge;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61840/diff/7-8/
Comment on attachment 8767310 [details]
Bug 1281004: remove -e10s from reftest TH symbols;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61842/diff/7-8/
Comment on attachment 8767311 [details]
Bug 1281004: run e10s jsreftests in 2 chunks like non-e10s;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61844/diff/7-8/
Comment on attachment 8767312 [details]
Bug 1281004: put the -e10s in the groupSymbol for marionette;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61846/diff/7-8/
Comment on attachment 8767315 [details]
Bug 1281004: fix suite/flavor for android tests;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61852/diff/7-8/
Comment on attachment 8767351 [details]
Bug 1281004: vendor voluptuous;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61880/diff/7-8/
Comment on attachment 8767313 [details]
Bug 1281004: only set MOZ_NODE_PATH for desktop;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61848/diff/7-8/
Comment on attachment 8767353 [details]
Bug 1281004: Specify test tasks more flexibly;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/61884/diff/9-10/
Comment on attachment 8770777 [details]
Bug 1281004: make fennec debug tier-1 (forward-port of Bug 1282850);

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/64126/diff/5-6/
Comment on attachment 8769925 [details]
Bug 1281004: remove now-unused yaml task definitions;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/63584/diff/6-7/
Blocks: 1283879
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/130f0b0bc578
delete mulet test jobs; r=gerard-majax
https://hg.mozilla.org/integration/autoland/rev/728933f251dc
add suite and flavor for web platform tests; r=armenzg
https://hg.mozilla.org/integration/autoland/rev/a59e388f9958
run all web platform tests on desktop-test-xlarge; r=jmaher
https://hg.mozilla.org/integration/autoland/rev/fb2fd5d4b1e1
remove -e10s from reftest TH symbols; r=RyanVM
https://hg.mozilla.org/integration/autoland/rev/5b417eff0ecd
run e10s jsreftests in 2 chunks like non-e10s; r=jmaher
https://hg.mozilla.org/integration/autoland/rev/b98d1e449436
put the -e10s in the groupSymbol for marionette; r=RyanVM
https://hg.mozilla.org/integration/autoland/rev/68c0e0b2a5c9
fix suite/flavor for android tests; r=gbrown
https://hg.mozilla.org/integration/autoland/rev/70ddd60d969b
vendor voluptuous; r=gps
https://hg.mozilla.org/integration/autoland/rev/354c13ab13c9
only set MOZ_NODE_PATH for desktop; r=gbrown
https://hg.mozilla.org/integration/autoland/rev/9063b402b859
Specify test tasks more flexibly; r=gps; r=gbrown
https://hg.mozilla.org/integration/autoland/rev/510d03360f23
make fennec debug tier-1 (forward-port of Bug 1282850); r=Callek
https://hg.mozilla.org/integration/autoland/rev/3096768ade50
remove now-unused yaml task definitions; r=Callek
Blocks: 1288053
Status: NEW → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
Product: TaskCluster → Firefox Build System
You need to log in before you can comment on or make changes to this bug.