Closed Bug 1333255 Opened 6 years ago Closed 6 years ago

create pattern for using a dummy task to index another task with lots of routes

Categories

(Firefox Build System :: Task Configuration, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla55

People

(Reporter: jonasfj, Assigned: dustin)

References

(Blocks 1 open bug)

Details

Attachments

(14 files)

59 bytes, text/x-review-board-request
dustin
: review+
Details
59 bytes, text/x-review-board-request
jonasfj
: review+
Details
59 bytes, text/x-review-board-request
jonasfj
: review+
Details
59 bytes, text/x-review-board-request
jonasfj
: review+
Details
59 bytes, text/x-review-board-request
jonasfj
: review+
Details
59 bytes, text/x-review-board-request
jonasfj
: review+
Details
59 bytes, text/x-review-board-request
jonasfj
: review+
Details
59 bytes, text/x-review-board-request
jonasfj
: review+
Details
59 bytes, text/x-review-board-request
jonasfj
: review+
Details
59 bytes, text/x-review-board-request
jonasfj
: review+
Details
59 bytes, text/x-review-board-request
jonasfj
: review+
Details
59 bytes, text/x-review-board-request
jonasfj
: review+
Details
59 bytes, text/x-review-board-request
dustin
: review+
jonasfj
: review+
Details
59 bytes, text/x-review-board-request
jonasfj
: review+
Details
This is two part:
 A) Define a docker image that takes TARGET_TASKID env var, and list of
    namespaces as command arguments.
 B) Write a transform for indexing that creates a dependent indexing task between
    a task and the dependent tasks.

I've already started on (A), and will submit a patch adding the image definition
in-tree.

I'm hoping dustin can help out with the transform.
See Also: → 1333234
Assignee: nobody → jopsen
status:
I have an image for the indexing tasks working.
(I might have to test it an extra time, but so far it seems to be working)

We probably much just need to modify the `add_l10n_index_routes` transform.
I'll take a look at it, next week and talk dustin when I get stuck.
This is part (A), the patch doesn't include the transform part of this bug, so no need to land it yet.
I'm a little unsure of how to do part (B), it involves some refactoring of the `index` transforms.
Assignee: jopsen → nobody
Assignee: nobody → jopsen
Comment on attachment 8833057 [details]
Bug 1333255 - Task for indexing other tasks.

https://reviewboard.mozilla.org/r/109284/#review110672

I like it otherwise :)

::: taskcluster/docker/index-task/README:18
(Diff revision 1)
> +  payload: {
> +    image: '...',
> +    env: {
> +      TARGET_TASKID: '<taskId-to-be-indexed>',
> +    },
> +    command: [

I think this would "read" better as:

  command:
    - index-task.js
    - my-index.namespace.1
    - my-index.namespace.2

This looks like a command with arguments ("oh, so it's running index-task.js"), and also has a more unique name than "index.js" to look for in the source tree.
Attachment #8833057 - Flags: review?(dustin) → review+
Comment on attachment 8833057 [details]
Bug 1333255 - Task for indexing other tasks.

https://reviewboard.mozilla.org/r/109284/#review119652

::: taskcluster/docker/index-task/README:18
(Diff revision 1)
> +  payload: {
> +    image: '...',
> +    env: {
> +      TARGET_TASKID: '<taskId-to-be-indexed>',
> +    },
> +    command: [

It would be something like `['node', 'index-task.js', 'my-inde.namespace.1', ...]`

And IMO we should be doing more specialized images like this.. That have a single specific purpose.

In fact this image is more like a lambda function.
Sure, I'm just arguing from the perspective of inspectability.  I don't want people to spend 10 minutes wondering where this docker image came from, then give up assuming it's TC secret magic and Jonas will need to fix it.

Do you want help with B?
Yes, please help out with B. I got this far:
https://gist.github.com/jonasfj/e0e3d2e27ec31c5a634a87409ea47023

But as you point out, we might have to do it somewhere else.
Assignee: jopsen → dustin
Just running some final verification on my patchset.  In the interim, some commentary on it:

Hey guys, there's a hole in the wall over here, just large enough for a rabbit. Bring me a flashlight! I took the opportunity to do some refactoring that is related to requests you've made months ago: getting rid of all of the different Task subclasses and making the taskgraph much more of a method-free data structure. That took some doing - it's the majority of the changesets here - but I'm quite happy with the results.

The crux of the matter is introducing "morphs", which occur after the optimization phase is complete, and can do things like add misc tasks to handle lots of dependencies or, in this case, lots of routes.  There's a little issue here, though: misc tasks might require a docker image that would otherwise not be built.  To work around that, I always include docker image tasks in the target task graph (but not in the target task set), so they are always available. They are usually optimized, of course, and even when they are not optimized the builds will not delay any other results.

I chose to create a new "gecko-misc" workerType for these misc jobs.  As these become common (more than just for nightlies), we can increase the minCapacity of the workerType to ensure it's always around.

Finally, this patchset includes a `compare.sh` and `graphdiff.py` that I've used to ensure that the resulting full and optimized graphs do not change in any unexpected ways.  You'll see diffs to `graphdiff.py` where they changed in expected ways :)
Comment on attachment 8846163 [details]
Bug 1333255: use normal old functions to load tasks;

https://reviewboard.mozilla.org/r/119238/#review121682

::: compare.sh:4
(Diff revision 1)
> +#! /bin/sh
> +
> +./mach taskgraph full -v -p parameters.yml -J > after-full.json && ./mach taskgraph optimized -v -p parameters.yml -J > after-optimized.json
> +python graphdiff.py

Isn't this file and the one below `graphdiff.py` tools you use to debug?

I don't mind landing, just maybe they belong in a folder or something?

::: taskcluster/docs/loading.rst:9
(Diff revision 1)
>  The full task graph generation involves creating tasks for each kind.  Kinds
> -are ordered to satisfy ``kind-dependencies``, and then the ``implementation``
> -specified in ``kind.yml`` is used to load the tasks for that kind.
> +are ordered to satisfy ``kind-dependencies``, and then the ``loader`` specified
> +in ``kind.yml`` is used to load the tasks for that kind. It should point to
> +a Python function like::
>  
> -Specifically, the class's ``load_tasks`` class method is called, and returns a
> +    def load_tasks(cls, kind, path, config, parameters, loaded_tasks):

**love this!!** The use of classes as functions is not something I'm a fan off.. Unless there is a reason for the object instance to hold state.

::: taskcluster/taskgraph/task/test.py:116
(Diff revision 1)
> -            rv[test_platform]['test-names'] = test_names
> +        rv[test_platform]['test-names'] = test_names
> -        return rv
> +    return rv
> +
> +
> +def load_tasks(kind, path, config, params, loaded_tasks):
> +    return transform.transform_inputs(

So if I wanted a kind without transforms... I would just change load_tasks to not include this part.
Attachment #8846163 - Flags: review?(jopsen) → review+
Comment on attachment 8846164 [details]
Bug 1333255: use transforms to make docker image tasks, too;

https://reviewboard.mozilla.org/r/119240/#review121684

::: taskcluster/ci/docker-image/kind.yml:5
(Diff revision 1)
>  # This Source Code Form is subject to the terms of the Mozilla Public
>  # License, v. 2.0. If a copy of the MPL was not distributed with this
>  # file, You can obtain one at http://mozilla.org/MPL/2.0/.
>  
>  loader: taskgraph.task.docker_image:load_tasks

Randomly, wouldn't it be almost as easy to define this file as specifying a `load_tasks` function, and then just have it be python?
Attachment #8846164 - Flags: review?(jopsen) → review+
Comment on attachment 8846165 [details]
Bug 1333255: remove t.get_dependencies();

https://reviewboard.mozilla.org/r/119242/#review121690
Attachment #8846165 - Flags: review?(jopsen) → review+
Comment on attachment 8846166 [details]
Bug 1333255: implement optimizations as named functions;

https://reviewboard.mozilla.org/r/119244/#review121694
Attachment #8846166 - Flags: review?(jopsen) → review+
Comment on attachment 8846167 [details]
Bug 1333255: replace uses of index_paths with optimizations;

https://reviewboard.mozilla.org/r/119246/#review121696
Attachment #8846167 - Flags: review?(jopsen) → review+
Comment on attachment 8846168 [details]
Bug 1333255: only apply seta optimizations for tests;

https://reviewboard.mozilla.org/r/119248/#review121698
Attachment #8846168 - Flags: review?(jopsen) → review+
Comment on attachment 8846169 [details]
Bug 1333255: handle when.files-changed at the job level;

https://reviewboard.mozilla.org/r/119250/#review121702
Attachment #8846169 - Flags: review?(jopsen) → review+
Comment on attachment 8846171 [details]
Bug 1333255: always transform tasks;

https://reviewboard.mozilla.org/r/119254/#review121706

::: taskcluster/docs/loading.rst:29
(Diff revision 1)
>  the list ``loaded_tasks``.
>  
> -The return value is a list of Task instances.
> -
> -TransformTask
> --------------
> +The return value is a list of inputs to the transforms listed in the kind's
> +``transforms`` property. The specific format for the input depends on the first
> +transform - whatever it expects. The final transform should be
> +``taskgraph.transform.task:transforms``, which produces the output format the

okay, so if you don't want transforms don't list any... And if you do they'll be automatically applied... that makes a lot more sense..

Otherwise, what is the YAML file there for..
Attachment #8846171 - Flags: review?(jopsen) → review+
Comment on attachment 8846170 [details]
Bug 1333255: and then there was only one Task class;

https://reviewboard.mozilla.org/r/119252/#review121708
Attachment #8846170 - Flags: review?(jopsen) → review+
Comment on attachment 8846172 [details]
Bug 1333255: rename taskgraph.task to taskgraph.loader;

https://reviewboard.mozilla.org/r/119256/#review121710
Attachment #8846172 - Flags: review?(jopsen) → review+
Comment on attachment 8846173 [details]
Bug 1333255: always include docker-image tasks in the target task graph;

https://reviewboard.mozilla.org/r/119258/#review121714
Attachment #8846173 - Flags: review?(jopsen) → review+
Comment on attachment 8846174 [details]
Bug 1333255: Task for indexing other tasks.

https://reviewboard.mozilla.org/r/119260/#review121720
Attachment #8846174 - Flags: review?(jopsen) → review+
Comment on attachment 8846175 [details]
Bug 1333255: introduce graph morphs, use them to make index tasks;

https://reviewboard.mozilla.org/r/119262/#review121722

::: taskcluster/taskgraph/morph.py:96
(Diff revision 1)
> +
> +
> +def make_index_task(parent_task, label_to_taskid):
> +    index_paths = [r.split('.', 1)[1] for r in parent_task.task['routes']
> +                   if r.startswith('index.')]
> +    parent_task.task['routes'] = [r for r in parent_task.task['routes']

I have mixed feelings about this.

I don't see a better solution, but it just seems bad.
routes where never intended to express this.

It's like we went to low level format, then back to high level format, and low level format again... This invites confusion and bugs. But like I said I don't see a better option.

::: taskcluster/taskgraph/morph.py:103
(Diff revision 1)
> +
> +    task = derive_misc_task(parent_task, 'index-task',
> +                            'index-task', label_to_taskid)
> +    task.task['scopes'] = [
> +        'index:insert-task:{}'.format(path) for path in index_paths]
> +    task.task['payload']['command'] = ['insert-indexes.js'] + index_paths

`insert-indexes.js` shouldn't be needed... or did you change that?

Note, if you really want that, then why not JSON encode the namespaces and inject via. an environment variable instead.
Attachment #8846175 - Flags: review?(jopsen) → review+
(In reply to Jonas Finnemann Jensen (:jonasfj) from comment #22)
> Isn't this file and the one below `graphdiff.py` tools you use to debug?
> 
> I don't mind landing, just maybe they belong in a folder or something?

See comment 8

(In reply to Jonas Finnemann Jensen (:jonasfj) from comment #23)
> Randomly, wouldn't it be almost as easy to define this file as specifying a
> `load_tasks` function, and then just have it be python?

Python as a configuration file format turns out to invite all manner of awful things.  Better to keep the dynamic behavior outside of the data -- separation of concerns and all that.

(In reply to Jonas Finnemann Jensen (:jonasfj) from comment #34)
> > +def make_index_task(parent_task, label_to_taskid):
> > +    index_paths = [r.split('.', 1)[1] for r in parent_task.task['routes']
> > +                   if r.startswith('index.')]
> > +    parent_task.task['routes'] = [r for r in parent_task.task['routes']
> 
> I have mixed feelings about this.
> 
> I don't see a better solution, but it just seems bad.
> routes where never intended to express this.
> 
> It's like we went to low level format, then back to high level format, and
> low level format again... This invites confusion and bugs. But like I said I
> don't see a better option.

Agreed.  Maybe a better original design would have been to create `task.indexPaths` and have the queue send one pulse message per index path, instead of piggybacking on the routes functionality, which again piggybacks on the AMQP CC functionality.  But, history is fixed :(

> `insert-indexes.js` shouldn't be needed... or did you change that?
> 
> Note, if you really want that, then why not JSON encode the namespaces and
> inject via. an environment variable instead.

Yes, I think that hiding the commands run by a docker image substantially decreases inspectability -- users see a task with some logging about errors, but have no idea what source was running in that task.  Remember there's no indication in the task payload of the docker image name -- just a taskId.  So I renamed the script to something more distinctive than "index.js", and included its name in the payload so it is easy to find by inspecting the task.

I don't see a benefit to JSON encoding, and it would mean a lot of ugly JSON-in-JSON encoding in the payload.
Comment on attachment 8846174 [details]
Bug 1333255: Task for indexing other tasks.

https://reviewboard.mozilla.org/r/119260/#review122028
Attachment #8846174 - Flags: review?(dustin) → review+
Pushed by dmitchell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4423c6f58345
use normal old functions to load tasks; r=jonasfj
https://hg.mozilla.org/integration/autoland/rev/c7eb35b0b522
use transforms to make docker image tasks, too; r=jonasfj
https://hg.mozilla.org/integration/autoland/rev/2078ad44dfeb
remove t.get_dependencies(); r=jonasfj
https://hg.mozilla.org/integration/autoland/rev/e991e028f9b4
implement optimizations as named functions; r=jonasfj
https://hg.mozilla.org/integration/autoland/rev/bba357a94f28
replace uses of index_paths with optimizations; r=jonasfj
https://hg.mozilla.org/integration/autoland/rev/be77b29de009
only apply seta optimizations for tests; r=jonasfj
https://hg.mozilla.org/integration/autoland/rev/90bee55f9797
handle when.files-changed at the job level; r=jonasfj
https://hg.mozilla.org/integration/autoland/rev/438ada631812
and then there was only one Task class; r=jonasfj
https://hg.mozilla.org/integration/autoland/rev/4fa17de0a52f
always transform tasks; r=jonasfj
https://hg.mozilla.org/integration/autoland/rev/033351ec3eb4
rename taskgraph.task to taskgraph.loader; r=jonasfj
https://hg.mozilla.org/integration/autoland/rev/a38447eb85d9
always include docker-image tasks in the target task graph; r=jonasfj
https://hg.mozilla.org/integration/autoland/rev/cc28492c1144
Task for indexing other tasks. r=jonasfj
https://hg.mozilla.org/integration/autoland/rev/ddd3397bf197
introduce graph morphs, use them to make index tasks; r=jonasfj
Depends on: 1347569
Product: TaskCluster → Firefox Build System
You need to log in before you can comment on or make changes to this bug.