1280231 - Refactor taskgraph "Kind" to be a base Task class

Assignee

Description

•

8 years ago

We are currently using Kind instances to represent kinds, with tasks represented by instances of a data-only class.  So we have Kind instance methods like `get_task_dependencies(..task..)`.  This would be much more sensible with a bunch of Task subclasses that have a `get_dependencies()` method.

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Updated

•

8 years ago

Assignee: nobody → dustin

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 1

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=be885f546fd9

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 2

•

8 years ago

Attached file Bug 1280231: refactor task kinds to task classes; — Details

Review commit: https://reviewboard.mozilla.org/r/59830/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/59830/

Attachment #8763684 - Flags: review?(jopsen)

Attachment #8763685 - Flags: review?(jopsen)

Attachment #8763686 - Flags: review?(bugspam.Callek)

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 3

•

8 years ago

Attached file Bug 1280231: Load kinds in order by dependencies; — Details

This enables kinds that generate tasks based on those output by another kind.
For example, the test kind might generate a set of test tasks for each build
task.

Review commit: https://reviewboard.mozilla.org/r/59832/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/59832/

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 4

•

8 years ago

Attached file Bug 1280231: rename types.py to taskgraph.py; — Details

Review commit: https://reviewboard.mozilla.org/r/59834/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/59834/

Justin Wood (:Callek)

Comment 5

•

8 years ago

Comment on attachment 8763686 [details]
Bug 1280231: rename types.py to taskgraph.py;

https://reviewboard.mozilla.org/r/59834/#review56828

This solves my editors issue, so \o/ lets do it.

Attachment #8763686 - Flags: review?(bugspam.Callek) → review+

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 6

•

8 years ago

Comment on attachment 8763684 [details]
Bug 1280231: refactor task kinds to task classes;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/59830/diff/1-2/

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 7

•

8 years ago

Comment on attachment 8763685 [details]
Bug 1280231: Load kinds in order by dependencies;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/59832/diff/1-2/

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 8

•

8 years ago

Comment on attachment 8763686 [details]
Bug 1280231: rename types.py to taskgraph.py;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/59834/diff/1-2/

Jonas Finnemann Jensen (:jonasfj)

Comment 9

•

8 years ago

Comment on attachment 8763684 [details]
Bug 1280231: refactor task kinds to task classes;

https://reviewboard.mozilla.org/r/59830/#review57674

I still think we can make better abstractions.
But this is a great improvement.

::: taskcluster/taskgraph/kind/base.py:14
(Diff revision 2)
>  
> -class Kind(object):
> +class Task(object):
>      """
> +    Representation of a task in a TaskGraph.  Each Task has, at creation:
> +
> +    - kind: the name of the task kind

Do we ever really need this?
And if so why not just use: `obj.__class__.__name__`?

::: taskcluster/taskgraph/kind/base.py:17
(Diff revision 2)
> +    Representation of a task in a TaskGraph.  Each Task has, at creation:
> +
> +    - kind: the name of the task kind
> +    - label; the label for this task
> +    - attributes: a dictionary of attributes for this task (used for filtering)
> +    - task: the task definition (JSON-able dictionary)

Given that you need taskIds for dependencies substited in I would argue that you can't just expose a dictionary for the task definition.

A single method task.definition() would suffice.

An alternative option that would give more control at a higher level would be to do:
task.payload(taskIds)
task.dependencies()
task.deadline()
task.expires()
task.extra()
task.name()
task.source()
task.description()
task.workerType()
task.provisionerId()
task.routes()
task.priority()
task.retries()
task.scopes()

With the invariant that task.payload(taskIds)
is called with a mapping taskIds from objects in 
task.dependencies() to taskId for said objects.

Doing something like this would give code operating on abstract tasks as much power as possible, while still hiding all the internals.
Notice how things like:

 * task.metadata.owner
 * task.taskGroupId
 * task.schedulerId
 * task.created

Don't have to be handled by the abstract task object anymore.
Those are things we easily handle at a higher level.


-----

I realize that the reasoning behind the current design might have to do with the ability to serialize all tasks to json files...
So that we can backfill later.. hmm... That arguably requires us to have a templating language to substitute in the dependencies (or so I suspect).

::: taskcluster/taskgraph/kind/base.py:52
(Diff revision 2)
>      @abc.abstractmethod
> -    def load_tasks(self, parameters):
> +    def load_tasks(cls, kind, path, config, parameters):
>          """
> -        Get the set of tasks of this kind.
> +        Load the tasks for a given kind.
> +
> +        The `kind` is the name of the kind; the configuration for that kind

If `kind` serves so that the same Task class can be used for two different kind implementations I think that's overkill.

We should aim to simplify this. If people really want two kinds using the same implementation, just inherent one from the other, or make a common base class.
(or even better: duplicate the code)

::: taskcluster/taskgraph/kind/base.py:70
(Diff revision 2)
> -    def get_task_dependencies(self, task, taskgraph):
> +    def get_dependencies(self, taskgraph):
>          """
> -        Get the set of task labels this task depends on, by querying the task graph.
> +        Get the set of task labels this task depends on, by querying the full
> +        task set, given as `taskgraph`.
>  
>          Returns a list of (task_label, dependency_name) pairs describing the

Why return task_label and not task objects?
I assume you're filtering on the list of objects that have attributes...

Also isn't dependency_name purely an internal thing to the task object.

I view this task object as an abstraction that can be:

 * filtered
 * depended on
 * optimized out

Without dealing with any internal details about this task.
Abstractions should hide internals.

::: taskcluster/taskgraph/kind/base.py:87
(Diff revision 2)
>          dependencies on this task will isntead depend on that taskId.  It is an
>          error to return no taskId for a task on which other tasks depend.
>  
>          The default never optimizes.
>          """
>          return False, None

Nice!

This correctly hides the logic of deciding whether or not a task can be optimized.

nit: None or "taskId" would work too.

Also, perhaps this method should be async?
(I'm not python expert, so not sure what that would entail)

Attachment #8763684 - Flags: review?(jopsen) → review+

Jonas Finnemann Jensen (:jonasfj)

Comment 10

•

8 years ago

Comment on attachment 8763685 [details]
Bug 1280231: Load kinds in order by dependencies;

https://reviewboard.mozilla.org/r/59832/#review57686

Looks good to me

::: taskcluster/taskgraph/kind/base.py:64
(Diff revision 2)
>          See `taskcluster/docs/parameters.rst` for details.
>  
> +        At the time this method is called, all kinds on which this kind depends
> +        (that is, specified in the `kind-dependencies` key in `self.config`
> +        have already loaded their tasks, and those tasks are available in
> +        `loaded_tasks`, a dictionary keyed by task label.

Don't they all have label as a property?
If so, why make it a dictionary... I just fear that duplicating data is a good way of getting inconsistencies.

Attachment #8763685 - Flags: review?(jopsen) → review+

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 11

•

8 years ago

(In reply to Jonas Finnemann Jensen (:jonasfj) from comment #9)
> Do we ever really need this?
> And if so why not just use: `obj.__class__.__name__`?

There can be multiple kinds using the same class.  Lots of dependencies will be looking for tasks from a specific other kind.

> Given that you need taskIds for dependencies substited in I would argue that
> you can't just expose a dictionary for the task definition.

You can -- there's a {task-reference: ..} syntax that handles it.

The suggestion to use lots of methods for different bits of the task definition has the advantage, if I understand you correctly, that a deep inheritance tree would allow task.created() to be implemented once, with task.scopes(), for example, overridden in subclasses.  In practice I think this just violates locality: now each task definition is generated by 15 different functions in different classes in different source files, even though all of the values are interrelated (e.g., scopes depends on caches, features, devices, etc., artifact expires depends on task expires, payload format depends on worker type, etc.).

As I work on more realistic kinds (bug 1281004) there's an opportunity to factor some of that stuff out into shared methods, but only in a very limited fashion.

> I realize that the reasoning behind the current design might have to do with
> the ability to serialize all tasks to json files...
> So that we can backfill later.. hmm... That arguably requires us to have a
> templating language to substitute in the dependencies (or so I suspect).

Yep :)

> If `kind` serves so that the same Task class can be used for two different
> kind implementations I think that's overkill.

I disagree.  As an example, I suspect we will have different classes for Mulet and desktop tests, but they can easily use the same class.  I guess I don't see the motivation to not allow tihs: it is simple enough, allows a useful kind of flexibility, and encourages separation of configuration from implementation.  Also, enforcing this would be weird: I'd need to check that a given class hasn't already been used to instantiate a kind, and if so, what, raise an exception suggesting creating an empty subclass just to work around the check?

> Why return task_label and not task objects?

Because the graph is based on named references, rather than object references.  Using object identity for dependencies gets messy quickly when updating graphs; for example, the target task set is an edge-free subset of the full task graph, which is then closed to make the target task graph.  If the task objects themselves embody those dependencies, then they must be duplicated at each stage, which would be quite a bit slower.

> Also isn't dependency_name purely an internal thing to the task object.

No, it's used to implement {task-reference: ..}

> I view this task object as an abstraction that can be:
> 
>  * filtered
>  * depended on
>  * optimized out
> 
> Without dealing with any internal details about this task.
> Abstractions should hide internals.

Dependencies are named graph edges of the form (label1, label2, depname).  So the dependencies are not an internal detail of the tasks, but are in fact a very important part of the task's interface, as they are what most of this system is manipulating.

> Also, perhaps this method should be async?
> (I'm not python expert, so not sure what that would entail)

The decision task doesn't run in an async environment (e.g., asyncio or twisted), so sadly that's not possible here.  I have some ideas on how to parallelize a whole bunch of index lookups, but none of them are pretty.  Greg may have some ideas.

(In reply to Jonas Finnemann Jensen (:jonasfj) from comment #10)
> Don't they all have label as a property?
> If so, why make it a dictionary... I just fear that duplicating data is a
> good way of getting inconsistencies.

Good point..

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 12

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/ba5cbf4e06a550993e5216f816dcf0ccd3938b2e
Bug 1280231: refactor task kinds to task classes; r=jonasfj

https://hg.mozilla.org/integration/mozilla-inbound/rev/7898d1ab1afc08f78445165d0c94566b0682a2f7
Bug 1280231: Load kinds in order by dependencies; r=jonasfj

https://hg.mozilla.org/integration/mozilla-inbound/rev/4b1d94901b4d09c13031982c28eec6a48f938c71
Bug 1280231: rename types.py to taskgraph.py; r=Callek

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 13

•

8 years ago

Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/4f1b739ec757 for decision task failures:

https://treeherder.mozilla.org/logviewer.html#?job_id=30727955&repo=mozilla-inbound

Flags: needinfo?(dustin)

Jonas Finnemann Jensen (:jonasfj)

Comment 14

•

8 years ago

> The suggestion to use lots of methods for different bits of the task definition has the advantage...
The motivation wasn't inheritance, it was more that some parts could be handled generically.
I see the code in two parts:
 A) Abstract Task objects (implemented by kinds/task classes)
 B) Generic Task-Graph code: generating graphs, filtering, optimizing, creating tasks...
Code in (B) operates on (A) without knowing that there are different kinds, as this is abstracted away.
As all tasks have top-level properties in common, the abstract task object could expose those and the
generic task-graph code can reason about these properties.

Anyways, this might be a detail...

> So the dependencies are not an internal detail of the tasks, but are in fact a very
> important part of the task's interface...
hmm... I see how it becomes problematic with the idea that the decision task must create json artifacts. On the other hand I fear we easily end up with multiple layers of templating systems, as a task kind implementation would naturally insert parameters into the YAML files.

Oh, well, let's see where we end up.. My original plan was to try and build a kind implementation with a simple json based template and linking system that could support everything. I _just_ haven't figured out what that would look like yet :) hehe

Note: all of this is definitely an improvement over what we have.

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 15

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=84da81b9c49b

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 16

•

8 years ago

working on it..

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 17

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=12e30313eb1f

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 18

•

8 years ago

Comment on attachment 8763684 [details]
Bug 1280231: refactor task kinds to task classes;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/59830/diff/2-3/

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 19

•

8 years ago

Comment on attachment 8763685 [details]
Bug 1280231: Load kinds in order by dependencies;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/59832/diff/2-3/

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 20

•

8 years ago

Comment on attachment 8763686 [details]
Bug 1280231: rename types.py to taskgraph.py;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/59834/diff/2-3/

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 21

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/88119b527dd6a3130d2d53d1c4fde4b8a738ead4
Bug 1280231: refactor task kinds to task classes; r=jonasfj

https://hg.mozilla.org/integration/mozilla-inbound/rev/fe1dcb3feb9aed3827de44011e6dadf21faf6d2a
Bug 1280231: Load kinds in order by dependencies; r=jonasfj

https://hg.mozilla.org/integration/mozilla-inbound/rev/ccce34cee35b61858216ac7dacb67ed8ebe1a2b7
Bug 1280231: rename types.py to taskgraph.py; r=Callek

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Updated

•

8 years ago

Flags: needinfo?(dustin)

Carsten Book [:Tomcat]

Comment 22

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/88119b527dd6
https://hg.mozilla.org/mozilla-central/rev/fe1dcb3feb9a
https://hg.mozilla.org/mozilla-central/rev/ccce34cee35b

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

BMO Automation

Updated

•

6 years ago

Product: TaskCluster → Firefox Build System

Bug 1280231: refactor task kinds to task classes; 8 years ago Dustin J. Mitchell [:dustin] (he/him) 58 bytes, text/x-review-board-request	jonasfj : review+	Details
Bug 1280231: Load kinds in order by dependencies; 8 years ago Dustin J. Mitchell [:dustin] (he/him) 58 bytes, text/x-review-board-request	jonasfj : review+	Details
Bug 1280231: rename types.py to taskgraph.py; 8 years ago Dustin J. Mitchell [:dustin] (he/him) 58 bytes, text/x-review-board-request	Callek : review+	Details