Closed Bug 1286813 Opened 8 years ago Closed 7 years ago

Add optimizations to Action Tasks when more than one Action Task has been used

Categories

(Firefox Build System :: Task Configuration, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: martianwars, Unassigned, Mentored)

References

Details

This is a continuation of https://bugzilla.mozilla.org/show_bug.cgi?id=1281062.

The current model of optimization uses the exisiting jobs scheduled by the decision task for that push. It does not look for any jobs that were scheduled by Action Tasks which preceded it.

An example could be as follows :- Action Task #1 was used to schedule build B1 and test T1 which needed B1. Action Task #2 wants to schedule test T2 which depends on build B1. The current approach will schedule a build B1 again, rather than use the existing build created by the previous Action Task.
However, in case B1 was scheduled by the decision task itself, the Action Task will successfully optimize B1.

This fix should be simple, if there is a simple way to fetch all the Action Tasks in a push. We just need to append the existing_tasks in https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/optimize.py with the Action Task artifacts.
Depends on: 1281062
We can retrieve jobs through the Treeherder client.

Does TC have any options to determine tasks scheduled for a push?
Aside from the index (which will only index one task), no.  Could treeherder provide this information?

Another option is to somehow grant the action task `queue:create-artifact:<decisionTaskId>/<decisionRunId>` so that it could add an artifact to the decision task containing the required metadata (something like `public/action-task-<actionTaskId>.json` containing a reference to the action task and the label -> taskId mapping for its added tasks).
(In reply to Dustin J. Mitchell [:dustin] from comment #2)
> Aside from the index (which will only index one task), no.  Could treeherder
> provide this information?
> 
Treeherder can tell us the info. I just wanted to know if there was something from TC that could tell us tasks associated to a push.

> Another option is to somehow grant the action task
> `queue:create-artifact:<decisionTaskId>/<decisionRunId>` so that it could
> add an artifact to the decision task containing the required metadata
> (something like `public/action-task-<actionTaskId>.json` containing a
> reference to the action task and the label -> taskId mapping for its added
> tasks).

martianwars: would this work for you?

From what I understand, it would simply be recording under the gecko decision task what tasks got under the action task. Kind of keeping record of all scheduling done via Gecko decision task and the action task in the same location.
Is it possible to edit the artifacts task-graph.json and label-to-taskid.json in the Decision Task while scheduling the Action Tasks? Based on the code that I've read, I don't seem to see any way of doing this. What do you think @dustin?

> Treeherder can tell us the info. I just wanted to know if there was something from TC that could tell us
> tasks associated to a push.
@armenzg, which is the best way to do this? We can do this via pulse messages (like we are sending the decision task id) or we could do this using Treeherder APIs. In any case, I think eventually the action task ids should be passed in the same way as the decision task id.
Flags: needinfo?(dustin)
Flags: needinfo?(armenzg)
You can't *edit* artifacts, but you can create new artifacts.

I'll need to think about how feasible it is to issue this scope.  Giving queue:create-artifact:* to pulse_actions would be far too broad (it would let pulse_actions add any artifact to any task).
Flags: needinfo?(dustin) → needinfo?(jopsen)
(In reply to Kalpesh Krishna [:martianwars] from comment #4)
> > Treeherder can tell us the info. I just wanted to know if there was something from TC that could tell us
> > tasks associated to a push.
> @armenzg, which is the best way to do this? We can do this via pulse
> messages (like we are sending the decision task id) or we could do this
> using Treeherder APIs. In any case, I think eventually the action task ids
> should be passed in the same way as the decision task id.

This is what I do in mozci:
https://github.com/mozilla/mozilla_ci_tools/blob/master/mozci/query_jobs.py#L233

The action task can just inspect the jobs on a push to find other action tasks that have already been scheduled.
Flags: needinfo?(armenzg)
Important invariant:
  Artifacts should be immutable, and you shouldn't add artifacts to other tasks.

This very simple and SUPER powerful, if we violate those invariants it'll haunt us for years to come :)
(and don't even get me started on consistency issues with mutating data)
--

So let's find some other solution... Perhaps we can set:
  task.extra.isMartianWarsCompatibleActionTask = true // any unique key will do :)
on actions tasks.. That way you can find other actions tasks by using queue.listGroupId(taskGroupId)
and then download the artifacts from the action tasks you find. And somehow merge that into the
optimization step...

Warning: queue.listTaskGroup(...) isn't super fast, and often involves paging. But that shouldn't be a problem in this case.

As add a task is an append operation a race condition between two action tasks would merely duplicate the dependent tasks (optimization fail) which seems acceptable.
Flags: needinfo?(jopsen)
Depends on: 1284911
I know they're immutable, so was only suggesting adding to other tasks.  But, fair enough.  It sounds like treeherder has the information we need in this case, anyway.
Is this fixed now?
Yes -- although for a very different flavor of artifacts.  Brian recently landed some code to enumerate artifact tasks in the index.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Product: TaskCluster → Firefox Build System
You need to log in before you can comment on or make changes to this bug.