Closed Bug 1219436 Opened 9 years ago Closed 9 years ago

Cancelling a task requires assume:scheduler-id:<schedulerId>/<taskGroupId>

Categories

(Taskcluster :: Services, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

References

Details

Mozilla-taskcluster lacked this scope for a while, and as such cancelling TC builds was broken since July 27.  I guess it's a popular feature!

This requirement translates to me as "only schedulers should cancel jobs".  Yet there's no way for a user to ask the task-graph-scheduler to cancel a job, either.

I'd suggest this change to just require scope

  queue:cancel-task:<schedulerId>/<taskGroupId>/<taskId>

with no associated role.  Then we can give mozilla-taskcluster queue:cancel-task:task-graph-scheduler/*.

That still allows mozilla-taskcluster to cancel anything (since that's the only scheduler we have).  Ideas on how to limit mozilla-taskcluster to cancelling tasks it started, or something more narrow?
I'm not terribly surprised that this has gone unnoticed.  I believe that once the cancel event has fired in treeherder, the UI changes to show the task as canceled even if the task actually hasn't been canceled.  Unless someone actually went to the task inspector to see that it was canceled, they wouldn't know.

I'm +1 for not needing the scheduler scope.  Would be nice if a user could just have the queue:cancel-task scope, especially if they are using another tool such as taskcluster cli.
Yeah, I feel like there's a deeper design question here of how we can give access to groups of tasks without giving access to *all* tasks.  Maybe that means making a namespace for task groups, once we have the big-graph scheduler in place?  So mozilla-taskcluster can put its tasks under `mozilla-taskcluster/<tree>`.. dunno.  This is probably not the best time to try to solve that problem.

I see your point that the <schedulerId>/<taskGroupId>/<taskId> suffix really doesn't mean much in this case, since the first is a singleton and the latter two are dynamic slugids.  So, yeah, just using `queue:cancel-task` would certainly be more honest.
the intention with the assume:schedulerId:...
Was originally to identify the scheduler so pulse messages could be picked up only by the interested scheduler. 

However, the idea of multiple schedulers does not appear to catch on... And ever since we came up with the idea for the big graph scheduler my plan for this scope was that it would represent the group of people allowed to cancel, modify graph, rerun the task and so on...

So everybody with try access can modify try tasks...same for various branches, gaia, etc...
We'll maybe we'll make more multiple groups for try further down. When we figure out how to let try jobs use you scopes... Try+
The other option, not role related seems decent too...

So far I've generally preferred the other way in many places, but the suggestion in  this bug js strictly more expressive.

We should make some guidelines for these cases... Downside of not using role like things is that we need more scopes.
It is true at the moment we have a bunch of external systems that can submit tasks, mozilla-taskcluster, taskcluster-cli, even funsize must have its own scheduler - yet our data isn't partitioned, meaning we can't easily qualify which tasks can be attributed to which tool.

Whatever solution we come up with scopes/roles, we should think about how we should best capture/tag tasks/graphs with this information.

Here is a loose idea, a bit rough around the edges:

We should register external applications when we issue clientIds for systems (not people). The clients we provide should be issued a scope for the application they will be used by, e.g. a clientId issued for mozilla-taskcluster app would get the scope application:mozilla-taskcluster. It would be up to the third party registering their app to decide which available name(s) they want for the scopes that will be assigned to the clientId they get. We control that no names are reused, and only trusted people are given clients that are registered to an application.

Then when a task is submitted, it can provide a) the name of the application it is being submitted by, and b) a list of applications that it authorises to perform actions against it (such as cancelling tasks). It can only be registered as created by application <app> if the clientid has the scope application:<app>. For submitting a list of applications which are authorised to cancel/alter the task, no scopes are needed, since this is the way the applications essentially provides the list of applications it trusts to do this.

This way all tasks should be potentially attributable back to the third party apps that created them, and only apps that are mentioned in its application list would be authorised to cancel tasks. So nobody can cancel funsize jobs because they managed to get scopes on mozilla-taskcluster etc.

Then we've essentially tagged all tasks officially and reliably with the source system that created them, and introduced some kind of very trivial trust mechanism.

Note: I'm not sure what should be controlled by roles vs scopes above.
/me pulls back the reins a little

What we need to solve in this bug is the much smaller issue of how to allow mozilla-taskcluster to cancel tasks.  We can accept, for the moment, that we have no useful way to distinguish *which* tasks it can kill.

Per Greg's suggestion, I would like to go with requiring a simple "queue:cancel-task" scope.
Assignee: nobody → dustin
(In reply to Dustin J. Mitchell [:dustin] from comment #6)
> /me pulls back the reins a little
> 
> What we need to solve in this bug is the much smaller issue of how to allow
> mozilla-taskcluster to cancel tasks.  We can accept, for the moment, that we
> have no useful way to distinguish *which* tasks it can kill.

Fair point. I've created bug 1219436 for discussing a solution to the general problem of attributing resources (e.g. tasks, graphs) to the source systems that created them.
> Fair point. I've created bug 1219436 for discussing a solution to the general problem
@pmoore, okay, I totally fell into your link cycle and opened this 3 times :)
See Also: → 1221291
Filed bug 1221291 for cleaning up the concept of schedulerId to be useful.
(In reply to Jonas Finnemann Jensen (:jonasfj) from comment #9)
> > Fair point. I've created bug 1219436 for discussing a solution to the general problem
> @pmoore, okay, I totally fell into your link cycle and opened this 3 times :)

rofl!!

So I guess I meant bug 1219778 :)
We'll be getting rid of all of the assume:* scopes and generally refactoring the scopes required to create queues.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Component: Queue → Services
You need to log in before you can comment on or make changes to this bug.