Closed
Bug 1374978
Opened 8 years ago
Closed 4 years ago
Stop granting and requiring assume:worker-id:*
Categories
(Taskcluster :: Services, enhancement, P5)
Taskcluster
Services
Tracking
(Not tracked)
RESOLVED
MOVED
People
(Reporter: pmoore, Unassigned)
References
Details
There are no roles prefixed with "worker-id:" so API endpoints requiring "assume:worker-id:*" make no sense. We should therefore clean this up.
In general we should probably clean up all the API definitions that depend on "assume:" scopes, since we deprecated their use a loooong time ago.
This one is particularly useless since no such "worker-id:<workerId>" roles exist.
Note, the provisioner also grants this role to all worker types, so once our APIs don't depend on it, we can also remove the granting of assume:worker-id:* to temp creds generated by provisioner for workers.
This originally came up in bug 1374366.
Comment 1•8 years ago
|
||
We didn't actually deprecate this:
https://docs.taskcluster.net/manual/design/devel/best-practices/scopes
but recent changes to the queue have introduced `queue:worker-id:..` which can serve the same purpose. I suspect we could replace assume:worker-id with queue:worker-id. We can't get rid of it entirely, though, as it serves as a means to prevent worker B from messing with a task claimed by worker A. Currently the provisioner doesn't do a great job with these (it hands out assume:worker-id:*) due to issues with out-of-date information in the EC2 API (no way to determine the instanceId of a host, even after it has started). The host-secrets configuration *does* support more specific worker-ids:
https://tools.taskcluster.net/auth/roles/#project:releng:host-secrets:host:com.mozilla.scl3.releng.test.moonshot-*
and we should eventually do something similar in the provisioner.
So I think the action here is to finish the conversion to use the most up-to-date scope scheme for the queue (including priorities and schedulerIds, too). Jonas, is there a bug or RFC I can link to for that?
Flags: needinfo?(jopsen)
Reporter | ||
Comment 2•8 years ago
|
||
I believe these are the affected endpoints, that currently require "assume:worker-id:<workerGroup>/<workerId>":
* queue.claimTask
* queue.reclaimTask
* queue.reportCompleted
* queue.reportFailed
* queue.reportException
* queue.createArtifact
Reporter | ||
Comment 3•8 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #1)
> Currently the provisioner doesn't do a great job with these (it hands out
> assume:worker-id:*) due to issues with out-of-date information in the EC2
> API (no way to determine the instanceId of a host, even after it has
> started).
What about if the provisioner continued to provide credentials, as it currently does, but instead of granting the current scopes it grants:
* assume:worker-id:*
* assume:worker-type:aws-provisioner-v1/<workerType>
it could instead grant the single scope:
* queue:register-worker:<provisionerId>/<workerType>/<slugid>
Then once the worker starts up, it could call queue.registerWorker(provisionerId, workerType, slugId, workerGroup, workerId) using the provisioner-provided credentials, and this queue call would return new temporary credentials for the worker, iff no credentials have previously been provided for this slugId (otherwise would return a HTTP 403).
The queue would need to maintain a list of "used" aws-provisioner-v1 slugIds for at least 96 hours (since 96 hours is currently maximum lifetime of an aws-provisioner-v1 worker). Optionally, a worker could "deregister" as a last step before terminating (on a best effort basis). Deregistering might be overkill.
Another thing I like about this approach, is that the register-worker is a great placeholder to provide other pertinent metadata, such as a json schema for the worker's task payload, version information about the worker ({"name": "generic-worker", "version": "v10.0.5", ....}). Worker registration would logically therefore only be necessary for *provisioned* workers that require temporary credentials, so people can still write their throwaway/dev workers without needing to make this call (since they can create their own credentials for the worker).
This would be a generic solution independent of provisioner (e.g. can be used by scl3-puppet, aws-provisioner-v1, packet-net-v1, ....), and removes on thing that all provisioners might otherwise need to implement, placing it in the queue which already is somewhat of a worker administrator.
Lastly, it somewhat simplifies the analysis of provisioned worker pools, since all workers are registered and can be easily counted (although this can be currently calculated as a side effect of task claim/reclaims and claimWork polling, counting registration/deregistration calls is simpler).
Comment 4•8 years ago
|
||
Yeah, something like that would be neat. Sounds like an RFC! :)
Reporter | ||
Comment 5•8 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #4)
> Yeah, something like that would be neat. Sounds like an RFC! :)
https://github.com/taskcluster/taskcluster-rfcs/issues/71
Comment 6•8 years ago
|
||
I don't have a bug deprecating old scope patterns. So far I've been putting it off forever, but it's plausible that we should stop doing that.
Flags: needinfo?(jopsen)
Updated•7 years ago
|
Priority: -- → P5
Reporter | ||
Comment 8•6 years ago
|
||
(In reply to Chris Cooper [:coop] pronoun: he from comment #7)
> Pete: does this cleanup still need to happen?
Yes. We made it P5, so it shouldn't get picked up in triage, but it is a genuine issue that would be nice to solve when we get the time to do so.
Note, most of the discussion of this topic went into
https://github.com/taskcluster/taskcluster-rfcs/issues/71
That RFC was closed as we had more important things to work on, but contains relevant information if anyone does get the chance to pick this up.
Flags: needinfo?(pmoore)
QA Contact: dustin
Assignee | ||
Updated•6 years ago
|
Component: Authentication → Services
Comment 9•4 years ago
|
||
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → MOVED
You need to log in
before you can comment on or make changes to this bug.
Description
•