1374978 - Stop granting and requiring assume:worker-id:*

Reporter

Description

•

8 years ago

There are no roles prefixed with "worker-id:" so API endpoints requiring "assume:worker-id:*" make no sense. We should therefore clean this up. In general we should probably clean up all the API definitions that depend on "assume:" scopes, since we deprecated their use a loooong time ago. This one is particularly useless since no such "worker-id:<workerId>" roles exist. Note, the provisioner also grants this role to all worker types, so once our APIs don't depend on it, we can also remove the granting of assume:worker-id:* to temp creds generated by provisioner for workers. This originally came up in bug 1374366.

Pete Moore [:pmoore][:pete]

Reporter

Updated

•

8 years ago

Comment 1

•

8 years ago

We didn't actually deprecate this: https://docs.taskcluster.net/manual/design/devel/best-practices/scopes but recent changes to the queue have introduced `queue:worker-id:..` which can serve the same purpose. I suspect we could replace assume:worker-id with queue:worker-id. We can't get rid of it entirely, though, as it serves as a means to prevent worker B from messing with a task claimed by worker A. Currently the provisioner doesn't do a great job with these (it hands out assume:worker-id:*) due to issues with out-of-date information in the EC2 API (no way to determine the instanceId of a host, even after it has started). The host-secrets configuration *does* support more specific worker-ids: https://tools.taskcluster.net/auth/roles/#project:releng:host-secrets:host:com.mozilla.scl3.releng.test.moonshot-* and we should eventually do something similar in the provisioner. So I think the action here is to finish the conversion to use the most up-to-date scope scheme for the queue (including priorities and schedulerIds, too). Jonas, is there a bug or RFC I can link to for that?

Flags: needinfo?(jopsen)

Pete Moore [:pmoore][:pete]

Reporter

Comment 2

•

8 years ago

I believe these are the affected endpoints, that currently require "assume:worker-id:<workerGroup>/<workerId>": * queue.claimTask * queue.reclaimTask * queue.reportCompleted * queue.reportFailed * queue.reportException * queue.createArtifact

Pete Moore [:pmoore][:pete]

Reporter

Comment 3

•

8 years ago

(In reply to Dustin J. Mitchell [:dustin] from comment #1) > Currently the provisioner doesn't do a great job with these (it hands out > assume:worker-id:*) due to issues with out-of-date information in the EC2 > API (no way to determine the instanceId of a host, even after it has > started). What about if the provisioner continued to provide credentials, as it currently does, but instead of granting the current scopes it grants: * assume:worker-id:* * assume:worker-type:aws-provisioner-v1/<workerType> it could instead grant the single scope: * queue:register-worker:<provisionerId>/<workerType>/<slugid> Then once the worker starts up, it could call queue.registerWorker(provisionerId, workerType, slugId, workerGroup, workerId) using the provisioner-provided credentials, and this queue call would return new temporary credentials for the worker, iff no credentials have previously been provided for this slugId (otherwise would return a HTTP 403). The queue would need to maintain a list of "used" aws-provisioner-v1 slugIds for at least 96 hours (since 96 hours is currently maximum lifetime of an aws-provisioner-v1 worker). Optionally, a worker could "deregister" as a last step before terminating (on a best effort basis). Deregistering might be overkill. Another thing I like about this approach, is that the register-worker is a great placeholder to provide other pertinent metadata, such as a json schema for the worker's task payload, version information about the worker ({"name": "generic-worker", "version": "v10.0.5", ....}). Worker registration would logically therefore only be necessary for *provisioned* workers that require temporary credentials, so people can still write their throwaway/dev workers without needing to make this call (since they can create their own credentials for the worker). This would be a generic solution independent of provisioner (e.g. can be used by scl3-puppet, aws-provisioner-v1, packet-net-v1, ....), and removes on thing that all provisioners might otherwise need to implement, placing it in the queue which already is somewhat of a worker administrator. Lastly, it somewhat simplifies the analysis of provisioned worker pools, since all workers are registered and can be easily counted (although this can be currently calculated as a side effect of task claim/reclaims and claimWork polling, counting registration/deregistration calls is simpler).

Dustin J. Mitchell [:dustin] (he/him)

Comment 4

•

8 years ago

Yeah, something like that would be neat. Sounds like an RFC! :)

Pete Moore [:pmoore][:pete]

Reporter

Comment 5

•

8 years ago

(In reply to Dustin J. Mitchell [:dustin] from comment #4) > Yeah, something like that would be neat. Sounds like an RFC! :) https://github.com/taskcluster/taskcluster-rfcs/issues/71

Jonas Finnemann Jensen (:jonasfj)

Comment 6

•

8 years ago

I don't have a bug deprecating old scope patterns. So far I've been putting it off forever, but it's plausible that we should stop doing that.

Flags: needinfo?(jopsen)

Jonas Finnemann Jensen (:jonasfj)

Updated

•

7 years ago

Priority: -- → P5

Chris Cooper [:coop] (he/him)

Comment 7

•

7 years ago

Pete: does this cleanup still need to happen?

Flags: needinfo?(pmoore)

Pete Moore [:pmoore][:pete]

Reporter

Comment 8

•

6 years ago

(In reply to Chris Cooper [:coop] pronoun: he from comment #7) > Pete: does this cleanup still need to happen? Yes. We made it P5, so it shouldn't get picked up in triage, but it is a genuine issue that would be nice to solve when we get the time to do so. Note, most of the discussion of this topic went into https://github.com/taskcluster/taskcluster-rfcs/issues/71 That RFC was closed as we had more important things to work on, but contains relevant information if anyone does get the chance to pick this up.

Flags: needinfo?(pmoore)

QA Contact: dustin

Nobody; OK to take it and work on it

Assignee

Updated

•

6 years ago

Component: Authentication → Services

Dustin J. Mitchell [:dustin] (he/him)

Comment 9

•

4 years ago

https://github.com/taskcluster/taskcluster/issues/4586

Status: NEW → RESOLVED

Closed: 4 years ago

Resolution: --- → MOVED

Bugzilla

Stop granting and requiring assume:worker-id:*

Categories

(Taskcluster :: Services, enhancement, P5)

Tracking

(Not tracked)

People

(Reporter: pmoore, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Updated

Comment 9