Right now the queue requires scopes as follows: 1) queue.claim(): - assume:worker-type:<provisionerId>/<workerType>, - assume:worker-id:<workerGroup>/<workerId> 2) queue.reclaim(), queue.reportCompleted(), queue,createArtifact() - assume:worker-id:<workerGroup>/<workerId> Basically, when a task is claimed we save the <workerGroup>/<workerId>, and only require the "assume:worker-id:<workerGroup>/<workerId>" from this point on. This makes a lot of sense, in case of truly untrusted workers, we could have an entity that claims tasks without granting workers operating on the task authority to claim new tasks. Problem: The most practical thing to do, is to grant all workers the scope: "assume:worker-id:*" Because we like to embed aws-region and other things in <workerGroup>. Note, in practice <workerGroup> could just be a slugid. Anyways, there is two solution to this problem: A) Require "assume:worker-type:" and making <workerGroup>/<workerId> specific to a workerType. Then issuing "assume:worker-id:*" is less critical, but not still not good. B) Let the aws-provisioner issue temporary credentials that are limited to a specific <workerGroup>, <workerId> in "assume:worker-id:<workerGroup>/<workerId>". WorkerId could probably just be a slugid. And <workerGroup> could probably be something like: "aws-<region>" That way the aws-provisioner only needs the scope "assume:worker-id:aws-*". And we don't need to bake credentials into the workers. In case (B) we can inject the temporary credentials through user-data. We already have other credentials in user-data that should be encrypted. So using openpgp.js to encrypt user-data would be nice. --- Needless to say I think option (B) is the way to go. This way the only thing we have to bake in is a private GPG key, which we need for encrypted env vars anyways. Later we can argue if a single private GPG key is a good idea; and/or if we can manage that somehow. Also exposure of temporary credentials is a lot less critical.
The provisioner should inject temporary credentials necessary to do everything not covered by `task.scopes`. See bug 1137821. This is basically option (B) from above.
Summary: queue, docker-worker: Stop issuing the "assume:worker-id:*" scope → provisioner, docker-worker: Stop issuing the "assume:worker-id:*" scope, inject temporary credentials via. user-data
I discussed this with jhford, and a possible strategy involves: 1) provisioner creates azure table entity (token, secret-data) 2) worker is started with https://provisioner.tc.net/v1/secret-data/<token> as user-data 3) worker starts and calls GET https://provisioner.tc.net/v1/secret-data/<token> to get the secret data. It attempts this until data arrives. 4) worker calls DELETE https://provisioner.tc.net/v1/secret-data/<token> to remove secret data so nobody can fetch it again. This happens before the worker starts processing tasks, hence, any task that access the meta-data service won't be able to get the secret-data as it has been deleted. And the secret data will only exists in-memory of the docker-worker process, from where it's reasonably hard to steal it even if you break out of the docker container. Most notably, this means we don't need to encrypt user-data and put private key into the worker AMI. Which would be bad because we would have to manage that private key, and likely end up sharing it between workerTypes, because we don't want to build multiple AMIs. Note, we should restrict access to user-data using iptables regardless of how we inject secrets using user-data. See bug 1134937 (just in case the delete operation fails, etc).
Depends on: 1134937
Oh, other detail I forgot, above: The provisioner would already like to get a callback from the worker once it has started, so we can better track number of workers alive, and gather statistics on time from spot node request to node start.
Component: TaskCluster → General
Product: Testing → Taskcluster
Component: General → AWS-Provisioner
This bug hasn't had a lot of activity, so I'd like to make sure we still have up to date context. The provisioner is now issuing temporary credentials to all workers which claim their secrets. The scopes that they have are based on those scopes that are configured in the workerType definition. Is there another set of scopes that should be issued? I know we'll have information to give a provisionerId/workerType scope, but workerGroup and workerId might not be known right before the spot request is submitted.
I think you can close this in favor of bug 1217088. We're currently issuing an actual assume:worker-id:prov/wkr scope (not a * suffix), based on configuration in the Azure tables. Bug 1217088 just bakes that configuration in.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1217088
Component: AWS-Provisioner → Services
Product: Taskcluster → Taskcluster
You need to log in before you can comment on or make changes to this bug.