Closed Bug 1272643 Opened 4 years ago Closed 7 months ago

start issuing unique worker-id credentials

Categories

(Taskcluster :: Services, defect, P4)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jhford, Unassigned)

References

Details

Per the email thread and corresponding example PR, we should start issuing unique credentials per instance.  We're going to do this in the form of a slugid to uniquely identify each instance.

The steps in this bug are:

1. provisioner to start including a workerId and workerGroup entry in data given back by getSecrets() so that docker-worker can be updated to use the new format

2. docker-worker will set its workerId and workerGroup based on this new data source instead of the EC2 metadata service

3. provisioner will stop issuing worker-id:* and instead fully resolved worker-id scopes
For the record we should issue: queue:worker-id:<workerGroup>/<workerId> to workers.
I think we will need this solved before we can start doing trustworthy release builds (as otherwise every docker-worker instance out there can pretend to be a scriptworker).  I think this is just a matter of someone executing the plan John outlined in comment 0.
Flags: needinfo?(aki)
I think this would be good to do.
I also have come to the conclusion, whether rightfully or not, that someone pretending to be a scriptworker can only perform DOS attacks, because they won't have the appropriate secrets that the scriptworkers have.  However, limiting what DOS attacks can be performed is good, and there's certainly the possibility I've missed something in my calculations.
Flags: needinfo?(aki)
However, if a compromised docker-worker instance can then pretend to be a different docker-worker instance, and then claim temp creds for a task and fiddle with the other docker-worker's task's artifacts, that's a significant issue.
That's not a problem.  We still hard-code the scopes (although this bug would involve generating them dynamically), and they are things like

  "scopes": [
    "assume:worker-type:aws-provisioner-v1/android-api-15",
    "assume:worker-id:*"
  ],

so the assume:worker-type scope is full spelled-out.
Sure. However, is the worker-type shared across try and real trees?

If so:

* Worker A and Worker B are the same AMI and worker type
* Worker A claims a tier 3 and/or release task and gets temp creds that specify tier 3 and/or release scopes
* Worker B runs a try task, but masquerades as Worker A, and gets Worker A's task's temp creds.  Boom, elevation of privs
* Worker B has malicious binaries already on disk, and uploads said binaries to Worker A's task before Worker A can
* Because Worker B is the same AMI, it has the same chain of trust GPG key as Worker A, and its chain of trust artifact is signed with a valid signature/key
* Worker B marks Worker A's task as finished
* Worker A's attempts at uploading artifacts or marking status to the task-marked-as-finished are unsuccessful?

If try has separate worker-types, we have some additional checks that could make things safer.  Still, if the above is possible, I think this bug is a significant issue beyond DOS.
Bug 1220686, partially completed, is about distinguishing build workerTypes by level.
Still thinking this through.

(In reply to Aki Sasaki [:aki] from comment #6)
> * Worker A and Worker B are the same AMI and worker type
> * Worker A claims a tier 3 and/or release task and gets temp creds that
> specify tier 3 and/or release scopes
> * Worker B runs a try task, but masquerades as Worker A, and gets Worker A's
> task's temp creds.  Boom, elevation of privs

I imagine this is mitigated by the initial set of temp creds during claimTask.

> * Worker B has malicious binaries already on disk, and uploads said binaries
> to Worker A's task before Worker A can
> * Because Worker B is the same AMI, it has the same chain of trust GPG key
> as Worker A, and its chain of trust artifact is signed with a valid
> signature/key
> * Worker B marks Worker A's task as finished
> * Worker A's attempts at uploading artifacts or marking status to the
> task-marked-as-finished are unsuccessful?

I imagine the rest of it is mitigated by the docker container sandbox; we would have to deploy an AMI that was capable of being compromised.  Still, I'd like to know definitively that these steps aren't possible.

> If try has separate worker-types, we have some additional checks that could
> make things safer.  Still, if the above is possible, I think this bug is a
> significant issue beyond DOS.
(In reply to Dustin J. Mitchell [:dustin] from comment #7)
> Bug 1220686, partially completed, is about distinguishing build workerTypes
> by level.

Awesome.  Then the attack vector becomes compromised level 3 scopes: someone can schedule a task on a particular worker type outside of the decision task, and then it can try to mess with other, valid, level 3 jobs.
John, I think the ec2-manager groundwork is finished now -- which would leave just a bit of work to hand out instance-id-specific workerId scopes?
Flags: needinfo?(jhford)
The ec2-manager groundwork is in progress at this time, we're just finalizing the remaining work around the specifics of how we verify our instance identity documents.  Once that's done, we'll be able to start handing out those credentials with worker-id specific scopes.
Flags: needinfo?(jhford)
Duplicate of this bug: 1452118
The iid-verify library is written, the remaining work is integrating that library with ec2-manager to be able to give out specific credentials.
Priority: -- → P4
Depends on: 1485984
Depends on: 1485986
Component: AWS-Provisioner → Services

worker-manager gcp support has this. we should make sure to do it for the aws provider when we get to it!

Status: NEW → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.