Closed Bug 1233555 Opened 9 years ago Closed 8 years ago

aws-provisioner-v1/* has assume:repo:*

Categories

(Taskcluster :: Services, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

References

Details

I don't see why this would be the case.  Per bug 1217088, worker-type:* roles are given to workers, not associated with the tasks they run.
Ah, this is because `aws-provisioner-v1/*` has `assume:repo:*` which it needs until the docker starts using the temporary credentials (bug 1220738)
Depends on: 1220738
Summary: {gaia,gecko}-decision workerTypes have balrog push permission → aws-provisioner-v1/* has assume:repo:*
Greg, once bug 1220738 is fully deployed, do you think we're ready to remove this `assume:repo:*`?
Flags: needinfo?(garndt)
As long as the temp creds have those scopes we should be able to see if the world breaks by removing it.
Flags: needinfo?(garndt)
I'm going to give this a shot.

At this point, worker-type:aws-provisioner-v1/* has

    assume:repo:*
    queue:claim-task
    queue:create-artifact:*
    queue:pending-tasks:aws-provisioner-v1/*
    queue:poll-task-urls
    queue:resolve-task

The assume:repo:* allows it to act on behalf of any source-code repository, which the worker itself no longer needs to do.  Per bug 1220738, it is now using the queue credentials for everything except queue operations, so it should only need

    queue:claim-task
    queue:pending-tasks:aws-provisioner-v1/*
    queue:poll-task-urls
    queue:resolve-task

so, removing:

    assume:repo:*
    queue:create-artifact:*

I'm going to start by just removing assume:repo:*.  We should very quickly see things explode if the calculations are wrong here.
holding off momentarily due to bug 1264618
Landed removal of assume:repo:*, and I see try tasks landing since then without any issues.
Removing assume:repo:* caused decision task error: https://tools.taskcluster.net/task-inspector/#b4DsWJrQTJGn5Mi1oRF2BA/3

Is it possible to get this scope back?
Flags: needinfo?(dustin)
So we've had decision tasks failing for 2 days?!

I re-added the scope, but I don't know what happened here -- has bug 1220738 not been deployed fully (to gecko-decision in particular)?
Flags: needinfo?(dustin) → needinfo?(wcosta)
(In reply to Dustin J. Mitchell [:dustin] from comment #8)
> So we've had decision tasks failing for 2 days?!
> 
> I re-added the scope, but I don't know what happened here -- has bug 1220738
> not been deployed fully (to gecko-decision in particular)?

My guess is that it only happens when phone builds are involved (don't ask me why).
Flags: needinfo?(wcosta)
Things are still failing, however. https://public-artifacts.taskcluster.net/FrVTaMJLR-a65OZz24vvaw/2/public/logs/live_backing.log

Now I don't understand what is exactly the scope is missing.
Flags: needinfo?(dustin)
I reverted this last night, but bugzilla apparently ate my comment.  But I think it's unrelated.  I'm looking now.
Flags: needinfo?(dustin)
This is down to bug 1220704, which removed access to a private docker image from try.  These try pushes are re-adding those builds, which aren't permitted.

I removed assume:repo:* from worker-type:aws-provisioner-v1/* once more.
I confirmed that docker-worker is calling createArtifact with the task's credentials, so I'm going to also remove `queue:create-artifact:*` from the worker credentials.
My try job got a D, so I'm calling this good.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Component: Authentication → Services
You need to log in before you can comment on or make changes to this bug.