Closed Bug 1218928 Opened 9 years ago Closed 8 years ago

sort out gaia-taskcluster credentials

Categories

(Taskcluster :: Services, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

References

Details

Attachments

(4 files)

As we found out in bug 1218523, gaia-taskcluster is using the confusingly-named `taskcluster-github` clientId.  That has `*`, but it looks like it only needs the gaia tree's scopes.  So this is similar to bug 1216306
Assignee: nobody → dustin
So looking at the source for gaia-taskcluster, this seems to be a version of mozilla-taskcluster that listens for GH pushes and pull requests instead of hg pushes.  Note that it does not use assumeScopes to limit its scopes when creating a new task graph.  It also loads its configuration from a file in S3 (!?) at s3://github-treeherder/production-v2.0.0.json, which can probably have scopes replaced by roles.
Attached file production-v2.0.0.json
Here's the config file, stripped of secrets and keys.  Joyfully, every tree has "*".  I kid you not.
Dave, I see a bunch of bugzilla github repositories in here.  Are you using TC to build/test those, or wast that just an experiement.  If so, can you point me to the task descriptions so that I can limit the available scopes appropriately?
Flags: needinfo?(dkl)
Here are the scopes defined in the taskGraph.json in each of the repos:

github.com/bugzilla/bugzilla
github.com/dklawren/bugzilla
github.com/lightsofapollo/gaia
  docker-worker:cache:gaia-misc-caches
github.com/mozilla-b2g/gaia
  docker-worker:cache:gaia-tc-vcs
  docker-worker:cache:gaia-linux-cache
  docker-worker:cache:gaia-misc-caches
  docker-worker:image:quay.io/mozilla/raptor-tester:latest
  queue:define-task:aws-provisioner/gaia
  queue:create-task:aws-provisioner/gaia
github.com/mozilla-b2g/gaia-email-libs-and-more
github.com/mozilla/webtools-bmo-bugzilla

I think we should drop the lightsofapollo/gaia repo.  The gaia repo has a nice list of scopes, so that's cool.  But the others are literally running with task.scopes = ["*"] which doesn't give much information about what they actually need.

Selena, do you have an idea who I should talk to about gaia-email-libs-and-more?  https://github.com/mozilla-b2g/gaia-email-libs-and-more/commits/master/taskgraph.json says James :/
Flags: needinfo?(sdeckelmann)
(In reply to Dustin J. Mitchell [:dustin] from comment #3)
> Dave, I see a bunch of bugzilla github repositories in here.  Are you using
> TC to build/test those, or wast that just an experiement.  If so, can you
> point me to the task descriptions so that I can limit the available scopes
> appropriately?

We are still using them. We should have mozilla/webtools-bmo-bugzilla and bugzilla/bugzilla (all branches for both).
When changes are committed to git.mozilla.org, they are mirrored to github and TaskCluster fires off new test runs.

dkl
Flags: needinfo?(dkl)
Thanks Dave.

Andrew, it looks like you've made most of the commits to gaia-email-libs-and-more.  Do you know if the builds for that repo require any special TaskCluster scopes?  Doing anything fancy in there?

If not, my plan is to give each of these repos (except gaia) an empty list of scopes.  For simple stuff, that's probably exactly right, but if you're using private docker images, direct audio/video device access, taskdroid/bitbar, or other restricted features, it will probably fail.  If fail it does, let me know how and I'll add the necessary scopes.  The idea is to limit things down from "god-like" to "just what is needed", not to actually prevent your tasks from running!
Flags: needinfo?(sdeckelmann) → needinfo?(bugmail)
Specifically, I'm going to define roles

  assume:repo:github.com/<org>/<repo>:branch:<branch>
  assume:repo:github.com/<org>/<repo>:pull-request

to allow us to later make distinct scopes available to branches and random pull requests.

I'll update the configuration file to contain those roles as appropriate for each repository, dropping the personal repos (dklwaren and lingsofapollo).  I'll then configure gaia-taskcluster with

  assume:repo:github.com/*
  scheduler:create-task-graph

which should be sufficient without being excessive.
(In reply to Dustin J. Mitchell [:dustin] from comment #6)
> Andrew, it looks like you've made most of the commits to
> gaia-email-libs-and-more.  Do you know if the builds for that repo require
> any special TaskCluster scopes?  Doing anything fancy in there?

We don't need any special TaskCluster scopes.  The taskcluster stuff in there was :lightsofapollo helping me try and get my feet wet with taskcluster, but we never really caught taskcluster fever after the treeherder configuration bug went into limbo (bug 1089892).  Feel free to strip scopes or disable taskcluster stuff for gaia-email-libs-and-more if that's appropriate.  We're a month or two away from really ramping up our testing infra again; right now everything's busted.
Flags: needinfo?(bugmail)
Morgan, does this align pretty well with how tc-github will use roles?  I'd like to be able to use the same roles for either application (TBH, it'd be great to see gaia-taskcluster eventually disabled in favor of tc-github).
And worth noting, github refuses to create repos with `/` or `:` in the repo name, so the suffixing in comment 7 is safe.
Attached file production-v2.0.0.json
New config (again without secrets).
Well,
  queue:define-task:aws-provisioner/gaia
  queue:create-task:aws-provisioner/gaia
were wrong - it's aws-provisioner-v1/gaia-decision now.  That swallowed creation of a few decision tasks.
Also needed
  queue:route:gaia-taskcluster
  queue:route:tc-treeherder.gaia.*
Hrm, Will we need the treeherder scopes for any of the other repos? I think things like bmo-master report to treeherder
OK, I just added index:*, queue:*, scheduler:*, and docker-worker:* to the gaia role.  There's layers of bugs and misconfigurations here :(
(In reply to Dustin J. Mitchell [:dustin] from comment #16)
> https://github.com/taskcluster/gaia-taskcluster/pull/2 will fix the weird
> treeherder scopes

This is landed and was used to run tests for attachment 8680762 [details] [review].
Depends on: 1219864
I did some analysis over in bug 1219864 to get the complete set of scopes and roles required for a gaia push.  I added those to the roles (warts and all) and removed index:*, queue:*, scheduler:*, and docker-worker:*.
Attachment #8680762 - Flags: review?(garndt)
Comment on attachment 8680762 [details] [review]
[gaia] djmitche:remove-invalid-scopes > mozilla-b2g:master

Comment left in the PR.
Attachment #8680762 - Flags: review?(garndt) → review+
I removed the * from the gaia-taskcluster clientId role, and

{"type":"create task graph error","queueName":"gaiathpv6","message":"Authorization Failed","body":{"message":"Authorization Failed","error":{"info":"None of the scope-sets was satisfied","scopesets":[["scheduler:route:gaia-taskcluster"]],"scopes":["assume:client-id:kd-b_FdrSJ-4Gr3FF4IOpA","scheduler:create-task-graph","assume:repo:github.com/*","queue:route:gaia-taskcluster","tc-treeherder.bugzilla-master.*","queue:route:tc-treeherder.bugzilla-master.*","queue:route:tc-treeherder.bugzilla-5_0.*","queue:route:tc-treeherder.bugzilla-4_4.*","queue:route:tc-treeherder.bugzilla-4_2.*","queue:route:tc-treeherder.bugzilla.*","queue:create-task:aws-provisioner-v1/b2gtest","queue:define-task:aws-provisioner-v1/b2gtest","tc-treeherder.bmo-master.*","queue:route:tc-treeherder.bmo-master.*","docker-worker:cache:gaia-tc-vcs","docker-worker:cache:gaia-linux-cache","docker-worker:cache:gaia-misc-caches","docker-worker:image:quay.io/mozilla/raptor-tester:latest","queue:create-task:aws-provisioner-v1/gaia-decision","queue:create-task:aws-provisioner/gaia","queue:define-task:aws-provisioner/gaia","queue:route:tc-treeherder.gaia.*","queue:create-task:aws-provisioner-v1/gaia","docker-worker:image:taskcluster/gaia-taskenv*","docker-worker:image:quay.io/mozilla/gaia-taskenv:*","index:insert-task:gaia.npm_cache.*","queue:create-artifact:public/node_modules.tar.gz"]}}}

which, yeah, something is looking for `scheduler:route:gaia-taskcluster`, which isn't even a thing.
Ah, it IS a thing, and the graph-level route `gaia-taskcluster` is set on every task-graph it generates, so I'll add it to the gaia-taskcluster clientId's role.
Commit (master): https://github.com/mozilla-b2g/gaia/commit/db409c9d8c3094c3329fed2131df3c65d67842e3

Fixed.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Sorry, didn't want to close this just yet.  Once those changes are reliably deployed, I need to remove some scopes.

:aus, do you have an idea how long I'd need to wait?  After I remove the scopes, basically any commit prior to what you've landed won't build in TC anymore.  That said, the scopes aren't hurting anything (referring to non-existent resources) so it's OK to wait quite a while.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Flags: needinfo?(aus)
I think you could remove them today. This change is now in use by most repos that run the gaia tests.
Flags: needinfo?(aus)
--- repo:github.com/mozilla/webtools-bmo-bugzilla:*.old
+++ repo:github.com/mozilla/webtools-bmo-bugzilla:*.new
@@ -1,4 +1,3 @@
 queue:define-task:aws-provisioner-v1/b2gtest
 queue:route:gaia-taskcluster
 queue:route:tc-treeherder.bmo-master.*
-tc-treeherder.bmo-master.*

--- repo:github.com/mozilla-b2g/gaia:*.old
+++ repo:github.com/mozilla-b2g/gaia:*.new
@@ -1,15 +1,10 @@
 docker-worker:cache:gaia-linux-cache
 docker-worker:cache:gaia-misc-caches
 docker-worker:cache:gaia-tc-vcs
-docker-worker:image:quay.io/mozilla/gaia-taskenv:*
-docker-worker:image:quay.io/mozilla/raptor-tester:latest
-docker-worker:image:taskcluster/gaia-taskenv*
 index:insert-task:gaia.npm_cache.*
 queue:create-artifact:public/node_modules.tar.gz
 queue:create-task:aws-provisioner-v1/gaia
 queue:create-task:aws-provisioner-v1/gaia-decision
-queue:create-task:aws-provisioner/gaia
-queue:define-task:aws-provisioner/gaia
 queue:route:gaia-taskcluster
 queue:route:tc-treeherder.gaia-master.*
 queue:route:tc-treeherder.gaia.*
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
^^ the PR is because there was one lingering docker-worker:image scope that was hard-coded into a task description.  I've re-added the scope:

--- repo:github.com/mozilla-b2g/gaia:*.old
+++ repo:github.com/mozilla-b2g/gaia:*.new
@@ -1,6 +1,7 @@
 docker-worker:cache:gaia-linux-cache
 docker-worker:cache:gaia-misc-caches
 docker-worker:cache:gaia-tc-vcs
+docker-worker:image:quay.io/mozilla/gaia-taskenv:*
 index:insert-task:gaia.npm_cache.*
 queue:create-artifact:public/node_modules.tar.gz
 queue:create-task:aws-provisioner-v1/gaia

and we can remove that in a month or so when everyone's base trees have been updated.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #8682201 - Flags: review?(jopsen)
Comment on attachment 8682201 [details] [review]
[gaia] djmitche:bug1218928-eslint > mozilla-b2g:master

Land if test pass.. looks good to me.
Let's remove the scope from the repo:... role in about 30 days, then we don't have to annoy people by sending out an email saying we're removing this useless scope :)
Attachment #8682201 - Flags: review?(jopsen) → review+
Deployed to heroku just now. No decision tasks are getting scheduled, I've rolled out in case this is the reason, and will continue to investigate by tailing logs etc.
Pete says he just deployed the patch from comment 16 (from 7 days ago) but I'm reasonably certain I deployed that when it got review, since I later removed the "tc-treeherder.*" scope which it required.

gaia-taskcluster is not starting in heroku right now, though.
To confirm, pete re-deployed the same rev I did a week ago:


pmoore@mozilla.com: Deployed 8e0db1a

    about an hour ago v107 Compare diff 

pmoore@mozilla.com: Build succeeded

    about an hour ago View build log 

dustin@mozilla.com: Deployed 8e0db1a

    7 days ago v106 

----

https://gaia-taskcluster.herokuapp.com/ loads (it returns a 404, but it loads), so I think that the logging regarding heroku/web.* exiting is probably normal -- occurring when heroku shuts down the old version of the app after starting the new.

----

So, nothing to do here but wait to remove the roles jonas mentioned in comment 31.
(In reply to Dustin J. Mitchell [:dustin] from comment #35)
> To confirm, pete re-deployed the same rev I did a week ago:
> 
> 
> pmoore@mozilla.com: Deployed 8e0db1a
> 
>     about an hour ago v107 Compare diff 
> 
> pmoore@mozilla.com: Build succeeded
> 
>     about an hour ago View build log 
> 
> dustin@mozilla.com: Deployed 8e0db1a
> 
>     7 days ago v106 
> 
> ----
> 
> https://gaia-taskcluster.herokuapp.com/ loads (it returns a 404, but it
> loads), so I think that the logging regarding heroku/web.* exiting is
> probably normal -- occurring when heroku shuts down the old version of the
> app after starting the new.
> 
> ----
> 
> So, nothing to do here but wait to remove the roles jonas mentioned in
> comment 31.

Strange, at the time I checked currently deployed version with `git ls-remote git@heroku.com:gaia-taskcluster.git master` and it told me it was on:

205b4a4bbee8b82945a3d27088f8f2dbe38f3f71        refs/heads/master

so then I did a push, to bring it up to 8e0db1a302bdd44222618f1678be20a8a4c56ad5.

I think if I'd pushed and it was already on 8e0db1a then the push would have been a noop and there wouldn't have been a new deployment?

Anyway, one of those mysteries probably not worth investing time in solving... I have no explanation! I can only say what I saw in my console. :)
Is this bug good for closing now?
> So, nothing to do here but wait to remove the roles jonas mentioned in comment 31.
I just adjusted the dev clientId ('client-id:sldk46fxR2CdSbiw2OR-4Q') to match the production ('client-id:kd-b_FdrSJ-4Gr3FF4IOpA').
Scope deleted.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
That didn't work, so I re-added docker-worker:image:quay.io/mozilla/gaia-taskenv:*

https://github.com/mozilla-b2g/gaia/pull/32953 never landed
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
mhenretty is merging that for me, so we wait another 30 days :)
OK, let's try it again!
No pings over broken decision tasks, yay!
Status: REOPENED → RESOLVED
Closed: 9 years ago8 years ago
Resolution: --- → FIXED
Component: Authentication → Services
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: