Closed Bug 1212039 Opened 9 years ago Closed 8 years ago

Write tests for TC roles

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: dustin, Assigned: dustin)

References

Details

Attachments

(1 file)

To ensure that we know exactly what clientIds have which releng-specific scopes, we want to verify things against our expectations.

This is similar to the firewall tests using fwunit, which has been very successful.
The vague plan I've come up with is to define a set of "principals" each of which has a set of scopes defined by their roles.

These can then be refined based on invariants in TC components.  For example, the mozilla-taskcluster clientId will have very broad scopes, but if we are willing to count on the invariant that it assumeScopes:["assume:tree:mozilla-inbound"] etc. to create decision tasks, then 'tree:mozilla-inbound' etc. become the more interesting principals.

This provides a nice opportunity to crystallize and model the invariants that each component enforces -- essentially every time a role is mentioned in any taskcluster component, it should have a corresponding passage in the tests that models that behavior.  We've talked about tc-github issuing roles based on github repo name; tests would model that.  We may do something funny with hooks to allow people to edit hooks for tasks they themselves don't have the scopes to run; tests would model that (imputing those task scopes to the users able to edit those tasks).  We could also model the intersection of task scopes and worker scopes when a task is executed.
As I lock down scopes, I've been keeping a set of notecards up to date to simulate this approach.  I have yellow cards corresponding to clientIds and red cards corresponding to roles that are used by services.  Together these constitute the principals.  And I have green cards representing the scopes used by releng tools and taskcluster tools.

There's always room for more scopes -- without asking permission or telling anyone, Someone could easily build a "foo-worker" that listens for tasks in the foo/foo-worker queue and performs them, based on scopes starting with "foo:".  As far as the tests I'm writing here go, that's fine -- I don't care about protecting the foo worker, because it's not mine.  If by some accident the whole world is granted access to perform foo tasks, that's not my problem.  But if the whole world is granted access to balrog, that *is* a problem.  So among the scopes I want to monitor is docker-worker:feature:balrogVPNProxy.

So, without further ado, here is the accumulated list of services that grant access based on roles:

mozilla-taskcluster
  -> assume:repo:hg.mozilla.org/$repo:*

gaia-taskcluster
  -> assume:repo:github.com/$org/$repo:branch:$branch
  -> assume:repo:github.com/$org/$repo:pull-request

taskcluster-github
  -> assume:repo:github.com/$org/$repo:pull-request

aws-provisioner
  -> assume:worker-type:aws-provisioner-v1/$workerType
     assume:worker-id:$workerId

taskcluster-hooks
  -> assume:hook-id:$hookId

docker-worker
  * extends graphs on behalf of tasks
  * creates artifacts on behalf of tasks
  * see bug 1220738

task-graph-scheduler
  -> taskgraph.scopes
*** this one is a little complicated!!
Depends on: 1226240
OK, I have a pretty simple test up:
  https://github.com/djmitche/build-scope-tests/blob/master/test_scopes.py

that already identified a credential with * that shouldn't have -- autolander-dev.  So I cleared that up.
Depends on: 1220295
Dustin, this is really a great project - I really like how you've set up these scope tests, which are really important security-wise.

This fix is just to move Jonas from RelEng to Taskcluster team, which I just spotted browsing the code.

Keep up the good work! ++
Attachment #8691907 - Flags: review?(dustin)
Attachment #8691907 - Flags: review?(dustin) → review+
I want to do two things to expand this out a little:

1) Set this up in hooks, so that we get a regularly scheduled run of the tests.  I need to dream up some sufficiently alert-y but not annoying means of signalling failures.

2) Expand the tests out to monitor all of the namespaces, to the extent practical.  Ideally scopes will prevent those from getting messed up (for example, to add a new top-level index route that's not in the namespaces doc, you'd need a scope for that route, which the scopes tests should catch), but this will provide a nice way of ensuring that the we don't miss things or mess something up.

For the moment, I'll keep this all in the same repo, but at some point we may want to separate per-team tests from taskcluster platform/core tests.
OK, successfully hooked up to hooks, with the latest run available at
  https://tools.taskcluster.net/index/#garbage.dustin/garbage.dustin.build-scope-tests
I can't send email from a docker instance (which is probably just as well).
The lists of principals were getting out of hand, so I did some interesting stuff to shorten it up.

First, I arranged to compute all of the role expansion in Python.  This means I can work around bug 1220295, and also means that I can perform partial expansions.  And it means that when bug 1220295 is finished, we'll have an independent test of the accuracy of the production scope expansion mechanics.

Next, I wrote principalsWith(role) which expands to all of the principals with the given role.  This makes it easy to break down the tests a little bit.  Broadly, I can write something like "scope X is available to all principals with moz-tree:level:1" in a few lines.  Separately, I can write "the set of principals with moz-tree:level:1 is ...".  As long as both of those are in place, we can't miss anything.

Here are some samples:

def test_relops():
    assertPrincipalsWithRole('mozilla-group:team_relops', [
        relops_permacreds,

        # taskcluster folks have *, hence matching this group
        principalsWith('mozilla-group:team_taskcluster'),
    ], omitTrusted=True)

and, separately,

def test_bbb_tasks():
    """Buildbot Bridge (BBB) allows Buildbot jobs to be run via a TaskCluster
    task.  Most BBB tasks run without the need for additional scopes, but some
    more sensitive builders are restricted by `buildbot-bridge:..` scopes.  """
    assertPrincipalsWithScope("buildbot-bridge:*", [
        # root
        'client-id:root',

        # services
        'client-id-alias:release-runner-dev',
        'client-id-alias:scheduler-taskcluster-net',  # Bug 1218541

        # user groups
        principalsWith('mozilla-group:releng'),
        principalsWith('mozilla-group:team_relops'),
        principalsWith('mozilla-group:team_taskcluster'),
    ], omitTrusted=True)

These are nice and easy to look at and reason about, without huge lists of principals.
I'm going to put this on ice for a bit -- I think the more important work right now is to get logins working for users and wrap up bug 1220295 and bug 1226240.
As time passes, this seems less and less like a good idea.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: