Closed Bug 1264078 Opened 8 years ago Closed 7 years ago

Alert when a new scope is assigned to a role

Categories

(Taskcluster Graveyard :: Discussion, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1346013

People

(Reporter: selenamarie, Unassigned)

Details

We should alert when new scopes are assigned to roles so that we know when these kinds of things change, and when.
How do we want to implement this?  I see a few options:

1. Implement it directly in the auth service, right in the API endpoint implementation
2. Build an alerts service of some sort
3. Set up structured logging and pipe that somewhere that can alert on it

I suspect the easiest short-term solution is to cheat on 3 and use a search pattern in papertrail?
My guess is that roles will continually get updates, and as a group we might not feel individually responsible for checking the alerts. However, I also see we need to have guards against unexpected role changes occurring...

Roles changing are important to understand when client will gain increased privileges. However, if no client assumes the given role, it will have no effect. Maybe we should monitor when the full set of expanded scopes for any permanent client augments. This would encompass role changes. We could capture the role change that led to the expanded scopes change. It seems reasonable that we should not need to monitor when the set of scopes reduces, only when it augments. We should probably try to be smart too about scope matching, i.e. if a client had scope "abc:*" and this changed to "abc:def:*", it is nothing to worry about.

Another thought that occurs is the possibility we could "lock" roles, especially if they are used by production systems. To change a locked role would either cause an alert, or have higher security around it, like requiring two (highly-privileged) people (client ids) to approve the change.

My worry about simple alerting is that we don't have a clearly defined responsibility chain for auditing the alerts, and also risk introducing a lot of unwatched noise. Maybe this can be filtered out with limiting to just when clientids used in production are affected, or only "locked" clients so that new roles being created e.g. for dev testing.

My last idea - maybe different scopes should have different security ratings. E.g. being able to publish an artifact under a release namespace seems more security sensitive, then for example, the ability to enable taskcluster proxy in a task. Maybe this would offer some filtering out of really insignificant stuff, or at least report on it with a lower priority.

Whatever we do, we should arrange a rota for trawling the audited changes, so e.g. once per week someone goes through the last week's changes (alerts), checking for anything suspicious, so that alerts don't get missed by everybody, thinking somebody else is looking at them.

Sorry for bike shedding! :)
I think you're actually working your way back to role/client unit tests, where basically we'd keep a set of assertions and alert when they fail.  The difficulty there was in imagining the failure mode.  From one perspective, I was just writing 100's of unit tests for a single bit of code in auth.tc.n that prevents scope escalation.  Looked at differently, I was writing each role modification in both clicks in the tools UI and a Python unit test -- which won't scale when we start allowing self-serve.
Thanks Dustin. I wasn't really thinking about unit tests, instead my goals were:

a) limit alerts to only those that *increase* a client's scopes
b) introduce personal responsibility for auditing changes (no point alerting if nobody feels responsible to audit the changes, and alerts can go unnoticed)
c) introduce the ability to "lock" a role, so that changes can still be made, but doing so raises a higher level alarm or requires 2 people to approve (functional change to auth service)
d) consider introducing a mechanism to assign different security levels to different scopes, such that changes involving a high security scope raise a more significant alarm than for a low security scope change. This could be as simple as maintaining a list of innocuous scopes that shouldn't cause an alert.

I don't see any of this being implemented via unit tests.
Note d) is just about reducing noise - it may be unnecessary and overkill. The other parts should require no maintenance...
(In reply to Pete Moore [:pmoore][:pete] from comment #4)
> Thanks Dustin. I wasn't really thinking about unit tests, instead my goals
> were:
> 
> a) limit alerts to only those that *increase* a client's scopes

I suspect this will be 90% of changes.

> b) introduce personal responsibility for auditing changes (no point alerting
> if nobody feels responsible to audit the changes, and alerts can go
> unnoticed)

Would one person be responsible for all changes?  How would we divide the responsibility?  I suspect the concern is more about the scope that's being added, than the role it's being added to (so releng doesn't care so much that you add a scope to role X vs role Y, but cares very much if you add a scope matching queue:create-task:signing-provisioner-v1/* to any role).

> c) introduce the ability to "lock" a role, so that changes can still be
> made, but doing so raises a higher level alarm or requires 2 people to
> approve (functional change to auth service)

Again, it's not a particular role we're usually concerned about, but a particular scope.  So we could accomplish this by giving nobody that scope, and implementing a kind of "sudo" operation, perhaps requiring a second party to sign-off, to get that scope.

> d) consider introducing a mechanism to assign different security levels to
> different scopes, such that changes involving a high security scope raise a
> more significant alarm than for a low security scope change. This could be
> as simple as maintaining a list of innocuous scopes that shouldn't cause an
> alert.

This seems reasonable.  But it founders because we use temporary credentials for just about everything, and at least right now, all of the TC admins have assume:*, which ends up equivalent to *.  As a result, most of the important clients (project/taskcluster/tc-login/production, project/taskcluster/tc-queue/production, etc.) have assume:* and we rely on them to be careful in handing out scopes.  Many of them hand out scopes based on roles, so some (but not all) roles can be treated as security "principals".

> I don't see any of this being implemented via unit tests.

My point is, there's no simple, general way to slice this.  We have a lot of different things we want to be true, or things we want to be false, and they don't follow any simple pattern.  They differ between groups -- releng cares about a different set of scopes from TC, from rust, etc.  Alerting on every change is a huge amount of noise, as you've indicated, and there are no simple rules that will reliably filter the noise from the signal.

The unit tests model comes from fwunit, where it was a useful way of expressing the assertions we are concerned about, using arbitrary code to model the arbitrary complexity of those assertions.  It would support both positive (these principals have these scopes) assertions and negative (no principals but these have these scopes) assertions.  The idea was to run these tests periodically and alert if they failed.  I had dreams of moving them into the auth service at some point, so that changes that failed the tests would not be accepted, thereby forming a second line of defense beyond the simple scope-delegation rules in the auth API.

My work on the topic was here:

  https://github.com/djmitche/build-scope-tests
  bug 1212039

Moving to "Discussion" since this one is well off the rails and into the woods now.
Component: Authentication → Discussion
Component: Discussion → Operations
True, but I think the latter is a bit better defined -- basically adding a "creator" field to a clientId.
Component: Operations → Discussion
Brian, do you think our audit logs cover this?
Flags: needinfo?(bstack)
The first bit of work we're try to achieve doesn't cover this, but I'd like for this to eventually be included in that work. I think it's the "Scopes/Roles Revisions & Editing Flow" section of bug 1346013.
Flags: needinfo?(bstack)
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
Product: Taskcluster → Taskcluster Graveyard
You need to log in before you can comment on or make changes to this bug.