Open Bug 1761841 Opened 3 years ago Updated 2 years ago

Level 1 rerun hooks seem to require level 3 scopes

Categories

(Release Engineering :: Firefox-CI Administration, defect, P3)

Tracking

(Not tracked)

People

(Reporter: ahal, Unassigned)

References

Details

Something strange is going on that I don't understand. To reproduce, attempt to re-run this task:
https://firefox-ci-tc.services.mozilla.com/tasks/UaKKkloDQ7C5koS2BqgFhg/runs/0

You should see an error like:

The role hook-id:project-app-services/in-tree-action-1-generic/0db59f3995 does not have sufficient scopes to create the task:

Client ID static/taskcluster/hooks does not have sufficient scopes and is missing the following scopes:

{
  "AllOf": [
    "assume:repo:github.com/mozilla/application-services:action:generic",
    "queue:route:checks",
    "queue:scheduler-id:app-services-level-3",
    {
      "AnyOf": [
        "queue:create-task:highest:app-services-3/decision",
        "queue:create-task:very-high:app-services-3/decision",
        "queue:create-task:high:app-services-3/decision",
        "queue:create-task:medium:app-services-3/decision",
        "queue:create-task:low:app-services-3/decision",
        "queue:create-task:very-low:app-services-3/decision",
        "queue:create-task:lowest:app-services-3/decision"
      ]
    }
  ]
}

This request requires the client to satisfy the following scope expression:

{
  "AllOf": [
    "assume:repo:github.com/mozilla/application-services:action:generic",
    "queue:route:checks",
    "queue:create-task:project:none",
    "queue:scheduler-id:app-services-level-3",
    {
      "AnyOf": [
        "queue:create-task:highest:app-services-3/decision",
        "queue:create-task:very-high:app-services-3/decision",
        "queue:create-task:high:app-services-3/decision",
        "queue:create-task:medium:app-services-3/decision",
        "queue:create-task:low:app-services-3/decision",
        "queue:create-task:very-low:app-services-3/decision",
        "queue:create-task:lowest:app-services-3/decision"
      ]
    }
  ]
}

Here is the hook that we're trying to run:
https://firefox-ci-tc.services.mozilla.com/hooks/project-app-services/in-tree-action-1-generic%2F0db59f3995

As you can see it is level 1 (which is expected since the task is also level 1). But it is failing with a scope error because we need level 3 create-task and scheduler-id scopes. I'm not sure why this would happen.

For context I recently enabled hooks on this repo here:
https://hg.mozilla.org/ci/ci-configuration/rev/7e8c1a39f2b3fb40ca19b0a5da39834fd3f6f32d

It's very likely that I neglected something..

Hey Aki, I'm a bit lost here. Hoping maybe this rings a bell or you can think of something that helps point me in the right direction.

Flags: needinfo?(aki)

Hmm, the same thing seems to happen in glean with, e.g this task:
https://firefox-ci-tc.services.mozilla.com/tasks/FMYpb4Z9TWafslWsh542GA

Same thing with Fenix and this task:
https://firefox-ci-tc.services.mozilla.com/tasks/MbPtQltVRbaY_WHDjepI6A

Starting to think L1 hooks might be broken everywhere.. I wonder if this is expected, or gone unnoticed since it was regressed.

Summary: Re-running level 1 task seems to require level 3 scopes in application-services → Level 1 rerun hooks seem to require level 3 scopes

The rerun action does work for a L1 Gecko task from try however..

I'm guessing they're using an action on a PR from a branch on a level 3 repo? We run into issues where it's difficult to maintain an allowlist/denylist of all the branches that should be level 3 vs level 1 on the main repo inside .tc.yml.
Our workaround has been to create a fork repo and create the PR from there, and the tasks will be level 1.

Flags: needinfo?(aki)

(But then the action won't be able to run on an untrusted repo due to scopes, which means we go back to cli taskcluster task rerun or wish that we could drop a comment in the PR to have a rerun happen.)

Ah.. Yeah, looks like that task came from here:
https://github.com/mozilla/application-services/tree/branch-builds

Ok I think that makes sense.. though I still don't quite grok how the scopes are calculated here.

I wonder if there's something else we can do. Seems like lots of projects use this branch based development process (apparently at least app-services, glean, fenix and mozilla-vpn-client). Maybe we can come up with a way for them to define their own L3 branches in .taskcluster.yml or somewhere and then treat the rest as L1 or something.. though I appreciate doing so introduces a security risk.

(In reply to Andrew Halberstadt [:ahal] from comment #6)

Ah.. Yeah, looks like that task came from here:
https://github.com/mozilla/application-services/tree/branch-builds

Ok I think that makes sense.. though I still don't quite grok how the scopes are calculated here.

https://github.com/mozilla/application-services/blob/cc4434568f089a20b5461f5bb954bb9bd5463aa9/.taskcluster.yml#L104-L107

When we push to the branch on the main repo, we have a github-push on the main repo, which creates level 3 build tasks. This may show up in a PR since Github statuses show up per-revision, not per-revision+tasks_for. So another thing we'd like to fix is how statuses show up in Github.

github-pull-requests are excluded from the level 3 if and fall through to the level 1 else.

If we have an easy list of branches and/or branch patterns that should be level 3, with the appropriate branch protections, we can encapsulate those in that block.

I wonder if there's something else we can do. Seems like lots of projects use this branch based development process (apparently at least app-services, glean, fenix and mozilla-vpn-client). Maybe we can come up with a way for them to define their own L3 branches in .taskcluster.yml or somewhere and then treat the rest as L1 or something.. though I appreciate doing so introduces a security risk.

Forcing more tasks to level 1 isn't a security risk, probably the opposite. It's more .tc.yml work and formally defining the branch patterns etc. This may actually end up with action-reruns working in PRs-from-main-repo-branches.

When we push to the branch on the main repo, we have a github-push on the main repo, which creates level 3 build tasks. This may show up in a PR since Github statuses show up per-revision, not per-revision+tasks_for. So another thing we'd like to fix is how statuses show up in Github.

Ah! I think this is https://github.com/taskcluster/taskcluster/issues/5017. So sounds like that's the real issue here and making more branches L1 rather than L3 is just a workaround.

From the issue:

Steps to reproduce the behavior:

  1. Create a feature branch my-feature
  2. Push it to a branch of the same name on the main repo (i.e, not a fork)
  3. Open a pull request from that branch in the Github UI
  4. Make a new commit in the feature branch and push

This will generate both a github-push event since we are pushing to a branch on the main repo, as well as a github-pull-request event, since we are updating a pull request, therefore spawning two Decision tasks. So far I believe this is expected behaviour.

I wonder if we really want to generate two entirely separate graphs here.. Maybe taskcluster-github should be smart enough to detect this scenario and then only submit one graph or something.. I don't know though, I guess there would still be level issues if we did that.

So bringing this back to application-services, to workaround this we can either:

  1. Hardcode which branches are supposed to be level 3 at the location Aki linked.
  2. Wait for https://github.com/taskcluster/taskcluster/issues/5017 to get fixed.
  3. Generate pull requests from a fork rather than from a branch on the main repo.

Looking at the branches in the repo, I'd guess main and any branch that matches release-* should be level 3.. Ben, does that sound accurate? Do you have a preference on the approach here?

Flags: needinfo?(bdeankawamura)

Yes main + release-* is the correct set of lvl-3 branches.

I think my preference would be to only generate 1 graph when we get a push + pull request event. Could we somehow combine 1 and 2? Maybe we could run the decision task on a github-push event if the branch matches the main/release pattern?

Flags: needinfo?(bdeankawamura)

So if we detect the push is to a non main / release branch, then don't run a decision task at all? This would mean the only way to get tasks to run is to create a PR. I think that sounds reasonable and is definitely do-able. Just be warned that tasks may not show up when you look at the branch view in Github either (though I'm not actually sure about this).

You might want to check with others on your team as well before we go ahead?

I think that should be fine and would match our practices, but I'll double check with others and get back to you.

One thing to double check: If we create a PR, then push more commits, I think this will still generate the github-pull-request event and we will still generate the decision task. Is that correct?

Yes I believe so :). I think there will be some amount of trial and error involved here though.

Severity: -- → S3
Priority: -- → P3

Talked to the team and got a green light for this one. I'm feeling pretty confident that it will work, but I'll also make sure to monitor and see if any issues come up.

Hmm, I started looking into this and the repo already appears to be set up this way:
https://github.com/mozilla/application-services/blob/14770aeae187a65bdb10f8bee150c274de476a8a/.taskcluster.yml#L100

So I'm back to being confused by this bug.

(In reply to Aki Sasaki [:aki] (he/him) (UTC-6) from comment #7)

When we push to the branch on the main repo, we have a github-push on the main repo, which creates level 3 build tasks. This may show up in a PR since Github statuses show up per-revision, not per-revision+tasks_for. So another thing we'd like to fix is how statuses show up in Github.

So afaict, this isn't happening. For example, here's the Decision task from Ben's most recent push:
https://firefox-ci-tc.services.mozilla.com/tasks/bUZx39k1TwCuJhmkog68Mw

In the extra section, we can clearly see tasks_for: github-pull-request. But the if you click through to a dependent task and try to use the rerun action, you get the error from comment 0.

So this is still a mystery to me.

Starting to think L1 hooks might be broken everywhere.. I wonder if this is expected, or gone unnoticed since it was regressed.

I think this might actually be the case here :/, and either no one has tried this recently or no one has bothered to complain about it.

Maybe this line that says every action run on the main repo is level 3, even if it's run against a pull request?

Good eye! I double checked fenix and glean and they have the same construct (likely all copy/pasted from one another). So that would explain why it's happening across repos.

See Also: → 1784563
You need to log in before you can comment on or make changes to this bug.