Closed Bug 1512631 Opened 6 years ago Closed 6 years ago

Please create `mobile-{1..3}-decision` worker types

Categories

(Taskcluster :: Operations and Service Requests, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jlorenzo, Assigned: jlorenzo)

References

Details

Attachments

(3 files)

In bug 1455290, we created the `gecko-focus` worker type. It was well suited when Firefox Focus was the only mobile product we shipped out of taskcluster-github. Today releng supports several projects and we'd like to run staging releases on them[1]. This leads us to need several levels of workers, for the sake of security. Here's the proposal, can we create: * mobile-1-decision for pull requests and taskgraphs spun off GitHub forks (like for staging releases). * mobile-2-decision for jobs run on the main repository (like pushes to master, testing branches, or releases branches) * mobile-3-decision for real nightlies and releases. Unlike previous mobile-X-decision, this one should be built off the AMI created in bug 1455290. This way, we don't have to whitelist a new GPG key (for now). Aki, does the split makes sense? Wander, I'm not too sure if the workerType administration falls under Release Pipeline Engineering or RelOps. Would you have an idea about it? If not, would you like to show me? [1] https://github.com/mozilla-releng/scriptworker/pull/271
Flags: needinfo?(wcosta)
Flags: needinfo?(aki)
Chatted IRL. :pmoore will give me a tour in a couple of hours.
Flags: needinfo?(wcosta)
The split makes sense. I was wondering if we should do github- rather than mobile-, but I don't have a strong opinion. Will lockbox use mobile-* ?
Flags: needinfo?(aki)
Also, when we have these all in place, we can probably retire the gecko-focus workerType.
Agreed on retiring `gecko-focus`. I'm not a huge fan of calling them github- for now. I think we should keep a certain layer of separation between Github projects. For instance, I know servo is getting on Taskcluster too. I don't think they should run on the same worker types. Another point called out by :mitchhentges, do we need the numbering system on mobile projects? To Mitch, that's another variable to understand for newcomers. For instance, Mitch thought level 1 was the most critical (instead of level 3). After reading [1] again, I'm not sure the definition I gave for mobile-2-decision fits what's a level 2 supposed to be. I asked :SimonSapin what they do on Servo, and he pointed to [2]. Basically, there is no numbering system, instead they define types of actions: try, operators, and reviewers. Try is try on Gecko. Operators allows users to retry a job[3] (unrelated to commit access policy, then). Reviewers are users able to give an r+ and land. Simon said he thinks they would need 2 levels (try and reviewers), instead of 3. He doesn't think numbering them is needed/ So, I'm on the fence about numbers. On one hand, it's great to have parity between the most popular project (gecko) and smaller ones; it's easier for releng (who are used to these concepts) to switch to Mobile. On the other hand, we may not need this complexity (just like Servo doesn't). What's your opinion on this, Aki? [1] https://www.mozilla.org/en-US/about/governance/policies/commit/access-policy/ [2] https://github.com/servo/saltfs/blob/4a2d61e8b5947de4135b385f0aba35925b8bf261/homu/files/cfg.toml#L124-L202 [3] https://github.com/servo/saltfs/pull/220
Flags: needinfo?(aki)
(In reply to Johan Lorenzo [:jlorenzo] from comment #4) > Agreed on retiring `gecko-focus`. I'm not a huge fan of calling them github- > for now. I think we should keep a certain layer of separation between Github > projects. For instance, I know servo is getting on Taskcluster too. I don't > think they should run on the same worker types. Sure. This adds work when Servo or other projects get on board, but keeps things separated. > Another point called out by :mitchhentges, do we need the numbering system > on mobile projects? To Mitch, that's another variable to understand for > newcomers. For instance, Mitch thought level 1 was the most critical > (instead of level 3). After reading [1] again, I'm not sure the definition I > gave for mobile-2-decision fits what's a level 2 supposed to be. > I asked :SimonSapin what they do on Servo, and he pointed to [2]. Basically, > there is no numbering system, instead they define types of actions: try, > operators, and reviewers. Try is try on Gecko. Operators allows users to > retry a job[3] (unrelated to commit access policy, then). Reviewers are > users able to give an r+ and land. Simon said he thinks they would need 2 > levels (try and reviewers), instead of 3. He doesn't think numbering them is > needed/ > > So, I'm on the fence about numbers. On one hand, it's great to have parity > between the most popular project (gecko) and smaller ones; it's easier for > releng (who are used to these concepts) to switch to Mobile. On the other > hand, we may not need this complexity (just like Servo doesn't). What's your > opinion on this, Aki? Agreed we don't necessarily need 3, in which case we might just use levels 1 and 3 if we keep numbering. I agree that it's confusing that we have tiers, where 1 is highest, and levels, where 3 is highest. I think that's a matter of documentation. As for whether to use numbers or drop them, an important piece of context is that we plan on maintaining all of our scopes and potentially workerTypes through ci-{configuration,admin}, where the configs will need to be both human- and automation- readable. We should be adding a lot of tests to verify that we have no privilege escalation from level 1 to 3 for Gecko. If mobile follows that pattern, we should be able to reuse those tests easily. If mobile has special rules that differ completely from Gecko, we may have to maintain two separate sets of tests and automation rules. Someone who has to administer both sets of projects will need to switch mental contexts between which sets of rules to use. Because of this, I believe that we would need a very strong argument against using numerical levels for mobile to not use them.
Flags: needinfo?(aki)
Excellent point! That's a strong enough reason to me. We do want ci-admin to configure scopes (bug 1509133). I'm fine using mobile-1 and mobile-3, then. Please let me know what you think, Mitch.
Blocks: 1509133
Flags: needinfo?(mitch9654)
In Servo there’s effectively two levels of access: reviewers, and people with access to Try. PRs need a review from a reviewer before landing (being merged into `master`), but we trust either set of people to not be malicious. All testing (for landing or for Try) is done on the exact same infra, we don’t have separate worker types. (At our scale, the cost of additional machines would be non-trivial.)
Fair enough, it sounds like the benefits of naming consistency when using numbering is worth the slightly-added burden on documentation. I'm comfortable with either approach.
Flags: needinfo?(mitch9654)
Additional note: in addition to having the valid cot key, level 3 workers should have live logging disabled.
> Additional note: in addition to having the valid cot key, level 3 workers should have live logging disabled. I believe live logging (at least for gecko workers) is disabled primarily by network rules that prevent inbound connections to the instances.
I just created mobile-1-decision[1] out of gecko-1-decision. I copied the config and changed the following things: * minCapacity/maxCapacity => 0, 10. Mobile has way less activity than the gecko repos. We may want to bump the min capacity to 1 so we don't have to wait when the activity jumps from nothing to a job. * instanceTypes => I only used "m5d.large"[2]. This is the first type of instance that has an SSD drive. I understand this is useful to deal with fast repository clones. I don't think we need anything bigger for now, as the repos are quite small compare to gecko. * secrets => I didn't put influx and relengAPI, because influx isn't used anymore in TC and Mobile doesn't need Releng API. There is just pulse and statelessHostname. The latter is used for live logs. I chatted with dustin about segregation and security. To him, we don't need new subnets/AMIs. As long as mobile level 1 lives within the gecko level 1 env, we're good enough from a security perspective. Same thing for level 3. I'll try this first worker type and see how it integrates with a mozilla-mobile repo. [1] https://tools.taskcluster.net/aws-provisioner/mobile-1-decision/view [2] https://aws.amazon.com/ec2/pricing/on-demand/
Assignee: nobody → jlorenzo
Attachment #9032176 - Flags: review?(mhentges) → review+
Attachment #9034669 - Attachment is patch: true
Attachment #9034669 - Attachment mime type: text/x-github-pull-request → text/plain
Attachment #9034669 - Flags: review?(mtabara) → review+

Work is now done. Mihai created mobile-3-decision[3]. It's been used on android-components for several days. No issue to report so far.

[1] https://tools.taskcluster.net/aws-provisioner/mobile-3-decision/view

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Component: Service Request → Operations and Service Requests
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: