Closed Bug 1512631 Opened 6 years ago Closed 6 years ago

Please create `mobile-{1..3}-decision` worker types

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: jlorenzo, Assigned: jlorenzo)

References

Details

Attachments

(3 files)

[android-components] Use mobile-{1,3}-decision worker types 6 years ago Johan Lorenzo [:jlorenzo] 62 bytes, text/x-github-pull-request	mhentges : review+	Details \| Review
[scriptworker] Enable mobile-X-decision and Github forks 6 years ago Johan Lorenzo [:jlorenzo] 55 bytes, text/x-github-pull-request	jlorenzo : review+	Details \| Review
[build/puppet] Bump scriptworker to 17.2.0 6 years ago Johan Lorenzo [:jlorenzo] 55 bytes, patch	mtabara : review+	Details \| Diff \| Splinter Review

Johan Lorenzo [:jlorenzo]

Assignee

Description

•

6 years ago

In bug 1455290, we created the `gecko-focus` worker type. It was well suited when Firefox Focus was the only mobile product we shipped out of taskcluster-github. Today releng supports several projects and we'd like to run staging releases on them[1]. This leads us to need several levels of workers, for the sake of security. Here's the proposal, can we create: * mobile-1-decision for pull requests and taskgraphs spun off GitHub forks (like for staging releases). * mobile-2-decision for jobs run on the main repository (like pushes to master, testing branches, or releases branches) * mobile-3-decision for real nightlies and releases. Unlike previous mobile-X-decision, this one should be built off the AMI created in bug 1455290. This way, we don't have to whitelist a new GPG key (for now). Aki, does the split makes sense? Wander, I'm not too sure if the workerType administration falls under Release Pipeline Engineering or RelOps. Would you have an idea about it? If not, would you like to show me? [1] https://github.com/mozilla-releng/scriptworker/pull/271

Flags: needinfo?(wcosta)

Flags: needinfo?(aki)

Johan Lorenzo [:jlorenzo]

Assignee

Comment 1

•

6 years ago

Chatted IRL. :pmoore will give me a tour in a couple of hours.

Flags: needinfo?(wcosta)

Aki Sasaki (not active)

Comment 2

•

6 years ago

The split makes sense. I was wondering if we should do github- rather than mobile-, but I don't have a strong opinion. Will lockbox use mobile-* ?

Flags: needinfo?(aki)

Aki Sasaki (not active)

Comment 3

•

6 years ago

Also, when we have these all in place, we can probably retire the gecko-focus workerType.

Johan Lorenzo [:jlorenzo]

Assignee

Comment 4

•

6 years ago

Agreed on retiring `gecko-focus`. I'm not a huge fan of calling them github- for now. I think we should keep a certain layer of separation between Github projects. For instance, I know servo is getting on Taskcluster too. I don't think they should run on the same worker types. Another point called out by :mitchhentges, do we need the numbering system on mobile projects? To Mitch, that's another variable to understand for newcomers. For instance, Mitch thought level 1 was the most critical (instead of level 3). After reading [1] again, I'm not sure the definition I gave for mobile-2-decision fits what's a level 2 supposed to be. I asked :SimonSapin what they do on Servo, and he pointed to [2]. Basically, there is no numbering system, instead they define types of actions: try, operators, and reviewers. Try is try on Gecko. Operators allows users to retry a job[3] (unrelated to commit access policy, then). Reviewers are users able to give an r+ and land. Simon said he thinks they would need 2 levels (try and reviewers), instead of 3. He doesn't think numbering them is needed/ So, I'm on the fence about numbers. On one hand, it's great to have parity between the most popular project (gecko) and smaller ones; it's easier for releng (who are used to these concepts) to switch to Mobile. On the other hand, we may not need this complexity (just like Servo doesn't). What's your opinion on this, Aki? [1] https://www.mozilla.org/en-US/about/governance/policies/commit/access-policy/ [2] https://github.com/servo/saltfs/blob/4a2d61e8b5947de4135b385f0aba35925b8bf261/homu/files/cfg.toml#L124-L202 [3] https://github.com/servo/saltfs/pull/220

Flags: needinfo?(aki)

Aki Sasaki (not active)

Comment 5

•

6 years ago

(In reply to Johan Lorenzo [:jlorenzo] from comment #4) > Agreed on retiring `gecko-focus`. I'm not a huge fan of calling them github- > for now. I think we should keep a certain layer of separation between Github > projects. For instance, I know servo is getting on Taskcluster too. I don't > think they should run on the same worker types. Sure. This adds work when Servo or other projects get on board, but keeps things separated. > Another point called out by :mitchhentges, do we need the numbering system > on mobile projects? To Mitch, that's another variable to understand for > newcomers. For instance, Mitch thought level 1 was the most critical > (instead of level 3). After reading [1] again, I'm not sure the definition I > gave for mobile-2-decision fits what's a level 2 supposed to be. > I asked :SimonSapin what they do on Servo, and he pointed to [2]. Basically, > there is no numbering system, instead they define types of actions: try, > operators, and reviewers. Try is try on Gecko. Operators allows users to > retry a job[3] (unrelated to commit access policy, then). Reviewers are > users able to give an r+ and land. Simon said he thinks they would need 2 > levels (try and reviewers), instead of 3. He doesn't think numbering them is > needed/ > > So, I'm on the fence about numbers. On one hand, it's great to have parity > between the most popular project (gecko) and smaller ones; it's easier for > releng (who are used to these concepts) to switch to Mobile. On the other > hand, we may not need this complexity (just like Servo doesn't). What's your > opinion on this, Aki? Agreed we don't necessarily need 3, in which case we might just use levels 1 and 3 if we keep numbering. I agree that it's confusing that we have tiers, where 1 is highest, and levels, where 3 is highest. I think that's a matter of documentation. As for whether to use numbers or drop them, an important piece of context is that we plan on maintaining all of our scopes and potentially workerTypes through ci-{configuration,admin}, where the configs will need to be both human- and automation- readable. We should be adding a lot of tests to verify that we have no privilege escalation from level 1 to 3 for Gecko. If mobile follows that pattern, we should be able to reuse those tests easily. If mobile has special rules that differ completely from Gecko, we may have to maintain two separate sets of tests and automation rules. Someone who has to administer both sets of projects will need to switch mental contexts between which sets of rules to use. Because of this, I believe that we would need a very strong argument against using numerical levels for mobile to not use them.

Flags: needinfo?(aki)

Johan Lorenzo [:jlorenzo]

Assignee

Comment 6

•

6 years ago

Excellent point! That's a strong enough reason to me. We do want ci-admin to configure scopes (bug 1509133). I'm fine using mobile-1 and mobile-3, then. Please let me know what you think, Mitch.

Blocks: 1509133

Flags: needinfo?(mitch9654)

Simon Sapin (:SimonSapin)

Comment 7

•

6 years ago

In Servo there’s effectively two levels of access: reviewers, and people with access to Try. PRs need a review from a reviewer before landing (being merged into `master`), but we trust either set of people to not be malicious. All testing (for landing or for Try) is done on the exact same infra, we don’t have separate worker types. (At our scale, the cost of additional machines would be non-trivial.)

Mitchell Hentges [:mhentges] 🦀

Comment 8

•

6 years ago

Fair enough, it sounds like the benefits of naming consistency when using numbering is worth the slightly-added burden on documentation. I'm comfortable with either approach.

Flags: needinfo?(mitch9654)

Aki Sasaki (not active)

Comment 9

•

6 years ago

Additional note: in addition to having the valid cot key, level 3 workers should have live logging disabled.

Tom Prince [:tomprince]

Comment 10

•

6 years ago

> Additional note: in addition to having the valid cot key, level 3 workers should have live logging disabled. I believe live logging (at least for gecko workers) is disabled primarily by network rules that prevent inbound connections to the instances.

Johan Lorenzo [:jlorenzo]

Assignee

Comment 11

•

6 years ago

I just created mobile-1-decision[1] out of gecko-1-decision. I copied the config and changed the following things: * minCapacity/maxCapacity => 0, 10. Mobile has way less activity than the gecko repos. We may want to bump the min capacity to 1 so we don't have to wait when the activity jumps from nothing to a job. * instanceTypes => I only used "m5d.large"[2]. This is the first type of instance that has an SSD drive. I understand this is useful to deal with fast repository clones. I don't think we need anything bigger for now, as the repos are quite small compare to gecko. * secrets => I didn't put influx and relengAPI, because influx isn't used anymore in TC and Mobile doesn't need Releng API. There is just pulse and statelessHostname. The latter is used for live logs. I chatted with dustin about segregation and security. To him, we don't need new subnets/AMIs. As long as mobile level 1 lives within the gecko level 1 env, we're good enough from a security perspective. Same thing for level 3. I'll try this first worker type and see how it integrates with a mozilla-mobile repo. [1] https://tools.taskcluster.net/aws-provisioner/mobile-1-decision/view [2] https://aws.amazon.com/ec2/pricing/on-demand/

Assignee: nobody → jlorenzo

Johan Lorenzo [:jlorenzo]

Assignee

Comment 12

•

6 years ago

Attached file [android-components] Use mobile-{1,3}-decision worker types — Details

Attachment #9032176 - Flags: review?(mhentges)

Johan Lorenzo [:jlorenzo]

Assignee

Comment 13

•

6 years ago

Attached file [scriptworker] Enable mobile-X-decision and Github forks — Details

Attachment #9032700 - Flags: review?(mozilla)

Mitchell Hentges [:mhentges] 🦀

Comment 14

•

6 years ago

Mitchell Hentges [:mhentges] 🦀

Updated

•

6 years ago

Attachment #9032176 - Flags: review?(mhentges) → review+

Johan Lorenzo [:jlorenzo]

Assignee

Comment 15

•

6 years ago

Attached patch [build/puppet] Bump scriptworker to 17.2.0 — Details — Splinter Review

Attachment #9034669 - Flags: review?(mtabara)

Mihai Tabara [:mtabara]⌚️GMT

Updated

•

6 years ago

Attachment #9034669 - Attachment is patch: true

Attachment #9034669 - Attachment mime type: text/x-github-pull-request → text/plain

Attachment #9034669 - Flags: review?(mtabara) → review+

Johan Lorenzo [:jlorenzo]

Assignee

Comment 16

•

6 years ago

Comment on attachment 9032700 [details] [review]
[scriptworker] Enable mobile-X-decision and Github forks

r+'d at https://github.com/mozilla-releng/scriptworker/pull/271#pullrequestreview-187220577
Landed on master at https://github.com/mozilla-releng/scriptworker/commit/35e717bf42ef6543d26a2c8e043459f4cb147778

Attachment #9032700 - Flags: review?(mozilla) → review+

Johan Lorenzo [:jlorenzo]

Assignee

Comment 17

•

6 years ago

Work is now done. Mihai created mobile-3-decision[3]. It's been used on android-components for several days. No issue to report so far.

[1] https://tools.taskcluster.net/aws-provisioner/mobile-3-decision/view

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → FIXED

Mihai Tabara [:mtabara]⌚️GMT

Updated

•

6 years ago

Blocks: 1526017

Nobody; OK to take it and work on it

Updated

•

6 years ago

Component: Service Request → Operations and Service Requests

Comment hidden (collapsed)

You need to log in before you can comment on or make changes to this bug.