All users were logged out of Bugzilla on October 13th, 2018

Delete unused workerTypes

RESOLVED FIXED

Status

RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: dustin, Assigned: wcosta)

Tracking

Details

There are a number of workerTypes that are now completely unused (and some that are lightly used, but still needed, for ESR and whatnot).  They should be deleted to simplify updating AMIs.
(Assignee)

Updated

2 years ago
Status: NEW → ASSIGNED
(Assignee)

Comment 1

2 years ago
This is a my initial list of nominees for removal. It is based on a dumb script I created [1], with obvious false positives excluded:

ami-test
ami-test-pv
android-api-11
b2g-desktop-debug
b2g-desktop-opt
b2gtest-emulator
b2gtest-legacy
cratertest
desktop-test-medium
dolphin
emulator-ics
emulator-ics-debug
emulator-jb
emulator-jb-debug
emulator-kk
emulator-kk-debug
emulator-l
emulator-l-debug
emulator-x86-kk
gaia-cache
gaia-decision
garbage-amiset
gps-c4-8xl
qa-3-linux-fx-tests
tcvcs-cache-device
tutorial
worker-ci-test

:garndt, :dustin which do you think shouldn't be in this list? I wonder if cratertest is some rust related stuff.

[1] https://github.com/walac/unused-worker-types
Flags: needinfo?(garndt)
Flags: needinfo?(dustin)
(Assignee)

Comment 2

2 years ago
tcvcs-cache-device is actually used, therefore it is out of the list.
cratertest is definitely used.

ni? jmaher re desktop-test-medium, as I'm not sure what the status of migrating tests to larger instances is

garbage-amiset is for testing with the aws provisioner.  Its AMIs don't need to be updated, but it should exist.  Similarly, ami-test* are, I think, best left in place as a way to test new AMIs.

ni? gps re gps-*

I believe the qa-* are used for the firefox ui tests; ni? whimboo

tutorial definitely needs to stick around (it's for, you guessed it, the tutorial)

worker-ci-test is used to test the docker-worker, and needs to stick around.

For the b2gtest-*, b2g-*, emulator-*, dolphin, gaia-* stuff you're the expert so I believe you!  It looks like you've noted that `b2gtest` is used in tree.  It would be nice to fix that, using a different workerType instead, but that can wait.
Flags: needinfo?(jmaher)
Flags: needinfo?(hskupin)
Flags: needinfo?(gps)
Flags: needinfo?(dustin)
(In reply to Dustin J. Mitchell [:dustin] from comment #3)
> I believe the qa-* are used for the firefox ui tests; ni? whimboo

Yes, qa-3-linux-fx-tests is our worker type for firefox-ui-tests as triggered by Mozmill CI.
Flags: needinfo?(hskupin)

Comment 5

2 years ago
So one of the things gps brought up should be taken into consideration... we should be careful with removing workerTypes that might be used for jobs on older revisioners at a later point.

Now I don't think that means we need to keep those workertypes alive forever, but we do need some way to know what has been used, and what could be removed because it has not been used in N weeks.  

Perhaps there is a way that we can preserve what the worker type definition was (with secrets scrubbed out) at the time of removal so that if we accidentally remove something, we can recover later.  This also assumes that we do not remove the ami that has been used unless it's older than some threshold we have and it's no longer used.

Here is a list of workerTypes that the provisioner reported as having a running capacity in the last 8 weeks.  You can compare this to the list that's currently configured.  I'm not sure if 8 weeks is long enough to go back, and you probably want to double check this with source control and any other sources to confirm.  I know some have already been removed from the provisioner.

ami-test
android-api-15
b2gbuild
b2gtest
cli
cratertest
dbg-linux32
dbg-linux64
dbg-macosx64
desktop-test
desktop-test-large
desktop-test-xlarge
emulator-jb
emulator-kk
emulator-kk-debug
emulator-l
emulator-l-debug
emulator-x86-kk
flame-kk
funsize-balrog
funsize-mar-generator
gaia
gaia-decision
gecko-1-b-linux
gecko-1-b-win2012
gecko-2-b-linux
gecko-3-b-linux
gecko-decision
github-worker
gps-c3-4xl
gps-c3-8xl
gps-c4-8xl
hg-worker
mulet-debug
mulet-opt
nss-win2012r2
opt-linux32
opt-linux64
opt-macosx64
qa-3-linux-fx-tests
rustbuild
spidermonkey
symbol-upload
taskcluster-images
tcvcs-cache
tcvcs-cache-device
ttaubert-win2012r2
tutorial
win10
win2012
win2012-level-1
win2012r2
win7
worker-ci-test
Flags: needinfo?(garndt)
(Assignee)

Comment 6

2 years ago
This is my updated list:

android-api-11
b2g-desktop-debug
b2g-desktop-opt
b2gtest-emulator
b2gtest-legacy
desktop-test-medium
dolphin
emulator-ics
emulator-ics-debug
emulator-jb
emulator-jb-debug
emulator-kk
emulator-kk-debug
emulator-l
emulator-l-debug
emulator-x86-kk
gaia-cache
gaia-decision
gecko-1-b-linux
gecko-1-t-win7-32
gecko-2-b-linux
gecko-3-b-linux
gecko-talos-c3large
gecko-talos-c4large
gps-c4-8xl

Notice that there are some that vanished from m-c, but are still present in beta/aurora branches, so after delete the unused ones, we may want to revisit this topic later. There are some in garndt's list that don't exist anymore.

One note: I am concentrating only on docker-worker worker types. Therefore, I am not considering anything with *win* in the name.
(Assignee)

Comment 7

2 years ago
What are gecko-[1-3]-b worker types used for?
Flags: needinfo?(dustin)

Comment 8

2 years ago
The gecko-[13]-b worker types are relatively new and are being used for builds on mozilla-central.
Flags: needinfo?(dustin)

Comment 9

2 years ago
I've removed the gps-c4-8xl worker type.
Flags: needinfo?(gps)
(In reply to Greg Arndt [:garndt] from comment #5)
> So one of the things gps brought up should be taken into consideration... we
> should be careful with removing workerTypes that might be used for jobs on
> older revisioners at a later point.
> 
> Now I don't think that means we need to keep those workertypes alive
> forever, but we do need some way to know what has been used, and what could
> be removed because it has not been used in N weeks.  

The use case here is sometimes a developer or automated tool will push a really old changeset to Try as part of bisecting, etc. If we had a fully deterministic and reproducible build environment, this should "just work."

If referenced worker types no longer exist, that would obviously prevent automation from running on old changesets, which would be bad.

I think a simple and elegant solution to this problem would be to maintain a map of "defuct" worker types to modern ones. If the system sees a request for an unknown worker type, it routes it to a modern equivalent, possibly a generic worker type. This isn't ideal and may not result in expected outcomes for all tasks. But it does strike a balance between supporting worker types forever and breaking automation on old commits.
I think that for the forseeable future (while there are still a dozen people hacking away on TaskCluster on a daily basis, "really old" changesets are just not going to run for any of a million reasons.  Most of these are where we had some bug in production that the jobs implicitly assumed.  For example, my work to limit "*" scopes pretty much broke history on a weekly basis.  Eventually Jonas convinced me that 30 days was a reasonable timeframe to attempt to keep things working.

As the platform stabilizes, I think that the maximum try duration will tend to increase, and at some point it will be stable enough that we can provide some kind of guarantee and test it (at least partially -- try jobs only test so much).  That said, I think we will often decide to accept a reset of this time to zero or nearly zero in order to get some desirable functionality.  Otherwise ensuring the ability to run old commits could be a *strong* brake on innovation.

All of which is to say, let's not optimize for deprecated workerTypes just yet.  WorkerTypes that are conceivably still used in try pushes (like the old {dbg,opt}-{linux,macosx}*) should stick around for as long as practical, but I don't think it's time to add automatic support for migrating worker names, as once we stabilize the platform we are unlikely to change them.
Also, while workerTypes might be kept around, nothing guarantees that they are the same workerType that existed when jobs for a particular revision originally ran.

Those worker types could have different AMIs, instance types, etc.  So while even using the same workerType name, that doesn't guarantee that it's 100% deterministic.
I believe we are going with desktop-test-large as the target for many of our jobs.  I believe desktop-test-medium was an experiment.
Flags: needinfo?(jmaher)
(Assignee)

Comment 14

2 years ago
I removed dolphin, emulators, android-api-11, b2gtest-legacy, b2gtest-emulator and desktop-test-medium. This is the new, unfiltered, list of worker types that are not used in gecko tree:

ami-test
ami-test-pv
cratertest
funsize-balrog
funsize-balrog-dev
funsize-mar-generator
gaia-cache
gaia-decision
gecko-1-b-linux
gecko-1-t-win7-32
gecko-2-b-linux
gecko-3-b-linux
gecko-talos-c3large
gecko-talos-c4large
github-worker
hg-worker
nss-win2012r2
qa-3-linux-fx-tests
releng-task
rustbuild
tcvcs-cache
tcvcs-cache-device
tutorial
win2012r2
win7
worker-ci-test

I don't see any obvious candidate for removal, and if anyone else does, I will close the bug.
I assume we can remove:
gecko-talos-c3large
gecko-talos-c4large
(Assignee)

Comment 16

2 years ago
(In reply to Joel Maher ( :jmaher) from comment #15)
> I assume we can remove:
> gecko-talos-c3large
> gecko-talos-c4large

Done!
(Assignee)

Updated

2 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.