Closed Bug 1295173 Opened 9 years ago Closed 9 years ago

Delete unused workerTypes

Categories

(Taskcluster :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: wcosta)

Details

There are a number of workerTypes that are now completely unused (and some that are lightly used, but still needed, for ESR and whatnot). They should be deleted to simplify updating AMIs.
Status: NEW → ASSIGNED
This is a my initial list of nominees for removal. It is based on a dumb script I created [1], with obvious false positives excluded: ami-test ami-test-pv android-api-11 b2g-desktop-debug b2g-desktop-opt b2gtest-emulator b2gtest-legacy cratertest desktop-test-medium dolphin emulator-ics emulator-ics-debug emulator-jb emulator-jb-debug emulator-kk emulator-kk-debug emulator-l emulator-l-debug emulator-x86-kk gaia-cache gaia-decision garbage-amiset gps-c4-8xl qa-3-linux-fx-tests tcvcs-cache-device tutorial worker-ci-test :garndt, :dustin which do you think shouldn't be in this list? I wonder if cratertest is some rust related stuff. [1] https://github.com/walac/unused-worker-types
Flags: needinfo?(garndt)
Flags: needinfo?(dustin)
tcvcs-cache-device is actually used, therefore it is out of the list.
cratertest is definitely used. ni? jmaher re desktop-test-medium, as I'm not sure what the status of migrating tests to larger instances is garbage-amiset is for testing with the aws provisioner. Its AMIs don't need to be updated, but it should exist. Similarly, ami-test* are, I think, best left in place as a way to test new AMIs. ni? gps re gps-* I believe the qa-* are used for the firefox ui tests; ni? whimboo tutorial definitely needs to stick around (it's for, you guessed it, the tutorial) worker-ci-test is used to test the docker-worker, and needs to stick around. For the b2gtest-*, b2g-*, emulator-*, dolphin, gaia-* stuff you're the expert so I believe you! It looks like you've noted that `b2gtest` is used in tree. It would be nice to fix that, using a different workerType instead, but that can wait.
Flags: needinfo?(jmaher)
Flags: needinfo?(hskupin)
Flags: needinfo?(gps)
Flags: needinfo?(dustin)
(In reply to Dustin J. Mitchell [:dustin] from comment #3) > I believe the qa-* are used for the firefox ui tests; ni? whimboo Yes, qa-3-linux-fx-tests is our worker type for firefox-ui-tests as triggered by Mozmill CI.
Flags: needinfo?(hskupin)
So one of the things gps brought up should be taken into consideration... we should be careful with removing workerTypes that might be used for jobs on older revisioners at a later point. Now I don't think that means we need to keep those workertypes alive forever, but we do need some way to know what has been used, and what could be removed because it has not been used in N weeks. Perhaps there is a way that we can preserve what the worker type definition was (with secrets scrubbed out) at the time of removal so that if we accidentally remove something, we can recover later. This also assumes that we do not remove the ami that has been used unless it's older than some threshold we have and it's no longer used. Here is a list of workerTypes that the provisioner reported as having a running capacity in the last 8 weeks. You can compare this to the list that's currently configured. I'm not sure if 8 weeks is long enough to go back, and you probably want to double check this with source control and any other sources to confirm. I know some have already been removed from the provisioner. ami-test android-api-15 b2gbuild b2gtest cli cratertest dbg-linux32 dbg-linux64 dbg-macosx64 desktop-test desktop-test-large desktop-test-xlarge emulator-jb emulator-kk emulator-kk-debug emulator-l emulator-l-debug emulator-x86-kk flame-kk funsize-balrog funsize-mar-generator gaia gaia-decision gecko-1-b-linux gecko-1-b-win2012 gecko-2-b-linux gecko-3-b-linux gecko-decision github-worker gps-c3-4xl gps-c3-8xl gps-c4-8xl hg-worker mulet-debug mulet-opt nss-win2012r2 opt-linux32 opt-linux64 opt-macosx64 qa-3-linux-fx-tests rustbuild spidermonkey symbol-upload taskcluster-images tcvcs-cache tcvcs-cache-device ttaubert-win2012r2 tutorial win10 win2012 win2012-level-1 win2012r2 win7 worker-ci-test
Flags: needinfo?(garndt)
This is my updated list: android-api-11 b2g-desktop-debug b2g-desktop-opt b2gtest-emulator b2gtest-legacy desktop-test-medium dolphin emulator-ics emulator-ics-debug emulator-jb emulator-jb-debug emulator-kk emulator-kk-debug emulator-l emulator-l-debug emulator-x86-kk gaia-cache gaia-decision gecko-1-b-linux gecko-1-t-win7-32 gecko-2-b-linux gecko-3-b-linux gecko-talos-c3large gecko-talos-c4large gps-c4-8xl Notice that there are some that vanished from m-c, but are still present in beta/aurora branches, so after delete the unused ones, we may want to revisit this topic later. There are some in garndt's list that don't exist anymore. One note: I am concentrating only on docker-worker worker types. Therefore, I am not considering anything with *win* in the name.
What are gecko-[1-3]-b worker types used for?
Flags: needinfo?(dustin)
The gecko-[13]-b worker types are relatively new and are being used for builds on mozilla-central.
Flags: needinfo?(dustin)
I've removed the gps-c4-8xl worker type.
Flags: needinfo?(gps)
(In reply to Greg Arndt [:garndt] from comment #5) > So one of the things gps brought up should be taken into consideration... we > should be careful with removing workerTypes that might be used for jobs on > older revisioners at a later point. > > Now I don't think that means we need to keep those workertypes alive > forever, but we do need some way to know what has been used, and what could > be removed because it has not been used in N weeks. The use case here is sometimes a developer or automated tool will push a really old changeset to Try as part of bisecting, etc. If we had a fully deterministic and reproducible build environment, this should "just work." If referenced worker types no longer exist, that would obviously prevent automation from running on old changesets, which would be bad. I think a simple and elegant solution to this problem would be to maintain a map of "defuct" worker types to modern ones. If the system sees a request for an unknown worker type, it routes it to a modern equivalent, possibly a generic worker type. This isn't ideal and may not result in expected outcomes for all tasks. But it does strike a balance between supporting worker types forever and breaking automation on old commits.
I think that for the forseeable future (while there are still a dozen people hacking away on TaskCluster on a daily basis, "really old" changesets are just not going to run for any of a million reasons. Most of these are where we had some bug in production that the jobs implicitly assumed. For example, my work to limit "*" scopes pretty much broke history on a weekly basis. Eventually Jonas convinced me that 30 days was a reasonable timeframe to attempt to keep things working. As the platform stabilizes, I think that the maximum try duration will tend to increase, and at some point it will be stable enough that we can provide some kind of guarantee and test it (at least partially -- try jobs only test so much). That said, I think we will often decide to accept a reset of this time to zero or nearly zero in order to get some desirable functionality. Otherwise ensuring the ability to run old commits could be a *strong* brake on innovation. All of which is to say, let's not optimize for deprecated workerTypes just yet. WorkerTypes that are conceivably still used in try pushes (like the old {dbg,opt}-{linux,macosx}*) should stick around for as long as practical, but I don't think it's time to add automatic support for migrating worker names, as once we stabilize the platform we are unlikely to change them.
Also, while workerTypes might be kept around, nothing guarantees that they are the same workerType that existed when jobs for a particular revision originally ran. Those worker types could have different AMIs, instance types, etc. So while even using the same workerType name, that doesn't guarantee that it's 100% deterministic.
I believe we are going with desktop-test-large as the target for many of our jobs. I believe desktop-test-medium was an experiment.
Flags: needinfo?(jmaher)
I removed dolphin, emulators, android-api-11, b2gtest-legacy, b2gtest-emulator and desktop-test-medium. This is the new, unfiltered, list of worker types that are not used in gecko tree: ami-test ami-test-pv cratertest funsize-balrog funsize-balrog-dev funsize-mar-generator gaia-cache gaia-decision gecko-1-b-linux gecko-1-t-win7-32 gecko-2-b-linux gecko-3-b-linux gecko-talos-c3large gecko-talos-c4large github-worker hg-worker nss-win2012r2 qa-3-linux-fx-tests releng-task rustbuild tcvcs-cache tcvcs-cache-device tutorial win2012r2 win7 worker-ci-test I don't see any obvious candidate for removal, and if anyone else does, I will close the bug.
I assume we can remove: gecko-talos-c3large gecko-talos-c4large
(In reply to Joel Maher ( :jmaher) from comment #15) > I assume we can remove: > gecko-talos-c3large > gecko-talos-c4large Done!
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.