Closed Bug 1220684 Opened 9 years ago Closed 9 years ago

Namespace the docker caches

Categories

(Taskcluster :: Services, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

References

Details

Attachments

(3 files)

Per Bug 1216306, we need to ensure some isolation between caches using scopes, which requires some namespacing.
Blocks: 1219943
Blocks: 1226240
The docker-worker:cache scopes extant are docker-worker:cache:gecko-decision docker-worker:cache:build-* docker-worker:cache:linux-cache docker-worker:cache:tc-vcs-public-sources docker-worker:cache:workspace-* docker-worker:cache:gaia-misc-caches docker-worker:cache:tooltool-cache docker-worker:cache:gaia-tc-vcs docker-worker:cache:* docker-worker:cache:gaia-linux-cache docker-worker:cache:tc-vcs docker-worker:cache:* is held by client-id:v9h-Fo_fQ3yq_-MeH6dP6w (worker-ci-tests) client-id:dustin-docker-dev (my docker-worker unit-testing credentials) client-id:T9J-xA9JSUKQzfR99NRtMg (mozilla-pulse-actions) gecko has the following in its task definitions: docker-worker:cache:build-aries-debug docker-worker:cache:build-aries-debug-objdir-gecko-{{project}} docker-worker:cache:build-aries-eng docker-worker:cache:build-aries-eng-objdir-gecko-{{project}} docker-worker:cache:build-aries-opt docker-worker:cache:build-aries-opt-objdir-gecko-{{project}} docker-worker:cache:build-aries-spark-dogfood docker-worker:cache:build-aries-spark-dogfood-objdir-gecko-{{project}} docker-worker:cache:build-aries-spark-ota-debug docker-worker:cache:build-aries-spark-ota-debug-objdir-gecko-{{project}} docker-worker:cache:build-aries-spark-ota-user docker-worker:cache:build-aries-spark-ota-user-objdir-gecko-{{project}} docker-worker:cache:build-dolphin-512-eng docker-worker:cache:build-dolphin-512-opt docker-worker:cache:build-dolphin-eng docker-worker:cache:build-dolphin-opt docker-worker:cache:build-flame-kk-debug docker-worker:cache:build-flame-kk-debug-objdir-gecko-{{project}} docker-worker:cache:build-flame-kk-eng docker-worker:cache:build-flame-kk-eng-objdir-gecko-{{project}} docker-worker:cache:build-flame-kk-opt docker-worker:cache:build-flame-kk-opt-objdir-gecko-{{project}} docker-worker:cache:build-flame-kk-ota-debug docker-worker:cache:build-flame-kk-ota-debug-objdir-gecko-{{project}} docker-worker:cache:build-flame-kk-ota-user docker-worker:cache:build-flame-kk-ota-user-objdir-gecko-{{project}} docker-worker:cache:build-flame-kk-spark-eng docker-worker:cache:build-flame-kk-spark-eng-objdir-gecko-{{project}} docker-worker:cache:build-hamachi-eng docker-worker:cache:build-hamachi-user docker-worker:cache:build-helix-user docker-worker:cache:build-macosx64-st-an-workspace docker-worker:cache:build-macosx64-workspace docker-worker:cache:build-mulet-linux-{{project}}-workspace docker-worker:cache:build-nexus-4-eng docker-worker:cache:build-nexus-4-eng-objdir-gecko-{{project}} docker-worker:cache:build-nexus-4-kk-eng docker-worker:cache:build-nexus-4-kk-eng-objdir-gecko-{{project}} docker-worker:cache:build-nexus-4-kk-ota-debug docker-worker:cache:build-nexus-4-kk-ota-debug-objdir-gecko-{{project}} docker-worker:cache:build-nexus-4-kk-user docker-worker:cache:build-nexus-4-kk-user-objdir-gecko-{{project}} docker-worker:cache:build-nexus-4-user docker-worker:cache:build-nexus-4-user-objdir-gecko-{{project}} docker-worker:cache:build-nexus-5-l-eng docker-worker:cache:build-nexus-5-l-eng-objdir-gecko-{{project}} docker-worker:cache:build-nexus-5l-ota-debug docker-worker:cache:build-nexus-5l-ota-debug-objdir-gecko-{{project}} docker-worker:cache:build-nexus-5-l-user docker-worker:cache:build-nexus-5-l-user-objdir-gecko-{{project}} docker-worker:cache:build-{{project}}-android-api-11-c6-workspace docker-worker:cache:build-{{project}}-linux32-c6-workspace docker-worker:cache:build-{{project}}-linux64-c6-workspace docker-worker:cache:build-{{project}}-linux64-st-an-workspace docker-worker:cache:build-spidermonkey-workspace docker-worker:cache:gecko-decision docker-worker:cache:linux-cache docker-worker:cache:tc-vcs docker-worker:cache:tc-vcs-public-sources docker-worker:cache:tooltool-cache docker-worker:cache:workspace-emulator-ics-debug docker-worker:cache:workspace-emulator-ics-debug-objdir-gecko-{{project}} docker-worker:cache:workspace-emulator-ics-opt docker-worker:cache:workspace-emulator-ics-opt-objdir-gecko-{{project}} docker-worker:cache:workspace-emulator-jb-debug docker-worker:cache:workspace-emulator-jb-debug-objdir-gecko-{{project}} docker-worker:cache:workspace-emulator-jb-opt docker-worker:cache:workspace-emulator-jb-opt-objdir-gecko-{{project}} docker-worker:cache:workspace-emulator-kk-debug docker-worker:cache:workspace-emulator-kk-debug-objdir-gecko-{{project}} docker-worker:cache:workspace-emulator-kk-opt docker-worker:cache:workspace-emulator-kk-opt-objdir-gecko-{{project}} docker-worker:cache:workspace-emulator-kk-x86-debug docker-worker:cache:workspace-emulator-kk-x86-debug-objdir-gecko-{{project}} docker-worker:cache:workspace-emulator-kk-x86-opt docker-worker:cache:workspace-emulator-kk-x86-opt-objdir-gecko-{{project}} docker-worker:cache:workspace-emulator-l-debug docker-worker:cache:workspace-emulator-l-debug-objdir-gecko-{{project}} docker-worker:cache:workspace-emulator-l-opt docker-worker:cache:workspace-emulator-l-opt-objdir-gecko-{{project}} docker-worker:cache:workspace-emulator-l-x86-opt docker-worker:cache:workspace-emulator-l-x86-opt-objdir-gecko-{{project}} docker-worker:cache:workspace-{{project}}-b2g-desktop-objects-debug docker-worker:cache:workspace-{{project}}-b2g-desktop-objects-opt and gaia is far simpler: docker-worker:cache:gaia-linux-cache docker-worker:cache:gaia-misc-caches docker-worker:cache:gaia-tc-vcs docker-worker:cache:resources
The idea here is to prevent cache-name collisions, avoid cache poisoning, and reduce the number of distinct scopes that must be mentioned in roles. The collision and poisoning issues are mitigated by caches being specific to a workerType. For example, the `tc-vcs` cache for github-worker can never be confused with the `tc-vcs` cache on the gecko-decision workerType. However, we do have an issue with potential poisoning of production builds by try builds. Currently, roles moz-tree:level:{1,2,3} all have docker-worker:cache:build-* docker-worker:cache:gecko-decision docker-worker:cache:linux-cache docker-worker:cache:tc-vcs docker-worker:cache:tc-vcs-public-sources docker-worker:cache:tooltool-cache docker-worker:cache:workspace-* In general, I think it's best to hold close to the tree-level model where possible - it's simple and well-understood. Gaia doesn't seem to use it, and fixing that is certainly out of scope here. What I'm proposing, then, is: * gaia-* -> all Gaia-related caches * level-N-* for each SCM level N * level-N-{{project}}-* for a particular project (generally further separated by task type) * level-N-{{project}}-decision for the decision task's workspace * level-N-{{project}}-tc-vcs * level-N-{{project}}-tc-vcs-public-sources * level-N-{{project}}-python to replace linux-cache * tooltool-cache can remain (tooltool is resistant to poisoning as it checks sha512 hashes) so, * rename 'resources' to 'gaia-resources' in the gaia tree (noting that there are no scopes with this cache..) * add scope `docker-worker:cache:gaia-*` to `repo:github.com/mozilla-b2g/gaia:*` and `client-id:tc-vcs` * add scope `docker-worker:cache:level-1-*` to `moz-tree:level:1`, similarly levels 2 and 3 * rename all of the in-tree caches appropriately as described * the decision task will need to know its level for this to work * remove all but `docker-worker:cache:level-N-*` and `docker-worker:cache:tooltool-cache` from the roles, after waiting a bit
> * rename 'resources' to 'gaia-resources' in the gaia tree (noting that > there are no scopes with this cache..) It looks like this is only used when `graph` is invoked with `--full`, which it is not in the decision task e.g., https://tools.taskcluster.net/task-inspector/#eeowDLcmRw2rj54TlLDfiw/ Note that that task also does not have the scope docker-worker:cache:resources anyway. So I think we can just remove the scope, on the assumption it's something old and unused.
> * add scope `docker-worker:cache:gaia-*` to `repo:github.com/mozilla-b2g/gaia:*` and `client-id:tc-vcs` Done, partially - client-id:tc-vcs already has the only more-specific scope that it needs, so I didn't change that. I also removed some more-specific scopes from `repo:github.com/mozilla-b2g/gaia:*` after adding this one. > * add scope `docker-worker:cache:level-1-*` to `moz-tree:level:1`, similarly levels 2 and 3 Done
Comment on attachment 8706451 [details] [review] https://github.com/mozilla-b2g/gaia/pull/33832 Looks like this should be ok. Saw some of the tasks on treeherder coming back green so nothing obvious broken yet.
Attachment #8706451 - Flags: review?(garndt) → review+
Incidentally, how do I get that gaia commit landed?
Attachment #8707576 - Flags: review?(garndt)
This adds a `--level` option to taskcluster-graph, and passes the level supplied from mozilla-taskcluster. It then substitutes that into cache names for just about every cache (tooltool being the exception, as it verifies hashes and is thus immune to poisoning). The scopes for these new cache names are already included in the relevant `moz-tree:level:*` roles. This also strips `-c6` from cache names; I added this when we were transitioning from the Ubuntu-based build images, to ensure I got clean caches. It's no longer necessary. Review commit: https://reviewboard.mozilla.org/r/30795/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/30795/
Attachment #8707625 - Flags: review?(garndt)
Attachment #8707625 - Flags: review?(garndt) → review+
Comment on attachment 8707625 [details] MozReview Request: Bug 1220684: use namespaced docker-worker caches; r?garndt https://reviewboard.mozilla.org/r/30795/#review27683 Looks good, just think this should be added to the additional taskcluster-build target as well. ::: testing/taskcluster/mach_commands.py:352 (Diff revision 1) > + 'level': params['level'], I think this (along with the command argument for it) needs to be duplicated for the taskcluster-build mach target that some people use while testing. https://dxr.mozilla.org/mozilla-central/source/testing/taskcluster/mach_commands.py#628
Comment on attachment 8707625 [details] MozReview Request: Bug 1220684: use namespaced docker-worker caches; r?garndt Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30795/diff/1-2/
Comment on attachment 8707625 [details] MozReview Request: Bug 1220684: use namespaced docker-worker caches; r?garndt Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30795/diff/2-3/
Now that I understand the oddity I was misunderstanding in the diffs, this looks good to me.
That failed: Jan 15 10:14:21 mozilla-taskcluster app/worker.1: "docker-worker:cache:level--mozilla-inbound-tc-vcs-public-sources" backed out in: https://hg.mozilla.org/integration/mozilla-inbound/rev/3e2c9a354c87 That's the scope for the decision task itself, into which the mustache variables are substituted by mozilla-taskcluster. I had passed this variable to instantiate, but incorrectly thought it just substituted all of those variables into the template -- it doesn't. https://github.com/taskcluster/mozilla-taskcluster/pull/44 Once this has landed, I can make a try push. It doesn't prove a whole lot, but will at least demonstrate whether or not a decision task is created :)
Merged the changes in that pull and it looks like the decision task ran with the right scopes from what I could looking at the graph.json created. https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=ea5f655fa812
Assuming it "sticks" this time, * remove all but `docker-worker:cache:level-N-*` and `docker-worker:cache:tooltool-cache` from the roles, after waiting a bit (I'll wait 30 days)
Depends on: 1240166
Keywords: leave-open
Hm, I wonder if I jumped the gun somehow on comment 19? Joel linked to a regression search a while back that was having trouble due to the missing --level option and lack of cache permissions.
Helpful for fixing this: curl https://auth.taskcluster.net/v1/roles/ | jq '.[] | select(.scopes[] | contains("docker-worker:cache:")) | [.roleId, .scopes]'
OK, old scopes removed from the mozilla-pulse-actions client (client-id:T9J-xA9JSUKQzfR99NRtMg) and from moz-tree:level:1.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
Component: Integration → Services
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: