Closed Bug 1220684 Opened 4 years ago Closed 4 years ago

Namespace the docker caches

Categories

(Taskcluster :: Services, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

References

Details

Attachments

(3 files)

Per Bug 1216306, we need to ensure some isolation between caches using scopes, which requires some namespacing.
Blocks: 1219943
Blocks: 1226240
The docker-worker:cache scopes extant are

docker-worker:cache:gecko-decision
docker-worker:cache:build-*
docker-worker:cache:linux-cache
docker-worker:cache:tc-vcs-public-sources
docker-worker:cache:workspace-*
docker-worker:cache:gaia-misc-caches
docker-worker:cache:tooltool-cache
docker-worker:cache:gaia-tc-vcs
docker-worker:cache:*
docker-worker:cache:gaia-linux-cache
docker-worker:cache:tc-vcs

docker-worker:cache:* is held by 
client-id:v9h-Fo_fQ3yq_-MeH6dP6w (worker-ci-tests)
client-id:dustin-docker-dev (my docker-worker unit-testing credentials)
client-id:T9J-xA9JSUKQzfR99NRtMg (mozilla-pulse-actions)

gecko has the following in its task definitions:
docker-worker:cache:build-aries-debug
docker-worker:cache:build-aries-debug-objdir-gecko-{{project}}
docker-worker:cache:build-aries-eng
docker-worker:cache:build-aries-eng-objdir-gecko-{{project}}
docker-worker:cache:build-aries-opt
docker-worker:cache:build-aries-opt-objdir-gecko-{{project}}
docker-worker:cache:build-aries-spark-dogfood
docker-worker:cache:build-aries-spark-dogfood-objdir-gecko-{{project}}
docker-worker:cache:build-aries-spark-ota-debug
docker-worker:cache:build-aries-spark-ota-debug-objdir-gecko-{{project}}
docker-worker:cache:build-aries-spark-ota-user
docker-worker:cache:build-aries-spark-ota-user-objdir-gecko-{{project}}
docker-worker:cache:build-dolphin-512-eng
docker-worker:cache:build-dolphin-512-opt
docker-worker:cache:build-dolphin-eng
docker-worker:cache:build-dolphin-opt
docker-worker:cache:build-flame-kk-debug
docker-worker:cache:build-flame-kk-debug-objdir-gecko-{{project}}
docker-worker:cache:build-flame-kk-eng
docker-worker:cache:build-flame-kk-eng-objdir-gecko-{{project}}
docker-worker:cache:build-flame-kk-opt
docker-worker:cache:build-flame-kk-opt-objdir-gecko-{{project}}
docker-worker:cache:build-flame-kk-ota-debug
docker-worker:cache:build-flame-kk-ota-debug-objdir-gecko-{{project}}
docker-worker:cache:build-flame-kk-ota-user
docker-worker:cache:build-flame-kk-ota-user-objdir-gecko-{{project}}
docker-worker:cache:build-flame-kk-spark-eng
docker-worker:cache:build-flame-kk-spark-eng-objdir-gecko-{{project}}
docker-worker:cache:build-hamachi-eng
docker-worker:cache:build-hamachi-user
docker-worker:cache:build-helix-user
docker-worker:cache:build-macosx64-st-an-workspace
docker-worker:cache:build-macosx64-workspace
docker-worker:cache:build-mulet-linux-{{project}}-workspace
docker-worker:cache:build-nexus-4-eng
docker-worker:cache:build-nexus-4-eng-objdir-gecko-{{project}}
docker-worker:cache:build-nexus-4-kk-eng
docker-worker:cache:build-nexus-4-kk-eng-objdir-gecko-{{project}}
docker-worker:cache:build-nexus-4-kk-ota-debug
docker-worker:cache:build-nexus-4-kk-ota-debug-objdir-gecko-{{project}}
docker-worker:cache:build-nexus-4-kk-user
docker-worker:cache:build-nexus-4-kk-user-objdir-gecko-{{project}}
docker-worker:cache:build-nexus-4-user
docker-worker:cache:build-nexus-4-user-objdir-gecko-{{project}}
docker-worker:cache:build-nexus-5-l-eng
docker-worker:cache:build-nexus-5-l-eng-objdir-gecko-{{project}}
docker-worker:cache:build-nexus-5l-ota-debug
docker-worker:cache:build-nexus-5l-ota-debug-objdir-gecko-{{project}}
docker-worker:cache:build-nexus-5-l-user
docker-worker:cache:build-nexus-5-l-user-objdir-gecko-{{project}}
docker-worker:cache:build-{{project}}-android-api-11-c6-workspace
docker-worker:cache:build-{{project}}-linux32-c6-workspace
docker-worker:cache:build-{{project}}-linux64-c6-workspace
docker-worker:cache:build-{{project}}-linux64-st-an-workspace
docker-worker:cache:build-spidermonkey-workspace
docker-worker:cache:gecko-decision
docker-worker:cache:linux-cache
docker-worker:cache:tc-vcs
docker-worker:cache:tc-vcs-public-sources
docker-worker:cache:tooltool-cache
docker-worker:cache:workspace-emulator-ics-debug
docker-worker:cache:workspace-emulator-ics-debug-objdir-gecko-{{project}}
docker-worker:cache:workspace-emulator-ics-opt
docker-worker:cache:workspace-emulator-ics-opt-objdir-gecko-{{project}}
docker-worker:cache:workspace-emulator-jb-debug
docker-worker:cache:workspace-emulator-jb-debug-objdir-gecko-{{project}}
docker-worker:cache:workspace-emulator-jb-opt
docker-worker:cache:workspace-emulator-jb-opt-objdir-gecko-{{project}}
docker-worker:cache:workspace-emulator-kk-debug
docker-worker:cache:workspace-emulator-kk-debug-objdir-gecko-{{project}}
docker-worker:cache:workspace-emulator-kk-opt
docker-worker:cache:workspace-emulator-kk-opt-objdir-gecko-{{project}}
docker-worker:cache:workspace-emulator-kk-x86-debug
docker-worker:cache:workspace-emulator-kk-x86-debug-objdir-gecko-{{project}}
docker-worker:cache:workspace-emulator-kk-x86-opt
docker-worker:cache:workspace-emulator-kk-x86-opt-objdir-gecko-{{project}}
docker-worker:cache:workspace-emulator-l-debug
docker-worker:cache:workspace-emulator-l-debug-objdir-gecko-{{project}}
docker-worker:cache:workspace-emulator-l-opt
docker-worker:cache:workspace-emulator-l-opt-objdir-gecko-{{project}}
docker-worker:cache:workspace-emulator-l-x86-opt
docker-worker:cache:workspace-emulator-l-x86-opt-objdir-gecko-{{project}}
docker-worker:cache:workspace-{{project}}-b2g-desktop-objects-debug
docker-worker:cache:workspace-{{project}}-b2g-desktop-objects-opt

and gaia is far simpler:
docker-worker:cache:gaia-linux-cache
docker-worker:cache:gaia-misc-caches
docker-worker:cache:gaia-tc-vcs
docker-worker:cache:resources
The idea here is to prevent cache-name collisions, avoid cache poisoning, and reduce the number of distinct scopes that must be mentioned in roles.

The collision and poisoning issues are mitigated by caches being specific to a workerType.  For example, the `tc-vcs` cache for github-worker can never be confused with the `tc-vcs` cache on the gecko-decision workerType.

However, we do have an issue with potential poisoning of production builds by try builds.  Currently, roles moz-tree:level:{1,2,3} all have

docker-worker:cache:build-*
docker-worker:cache:gecko-decision
docker-worker:cache:linux-cache
docker-worker:cache:tc-vcs
docker-worker:cache:tc-vcs-public-sources
docker-worker:cache:tooltool-cache
docker-worker:cache:workspace-*

In general, I think it's best to hold close to the tree-level model where possible - it's simple and well-understood.  Gaia doesn't seem to use it, and fixing that is certainly out of scope here.

What I'm proposing, then, is:

 * gaia-* -> all Gaia-related caches
 * level-N-* for each SCM level N
   * level-N-{{project}}-* for a particular project (generally further separated by task type)
     * level-N-{{project}}-decision for the decision task's workspace
     * level-N-{{project}}-tc-vcs
     * level-N-{{project}}-tc-vcs-public-sources
     * level-N-{{project}}-python to replace linux-cache
 * tooltool-cache can remain (tooltool is resistant to poisoning as it checks sha512 hashes)

so,

 * rename 'resources' to 'gaia-resources' in the gaia tree (noting that there are no scopes with this cache..)
 * add scope `docker-worker:cache:gaia-*` to `repo:github.com/mozilla-b2g/gaia:*` and `client-id:tc-vcs`
 * add scope `docker-worker:cache:level-1-*` to `moz-tree:level:1`, similarly levels 2 and 3
 * rename all of the in-tree caches appropriately as described
   * the decision task will need to know its level for this to work
 * remove all but `docker-worker:cache:level-N-*` and `docker-worker:cache:tooltool-cache` from the roles, after waiting a bit
>  * rename 'resources' to 'gaia-resources' in the gaia tree (noting that
> there are no scopes with this cache..)

It looks like this is only used when `graph` is invoked with `--full`, which it is not in the decision task e.g., 
  https://tools.taskcluster.net/task-inspector/#eeowDLcmRw2rj54TlLDfiw/
Note that that task also does not have the scope docker-worker:cache:resources anyway.

So I think we can just remove the scope, on the assumption it's something old and unused.
>  * add scope `docker-worker:cache:gaia-*` to `repo:github.com/mozilla-b2g/gaia:*` and `client-id:tc-vcs`

Done, partially - client-id:tc-vcs already has the only more-specific scope that it needs, so I didn't change that.  I also removed some more-specific scopes from `repo:github.com/mozilla-b2g/gaia:*` after adding this one.

>  * add scope `docker-worker:cache:level-1-*` to `moz-tree:level:1`, similarly levels 2 and 3

Done
Comment on attachment 8706451 [details] [review]
https://github.com/mozilla-b2g/gaia/pull/33832

Looks like this should be ok.  Saw some of the tasks on treeherder coming back green so nothing obvious broken yet.
Attachment #8706451 - Flags: review?(garndt) → review+
Incidentally, how do I get that gaia commit landed?
Attachment #8707576 - Flags: review?(garndt)
This adds a `--level` option to taskcluster-graph, and passes the level
supplied from mozilla-taskcluster.  It then substitutes that into cache names
for just about every cache (tooltool being the exception, as it verifies hashes
and is thus immune to poisoning).  The scopes for these new cache names are
already included in the relevant `moz-tree:level:*` roles.

This also strips `-c6` from cache names; I added this when we were
transitioning from the Ubuntu-based build images, to ensure I got clean caches.
It's no longer necessary.

Review commit: https://reviewboard.mozilla.org/r/30795/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/30795/
Attachment #8707625 - Flags: review?(garndt)
Attachment #8707625 - Flags: review?(garndt) → review+
Comment on attachment 8707625 [details]
MozReview Request: Bug 1220684: use namespaced docker-worker caches; r?garndt

https://reviewboard.mozilla.org/r/30795/#review27683

Looks good, just think this should be added to the additional taskcluster-build target as well.

::: testing/taskcluster/mach_commands.py:352
(Diff revision 1)
> +            'level': params['level'],

I think this (along with the command argument for it) needs to be duplicated for the taskcluster-build mach target that some people use while testing.

https://dxr.mozilla.org/mozilla-central/source/testing/taskcluster/mach_commands.py#628
Comment on attachment 8707625 [details]
MozReview Request: Bug 1220684: use namespaced docker-worker caches; r?garndt

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30795/diff/1-2/
Comment on attachment 8707625 [details]
MozReview Request: Bug 1220684: use namespaced docker-worker caches; r?garndt

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30795/diff/2-3/
Now that I understand the oddity I was misunderstanding in the diffs, this looks good to me.
That failed:

 Jan 15 10:14:21 mozilla-taskcluster app/worker.1:                 "docker-worker:cache:level--mozilla-inbound-tc-vcs-public-sources" 

backed out in:
 https://hg.mozilla.org/integration/mozilla-inbound/rev/3e2c9a354c87

That's the scope for the decision task itself, into which the mustache variables are substituted by mozilla-taskcluster.  I had passed this variable to instantiate, but incorrectly thought it just substituted all of those variables into the template -- it doesn't.

https://github.com/taskcluster/mozilla-taskcluster/pull/44

Once this has landed, I can make a try push.  It doesn't prove a whole lot, but will at least demonstrate whether or not a decision task is created :)
Merged the changes in that pull and it looks like the decision task ran with the right scopes from what I could looking at the graph.json created.

https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=ea5f655fa812
Assuming it "sticks" this time,

 * remove all but `docker-worker:cache:level-N-*` and `docker-worker:cache:tooltool-cache` from the roles, after waiting a bit

(I'll wait 30 days)
Depends on: 1240166
Keywords: leave-open
Hm, I wonder if I jumped the gun somehow on comment 19?  Joel linked to a regression search a while back that was having trouble due to the missing --level option and lack of cache permissions.
Helpful for fixing this:

curl https://auth.taskcluster.net/v1/roles/ | jq '.[] | select(.scopes[] | contains("docker-worker:cache:")) | [.roleId, .scopes]'
OK, old scopes removed from the mozilla-pulse-actions client (client-id:T9J-xA9JSUKQzfR99NRtMg) and from moz-tree:level:1.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
Component: Integration → Services
You need to log in before you can comment on or make changes to this bug.