Closed
Bug 1171809
Opened 9 years ago
Closed 9 years ago
docker-worker: Listen to pulse exchange to blow away (clobber) named caches (proposal)
Categories
(Taskcluster :: Workers, defect)
Taskcluster
Workers
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jonasfj, Assigned: wcosta)
References
Details
Attachments
(1 file)
Sheriffs wants a clobber service... Not sure exactly what form it should take yet. But maybe it's just that docker-worker listens to a pulse exchange and then gets a message routed with <provisionerId>.<workerType> which tells it which named cache to clear. I suspect the idea of some form of named cache might persist between different worker implementations, so a generic way to signal cache purge might not be bad. I partially think that us having cache poisoning issues means we cache too much. @garndt, what do you think? (I'm not sure this isn't an anti-pattern)
Flags: needinfo?(garndt)
Comment 1•9 years ago
|
||
See http://gittup.org/blog/2014/03/6-clobber-builds-part-1---missing-dependencies/ for inspiration
Comment 2•9 years ago
|
||
I think a combination of using per branch caches and clobbering caches is a good step to reducing issues that we've seen in things like 1154669 I also want to implement some disk stats soon on these workers to understand maybe the impsect of having more cache directories, but that's independent of this bug. The worker could listen to that exchange, and then either remove it completely from volumes that can be used for tasks, or if it's currently in use, it can mark it dirty and then once it's freed it'll be removed.
Flags: needinfo?(garndt)
Comment 3•9 years ago
|
||
Does this bug cover adding Taskcluster jobs to the Clobberer as well so this infra is actually usable? Right now, those jobs aren't listed, so we can't clobber them even if it's in theory supported. https://api.pub.build.mozilla.org/clobberer
Reporter | ||
Comment 4•9 years ago
|
||
I don't know the clobberer service, but judging from the docs: https://api.pub.build.mozilla.org/docs/usage/clobberer/ I would say that it wouldn't integrate well. But it could probably be something similar.
Comment 5•9 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #3) > Does this bug cover adding Taskcluster jobs to the Clobberer as well so this > infra is actually usable? Right now, those jobs aren't listed, so we can't > clobber them even if it's in theory supported. > https://api.pub.build.mozilla.org/clobberer That depends on the approach they decide to take. I'm not sure if they'd want to use clobberer (the utility) here or not. If we'd like to keep clobbering tied to a single interface, that would be the way to do it; but that approach may seem limiting within the context of TC.
Comment 6•9 years ago
|
||
I can say with certainty that sheriffs want to be able to clobber TC build slaves the same as buildbot slaves (as in, forced objdir deletion). If we need a new bug to track it, so be it.
Reporter | ||
Comment 7•9 years ago
|
||
@RyanVM, Can you describe the current workflow for purging caches? What button do click where, what link? what details do you enter?
Comment 8•9 years ago
|
||
We discussed at length on IRC and I think we're all on the same page now. This bug tracks TC being made capable of performing a clobber task and I've filed bug 1174263 for updating the clobberer tool to properly communicate to TC that a clobber has been requested for a given tree/platform.
Reporter | ||
Comment 9•9 years ago
|
||
I've deployed a taskcluster-purge-cache service that will publish a pulse message. Once DNS is up, I'll publish docs, add it to docs.tc.net and build a quick tool for purge caches per workerType.
Reporter | ||
Comment 10•9 years ago
|
||
Docs deployed, will be updated when DNS is configured: http://docs.taskcluster.net/services/purge-cache/
Comment 12•9 years ago
|
||
Looking over the docs it appears that what needs to be implemented within docker-worker: 1. Listen for events on exchange/taskcluster-purge-cache/v1/purge-cache 2. On event, if workertype and provisioner id match then: a. if cachename exists and is not mounted, remove b. If cache is currently mounted, mark cache as purged and remove when volume is released c. never allow tasks to volume mount a cache marked for purging
Updated•9 years ago
|
Component: TaskCluster → Docker-Worker
Product: Testing → Taskcluster
Reporter | ||
Comment 13•9 years ago
|
||
This might also be enough that we don't need to do bug 1151605, which probably a more in-tree driven approach to fix this.
See Also: → 1151605
Updated•9 years ago
|
Assignee: nobody → wcosta
Updated•9 years ago
|
Summary: docker-worker: Listen to pulse exchange to blow away named caches (proposal) → docker-worker: Listen to pulse exchange to blow away (clobber) named caches (proposal)
Assignee | ||
Updated•9 years ago
|
Status: NEW → ASSIGNED
Comment 15•9 years ago
|
||
Comment on attachment 8667447 [details] [review] PR 72 This is definitely coming together. Awesome work picking this up so quickly. Docker worker definitely isn't the easiest to get started with. I left some comments on the PR and the CI tests have a couple of failures that need addressing. Feel free to reflag me once those are addressed. Thanks!
Attachment #8667447 -
Flags: review?(garndt) → review-
Assignee | ||
Comment 16•9 years ago
|
||
Comment on attachment 8667447 [details] [review] PR 72 All comments addressed.
Attachment #8667447 -
Flags: review- → review?(garndt)
Assignee | ||
Comment 17•9 years ago
|
||
Even before the patch, not all tests pass [1]. Running tests under worker-ci locally, purge cache tests pass if I supply the right pulse credentials, I have no idea why they are failing when push to the PR. [1] https://pastebin.mozilla.org/8848314
Flags: needinfo?(garndt)
Comment 18•9 years ago
|
||
It's possible that a lot of those failures are fallout from me rotating creds this morning. I have restarted the app along with auth...could you try again?
Flags: needinfo?(garndt)
Comment 19•9 years ago
|
||
Comment on attachment 8667447 [details] [review] PR 72 Nice work on this. Comments have been addressed in the PR.
Attachment #8667447 -
Flags: review?(garndt) → review+
Comment 20•9 years ago
|
||
https://github.com/taskcluster/docker-worker/commit/3099b91c3d8dd6fc42c82c8bf3b312959ad89bef
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Component: Docker-Worker → Workers
You need to log in
before you can comment on or make changes to this bug.
Description
•