Closed
Bug 1353508
Opened 7 years ago
Closed 7 years ago
Create a mechanism for loaning Mac OS X TC workers
Categories
(Release Engineering :: General, enhancement)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dustin, Assigned: dustin)
Details
Users should be able to "borrow" TC workers for work debugging test failures, etc. Since these are exclusively releng hardware, I'm hoping we can just modify the existing releng loaner process to suit taskcluster.
Assignee | ||
Comment 1•7 years ago
|
||
The loaner process is at https://wiki.mozilla.org/ReleaseEngineering/How_To/Loan_a_Slave I think we'll only need to modify the cleaning process. Maybe https://wiki.mozilla.org/index.php?title=ReleaseEngineering/How_To/Loan_a_Slave&diff=prev&oldid=1167558 is enough? Kim, are you the right person to ask about buildduty stuff? Or at least, know who is the right person? :D
Flags: needinfo?(kmoir)
Comment 2•7 years ago
|
||
So these tc workers are not just ones that are going to have jobs scheduled on them via bbb? Is that correct? If this is the case, then they will be able to be disabled via slavealloc. If this is the case the documentation changes look good. If they are not scheduled through bbb, but are pure tc workers how are they disabled from being used by production jobs?
Flags: needinfo?(kmoir)
Assignee | ||
Comment 3•7 years ago
|
||
Good point. They would be disabled, I think, simply by not having the worker installed. We don't have a good way to prevent a host from running tasks if the borrower is keen on doing so. That said, these are only testers, so stealing a few test jobs is not the end of the world. I should include some updates to how to select a worker in the first place and wait until it is idle. I think that, rather than waiting until it's idle, I'll just suggest rebooting it -- that may terminate a job with claim-expired, but it will be automatically re-run. Thanks!
Assignee | ||
Comment 4•7 years ago
|
||
https://wiki.mozilla.org/index.php?title=ReleaseEngineering/How_To/Loan_a_Slave&diff=1167656&oldid=1167558 ** For TaskCluster, interrupted jobs are automatically retried, so pick an arbitrary host, kill the generic-worker or taskcluster-worker process, and proceed. Hopefully that's sufficient :) As releng develops more tooling around taskcluster on hardware (slave-health, for example), maybe we can do a bit better job of disabling workers. Kim, does that sound reasonable?
Flags: needinfo?(kmoir)
Comment 5•7 years ago
|
||
I think you will need some more text around "pick an arbitrary host" <- pick a machine from the existing the mac test pool "kill the generic-worker or taskcluster-worker process" <- so ssh as root to the machine, and kill the generic-worker or taskcluster-worker process Are there new steps that needed after the developer is done with the loaner to return it to service? just to make it clear for the buildduty folks
Flags: needinfo?(kmoir)
Assignee | ||
Comment 6•7 years ago
|
||
Thanks! I think I addressed all of that.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•