If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Create a mechanism for loaning Mac OS X TC workers

RESOLVED FIXED

Status

Release Engineering
General Automation
RESOLVED FIXED
6 months ago
6 months ago

People

(Reporter: dustin, Assigned: dustin)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Assignee)

Description

6 months ago
Users should be able to "borrow" TC workers for work debugging test failures, etc.

Since these are exclusively releng hardware, I'm hoping we can just modify the existing releng loaner process to suit taskcluster.
(Assignee)

Comment 1

6 months ago
The loaner process is at
  https://wiki.mozilla.org/ReleaseEngineering/How_To/Loan_a_Slave

I think we'll only need to modify the cleaning process.  Maybe
  https://wiki.mozilla.org/index.php?title=ReleaseEngineering/How_To/Loan_a_Slave&diff=prev&oldid=1167558
is enough?

Kim, are you the right person to ask about buildduty stuff?  Or at least, know who is the right person? :D
Flags: needinfo?(kmoir)

Comment 2

6 months ago
So these tc workers are not just ones that are going to have jobs scheduled on them via bbb? Is that correct?  If this is the case, then they will be able to be disabled via slavealloc.  If this is the case the documentation changes look good.

If they are not scheduled through bbb, but are pure tc workers how are they disabled from being used by production jobs?
Flags: needinfo?(kmoir)
(Assignee)

Comment 3

6 months ago
Good point.  They would be disabled, I think, simply by not having the worker installed.  We don't have a good way to prevent a host from running tasks if the borrower is keen on doing so.  That said, these are only testers, so stealing a few test jobs is not the end of the world.

I should include some updates to how to select a worker in the first place and wait until it is idle.  I think that, rather than waiting until it's idle, I'll just suggest rebooting it -- that may terminate a job with claim-expired, but it will be automatically re-run.

Thanks!
(Assignee)

Comment 4

6 months ago
https://wiki.mozilla.org/index.php?title=ReleaseEngineering/How_To/Loan_a_Slave&diff=1167656&oldid=1167558
** For TaskCluster, interrupted jobs are automatically retried, so pick an arbitrary host, kill the generic-worker or taskcluster-worker process, and proceed.

Hopefully that's sufficient :)

As releng develops more tooling around taskcluster on hardware (slave-health, for example), maybe we can do a bit better job of disabling workers.

Kim, does that sound reasonable?
Flags: needinfo?(kmoir)

Comment 5

6 months ago
I think you will need some more text around 
"pick an arbitrary host" <- pick a machine from the existing the mac test pool
"kill the generic-worker or taskcluster-worker process" <- so ssh as root to the machine, and kill the  generic-worker or taskcluster-worker process

Are there new steps that needed after the developer is done with the loaner to return it to service?

just to make it clear for the buildduty folks
Flags: needinfo?(kmoir)
(Assignee)

Comment 6

6 months ago
Thanks!  I think I addressed all of that.
Status: NEW → RESOLVED
Last Resolved: 6 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.