[tracker] use worker-runner everywhere
Categories
(Taskcluster :: Operations and Service Requests, task)
Tracking
(Not tracked)
People
(Reporter: dustin, Unassigned)
References
(Depends on 1 open bug)
Details
We want to ensure that all workers everywhere are using worker-runner, whether in static mode or via a worker-manager provider. This will unlock a bunch of improvements we want to make to worker management, including making worker-manager aware of all of the running workers and allowing more sophisticated tracking of worker status.
This will include:
- on-prem hardware
- bitbar
- packet
- community workers
It does not necessarily include scriptworkers, since those have a different sort of lifecycle. It wouldn't hurt, though!
Reporter | ||
Comment 1•5 years ago
|
||
Andrew, can you help me flesh out the list of "kinds" of workers in firefox-ci that might need attention? What flavors of hardware exist these days?
Also, do you see any complications in setting up the bitbar workers to operate with worker-runner and a "static" worker-manager provider?
Comment 2•5 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #1)
Andrew, can you help me flesh out the list of "kinds" of workers in firefox-ci that might need attention? What flavors of hardware exist these days?
Here are the types of hardware workers that Relops runs that I know of:
- Android HW @ Bitbar (pixel 2 and moto g5)
- Windows/ARM laptops @ Bitbar (cc mcornmesser)
- HP Moonshot devices (linux and windows, cc dhouse and mcornmesser)
- Mac Minis (cc dhouse)
Also, do you see any complications in setting up the bitbar workers to operate with worker-runner and a "static" worker-manager provider?
I don't know enough about worker-runner or static worker-manager providers yet, but from looking at the features worker-runner provides:
- the android-hw at Bitbar receives credentials and configuration from devicepool currently, I'll have to figure out what devicepool will continue to provide and what worker-runner will take over.
- android-hw at Bitbar isn't dynamically allocated (we have pools for types of jobs that are fixed in size) and never get pre-empted, so interaction with the worker-manager would be minimal.
- android-hw workers are short-lived. they exit after one run and are returned to the Bitbar pool after cleaning. this could confuse the worker metrics.
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 3•5 years ago
|
||
Got it, thanks. Please do have a look at worker-runner and at the static worker-manager provisioner. I think it could work for all of those cases, and simplify things a bit. I'm happy to chat about it if you have questions.
(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #3)
Got it, thanks. Please do have a look at worker-runner and at the static worker-manager provisioner. I think it could work for all of those cases, and simplify things a bit. I'm happy to chat about it if you have questions.
Is there a planning document I can read? Also, could we basically treat worker-runner as the generic-worker, and know that it spawns the task-runner (actual "generic-worker", we'll then need to monitor both processes)?
Reporter | ||
Comment 5•5 years ago
|
||
The README is the documentation for worker-runner. The worker-manager docs are at https://docs.taskcluster.net/docs/reference/core/worker-manager/static.
Yes, you can basically just run start-worker
instead of generic-worker. The config and calling syntax is different, but other than that it should function identically. Yes, monitoring might need some updates.
Reporter | ||
Comment 6•5 years ago
|
||
^^ to add some detail to that:
https://github.com/taskcluster/taskcluster-worker-runner#generic-worker documents how to start generic-worker, including a link to a suggestion for how to design the service. https://github.com/taskcluster/taskcluster-worker-runner/blob/master/docs/deployment.md talks about how to deploy worker-runner itself.
You can see some examples for both linux and windows at
https://github.com/mozilla/community-tc-config/blob/822e6f85ce954ca8fc3dd285dbcd116424f41583/imagesets/generic-worker-ubuntu-18-04/bootstrap.sh#L51-L94
https://github.com/mozilla/community-tc-config/blob/822e6f85ce954ca8fc3dd285dbcd116424f41583/imagesets/generic-worker-win2012r2/bootstrap.ps1#L75-L141
I expect Mac startup to be similar to linux, but I don't have the details. A PR for the taskcluster-worker-runner repo would be helpful if there are launchd plists or anything like that to share.
Comment 7•5 years ago
|
||
PR to make bitbar use taskcluster-worker-runner: https://github.com/bclary/mozilla-bitbar-docker/pull/40
Reporter | ||
Comment 9•5 years ago
|
||
Overall, this project is blocked on better lifecycle management.
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 10•5 years ago
|
||
Description
•