Open Bug 1602946 Opened 5 years ago Updated 4 years ago

[tracker] use worker-runner everywhere

Categories

(Taskcluster :: Operations and Service Requests, task)

task
Not set
normal

Tracking

(Not tracked)

People

(Reporter: dustin, Unassigned)

References

(Depends on 1 open bug)

Details

We want to ensure that all workers everywhere are using worker-runner, whether in static mode or via a worker-manager provider. This will unlock a bunch of improvements we want to make to worker management, including making worker-manager aware of all of the running workers and allowing more sophisticated tracking of worker status.

This will include:

  • on-prem hardware
  • bitbar
  • packet
  • community workers

It does not necessarily include scriptworkers, since those have a different sort of lifecycle. It wouldn't hurt, though!

Andrew, can you help me flesh out the list of "kinds" of workers in firefox-ci that might need attention? What flavors of hardware exist these days?

Also, do you see any complications in setting up the bitbar workers to operate with worker-runner and a "static" worker-manager provider?

Flags: needinfo?(aerickson)

(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #1)

Andrew, can you help me flesh out the list of "kinds" of workers in firefox-ci that might need attention? What flavors of hardware exist these days?

Here are the types of hardware workers that Relops runs that I know of:

  • Android HW @ Bitbar (pixel 2 and moto g5)
  • Windows/ARM laptops @ Bitbar (cc mcornmesser)
  • HP Moonshot devices (linux and windows, cc dhouse and mcornmesser)
  • Mac Minis (cc dhouse)

Also, do you see any complications in setting up the bitbar workers to operate with worker-runner and a "static" worker-manager provider?

I don't know enough about worker-runner or static worker-manager providers yet, but from looking at the features worker-runner provides:

  • the android-hw at Bitbar receives credentials and configuration from devicepool currently, I'll have to figure out what devicepool will continue to provide and what worker-runner will take over.
  • android-hw at Bitbar isn't dynamically allocated (we have pools for types of jobs that are fixed in size) and never get pre-empted, so interaction with the worker-manager would be minimal.
  • android-hw workers are short-lived. they exit after one run and are returned to the Bitbar pool after cleaning. this could confuse the worker metrics.
Flags: needinfo?(aerickson)
Depends on: 1558532
No longer depends on: 1558534

Got it, thanks. Please do have a look at worker-runner and at the static worker-manager provisioner. I think it could work for all of those cases, and simplify things a bit. I'm happy to chat about it if you have questions.

(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #3)

Got it, thanks. Please do have a look at worker-runner and at the static worker-manager provisioner. I think it could work for all of those cases, and simplify things a bit. I'm happy to chat about it if you have questions.

Is there a planning document I can read? Also, could we basically treat worker-runner as the generic-worker, and know that it spawns the task-runner (actual "generic-worker", we'll then need to monitor both processes)?

The README is the documentation for worker-runner. The worker-manager docs are at https://docs.taskcluster.net/docs/reference/core/worker-manager/static.

Yes, you can basically just run start-worker instead of generic-worker. The config and calling syntax is different, but other than that it should function identically. Yes, monitoring might need some updates.

^^ to add some detail to that:

https://github.com/taskcluster/taskcluster-worker-runner#generic-worker documents how to start generic-worker, including a link to a suggestion for how to design the service. https://github.com/taskcluster/taskcluster-worker-runner/blob/master/docs/deployment.md talks about how to deploy worker-runner itself.

You can see some examples for both linux and windows at

https://github.com/mozilla/community-tc-config/blob/822e6f85ce954ca8fc3dd285dbcd116424f41583/imagesets/generic-worker-ubuntu-18-04/bootstrap.sh#L51-L94
https://github.com/mozilla/community-tc-config/blob/822e6f85ce954ca8fc3dd285dbcd116424f41583/imagesets/generic-worker-win2012r2/bootstrap.ps1#L75-L141

I expect Mac startup to be similar to linux, but I don't have the details. A PR for the taskcluster-worker-runner repo would be helpful if there are launchd plists or anything like that to share.

Depends on: 1607935

PR to make bitbar use taskcluster-worker-runner: https://github.com/bclary/mozilla-bitbar-docker/pull/40

Overall, this project is blocked on better lifecycle management.

Assignee: dustin → nobody
Blocks: 1459710
You need to log in before you can comment on or make changes to this bug.