Closed Bug 1546686 Opened 6 years ago Closed 5 years ago

Configure docker-worker to work with worker-manager in GCP

Categories

(Taskcluster :: Workers, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

References

Details

No description provided.
Blocks: 1543232

https://github.com/taskcluster/docker-worker/pull/464

The master branch of tc-worker-runner now has all the pieces needed to run docker-worker under aws-provisioner. Next step will be to try that out for-really using ami-test or some such workerType.

The only hitch in the plan here is that docker-worker expects a good bit of information in its config which is not always available, such as region or instance type or public IP. In general that data seems to be used for things like logging identifiers and other debugging-related stuff. I've abstracted that as "provisionerMetadata" here, and encouraged treating it as a soft requirement. If there are "harder" requirements that are generally applicable, we could certainly add those to tc-worker-runner.

ni -> review of what's in the tc-worker-runner repo right now.

https://github.com/taskcluster/taskcluster-worker-runner

Flags: needinfo?(pmoore)
Flags: needinfo?(bstack)

Content of my TODO.txt right now, for reference:

* basic CI                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
* linting
* generate README from --help, check in CI
* add docs to tc repo
* factor out common code in providerconfig.go, workerimplconfig.go

* support caching configuration over restarts
* support setting permissions for files
* support unpacking files in secrets (?? or just make workers take it as
  config)
* support starting workers as another user
* support preventing access to metadata via firewall
* stay running, reboot or halt when worker exits (based partially on exit code)
* manage autologin
* support polling for expired deployments (send a signal to the worker?)
* support termination notification

The interface looks great. Initial thought:

"This looks like it does more than run workers, it manages workers... Let's call it worker-manager instead."

Then I realized what I had done.

Anything I can do to help out with this?

Flags: needinfo?(bstack)

Next target per bstack, once I get aws-provisioner working: docker-worker + gcp provider

Blocks: 1558525

This bug is now more narrowly targeted at docker-worker / gcp-provider.

Flags: needinfo?(pmoore)
Component: Services → Workers

https://github.com/taskcluster/docker-worker/pull/465 for the taskcluster-worker-runner compatibility in docker-worker.

So this seems to be running workers, but they shut down with


Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: Uncaught Exception! Attempting to report to Sentry and crash.
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: Error: spawn shutdown ENOENT
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:     at Process.ChildProcess._handle.onexit (internal/child_process.js:190:19)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:     at onErrorNT (internal/child_process.js:362:16)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:     at _combinedTickCallback (internal/process/next_tick.js:139:11)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:     at process._tickDomainCallback (internal/process/next_tick.js:219:9)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: reportError - level: fatal, tags: {}
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:  { Error: spawn shutdown ENOENT
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:     at Process.ChildProcess._handle.onexit (internal/child_process.js:190:19)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:     at onErrorNT (internal/child_process.js:362:16)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:     at _combinedTickCallback (internal/process/next_tick.js:139:11)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:     at process._tickDomainCallback (internal/process/next_tick.js:219:9)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:   errno: 'ENOENT',
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:   code: 'ENOENT',
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:   syscall: 'spawn shutdown',
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:   path: 'shutdown',
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:   spawnargs: [ '-h', 'now' ] }
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: Succesfully reported error to Sentry.
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: 2019/07/03 22:24:31 exit status 1 

I'm not sure why this patch would change that behavior? My only guess is that maybe it's not passing $PATH along to the worker..

Brian's been working on getting this set up in the taskcluster-dev deployment. Once that's up to the point of generating lots of errors, I can hack on those errors.

..and those have landed now, too.

So the current state is that we can successfully build an image that will run docker-worker. I think we're about ready to run docker-worker tasks in staging, then!

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.