Closed Bug 1519892 Opened 7 years ago Closed 6 years ago

Set up some static workers for tc-staging

Categories

(Taskcluster :: Operations and Service Requests, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

References

Details

Until we have provisioning in staging, we'll need some workers. I'll find a hacky way (possibly just static on-demand instances) to run a few of those against staging so we can actually try running a Firefox CI push there.

John suggests running an AWS-provisioner / EC2-manager instance in Heroku to do this, running against https://taskcluster.net but providing credentials based on a specific development environment.

I'm going to meet with Pete tomorrow to see if we can't set up a Windows generic-worker AMI in the staging environment with some built-in secrets that can run a task or two. Then we can at least experiment with running specific tasks (such as builds) in the staging environment.

I got a worker set up in staging, test-provisioner/dustin-test, running the equivalent of win2012r2. Here's roughly what I did:

  • In https://github.com/taskcluster/generic-worker/tree/master/worker_types, copied win2012r2 to dustin-test
  • Set up AWS staging credentials
  • Ran ./worker-type.sh dustin-test update
    This failed at the end because it tries to talk to aws-provisioner and that doesn't work.
    However, it created an image in each of three regions, and left three stopped instances.
  • Deleted two of the stopped instances as unnecessary
  • SSH'd to the third of the stopped instances using Adminstrator and the generated password
  • net stop "Generic Worker" to stop the worker (since it will likely reboot)
  • Edited /c/generic-worker/generic-worker.conf, including pointing to two keys (openpgp, ed25519) that I generated with the generic-worker.exe command. I used static credentials I created in the UI.
  • net start "Generic Worker" and create some jobs in task creator.

Per discussion in our meeting today, we're going to wait until mid-March and see how things are looking with the deprecation of docker-worker in favor of generic-worker.

Component: Operations → Operations and Service Requests

I'm going to work on setting up some docker-worker instances running in AWS against my dev environment. Bug 1469617 has already made the required changes to docker-worker to enable it to run in other deployments.

I tried modifying docker-worker to run without contacting the aws provisioner if not given securityToken in userData, and to instead use userdata.credentials. Then I started up an instance like that. Here's what I see in papertrail:


Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.214934] init: cloud-final main process (1450) terminated with status 1
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.393292] init: docker main process (966) terminated with status 1
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.393302] init: docker main process ended, respawning
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.395058] init: docker post-start process (970) terminated with status 1
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.409215] init: docker main process (1580) terminated with status 1
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.409224] init: docker main process ended, respawning
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.409344] init: docker post-start process (1581) terminated with status 1
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.423178] init: docker main process (1612) terminated with status 1
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.423187] init: docker main process ended, respawning
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.423304] init: docker post-start process (1613) terminated with status 1
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.437096] init: docker main process (1644) terminated with status 1
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.437106] init: docker main process ended, respawning
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.437221] init: docker post-start process (1645) terminated with status 1
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.450997] init: docker main process (1676) terminated with status 1
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.451007] init: docker main process ended, respawning
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.451150] init: docker post-start process (1677) terminated with status 1
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.464713] init: docker main process (1708) terminated with status 1
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.464722] init: docker main process ended, respawning
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.464840] init: docker post-start process (1709) terminated with status 1
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.478540] init: docker main process (1740) terminated with status 1
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.478549] init: docker main process ended, respawning
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.478664] init: docker post-start process (1741) terminated with status 1
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.492289] init: docker main process (1772) terminated with status 1
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.492298] init: docker main process ended, respawning
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.492416] init: docker post-start process (1773) terminated with status 1
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.505996] init: docker main process (1804) terminated with status 1
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.506006] init: docker main process ended, respawning
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.506117] init: docker post-start process (1805) terminated with status 1
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.520303] init: docker main process (1836) terminated with status 1
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.520312] init: docker main process ended, respawning
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.520441] init: docker post-start process (1837) terminated with status 1
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.534515] init: docker main process (1868) terminated with status 1
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.534525] init: docker respawning too fast, stopped
Mar 22 18:47:53 docker-worker.aws-provisioner.us-east-1d.ami-0a20a3b7825644eb8.m5-large.i-01123ed3cb972b633 kernel: [   35.534679] init: docker post-start process (1869) terminated with status 1 

I'm not sure what that indicates -- did docker-worker not start up?

https://github.com/taskcluster/docker-worker/pull/440

Note, too, that I created the AMIs in the production AWS account to avoid writing secrets to the staging account.

AMI's for the record:

==> Builds finished. The artifacts of successful builds are:
--> hvm-builder: AMIs were created:
eu-central-1: ami-0540c99ddedef4b82
us-east-1: ami-0a20a3b7825644eb8
us-west-1: ami-04302b58a806171cb
us-west-2: ami-045e9868386d40ba7

--> hvm-builder:
--> hvm-builder-trusted: AMIs were created:
eu-central-1: ami-00f4e53c9f689ff9f
us-east-1: ami-0d5b98997138074d5
us-west-1: ami-0302d140dbd0fe07a
us-west-2: ami-0811807dd9d5bc6a8

Oh, it looks like the launch wizard wiped out the extra storage.

Trying again with userdata

{
  "availabilityZone": "us-east-1d",
  "capacity": 1,
  "data": {
    "capacityManagement": {
      "diskspaceThreshold": 30000000000
    }
  },
  "instanceType": "c5d.4xlarge",
  "lastModified": "2019-03-25T19:01:17.078Z",
  "launchSpecGenerated": "2019-03-29T19:32:41.798Z",
  "price": 0.3255,
  "provisionerId": "test-provisioner",
  "region": "us-east-1",
  "spotBid": 8,
  "taskclusterRootUrl": "https://taskcluster-staging.net",
  "workerType": "test-wt",
  "credentials": {
    "clientId": "bug1519892",
    "accessToken": "<mumble>"
  }
}

I had to create a blank secret named worker-type:<provisionerId>/<workerType>.

I was able to run https://taskcluster-staging.net/tasks/bJxZPgqjQ9et53V-0ubn4w! It failed because it needs a stateless-dns secret. But, close enough -- I'm convinced that docker-worker could indeed operate in a non-taskcluster.net environment.

Blocks: 1543232

Point proven, now to try it out..

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.