1502371 - Support building (public) worker images in tc-builder

Assignee

Description

•

7 years ago

When we "ship" a version of TC, we should ship worker images for the various supported clouds, too. That's a tall order right now, but we can build some infrastructure to support it, and embed that in tc-builder. Bug 1502183: > There is stuff to build new images in the taskcluster-mozilla-terraform repo under the workers dir.

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Updated

•

7 years ago

Blocks: 1502183

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 1

•

7 years ago

Rough plan: * set up taskcluster-builder environments in various clouds, using tc-infrastructure * dedicated gcp project + service account * production AWS account + permission-limited IAM user * packet.net? spoon.net? digitalocean? etc. * put secrets for all of those into passwordstore * run packer (via docker) as part of tc-builder * for gcp, use the image exporter post-processor to write the image to a gcs bucket * include the outputs (dumped via the manifest post-processor) in the taskcluster.tf.json output * set up taskcluster-terraform to re-import the images from the gcs bucket (somehow) * we need to think about how to connect these with the runtime config of worker manager I also want to be careful to get packer to tag everything it creates in these accounts, and have a "cleanup" mode in tc-builder that will seek and destroy old, abandoned resources. Otherwise it's too easy to ctrl-c and leave an instance running for months.

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 2

•

7 years ago

Worker / provisioning people, thoughts on the above? I noticed that the docker-worker build process is baking in a lot of secrets. We probably don't want to do that with a public image? Is it reasonable to think we can get to a point where all secrets a worker needs to operate are supplied to at at startup, and thus not baked in? I will likely hack something together (that just uses brian's hacked generic-worker in GCP) so we can get workers in dev/staging environments. Maybe the rest should be in an RFC? Any initial guidance is appreciated :)

Flags: needinfo?(wcosta)

Flags: needinfo?(pmoore)

Flags: needinfo?(jhford)

John Ford [:jhford] CET/CEST Berlin Time

Comment 3

•

7 years ago

In general, I think image building should be distinct and separate from provisioning. It sounds like that's the case here. As you mentioned, garbage collecting is important here. If I understand the plan correctly, this would be done in a dedicated account? If so, tagging things with a "shutoff after" timestamp, and then daily going through the account and terminate anything with a shutoff after timestamp in the past or untagged and older than a day. Is the idea here to keep building images as a manual process?

Flags: needinfo?(jhford)

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 4

•

7 years ago

Regarding provisioning, yes they are separate. But, I want to make sure we produce something that is easy to set up with provisioning and where it's easy to "upgrade" a deployment and get the newest worker images, just like such an upgrade would automatically use the newest service images. Maybe we could have a concept of some "built-in" images that can be configured in the worker configuration rules, as an alternative to referencing explicit images such as images custom-built for a particular deployment. Or maybe the upgrade process calls worker-manager APIs directly to update some well-known rules with the new information? I'd like to avoid the case where on every upgrade of a deployment someone needs to go copy/paste a whole bunch of identifiers into the correct worker configuration rules. I realize neither side of this is implemented yet so it's all a little vague, but if you have ideas on what the best approach would be I'd love to hear them. And yes, at the moment building everything is done locally (./taskcluster-builder [options]) but the door is open to doing so in some kind of automation -- perhaps on a tag of the taskcluster-builder repo (or some other repo containing a build spec). I think we'll have a clearer idea later of what the best choice will be.

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 5

•

7 years ago

https://github.com/djmitche/console-taskgraph/pull/8

Wander Lairson Costa

Comment 6

•

7 years ago

(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #2) > Worker / provisioning people, thoughts on the above? > > I noticed that the docker-worker build process is baking in a lot of > secrets. We probably don't want to do that with a public image? Is it > reasonable to think we can get to a point where all secrets a worker needs > to operate are supplied to at at startup, and thus not baked in? > > I will likely hack something together (that just uses brian's hacked > generic-worker in GCP) so we can get workers in dev/staging environments. > Maybe the rest should be in an RFC? Any initial guidance is appreciated :) Deployment scripts of docker-worker are very gecko/AWS specific, but itself docker-worker not so much, what do you have in mind, exactly?

Flags: needinfo?(wcosta)

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 7

•

7 years ago

We'll need to tease that apart. I see some gecko-specific stuff in the generic-worker repo, too. I'd like the team to think about what the "generic" image(s) should look like, and how we can support building "custom" images. And how we can make deployment of those images more user-friendly (the question about worker configs I was asking John). These are probably more important questions for generic-worker than docker-worker. At the moment, we're still not planning to include docker-worker in new deployments.

Wander Lairson Costa

Comment 9

•

7 years ago

(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #7) > We'll need to tease that apart. I see some gecko-specific stuff in the > generic-worker repo, too. I'd like the team to think about what the > "generic" image(s) should look like, and how we can support building > "custom" images. And how we can make deployment of those images more > user-friendly (the question about worker configs I was asking John). > > These are probably more important questions for generic-worker than > docker-worker. At the moment, we're still not planning to include > docker-worker in new deployments. I don't think we should mess up with worker deployment. I believe we should provide the executable and detailed instructions on how to run it, creating cloud images should be left to the user (we can provide samples)

Pete Moore [:pmoore][:pete]

Comment 10

•

7 years ago

For the most part I expect users to set up their own builders with configuration which is specific to them. This is especially true for Windows/Mac where the worker host environment is also the host environment of the tasks, so e.g. toolchains specific to the user's tasks are typically installed on the host. If we provide images that don't have the necessary tools installed, they won't be usable. I think the correct approach here is along the lines of https://github.com/taskcluster/taskcluster-rfcs/issues/122 - this still needs some fleshing out, but I believe that the best way of managing worker type host environments is to make it possible to create a worker type image from inside a taskcluster task, and leave it up to users to run appropriate tasks (and we can provide some sample formulas for them to use, which they can adapt). The taskcluster platform is at the moment very nicely decoupled from the concerns of setting up workers (anybody can set up workers in any way they chose, and plug them into the taskcluster platform easily and securely) so I'm keen to avoid that we introduce dependencies here or assumptions in our platform that workers have been set up in any particular way. I also see the advantage of providing some usable "bare bones" worker types for the purposes of trying out the platform, though. Perhaps the simple solution is we just create a few example AMIs (and whatever the GCP equivalent is) and publish them publicly, with a link to a doc about how they were created. But I envision that users will (quite rightly) want to be in complete control of setting up their own workers, and that not being bound to the platform.

Flags: needinfo?(pmoore)

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 11

•

7 years ago

OK, sounds good. I'll close this for now, then.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → INVALID

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 12

•

7 years ago

Well, I will build these "bare bones" AMIs (which, running docker worker, are probably 100% sufficient).

Status: RESOLVED → REOPENED

Resolution: INVALID → ---

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 13

•

7 years ago

From today's meeting, it seems a good plan is to ship docker-engine/docker-worker images as part of the build process, to get users started (and perhaps be enough for most users). We will have additional support for users building custom workers, but of course that needs lots of flexibility.

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 14

•

7 years ago

https://groups.google.com/forum/#!topic/packer-tool/Z1_gTkntNMk

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 15

•

7 years ago

Per discussion at this week's all-hands, workers are not part of the Taskcluster Platform product, so need not be included in tc-builder.

Status: REOPENED → RESOLVED

Closed: 7 years ago → 7 years ago

Resolution: --- → WONTFIX

Nobody; OK to take it and work on it

Updated

•

7 years ago

Component: Redeployability → Services

Bugzilla

Support building (public) worker images in tc-builder

Categories

(Taskcluster :: Services, enhancement)

Tracking

(Not tracked)

People

(Reporter: dustin, Assigned: dustin)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Updated