Closed Bug 1382542 Opened 7 years ago Closed 5 years ago

Differences between docker-worker and generic-worker

Categories

(Firefox Build System :: Task Configuration, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: glandium, Unassigned)

References

Details

In bug 1381772, I tried to (and, in the end, succeeded) use the same script to build sccache on Linux and Windows. The process of building sccache on both is largely similar, but a few subtle (and seemingly unnecessary) differences between docker-workers and generic-workers made this harder than it should have been.

- The mercurial checkout is in $PWD/workspace/build/src on docker-workers, and $PWD/build/src on generic-workers
- The location where to put artifacts is $PWD/workspace/artifacts on docker-workers and $PWD/public/build on generic-workers (actually, maybe $PWD/public works, but it seems the convention is public/build, which makes artifact urls different between builds from docker-worker and generic-worker builds)
- Not all the same TASKCLUSTER_* environment variables are set (for instance, I was looking at the variable that gives the worker type in docker-worker (the "gecko-n-b-os" string), that one doesn't exist on generic-worker.
- git is not installed on generic-workers.
- generic-worker apparently has a "mounts" mechanism, which is kind of like tooltool, that doesn't exist on docker-worker.

There are probably others.

Some of those could be addressed by exposing more environment variable (like, setting $WORKSPACE everywhere, since most (all?) job scripts seem to be setting it themselves) ; it could also be argued that scripts should be started from the source directory.
I now realize some of those are entirely implementation details of taskcluster/taskgraph/transforms/job/toolchain.py
The following are in-tree config changes:

- The mercurial checkout is in $PWD/workspace/build/src on docker-workers, and $PWD/build/src on generic-workers
- The location where to put artifacts is $PWD/workspace/artifacts on docker-workers and $PWD/public/build on generic-workers (actually, maybe $PWD/public works, but it seems the convention is public/build, which makes artifact urls different between builds from docker-worker and generic-worker builds)
- git is not installed on generic-workers.

Taskcluster-worker, to which we want to transition, supports mounts.  So I don't see it being implemented in docker-worker.
Component: General → Task Configuration
So apparently, public/build is also where things are put for mozharness builds... looks like the toolchain docker-worker job just ended up using public instead. Interestingly, both transform.job.common.docker_worker_add_public_artifacts and transform.job.toolchain.docker_worker_toolchain come from the same bug, but the latter doesn't use the former...
Depends on: 1382860
(In reply to Mike Hommey [:glandium] from comment #0)
> - Not all the same TASKCLUSTER_* environment variables are set (for
> instance, I was looking at the variable that gives the worker type in
> docker-worker (the "gecko-n-b-os" string), that one doesn't exist on
> generic-worker.

The environment variables I see set in docker-worker that aren't in generic-worker are:

https://github.com/taskcluster/docker-worker/blob/fd49c46163799d67ba65c6c10ed85af169c54a89/src/lib/task.js#L324-L327

> env.TASKCLUSTER_WORKER_TYPE = this.runtime.workerType;
> env.TASKCLUSTER_INSTANCE_TYPE = this.runtime.workerNodeType;
> env.TASKCLUSTER_WORKER_GROUP = this.runtime.workerGroup;
> env.TASKCLUSTER_PUBLIC_IP = this.runtime.publicIp;

These seem somewhat arbitrary, are they really useful? Note the instance type is even AWS specific (so doesn't apply to workers outside of AWS) and worker type is set in task definition, so task should know its worker type, and shouldn't need to apply different logic based on its worker type (presumably that should be handled by task generation process). Worker group is an arbitrary name, which happens to match aws availability zone for AWS workers (again, anything which relies on its setting is likely to break), and the public IP can comfortably be inferred querying the instance, if really required. This is only used for livelog serving, and as such may not be only public IP address on the machine, or the machine may not have any public IP addresses (in the case of workers sitting on desks etc). So I feel like there is little value in implementing any of these in generic-worker, and that they could cause more problems than they solve. :/
(and the other issues are task generation issues, rather than issues with generic-worker itself, other than the last bullet point, which is that it has a cool feature that isn't on all other workers, so I guess that isn't something to fix in generic-worker).

Should we close this bug as INVALID, or keep it open, so the in-tree task generation process for the two worker types can be aligned?
Flags: needinfo?(mh+mozilla)
Product: TaskCluster → Firefox Build System
Depends on: 1454037

Given that we are going to deprecate generic-worker (see Bug 1499055), this can probably be closed.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → INCOMPLETE
Flags: needinfo?(mh+mozilla)
You need to log in before you can comment on or make changes to this bug.