Open Bug 1952130 Opened 5 months ago Updated 3 months ago

/Users/task_* on mac workers doesn't start nearly empty

Categories

(Taskcluster :: Workers, defect)

defect

Tracking

(Not tracked)

People

(Reporter: glandium, Unassigned)

References

(Blocks 2 open bugs)

Details

The macmini-r8-* workers do not start tasks from an empty slate. See for example this log of a task that just does find $HOME: https://firefox-ci-tc.services.mozilla.com/tasks/AJO9cm8_QpyjDheFiwXYAg/runs/0/logs/public/logs/live.log

Note that, there are no mounts, and no caches on the task. Yet, while one would expect the user directory to be fresh, we start with state from previous builds, apparently, with .mozbuild/scrdirs, and many other things like .local/bin, .local/lib/python*, or .cargo (the latter actually unveiled a separate bug in either the Firefox build system or the scripts surrounding some mac tasks).

This doesn't seem like a desirable state, especially when different workers have different things around (for instance, only one of the 4 workers has a .cargo directory that causes problems).

Pete, Ryan, Andrew, any ideas what might be going on here? This is using multiuser so I'd have expected the task dir to start up clean.

Flags: needinfo?(rcurran)
Flags: needinfo?(pmoore)
Flags: needinfo?(aerickson)

Sorry I am not sure. It doesn't appear there were any significant changes to that pool recently. See here and here

:pmoore do you think upgrading gw in this case would make a difference?

Flags: needinfo?(rcurran)

This does seem like a bug. This workerType is using multi.

Flags: needinfo?(aerickson)

The numerical part of the task username encodes a unix timestamp of when the task user was created. The task user from comment 0 (task_169711821482996) would therefore have been created on Thu Oct 12 2023 15:43:34 GMT+0200 (Central European Summer Time). This could happen if numberOfTasksToRun is set to 1 in the config.

          numberOfTasksToRun                If zero, run tasks indefinitely. Otherwise, after
                                            this many tasks, exit. [default: 0]

If it was set to 1, it would exit after running a single task (without rebooting) and if the launch daemon is configured to auto-start worker runner if it stops, I could imagine it would just keep starting it up again. Ideally numberOfTasksToRun should be set to 0 (or not configured, since 0 is the default) in order that generic worker will take care of creating new tasks users and rebooting, indefinitely. Setting to a different value is intended for the case you want to run 1 or more tasks, then perform some housekeeping (e.g. run puppet, install updates, etc) or for when you want to throw away a worker after running one task (e.g. for security reasons) or e.g. when you know you expect a certain number of tasks to be created for your worker pool, and you are happy to throw away your worker after that many tasks have been run etc.

Ryan, can you check if numberOfTasksToRun is set to a non-zero place on any other mac pools too? Thanks!

Flags: needinfo?(pmoore) → needinfo?(rcurran)

Hello :pete

I checked worker-runner-config.yaml on all the mac pools--(l1/l3 builders and testers), and that value is set to 1

Flags: needinfo?(rcurran)
Blocks: 1962505
No longer depends on: 1962505
You need to log in before you can comment on or make changes to this bug.