Closed Bug 1552465 Opened 5 years ago Closed 5 years ago

generic-worker: only prepare new task environment when task has been claimed

Categories

(Taskcluster :: Workers, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: pmoore, Assigned: pmoore)

References

Details

Attachments

(1 file)

In bug 1533694 we adapted the task environment setup logic so that on Windows there would be up to two task environments configured at any point - one for the currently running (or about-to-run) task, and the second for the subsequent task that should run after the next reboot.

This involving shovelling some platform-independent code around as well as adapting the Windows platform-specific code.

In the process, this introduced bug 1551164 which was discovered by DeepSpeech and Servo teams independently on macOS. Note, it also affected Linux, but by chance, no tasks were affected by the change. The bug was that on macOS and Linux if config setting numberOfTasksToRun was not set to 1, task users would get reused between tasks. This did not affect gecko since the macOS and Linux generic-worker workers have numberOfTasksToRun set to 1.

A fix for bug 1551164 was then written, together with an integration test, and finally rolled out in generic-worker 14.1.1 and the Servo macOS workers were upgraded.

Unfortunately, the story isn't over. On inspection of the macOS worker logs, I noticed that we are now preparing the task environment every time we attempt to fetch a task from the Queue, rather than every time we actually receive a task. In other words, we poll the Queue, and around every 20 seconds we either get a task or we don't. In release 14.1.1, even if we don't get a task, we create set up a new task environment (i.e. prepare a new task directory on macOS).

This is because the main loop inside main.go that claims and resolves tasks looks like this:

for {
   PrepareTaskEnvironment()
   task := ClaimWork()
   if task != nil {
        // process task
   }
}

Instead it should look more like this:

PrepareTaskEnvironment()
for {
    task := ClaimWork()
    if task != nil {
        // process task
        PrepareTaskEnvironment()
    }
}

Note, the details are a little more complicated than above due to some platform/engine combinations requiring reboots between tasks, not needing to prepare a task environment if there are no more tasks to run, etc - but the above code extract demonstrates the general issue.

Assignee: nobody → pmoore
Status: NEW → ASSIGNED
Assignee: pmoore → nobody
Status: ASSIGNED → NEW

Released in generic-worker 14.1.2.

Assignee: nobody → pmoore
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: