Closed Bug 1575082 Opened 5 years ago Closed 3 years ago

gecko-3-b-linux tasks queued when 1/2 ready to work?

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: dhouse, Unassigned)

Details

Attachments

(1 file)

It appears that gecko-3-b-linux/gecko-1-b-linux workers are available to take tasks, but it takes about 15 minutes for them to take a task despite being "warmed-up". What is happening?

It looks like we over-provision workers (I'm assuming to have workers warmed-up to take new tasks), however those running workers are slow to take tasks and so tasks often enter the pendingQueue and wait 15minutes before being picked up.

Today the queue for gecko-3-b-linux reached 242 when there were 196 workers that had been running for more than an hour (since the last queue peak). Only 95 of those workers were active running tasks when the queues increased to 242.

So, what is slow? Are these ready workers taking tasks and we somehow do not see this for 15m (running count does not include all that are running?), or are workers going away and that makes our active/running stats deceiving, or what?

In #ci, Aryx pointed out that "gecko-3-b-win2012 gets max ~2/3 active, gecko-1-b-win2012 in one case almost 90%, else ~50%". So this is common across the builders (win2012 as the exception, and has something different?)

Attaching a screenshot of the numbers for gecko-3-b-linux over the last 12h. Shows a pendingTasks queue and (over)provisioning in response, and that there is a 15+ minute delay for the existing workers to start running tasks (solid green line increases slowly instead of more rapid task-start and delay in waiting for requested new machines)

I verified from the aws-provisioner(https://tools.taskcluster.net/aws-provisioner/gecko-3-b-linux/) and queue(https://tools.taskcluster.net/provisioners/aws-provisioner-v1/worker-types?search=gecko-3-b-) views on tc-tools that these numbers are the current state:
gecko-3-b-linux 513 0
gecko-3-b-linux Running capacity 501
(matches the grafana stats we're seeing)

backlog cleanup

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: