Closed Bug 1591289 Opened 5 years ago Closed 5 years ago

AWS Provider is not starting gecko-1/decision workers.

Categories

(Taskcluster :: Services, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: tomprince, Assigned: owlish)

References

Details

There were 4 pending gecko-1/decision tasks (they have now passed their deadline), but no workers started to run them.

Looking at worker-manager logs, it appears as if the estimator thought that there were five instances running, so didn't start any new ones.

Assignee: nobody → bugzeeeeee

I'm seeing a similar problem for pmoore-test/gwci-linux but will move these tasks to aws-provisioner-v1/gwci-linux for now, so I'm not blocked. But just wanted to drop a note in the bug that more than one worker pool might be affected.

Dustin has advised that there is a worker manager listWorkersForWorkerPool API call that should show which workers worker-manager considers to be alive.

I wonder if this is related to worker pools that started life under gcp provider but then moved to aws provider?

I looked at listWorkersForWorkerPool for pmoore-test/gwci-linux and got back 31 aws workers and 212 google workers. The maxCapacity I believe is 10, so I guess this includes workers that worker-manager no longer considers to be active.

Is there a way to filter just the workers that worker manager considers to be active?

Brian pointed out that there is a state property in the response that says if workers are running or stopped etc.
It reports 74 running workers in gcp and 0 running workers in aws.

Since 74 > 10 I can see that worker manager wouldn't want to spawn new instances. I'm not sure why it is 74 as the max capacity was always 10, but maybe these are ghost instances that appear to be alive, but aren't really.

This was opened quite a while ago, and we tackled quite a few bugs in AWS provider since then. Is this still an issue?

Flags: needinfo?(mozilla)

I'm not aware of any current issues here.

Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(mozilla)
Resolution: --- → INCOMPLETE

Why the INCOMPLETE status?

Flags: needinfo?(mozilla)

It is not clear what the underlying problem was, or what resolved it. Per https://wiki.mozilla.org/BMO/UserGuide/BugStatuses#Resolutions it seemed like the most reasonable resoultion. (I don't have particularly strong feelings on what resolution to use)

Flags: needinfo?(mozilla)

Ah, thank you for the link - I tried to search something like that but couldn't find!

You need to log in before you can comment on or make changes to this bug.