AWS Provider is not starting gecko-1/decision workers.
Categories
(Taskcluster :: Services, defect)
Tracking
(Not tracked)
People
(Reporter: tomprince, Assigned: owlish)
References
Details
There were 4 pending gecko-1/decision tasks (they have now passed their deadline), but no workers started to run them.
Looking at worker-manager logs, it appears as if the estimator thought that there were five instances running, so didn't start any new ones.
Assignee | ||
Updated•5 years ago
|
Comment 1•5 years ago
|
||
I'm seeing a similar problem for pmoore-test/gwci-linux but will move these tasks to aws-provisioner-v1/gwci-linux for now, so I'm not blocked. But just wanted to drop a note in the bug that more than one worker pool might be affected.
Dustin has advised that there is a worker manager listWorkersForWorkerPool
API call that should show which workers worker-manager considers to be alive.
Comment 2•5 years ago
|
||
I wonder if this is related to worker pools that started life under gcp provider but then moved to aws provider?
I looked at listWorkersForWorkerPool
for pmoore-test/gwci-linux
and got back 31 aws workers and 212 google workers. The maxCapacity I believe is 10, so I guess this includes workers that worker-manager no longer considers to be active.
Is there a way to filter just the workers that worker manager considers to be active?
Comment 3•5 years ago
|
||
Brian pointed out that there is a state
property in the response that says if workers are running or stopped etc.
It reports 74 running workers in gcp and 0 running workers in aws.
Since 74 > 10 I can see that worker manager wouldn't want to spawn new instances. I'm not sure why it is 74 as the max capacity was always 10, but maybe these are ghost instances that appear to be alive, but aren't really.
Assignee | ||
Comment 4•5 years ago
|
||
This was opened quite a while ago, and we tackled quite a few bugs in AWS provider since then. Is this still an issue?
Reporter | ||
Comment 5•5 years ago
|
||
I'm not aware of any current issues here.
Reporter | ||
Comment 7•5 years ago
|
||
It is not clear what the underlying problem was, or what resolved it. Per https://wiki.mozilla.org/BMO/UserGuide/BugStatuses#Resolutions it seemed like the most reasonable resoultion. (I don't have particularly strong feelings on what resolution to use)
Assignee | ||
Comment 8•5 years ago
|
||
Ah, thank you for the link - I tried to search something like that but couldn't find!
Description
•