Number of active comm-t/win11-64-2009 workers keeps dropping to zero or near-zero
Categories
(Release Engineering :: Firefox-CI Administration, defect)
Tracking
(Not tracked)
People
(Reporter: darktrojan, Assigned: jmoss)
References
Details
This could be a duplicate of bug 1735411, so feel free to dupe it. In this case it's only really been happening in the past few weeks so it may be a different problem.
At least once a day we're seeing a high number of pending Windows tests jobs, and the list of workers showing only two workers actually doing anything. Sometimes there are zero workers doing anything and no pending tasks are getting cleared. After a number of hours we seem to get lots more workers and the tasks run.
It could be a matter of infrequent demand for workers. There are often periods of many hours where no pushes happen.
Comment 1•4 years ago
|
||
Comment 2•4 years ago
|
||
When did the Azure machines became default for the jobs in comm? Any idea how much time difference there is between switching it over to Azure vs. hitting this?
| Reporter | ||
Updated•4 years ago
|
Comment 3•4 years ago
|
||
The timing definitely lined up with the switching to Azure. It landed on Jan 15. Previously we only had occasional issues with decision tasks.
Updated•4 years ago
|
Comment 4•4 years ago
|
||
Marking this as S3, this is on the roadmap to be fixed properly, if it starts getting out of hand NI me and I'll see if I can prioritize this higher with the team.
| Reporter | ||
Comment 5•4 years ago
|
||
We're currently seeing 3+ hour gaps between the build finishing and the tests starting. Not every push, but frequently. It's not the end of the world as we still have the other platforms, but it is annoying.
Also, comm-beta uses the same workers so the push to release time there is longer.
| Reporter | ||
Comment 6•4 years ago
|
||
(Actually I've just been contradicted on the last point, we can do the release steps without waiting for test results. But that feels a bit wrong.)
| Reporter | ||
Comment 7•2 years ago
|
||
I know this is an old bug and things may have changed somewhat, but this still happens. I've had a task waiting on try-comm-central for two hours while no workers were awake to take it. One eventually awoke (probably triggered by a build on comm-central or by another try run I started), but now there are 14 tasks waiting and one worker.
(In reply to Geoff Lankow (:darktrojan) from comment #7)
I know this is an old bug and things may have changed somewhat, but this still happens. I've had a task waiting on try-comm-central for two hours while no workers were awake to take it. One eventually awoke (probably triggered by a build on comm-central or by another try run I started), but now there are 14 tasks waiting and one worker.
Which worker pool are you referring to? comm-t/win10-64-2004 or comm-t/win11-64-2009? We're getting rid of comm-t/win10-64-2004.
| Reporter | ||
Comment 9•2 years ago
|
||
Ah yes, I should've updated the bug title.
Updated•1 year ago
|
Description
•