Switch to new GCP balrogworkers
Categories
(Release Engineering :: Release Automation, task)
Tracking
(firefox71 fixed)
Tracking | Status | |
---|---|---|
firefox71 | --- | fixed |
People
(Reporter: mtabara, Assigned: mtabara)
References
Details
Attachments
(2 files)
+++ This bug was initially created as a clone of Bug #1580476 +++
+++ This bug was initially created as a clone of Bug #1579476 +++
Rail gave me a tour in the new world today. It looks like we can switch to the new balrog workers hosted by cloudops at some point next week \o/
Still to investigate whether netflows are fine or not, but once that's confirmed, we can start with nightlies.
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 1•6 years ago
|
||
Brough all branches up-to-date with master
in balrogscript and pushed a test patch on try. Most build jobs failed for some reason so I won't be able to see all the locales balrog jobs as I hoped. Nevertheless, the balrog-toplevel-submit
job made it thru the new GCP worker - https://tools.taskcluster.net/groups/IDqFsAmWTNi4jUtWi1aFLg/tasks/AJ5TtCMZRlGtnghMW-JB0g/details
Prepping patch to auto-scale both the environments and measurements to understand how much do we need.
For a regular beta (including partner after >= b8), we have ~1200 beetmover jobs but ~500+ balrog ones. So definitely less resources in GCP.
Old AWS puppet-based infrastructure:
- Beetmoverworkers -> 22 production, 10 in dev
- Balrogworkers -> 10 production, 10 in dev
GCP-based infrastructure:
- Beetmoverworkers - max 20 replicas
- balrogworkers - I'm thinking of max 10, to simulate what we currently have, to begin with?
Average job runtime:
- beetmover -> ~50 seconds in a beta (in cloudops-infra this is et to 120 seconds)
- balrog -> ~20 seconds in a bea (will set this in cloudops to 60 seconds to cover upper limit)
Assignee | ||
Comment 2•6 years ago
•
|
||
Try push's balrog toplevel-submit
job https://tools.taskcluster.net/groups/b3jfIBkeTh6BqEwarNPh2A/tasks/X4eEsKI3Th-xyzMGjfeLlQ/details exceeded runtime because no workers consumed this. Double-checking workers, turns out the scopes were wrong, they were still polling gecko-t-balrog
instead of gecko-1-balrog
. We should be good now.
Assignee | ||
Comment 3•6 years ago
|
||
Assignee | ||
Comment 4•6 years ago
|
||
Staging releases have been behaving very weird lately for me with builds failing all along. I pushed to try to see if it could be related to my patch - very low chances - and it seems to have kicked well so far.
I followed-up with pushing my patch and another push to try.
If this goes green, I'll fire-up a staging release based on latest patch.
Assignee | ||
Comment 5•6 years ago
|
||
Comment 7•6 years ago
|
||
bugherder |
Assignee | ||
Comment 8•6 years ago
|
||
Note to self: if this morning's nightlies work as expected, I'll uplift this to beta too.
Assignee | ||
Comment 9•6 years ago
|
||
Green tasks on cetral but because we have too many workers now (20), we're hitting some conflicts in the blobs which we didn't have before. I'll file a separate bug.
Updated•9 months ago
|
Comment 10•2 months ago
|
||
uplift |
Description
•