Switch to new GCP treescript workers
Categories
(Release Engineering :: Release Automation: Other, task)
Tracking
(firefox71 fixed)
Tracking | Status | |
---|---|---|
firefox71 | --- | fixed |
People
(Reporter: nthomas, Assigned: nthomas)
References
Details
Attachments
(2 files)
Release only worker
- tagging at the start of releases via release-early-tagging
- tagging and version bump via release-early-tagging
Will need to review instance sizing since this needs to clone gecko.
Assignee | ||
Comment 1•5 years ago
|
||
gecko-1-tree can do early tagging ok - eg https://tools.taskcluster.net/groups/SpyLLQRFRCyQij-XGkio_A/tasks/L268SOv0TVmYnoSpOEk4Gg
- Initial clone sets up the hg share and takes 23 minutes (AWS unknown, we never do this with a static host)
- subsequent runs are 4m 40s (AWS ~ 3m 30s)
- that's for a CPU request of 1000m, memory request 4000M
Comment 2•5 years ago
|
||
(In reply to Nick Thomas [:nthomas] (UTC+12) from comment #1)
gecko-1-tree can do early tagging ok - eg https://tools.taskcluster.net/groups/SpyLLQRFRCyQij-XGkio_A/tasks/L268SOv0TVmYnoSpOEk4Gg
- Initial clone sets up the hg share and takes 23 minutes (AWS unknown, we never do this with a static host)
- subsequent runs are 4m 40s (AWS ~ 3m 30s)
- that's for a CPU request of 1000m, memory request 4000M
Beefy instances :) Seems like in AWS we're using a t2.medium for both treescriptworker1
and treescriptworker-dev1
. Looking at the AWS docs seems like we need 2 vCPUs and 4 GB of RAM. We've had this problem with signingworkers as well, where we constantly bumped the memory and CPUs until we closed the timeframe gap with AWS counterparts.
Luckily we only need an instance here, no? (AWS world we only had one)
Or is it worth allocating two?
Assignee | ||
Comment 3•5 years ago
|
||
The first early tag is at about 16:00, and includes a clone. Two reruns at ~17:00 use an hg share. You can see we have higher limits than request - 1200m and 4500M - and that does get used.
The node itself is 2 vCPU and 7.5G, so we're heading toward single occupancy if we go much higher on request. Limits might help a bit. Mercurial is only going to use a single CPU as a python app, so it's extra OS load it'd help with.
Overall, maybe it's not a big deal given these are leaf tasks, and could just make the ship graph a little longer.
Assignee | ||
Comment 4•5 years ago
|
||
The creds should be good on gecko-3-tree - here's a maple early tagging from a week ago:
https://tools.taskcluster.net/groups/GM-nVD0MQ2W36kyRRRh8rA/tasks/c0VigaWARJyfaV2dYmIEGw/details
Assignee | ||
Comment 5•5 years ago
|
||
Pushed by nthomas@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d41b80604a69 Switch to new GCP treescript workers, r=mtabara
Comment 7•5 years ago
|
||
bugherder |
Assignee | ||
Comment 8•5 years ago
|
||
I don't think we need any autoscale patch to match the one worker we have in AWS.
We could consider autoscaling between 0 and 1 given we only need treescript twice per release, which becomes 10x 5 mins = 50 minutes per week most of the beta cycle. One downside of having no instance live is that the first job will spend 20-25 minutes doing the initial clone of the hg share.
Aki and I speculated that we could pre-populate the hg share during the image build, and then only need to pull in the delta since then. Some of the time win from that may be eroded by the increased size of the image affecting compression time, as well as transfer to docker-hub and into the k8s cluster.
Kubernetes also has Volume and Persistent Volume support to share data between containers, but we'd want to make sure the backing store is fast because hg does so much I/O (ie not the slow AWS S3 block store we had on workers a while back). I'm also not sure if that is compatible with more than one container (race conditions in the share), or how storage vs GCE costs work out.
At some point l10n-bumper will move into treescript and run jobs every hour.
Assignee | ||
Comment 9•5 years ago
|
||
The gecko change was uplifted to beta by the sheriffs:
https://hg.mozilla.org/releases/mozilla-beta/rev/d41b80604a6952fcf20d08250b60ce065223d24b
Wasn't included in 71.0b2 but will be in b3.
Description
•