Closed Bug 1594220 Opened 6 years ago Closed 6 years ago

taskcluster: scale out firefoxci deployment

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: miles, Assigned: brian)

References

Details

Miles Crabill [:miles]

Reporter

Description

•

6 years ago

Current numbers of Heroku dynos per service:

queue.web: 25
queue.claimResolver: 4
queue.deadlineResolver: 3
queue.dependencyResolver: 4
auth.web: 10
index.web: 4
secrets.web: 2

Each dyno has roughly 1 cpu and 512MB of memory. These numbers can roughly correspond to k8s pods, though we'd rather be overprovisioned than underprovisioned.

There is a push tomorrow that we'd like to be scaled out for. It's OK if this is done manually for now.

Miles Crabill [:miles]

Reporter

Updated

•

6 years ago

Component: Operations: Deployment Requests → Operations: Taskcluster

Brian Pitts

Assignee

Updated

•

6 years ago

Assignee: nobody → bpitts

Status: NEW → ASSIGNED

Brian Pitts

Assignee

Comment 1

•

6 years ago

I assume everything not in that list is using 1 dyno.

Are all taskcluster services universally configured to reserve 1 CPU and 512MB of RAM in heroku?

Is there a way edunham or I could see historical resource utilization per-service to better fine tune our initial requests? If not, that's fine, we can just adjust downward after launch. We don't have any resource limits, only requests, configured in k8s, so nothing is going to get throttled artificially. We just have to look out for overloaded nodes in the short-term, and address overprovisioning causing us to run too many nodes in the long term.

Brian Pitts

Assignee

Comment 2

•

6 years ago

•

Edited

Miles, do the resource changes at https://paste.mozilla.org/YowdwOJ0 look good to you? If so I can apply both https://github.com/mozilla-services/cloudops-infra/pull/1558 and them in the morning.

I've reserved 0.9 cpu instead of 1 because the node pool is currently comprised of n1-standard-2 instances, which only have 1.94 usable cpu (https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture). I think after the push tomorrow is done we should consider creating a new nodepool with a larger instance type, possibly with a higher cpu to memory ratio.

Flags: needinfo?(miles)

Brian Pitts

Assignee

Comment 3

•

6 years ago

This is done. We now have 0.9cpu and 500MB of RAM reserved for each service, and the additional replicas described in the original request.

I'll create a bug for us to followup in a couple weeks and tweak the requests and replicas.

From the Taskcluster dev side, my expectation is that once things settle down you can start to generate HPAs for these 7 services so we will not need to scale up and down manually.

Status: ASSIGNED → RESOLVED

Closed: 6 years ago

Resolution: --- → FIXED

Brian Pitts

Assignee

Updated

•

6 years ago

Updated

•

6 years ago

Flags: needinfo?(miles)

You need to log in before you can comment on or make changes to this bug.

Bugzilla

taskcluster: scale out firefoxci deployment

Categories

(Cloud Services :: Operations: Taskcluster, task)

Tracking

(Not tracked)

People

(Reporter: miles, Assigned: brian)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Updated

Updated