Closed Bug 1598295 Opened 5 years ago Closed 2 years ago

Request quota increase for GCP n1 builders in us-central1

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: coop, Assigned: fubar)

References

Details

Attachments

(1 file)

Per Tom Bai from Google:

"Please submit a quota increase request through the GCP console (the UI can be a little confusing, see screenshot below)

Once submitted, a support ticket will be generated and in some cases these requests may get auto-approved. In case further review is needed, we can follow up on the ticket internally.

You may also want to submit increases for RAM, persistent disk and Local SSD quota to go along with the CPU quota request."

We haven't done the math in a few months, but here's what our avg and peak CPU needs were in the summer:

Avg CPU need: 29,504
Peak CPU need: 524,736

The config we're running in GCP has 32 vCPUS and 36 GB RAM with a 375GB local SSD attached as a scratch disk. We can use that to figure out what we need for those other resources. Math incoming:

Avg RAM need: (29,504 / 32) * 36 = 33,192
Peak RAM need: (524,736 / 32) * 36 = 590,328
Avg SSD need: (29,504 / 32) * 375 = 345,750
Peak SSD need: (524,736 / 32) * 375 = 6,149,250

Please double-check the math and let me know if you have any questions.

I would suggest that targeting our peak usage is probably overkill. I think 100,000 CPUs (== 3,125 instances) is a nice round number that will probably keep us afloat most days. We can always bump it further later.

Assignee: relops → klibby

Filed requests to increase both projects to 100,000:

fxci-production-level1-workers - ID:21358732
fxci-production-level3-workers - ID:21358748

Coop, can you forward on to the GCP folks?

Flags: needinfo?(coop)

Quota increased.

Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(coop)
Resolution: --- → FIXED

(In reply to Kendall Libby [:fubar] (he/him) from comment #3)

Quota increased.

Did we also get quota increases for RAM, SSD, and IP addresses (per email from Google)?

Flags: needinfo?(klibby)

Reopen since: 36 e-mails arrived between 9 hours ago and 41 minutes ago

with Quota 'CPUS' exceeded. Limit: 2400.0 in region us-central1. for gecko-3/b-linux-gcp

example error ID: zp30vm2pSlGDrT1Yp1OB3A

I suspect we also need an answer to c#4 as well.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

anyone know if there's a reason we only run linux workers in us-central1 on gcp?

in ec2 we spread work across multiple regions to mitigate against outages, capacity issues, quota limits and spot price spikes.

it appears that the gcp worker manager configuration for gecko-3/b-linux-gcp only includes settings for zones in us-central1. if this were updated to include all us regions, we would benefit from an immediate quota increase and mitigate against other problems arising from having all our eggs in the us-central1 basket.

(In reply to Rob Thijssen [:grenade (EET/UTC+0300)] from comment #6)

anyone know if there's a reason we only run linux workers in us-central1 on gcp?

The details are in bug 1587958, but in short, we only have hg mirror and bundle clone support in us-central1.

fwiw, another 51 alerts within the last 4 hours...

Re-requested quota increase on CPU, as it's still at 2400. Also requested increase in SSD and IPs to match. Cannot find RAM, will ask in email.

Flags: needinfo?(klibby)

I really can't test this until TC 24.0.1 is. deployed (bug 1601125) for the worker manager fixes. In my simple testing last week, we blew threw the 50 instance cap almost immediately.

Depends on: 1601125
No longer depends on: 1601125
Depends on: 1601125
Status: REOPENED → RESOLVED
Closed: 5 years ago2 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: