Closed Bug 1323567 Opened 6 years ago Closed 5 years ago

Enable us-east-1e where possible, eg g-w732

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P5)

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: nthomas, Assigned: aselagea)

References

Details

Attachments

(2 files, 1 obsolete file)

At some point us-east-1e was added by Amazon, but we didn't have any subnets configured for it. Bug 1093656 disabled that availability zone to avoid spurious log messages from watch_pending.py. I think we can look at adding some subnets and turning 1e on. The motivation is a price spike on g2.2xlarge, which made me search for alternative capacity for g-w732.

An example location for disabling is https://github.com/mozilla-releng/build-cloud-tools/blob/master/configs/watch_pending.cfg#L102, but we're doing it on all our machine classes.

We'd need to add more subnets:
* for tests as we only have one /24 in 1e (subnet-42f0c078), while there are 5 /24's for tests 1a/1b/1c/1d (each /24 provides ~250 IP addresses)
* for non-try builds we don't have anything in us-east-1e, while there are 2x /25 in 1a/1b/1c/1d
* try builds are the same as non-try builds

We shouldn't need any changes to slavealloc, or any new masters - this is about the same number of max instances but can run over 4 zones instead of 3.
Component: General Automation → Buildduty
QA Contact: catlee → bugspam.Callek
Assignee: nobody → aselagea
Attached patch bug_1323567.patch (obsolete) — Splinter Review
Added more subnets for the VPC corresponding to us-east-1.
Attachment #8820247 - Flags: feedback?(nthomas)
(In reply to Nick Thomas [:nthomas] from comment #0)

> We'd need to add more subnets:
> * for tests as we only have one /24 in 1e (subnet-42f0c078), while there are
> 5 /24's for tests 1a/1b/1c/1d (each /24 provides ~250 IP addresses)
> * for non-try builds we don't have anything in us-east-1e, while there are
> 2x /25 in 1a/1b/1c/1d
> * try builds are the same as non-try builds

Based on the patch above, the actual subnets that would need to be added in us-east-1e could look like this:

testers:
    - 10.134.160.0/24 => 251 usable IPs
    - 10.134.161.0/24 => 251 usable IPs
    - 10.134.162.0/24 => 251 usable IPs 
    - 10.134.163.0/24 => 251 usable IPs 

non-try builds:
    - 10.134.164.0/25 => 123 usable IPs
    - 10.134.165.0/25 => 123 usable IPs

try builds:
    - 10.134.168.0/25 => 123 usable IPs
    - 10.134.168.0/25 => 123 usable IPs

Also, is g-w732 the only instance type we'd like to enable in us-east-1e for now? (https://github.com/mozilla-releng/build-cloud-tools/blob/master/configs/watch_pending.cfg#L102)
(In reply to Alin Selagea [:aselagea][:buildduty] from comment #2)

> try builds:
>     - 10.134.168.0/25 => 123 usable IPs
>     - 10.134.168.0/25 => 123 usable IPs

Err, the last one should be 10.134.169.0/25
Comment on attachment 8820247 [details] [diff] [review]
bug_1323567.patch

Thanks for introducing me to these files, somehow I'd missed them all this time! We don't need to stick to the 2x /25 for build and try, that's just the way we started off small in the early days in AWS. You're using only half of two /25s too. Instead lets use a /24, so I propose:

testers:
    - 10.134.160.0/24 => 251 usable IPs
    - 10.134.161.0/24 => 251 usable IPs
    - 10.134.162.0/24 => 251 usable IPs 
    - 10.134.163.0/24 => 251 usable IPs 

non-try builds:
    - 10.134.164.0/24 => 251 usable IPs

try builds:
    - 10.134.165.0/24 => 251 usable IPs

>diff --git a/configs/securitygroups.yml b/configs/securitygroups.yml
>     build-use1: 10.134.52.0/22
>+    build2-use1: 10.134.164.0/22

So this would drop back to a /24.

>     test4-use1: 10.134.60.0/22
>+    test5-use1: 10.134.160.0/22
>     try-use1: 10.134.64.0/22
>+    try2-use1: 10.134.168.0/22

And this changes base and size. Then matching changes in subnets.yml.

I had a quick look at aws_manage_subnets.py and I'm not sure it's going to create the new subnets in use1e or not. Would be worth watching out for. And it looks like we'll need to add the new test subnets to configs/g-w732 once their names are known. 

g-w732 is all I was planning on right now, but we might as well add the others too. We'd need to check that all the instance types we use are available in 1e, as some of the older ones may not be.
Attachment #8820247 - Flags: feedback?(nthomas) → feedback+
Attached file add_subnets_in_use1
Created a PR considering the changes suggested above. I think we'll need to manually create the new subnets in us-east-1e though. Will come up with another patch to update configs/g-w732 once we have the name of the subnets.

Will ask for review and merge once Nick returns from PTO. :-)
Attachment #8820247 - Attachment is obsolete: true
Comment on attachment 8823592 [details] [review]
add_subnets_in_use1

lgtm. Check with rail if you have any queries about how to create the subnets (script vs manual etc).
Attachment #8823592 - Flags: review+
Discussed with :rail today about this and he suggested using the following script:
https://github.com/mozilla-releng/build-cloud-tools/blob/master/cloudtools/scripts/aws_manage_subnets.py
Manually adding them is also a possibility, but, as :rail said, the script may complain about it in the future. We have this wiki page for the required steps (which I somehow missed): https://wiki.mozilla.org/ReleaseEngineering/How_To/Add_New_AWS_Subnets

Since we want the new subnets in us-east-1e only, I updated the PR to exclude the rest of the AZs.
https://github.com/mozilla-releng/build-cloud-tools/pull/274

If everything's ok, I will ask you to also do the merge since I don't have the permissions to do so :-). Thanks!
Flags: needinfo?(nthomas)
Merged at https://github.com/mozilla-releng/build-cloud-tools/commit/7d217a7215a40607081c988e5b5392c148d3efe3.

re the NetOps bug for firewall changes in SCL3 - please note the way Dustin translates for NetOps at bug 1254144 comment #6. We have test, build, and try this time though, so it's a little more complicated.
Flags: needinfo?(nthomas)
The following subnets have been added in us-east-1e:

- test:
    - 10.134.160.0/24: subnet-f6c678ca
    - 10.134.161.0/24: subnet-f4c678c8
    - 10.134.162.0/24: subnet-fbc678c7 
    - 10.134.163.0/24: subnet-f8c678c4

- build:
    - 10.134.164.0/24: subnet-f2c678ce

- try:
    - 10.134.165.0/24: subnet-ecc678d0
I think we'll need to update the signing server config to include the build and try subnets, see 
  https://hg.mozilla.org/build/puppet/file/default/manifests/moco-config.pp#l94
Thanks for spotting this! Attaching the patch for this update.
Attachment #8826192 - Flags: review?(nthomas)
Attachment #8826192 - Flags: review?(nthomas) → review+
Once the network changes in bug X are verified by the firewall tests we can continue with the steps in https://wiki.mozilla.org/ReleaseEngineering/How_To/Add_New_AWS_Subnets. At step 3 we don't need more instances, but do need to remove the blocks in "ignored_azs". In step 4 there will be several configs to update with the new subnets, here are just a few
 https://github.com/mozilla-releng/build-cloud-tools/blob/master/configs/t-w732#L7
 https://github.com/mozilla-releng/build-cloud-tools/blob/master/configs/y-2008#L7
 https://github.com/mozilla-releng/build-cloud-tools/blob/master/configs/bld-linux64#L7
 https://github.com/mozilla-releng/build-cloud-tools/blob/master/configs/try-linux64#L7
Found in triaging. Is this something we still want to proceed with doing?

Note explaining the priority level: P5 doesn't mean we've lowered the priority, but the contrary. However, we're aligning these levels to the buildduty quarterly deliverables, where P1-P3 are taken by our daily waterline KTLO operational tasks.
Priority: -- → P5
This need for this goes away with the taskcluster migration.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.