Closed Bug 1387540 Opened 8 years ago Closed 7 years ago

[tracker] Connect Gecko VPCs to Mozilla networks

Categories

(Taskcluster :: Operations and Service Requests, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

References

Details

We want to set up IPSec tunnels between VPCs used for Gecko work and the Mozilla network. That means we need to follow the Mozilla network numbering scheme. My plan is roughly: * Create new VPCs based on the /16's allocated in 1387179, plus appropriate subnets etc. * Configure the provisioner to put gecko-* workerTypes in that VPC * Create VPN config for each VPC * Work with netops to connect those VPNs to the Mozilla network In the final analysis, we'll use two VPCs per region, one for Gecko stuff, and one for non-Gecko stuff like github-worker, tutorial, etc. That will give an added measure of isolation between Gecko and the rest and ensure there's no access to the Mozilla networks from non-gecko workers. Greg, any of that strike you as crazy?
sounds like a good plan, nothing about that is crazy. I'm glad you brought up the idea of separating out vpcs between gecko, non-gecko.
Assignee: nobody → dustin
This is coming along at https://github.com/taskcluster/taskcluster-terraform I have a VPC set up in eu-central-1 along with the beginnings of a VPN connection.
John, it looks like the AWS provisioner sets Placement.AvailabilityZone in the bid based on the selected AZ for provisioning. From our conversation, I think EC2 maps this to the "default" subnet in that AZ, which is in the default VPC. As far as I can tell, "default" in this sense is for compatibility with "classic EC2" and isn't especially manageable -- I can't find a UI option to set a subnet as default Here's a snippet of docs (http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-requests.html#using-spot-instances-request) --- (Optional) For Availability Zones, the default is to let AWS choose the Availability Zones for your Spot instances. If you prefer, you can specify specific Availability Zones. [EC2-VPC] Select one or more Availability Zones. If you have more than one subnet in an Availability Zone, select the appropriate subnet from Subnet. To add subnets, select Create new subnet to go to the Amazon VPC console. When you are done, return to the wizard and refresh the list. [EC2-Classic] Select Select specific zone/subnet, and then select one or more Availability Zones. --- As part of the work here, we are moving the Gecko workers to different subnets (and VPCs) than non-Gecko workers, so we be provisioning into two subnets per availability zone (one Gecko, one non). To achieve this, I think we'll need to specify SubnetId in the worker type definition -- but that will limit the workerType to a single AZ per region, since that subnet is in a single AZ. How can we configure things to run over multiple, specific subnets?
Flags: needinfo?(jhford)
(In reply to Dustin J. Mitchell [:dustin] from comment #3) > As far as I can tell, "default" in this sense is for > compatibility with "classic EC2" and isn't especially manageable -- I can't > find a UI option to set a subnet as default You're right that there isn't a way to specify default subnet in the UI. It is something you have to file a support ticket to request it being set. My understanding is that the default subnet is set per VPC/AZ pairing. Once the default is set, we can modify the default subnet as we would any other subnet. The only thing we cannot change is setting it as default/nondefault. > As part of the work here, we are moving the Gecko workers to different > subnets (and VPCs) than non-Gecko workers, so we be provisioning into two > subnets per availability zone (one Gecko, one non). To achieve this, I > think we'll need to specify SubnetId in the worker type definition -- but > that will limit the workerType to a single AZ per region, since that subnet > is in a single AZ. > > How can we configure things to run over multiple, specific subnets? Provided that we can set a different set of subnets as default in the second VPC, this shouldn't be an issue. I'm pretty sure that's possible. If that's not possible, I wonder if it would be better to create a second EC2 account for gecko and run a second instance of ec2-manager/provisioner for that second account. If we definitely do not want a second EC2 account, as icky as it is, I could add a another level of cascading option for per-az configuration to the worker type configuration. That'd let us specify the subnet on a per-az basis.
Flags: needinfo?(jhford)
Summarizing irc: - default subnets are only allowed on the default VPC https://aws.amazon.com/premiumsupport/knowledge-center/recreate-default-vpc/ - a second provisioner would require a new provisionerId and migrating everything Gecko to that -- a long process. so it looks like a level of cascading launchSpec for per-AZ configuration is the way to go.
Depends on: 1388436
Depends on: 1388481
Depends on: 1388792
Summary: Connect Gecko VPCs to Mozilla networks → [tracker] Connect Gecko VPCs to Mozilla networks
It's looking tough to get the provisioner to support multiple subnets -- and it's not a core requirement of this project. I'd like to try, instead, just switching the "default" VPC over to the gecko workers VPC. I think this can be done with an AWS support ticket. I'll try in one less-used region first (eu-central-1) and make sure provisioning to that region continues to function, then switch regions one by one. Once the old vpcs/subnets are drained, we can delete them. This checks all the boxes as required: hosts will not have external access, but will have access via VPNs/jumphosts and the admin subnet has KMS access. It will leave the releng jumphosts with ssh/rdp/vnc acccess to all EC2 workers, even non-gecko workers, but that shouldn't be a big concern.
Response: --- Thanks for contacting AWS Support. I understand that you're looking to understand if you can mark a new / different VPC as a default VPC and make its subnets as default subnets. You cannot mark an existing non-default VPC as a default VPC and you can only have one default VPC per region. Having said that, I'm excited to inform you that we have announced a new long-awaited feature just recently (07/27) where-in, you can create a new default VPC [1] directly from your console or using AWS CLI. Here's the AWS what's new post that explains in detail on all that you will need to know about creating default VPC - https://aws.amazon.com/about-aws/whats-new/2017/07/create-a-new-default-vpc-using-aws-console-or-cli/. Please let us know if it doesn't answer any of your questions and I'll be happy to assist you. --- https://aws.amazon.com/about-aws/whats-new/2017/07/create-a-new-default-vpc-using-aws-console-or-cli/ --- Amazon Virtual Private Cloud (VPC) now allows customers to create a new default VPC directly from the console or by using the CLI. With this release, customers no longer need to contact AWS support if the default VPC has been deleted, as they can create a new default VPC by using this self-service feature. Customers can also take remedial actions against accidental deletion of default VPCs, by automating creation of a new default VPC using the API. --- So I think that would involve deleting the existing VPC and then re-creating the new VPC. That will then require draining the region of any workers. Note that the VPN connections can stay in place, just linked to a different VPC. So I think the question is, can we afford to, say, drain us-east-1 of workers and of any other instances in that VPC.
Depends on: 1393241
Depends on: 1400261
Depends on: 1400273
Depends on: 1400274
Amy, Greg, what would you like the security groups for these workers to look like from the inside? Do you want them always open to rejh's, or do you want to manually add such SG's as required?
Flags: needinfo?(garndt)
Flags: needinfo?(arich)
Would this mean to access one of these machines via the jumphost, I would first need to add the right security group to the machine through AWS, and then use the jump host to get access?
Flags: needinfo?(garndt)
Yes, well, and you and I don't have rejh access, so it would mean a TC person adds the SG, and then a relops person gets access through the rejh.
swapping the NI to kendall since he's been running the jumphost/security project.
Flags: needinfo?(arich) → needinfo?(klibby)
Sounds like that's a major pain and blocker. if anything the TC team should be out of the loop for accessing those machines, including applying the security groups. So either a SG needs to be applied, and we should give the appropriate people access to do so, or we just have it by default applied.
Yeah, I'm not sure if the effective "two-man rule" is desirable.
We'll call this done. If we need to sort out further SGs or flows, we can do that in another bug.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(klibby)
Resolution: --- → FIXED
Component: Operations → Operations and Service Requests
You need to log in before you can comment on or make changes to this bug.