Closed
Bug 1495917
Opened 6 years ago
Closed 5 years ago
Clean up of scl3 from securitygroups
Categories
(Release Engineering :: General, enhancement)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: apop, Assigned: bcrisan)
References
Details
Attachments
(1 file, 2 obsolete files)
After adding the scriptworkers into mdc1, we received a lot of broken checks from nagios-releng-mdc1. Then, we have found some scl3 entries under securitygroups.yml
Reporter | ||
Comment 1•6 years ago
|
||
while checking the securitygroups.yml, I have found many entries related scl3 and some related to aws Entries like this : # all slave VLANs look the same slave-vlan-inbound: - proto: tcp ports: [22] hosts: - {include: slaveapi-servers} - aws-manager1.srv.releng.scl3.mozilla.com - aws-manager2.srv.releng.scl3.mozilla.com aws-manager-ssh: proto: tcp ports: [22] hosts: - aws-manager1.srv.releng.scl3.mozilla.com - aws-manager2.srv.releng.scl3.mozilla.com Are these still needed ?
Flags: needinfo?(nthomas)
Comment 2•6 years ago
|
||
The answer isn't 100% certain. We have certainly no longer need slave-vlan-inbound, and aws-manager1 and aws-manager2 from scl3 are gone. aws-manager-ssh seems to be used so that puppet can be run after the initial instance creation. We may need to create a new aws-manager style of instance to create new instances, mostly scriptworker. jlorenzo was going to try running the code locally to see if that works OK.
Flags: needinfo?(nthomas) → needinfo?(jlorenzo)
Reporter | ||
Comment 3•6 years ago
|
||
Maybe a better question should be : What lines we still need and which ones we don't ?
Flags: needinfo?(nthomas)
Flags: needinfo?(jwatkins)
Comment 4•6 years ago
|
||
(In reply to Nick Thomas [:nthomas] (UTC+13) from comment #2) > jlorenzo was going to try running the code locally to see if that works OK. I found a workaround which doesn't require me to spin a machine up. So, I haven't looked closely at running the script locally. However, Mihai got some instructions from Aki which bypasses aws-manager, too. Did I get this correctly, Mihai? If so, I'm in favor of removing these entries.
Flags: needinfo?(jlorenzo) → needinfo?(mtabara)
Comment 5•6 years ago
|
||
(In reply to Johan Lorenzo [:jlorenzo] from comment #4) > (In reply to Nick Thomas [:nthomas] (UTC+13) from comment #2) > > jlorenzo was going to try running the code locally to see if that works OK. > > I found a workaround which doesn't require me to spin a machine up. So, I > haven't looked closely at running the script locally. However, Mihai got > some instructions from Aki which bypasses aws-manager, too. Did I get this > correctly, Mihai? If so, I'm in favor of removing these entries. Yes, Aki gave me some instructions which I want to test. If that works, we're safe to forget about the aws-manager as we'd have some backup solutions to ramp-up scriptworkers. However, I don't have headspace to test those until the end of this week or beginning of next week. If this bug is not urgent, maybe you can block it on me until I fully test those instructions? If this is needed sooner, I think we can maybe find someone else to give those a try?
Comment 6•6 years ago
|
||
Thanks for the context Mihai! aws-manager is already decommissioned. So is slc3. I don't think we should keep the vlan/ssh config around. Worst case scenario, if comment 5 doesn't work, we can still look back at the git history to craft something similar. Please remove the configuration related to aws-manager, Adrian. Please let me know if you need more information.
Flags: needinfo?(nthomas)
Flags: needinfo?(mtabara)
Flags: needinfo?(jwatkins)
Updated•6 years ago
|
Flags: needinfo?(apop)
Comment 7•6 years ago
|
||
(In reply to Johan Lorenzo [:jlorenzo] from comment #6) > Thanks for the context Mihai! > > aws-manager is already decommissioned. So is slc3. I don't think we should > keep the vlan/ssh config around. Worst case scenario, if comment 5 doesn't > work, we can still look back at the git history to craft something similar. > > Please remove the configuration related to aws-manager, Adrian. Please let > me know if you need more information. +1
Comment 8•6 years ago
|
||
(In reply to Johan Lorenzo [:jlorenzo] from comment #6) > Please remove the configuration related to aws-manager, Adrian. Please let > me know if you need more information. Okay, reading through the bug, it seems like we were always confident removing: slave-vlan-inbound, and aws-manager1 and aws-manager2 and Johan/Mihai confirmed that we can also remove aws-manager-ssh which addresses Nick's concerns.
Reporter | ||
Comment 9•6 years ago
|
||
Assignee: nobody → apop
Flags: needinfo?(apop)
Reporter | ||
Comment 10•6 years ago
|
||
Please check and review
Reporter | ||
Comment 11•6 years ago
|
||
Please check the patch that I've attached and review it. Thank you
Flags: needinfo?(mtabara)
Flags: needinfo?(jlorenzo)
Comment 12•6 years ago
|
||
(In reply to Adrian Pop from comment #11) > Please check the patch that I've attached and review it. > > Thank you Sure, will do. Btw, you can set r? directly on the patch by going to "Details" and asking "r?" + ":mtabara: or ":jlorenzo" so that the review request links to the patch. In this bug it's quite easy to spot it, but in a bug with more patches, it'd be confusing a bit.
Comment 13•6 years ago
|
||
Comment on attachment 9016913 [details] [diff] [review] patch for removing lines related to aws hosts with scl3 as data center Review of attachment 9016913 [details] [diff] [review]: ----------------------------------------------------------------- ::: configs/securitygroups.yml @@ +328,4 @@ > ports: > - {include: buildbot-http-portrange} > hosts: > + - {include: slaveapi-servers} Tiny nit: are there some trailing spaces here?
Attachment #9016913 -
Flags: review+
Updated•6 years ago
|
Flags: needinfo?(mtabara)
Comment 14•6 years ago
|
||
Comment on attachment 9016913 [details] [diff] [review] patch for removing lines related to aws hosts with scl3 as data center LGTM. Same comment as Mihai.
Flags: needinfo?(jlorenzo)
Attachment #9016913 -
Flags: review+
Reporter | ||
Comment 15•6 years ago
|
||
(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #13) > Comment on attachment 9016913 [details] [diff] [review] > patch for removing lines related to aws hosts with scl3 as data center > > Review of attachment 9016913 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: configs/securitygroups.yml > @@ +328,4 @@ > > ports: > > - {include: buildbot-http-portrange} > > hosts: > > + - {include: slaveapi-servers} > > Tiny nit: are there some trailing spaces here? I've checked and I couldn't see them. About the other entries related to scl3 and scl3 host. I'll create a different patch and add it here for the review
Reporter | ||
Comment 16•6 years ago
|
||
removed the other hosts that had scl3 in their names and a few entries that had scl3.
Attachment #9019222 -
Flags: review?(nthomas)
Attachment #9019222 -
Flags: review?(jwatkins)
Reporter | ||
Comment 17•6 years ago
|
||
Please check and review my last patch. Thank you
Comment 18•6 years ago
|
||
Comment on attachment 9019222 [details] [diff] [review] bug_1495917_2.patch Review of attachment 9019222 [details] [diff] [review]: ----------------------------------------------------------------- It would be much easier to do these reviews in a github pull request. If you submit a PR, you can attach it to the bug by simply pasting the PR URL to the attachment box. A github pull request would allow myself, and other reviewers, to inspect the diff as an entire file. That would give a much better chance of reviews catching errors. ::: configs/securitygroups.yml @@ +31,1 @@ > - nagios1.private.releng.mdc1.mozilla.com The mdc2 nagios host should also be listed here @@ +31,3 @@ > - nagios1.private.releng.mdc1.mozilla.com > > # all slave VLANs look the same If you are removing the entire slave-vlan-inbound group, you should also remove its comment. @@ -179,1 @@ > slaveapi-servers: 10.26.48.16/31 slaveapi died with scl3 and should also be removed. In fact, anywhere you see the cidr 10.26.x.x/x, it should be removed since that was scl3s network range. @@ +279,4 @@ > - proto: tcp > ports: [22, 3389, 5900] > hosts: > + This becomes invalid without at least one host listed. Find out the mdc1 and mdc2 equivalents and add them. @@ +348,1 @@ > - nagios1.private.releng.mdc1.mozilla.com The MDC2 nagios host should also be listed @@ +379,1 @@ > - nagios1.private.releng.mdc1.mozilla.com Missing mdc2 nagios host @@ +402,1 @@ > - nagios1.private.releng.mdc1.mozilla.com missing mdc2 nagios host @@ +421,4 @@ > - 10.26.0.0/16 > - 10.130.0.0/16 > - 10.132.0.0/16 > + - 10.134.0.0/16 Trailing whitespace and missing mdc2 nagios host @@ +425,2 @@ > - nagios1.private.releng.mdc1.mozilla.com > - include: aws-manager-ssh aws-manager-ssh was removed in this patch, it should be removed everywhere else it is referenced.
Attachment #9019222 -
Flags: review?(jwatkins) → review-
Updated•6 years ago
|
Attachment #9019222 -
Flags: review?(nthomas)
Assignee | ||
Comment 19•6 years ago
|
||
With suggestions provided above, I made this commit (in my fork) https://github.com/bccrisan/build-cloud-tools/commit/faee3f8b9bf5ba1b310e2d477792f7c34ca5d5c5 There are few definitions that we may not need anymore and I'm not sure about them. Do we still need this: - outbound-ssh-upload-try - test-slave-vlan-outbound - try-slave-vlan-outbound - build-slave-vlan-outbound - slave-vlan-outbound - loaners - buildbot-master
Flags: needinfo?(nthomas)
Flags: needinfo?(jwatkins)
Assignee | ||
Comment 20•6 years ago
|
||
(In reply to Bogdan Crisan [:bcrisan] (GMT+2) from comment #19) > With suggestions provided above, I made this commit (in my fork) > https://github.com/bccrisan/build-cloud-tools/commit/ > faee3f8b9bf5ba1b310e2d477792f7c34ca5d5c5 > > There are few definitions that we may not need anymore and I'm not sure > about them. > Do we still need this: > - outbound-ssh-upload-try > - test-slave-vlan-outbound > - try-slave-vlan-outbound > - build-slave-vlan-outbound > - slave-vlan-outbound > - loaners > - buildbot-master These also: - buildbot-http-portrange: 8000-8999 - buildbot-rpc-portrange: 9000-9999
Comment 21•6 years ago
|
||
(In reply to Bogdan Crisan [:bcrisan] (GMT+2) from comment #19) > With suggestions provided above, I made this commit (in my fork) > https://github.com/bccrisan/build-cloud-tools/commit/ > faee3f8b9bf5ba1b310e2d477792f7c34ca5d5c5 > > There are few definitions that we may not need anymore and I'm not sure > about them. > Do we still need this: > - outbound-ssh-upload-try > - test-slave-vlan-outbound > - try-slave-vlan-outbound > - build-slave-vlan-outbound > - slave-vlan-outbound > - loaners > - buildbot-master Honestly, I'm not sure what is still needed either. Maybe someone in Releng would have a better sense. Since buildbot is dead it seems like we could rip out a bunch of things. Make sure if things are ripped out, there are no other ec2 instances inadvertently using the security groups being removed. And make sure you remove any references to the removed definitions.
Flags: needinfo?(jwatkins)
Comment 22•6 years ago
|
||
(In reply to Jake Watkins [:dividehex] from comment #21) > (In reply to Bogdan Crisan [:bcrisan] (GMT+2) from comment #19) > > With suggestions provided above, I made this commit (in my fork) > > https://github.com/bccrisan/build-cloud-tools/commit/ > > faee3f8b9bf5ba1b310e2d477792f7c34ca5d5c5 > > > > There are few definitions that we may not need anymore and I'm not sure > > about them. > > Do we still need this: > > - outbound-ssh-upload-try > > - test-slave-vlan-outbound > > - try-slave-vlan-outbound > > - build-slave-vlan-outbound > > - slave-vlan-outbound > > - loaners > > - buildbot-master > > Honestly, I'm not sure what is still needed either. Maybe someone in Releng > would have a better sense. Since buildbot is dead it seems like we could > rip out a bunch of things. Make sure if things are ripped out, there are no > other ec2 instances inadvertently using the security groups being removed. > And make sure you remove any references to the removed definitions. the *slave* and *master* ones I would imagine can be removed however, I'd be concerned that other things use these. Not sure about outbound-ssh-upload-try. Nick, do you know?
Comment 23•6 years ago
|
||
Mostly we can remove all of that, eg outbound-ssh-upload-build and outbound-ssh-upload-try are no longer needed. We will need to keep buildbot-master because of bm01 and bm77 running l10n-bumper and bouncer checks, but it can be shortened to buildbot-master: description: security group for buildbot masters regions: us-west-1: vpc-7a7dd613 us-west-2: vpc-cd63f2a4 us-east-1: vpc-b42100df inbound: # generic stuff - include: jumphost-admin-access - include: nagios-nrpe - include: global-ping outbound: # TODO: bug 1210137 - include: global-any Other things you could remove: proxxy, any include which isn't used by anything
Flags: needinfo?(nthomas)
Assignee | ||
Comment 24•6 years ago
|
||
So far I removed: - outbound-ssh-upload-build, - outbound-ssh-upload-try, - test-slave-vlan-outbound, - try-slave-vlan-outbound, - build-slave-vlan-outbound, - slave-vlan-outbound, - tests, - build, - try, - loaners (In reply to Nick Thomas [:nthomas] (UTC+13) from comment #23) > Mostly we can remove all of that, eg outbound-ssh-upload-build and > outbound-ssh-upload-try are no longer needed. That has been removed in https://github.com/bccrisan/build-cloud-tools/commit/24ca09cfac57b6a88ad510c2f2357cddad7d9fe8 > We will need to keep buildbot-master because of bm01 and bm77 running > l10n-bumper and bouncer checks, but it can be shortened to I was expecting that > buildbot-master: > description: security group for buildbot masters > regions: > us-west-1: vpc-7a7dd613 > us-west-2: vpc-cd63f2a4 > us-east-1: vpc-b42100df > inbound: > # generic stuff > - include: jumphost-admin-access > - include: nagios-nrpe > - include: global-ping > outbound: > # TODO: bug 1210137 > - include: global-any Replaced with ^^ in commit https://github.com/bccrisan/build-cloud-tools/commit/24ca09cfac57b6a88ad510c2f2357cddad7d9fe8 > Other things you could remove: proxxy, any include which isn't used by > anything I already removed a bunch of scl3 related proxxy's in https://github.com/bccrisan/build-cloud-tools/commit/faee3f8b9bf5ba1b310e2d477792f7c34ca5d5c5#diff-57d617afa8ef2d0089e9ce90bae279f4L177 I'm not entirely sure what others could be removed and are unused. Also, dividehex pointed out a few things (slave-vlan-inbound being removed but still referenced in several places) just realized that those places we may not need them (was referenced in tests, build, and try), those got removed and the vpn network and ssh jumphost should be replaced with mdc1/2 equivalents, which again, not necessary since "tests" were removed.
Comment 25•5 years ago
|
||
bcrisan: is there more clean up here required? Sounds like in comment 24 we resolved most known scl3 references. Should we resolve and follow up in subsequent bug if others find more old references?
Flags: needinfo?(bcrisan)
Assignee | ||
Comment 26•5 years ago
|
||
I think I covered all of the changes in comment 24, should I do a PR from my fork? (I'm not sure if it's safe since I did not updated it since the last time I worked on it)
Or create a patch file and attach it here?
Assignee: apop → bcrisan
Flags: needinfo?(bcrisan) → needinfo?(jlund)
Comment 27•5 years ago
|
||
you can create a PR and link the PR URL in a Bugzilla attachment here.
Flags: needinfo?(jlund)
Assignee | ||
Comment 28•5 years ago
|
||
Created PR #373
Attachment #9016913 -
Attachment is obsolete: true
Attachment #9019222 -
Attachment is obsolete: true
Assignee | ||
Comment 29•5 years ago
|
||
Merged and no conflict appeared meanwhile.
This can be marked as fixed. Please reopen if something else needs to be done.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•