Decommission bouncer in PHX1

RESOLVED FIXED

Status

Infrastructure & Operations Graveyard
WebOps: Product Delivery
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: oremj, Assigned: w0ts0n)

Tracking

({spring-cleaning})

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/1979] [vm-delete:11])

Attachments

(1 attachment)

(Reporter)

Description

2 years ago
Please wait until 10/22, if possible.

download.mozilla.org, bounceradmin.mozilla.com and sentry have been migrated to AWS (bug 1211734). Feel free to decommission those services on your side any time after 10/22.

Updated

2 years ago
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/1979]
I'm specifically seeing active traffic on the bouncer dbs from the "download" user:

This set of tasks clears out the "users who have connnected" table, waits 10 seconds, then reports on all the non-monitoring users:

mysql> truncate performance_schema.users; do sleep(10); select * from performance_schema.users where user not in ('newrelic','repl','nagiosdaemon','collectd','root','checksum') order by user;
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (10.00 sec)

+----------+---------------------+-------------------+
| USER     | CURRENT_CONNECTIONS | TOTAL_CONNECTIONS |
+----------+---------------------+-------------------+
| download |                   4 |                 4 |
+----------+---------------------+-------------------+
1 row in set (0.00 sec)


This will grep only for the "download" in the output, and show all the processes that have "download" somewhere in there. The numbers after "Sleep" indicate how long the user has been in that state. All these connections came from the load balancer, and use the "bouncer" database.

mysql> \P grep download
PAGER set to 'grep download'
mysql> show full processlist;
| 2245925 | download    | 10.8.70.202:29476  | bouncer | Sleep       |     103 |                                                                             | NULL                  |
| 2245971 | download    | 10.8.70.202:36301  | bouncer | Sleep       |      59 |                                                                             | NULL                  |
| 2245988 | download    | 10.8.70.202:38891  | bouncer | Sleep       |      43 |                                                                             | NULL                  |
| 2246029 | download    | 10.8.70.202:44474  | bouncer | Sleep       |       6 |                                                                             | NULL                  |
15 rows in set (0.00 sec)
If desired I can get the actual commands being run, but I suspect that several connections in 10 seconds isn't a cron job (unless it's running all the time?)

Updated

2 years ago
Keywords: spring-cleaning
(Assignee)

Updated

2 years ago
Assignee: server-ops-webops → rwatson
Flags: needinfo?(oremj)
https://bugzilla.mozilla.org/show_bug.cgi?id=1216601

--- Comment #6 from Jeremy Orem [:oremj] <oremj@mozilla.com> 2015-10-23 12:55:58 EDT ---
As long as we have a backup of the db, it should be fine to decomm any time

-------------------

We do have backups, so I am allowing decom of bouncer dbs in phx1, specifically:

It is OK to shut down and kill with fire these servers, according to the https://mana.mozilla.org/wiki/display/SYSADMIN/Server+Decommissioning+Checklist (INCLUDING NAGIOS DOWNTIME!)
bouncer1.db.phx1.mozilla.com - aka 10.8.70.61 - https://inventory.mozilla.org/en-US/systems/show/11547/
bouncer2.db.phx1.mozilla.com - aka 10.8.70.62 - https://inventory.mozilla.org/en-US/systems/show/11448/
bouncer3.db.phx1.mozilla.com - aka 10.8.70.63 - https://inventory.mozilla.org/en-US/systems/show/11444/
bouncer4.db.phx1.mozilla.com - aka 10.8.70.64 - https://inventory.mozilla.org/en-US/systems/show/11320/

And the corresponding load balancer pools/VIPs:
db-bouncer-rw-pool
db-bouncer-ro-pool
db-bouncer-rw https://zlb1.ops.phx1.mozilla.com:9090/apps/zxtm/?name=db-bouncer-rw&section=Virtual%20Servers%3AEdit
db-bouncer-ro https://zlb1.ops.phx1.mozilla.com:9090/apps/zxtm/?name=db-bouncer-ro&section=Virtual%20Servers%3AEdit
Flags: needinfo?(oremj)
Depends on: 1216601

Comment 4

2 years ago
@oremj: does this include the dev and stage instances of bouncer in PHX1 as well?
Flags: needinfo?(oremj)
(Reporter)

Comment 5

2 years ago
Yes, but they are now bouncer-bouncer.stage.mozaws.net and bouncer-bouncer-dev.stage.mozaws.net.
Flags: needinfo?(oremj)
So, the bouncer-dev and bouncer-stage dbs on the dev cluster that formerly lived in phx1 can be decom'd, then?
(Reporter)

Comment 7

2 years ago
That is okay with me.
OK, great. sounds like we need a full list from webops of the web stuff and load balancers affected, then we can pass on to the MOC for triage to decom.

Comment 9

2 years ago
While not a complete list, here are the VMs I'm decom'ing:

10.8.75.89  = bounceradm.private.phx1
10.8.70.61  = bouncer1.db.phx1
10.8.70.62  = bouncer2.db.phx1
10.8.70.63  = bouncer3.db.phx1
10.8.70.64  = bouncer4.db.phx1
10.8.81.150 = bouncer1.dev.webapp.phx1
10.8.81.151 = bouncer1.stage.webapp.phx1
10.8.81.152 = bouncer1.webapp.phx1
10.8.81.153 = bouncer2.webapp.phx1
10.8.81.154 = bouncer3.webapp.phx1
10.8.81.155 = bouncer4.webapp.phx1
10.8.81.156 = bouncer5.webapp.phx1
I probably missed zeus but I'm not looking that deeply right now.

No NFS, no netvault except for bounceradm.
nagios pulled in change 110460.  Powered off.

Comment 10

2 years ago
bounceradm was requested to come back from the dead and migrate.
I'm seeing:

        'bouncer01.zlb.phx.mozilla.net' => {
            parents => 'zlb9.ops.phx1.mozilla.com',
            hostgroups => [
                'zeus-vips',
                'bouncer-vip',
            ]
        },
        'tp-bouncer01-ro-zeus.phx.mozilla.com' => {
            parents => 'zlb1.ops.phx1.mozilla.com, zlb2.ops.phx1.mozilla.com',
            hostgroups => [
                'zeus-vips',
                'mysql-zeus-vips'
            ]
        },
        'tp-bouncer01-rw-zeus.phx.mozilla.com' => {
            parents => 'zlb1.ops.phx1.mozilla.com, zlb2.ops.phx1.mozilla.com',
            hostgroups => [
                'zeus-vips',
                'mysql-zeus-vips'
            ]
        },

I'm assuming these can be removed from nagios?
Yes, I can authorize the last 2 -
'tp-bouncer01-ro-zeus.phx.mozilla.com'
'tp-bouncer01-rw-zeus.phx.mozilla.com' 

Webops can verify 'bouncer01.zlb.phx.mozilla.net'
(Assignee)

Comment 13

2 years ago
Yup to: 
'bouncer01.zlb.phx.mozilla.net'

Traffic stopped on Oct 14th.
Bouncer entries removed. Thanks.
ccing pythian for knowledge.

Comment 16

2 years ago
comment 9 VMs:
Killed DNS, inventory, RHN, puppetdashboard.
Big puppet cleanup:
Sending        manifests/nodes/webapp.pp
Deleting       modules/webapp/manifests/bouncer/dev.pp
Deleting       modules/webapp/manifests/bouncer/stage.pp
Deleting       modules/webapp/files/bouncer-dev
Deleting       modules/webapp/files/bouncer-stage
Deleting       modules/webapp/templates/bouncer-dev
Deleting       modules/webapp/templates/bouncer-stage
Committed revision 111144.

flows, cleared out changes for comment 6 and comment 12's zeus IPs (10.8.70.6[90]), will attach that since it's comprehensive.

VMs deleted from disk.  Spreadsheets updated.

Since this is in webops hands:
* newrelic needs cleanup.
* Didn't touch zeus; I doubt it moved over to neo so that's probably a no-op.. but I didn't want to break anything there finding out.
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/1979] → [kanban:https://webops.kanbanize.com/ctrl_board/2/1979] [vm-delete:11]

Comment 17

2 years ago
Created attachment 8699627 [details]
netops rules cleanup
(Assignee)

Comment 18

2 years ago
newrelic all checks out.
zlb = no-op indeed.

I think we are done here.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.