Decom the TBPL DBs (tbpl1.db.phx1.mozilla.com, tbpl2.db.phx1.mozilla.com, tbpl2-new.db.phx1.mozilla.com)

RESOLVED FIXED

Status

Infrastructure & Operations
MOC: Service Requests
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: emorley, Assigned: Usul)

Tracking

({spring-cleaning})

Details

(Reporter)

Description

3 years ago
TBPL has just been end of lifed (all content is being redirected, data import cron job stopped etc).

This bug is _just_ for decommissioning the DB nodes used by TBPL - other bugs (which will be added to bug 1054977's dep tree soon) will handle the moving of redirects to Zeus (they are currently served from the app root via generic), deletion of src/www directories from generic, removing of flows etc etc.

Sheeri, should this be in the DB related components or Infrastructure & Operations? Not sure what DB monitoring pieces need to be turned off by your team prior to hardware decom.

I'm presuming for this bug we'll need to:
1) Disable Nagios/...
2) Remove puppet entries
3) Power down/decom the hardware

The nodes in question are:
tbpl1.db.phx1.mozilla.com
tbpl2.db.phx1.mozilla.com
tbpl2-new.db.phx1.mozilla.com
(https://rpm.newrelic.com/accounts/263620/servers?filterBy=Project:Tbpl&filterBy=Type:Database)

No data needs to be retained.
Assignee: infra → nobody
Component: Infrastructure: Other → MOC: Service Requests
QA Contact: jdow → lypulong
(Assignee)

Updated

3 years ago
Assignee: nobody → ludovic

Updated

3 years ago
Keywords: spring-cleaning
(Reporter)

Comment 1

3 years ago
(In reply to Ed Morley (Away 23rd March -> 1st April) [:edmorley] from comment #0)
> I'm presuming for this bug we'll need to:
> 1) Disable Nagios/...

Oh and I guess we'll also need to disable the automated data expiration job added by bug 779290.
Yep! we have a protocol for this all.....

https://mana.mozilla.org/wiki/display/SYSADMIN/Server+Decommissioning+Checklist

This makes me so happy!
We can also remove the db from the shared dev instance, right?


tbpl1.db.phx1.mozilla.com aka 10.8.70.53 (warranty expired; known, on purpose)
tbpl2.db.phx1.mozilla.com aka 10.8.70.54 (warranty expired; known, on purpose)
tbpl2-new.db.phx1.mozilla.com aka 10.8.70.161 (warranty covered until 9/2015)


It is ok to power down these 3 servers!
Removed from Nagios in revision 103068.

Removed from puppet db stuff (includes newrelic, hiera, cron scripts/auto purge, configuration management and backups) in svn revision 103069.
Ran puppet on the newrelic proxy to remove tbpl from active newrelic instances.

Restarted Nagios to ensure all changes made in previous step were kosher. (they were!)
netvault isn't running (/etc/init.d/netvault isn't present)

None of the machines have NFS/external mounts

Puppet has been delayed for a year on all 3 servers.
 sudo shutdown -h now 

has been performed on all three machines.
I will continue with the steps tomorrow.
(Reporter)

Comment 9

3 years ago
(In reply to Sheeri Cabral [:sheeri] from comment #3)
> We can also remove the db from the shared dev instance, right?

Yup - everything TBPL related can go :-)
w00t! tbpl dev db dropped, 40G of data GONE BABY GONE.
(Reporter)

Comment 11

3 years ago
I've removed the now inactive TBPL DB entries for the MySQL plugin:
https://rpm.newrelic.com/accounts/263620/plugins/2805?utf8=%E2%9C%93&id=2805&search[name]=tbpl

And also on the servers page:
https://rpm.newrelic.com/accounts/263620/servers?filterBy=Project:Tbpl

(In reply to Sheeri Cabral [:sheeri] from comment #2)
> https://mana.mozilla.org/wiki/display/SYSADMIN/
> Server+Decommissioning+Checklist

Is there any way I can be given access to view that page? It would have saved a bunch of headscratching on my part as to figure out which bugs I had to file - and I get a "page not found" at the moment for it. I totally get that some pages need to be private, but the permissions are too strict IMO for many pages under SYSADMIN at the moment :-(
(Reporter)

Updated

3 years ago
Blocks: 1152225
I'm not sure about permissions on that page, it's not really "mine" to change permissions on. I could export it and mail you a PDF of the page, but that doesn't really solve the problem that someone *like* you doesn't know what to do.

In general, if you have questions about IT stuff, #it is a great place to ask, too. I wish I had a better answer :(

Usually with decoms, everyone is in the loop and you could just say "OK now that we no longer need it, how do we decom this service?" Kind of like how you say "we have this new service, how do we get it into production?"
DNS records for tbpl dbs and the VIPs for tbpl dbs are deleted.

Deleted the Zeus LB pool and virtual server entries for tbpl-ro and tbpl-rw virtual IPs.


cc Shyam: I noticed that dm-tbpl01.mozilla.org is still in DNS and resolves to 10.2.74.89 - is that even still valid?
Flags: needinfo?(smani)
Changed status in inventory (2 machines are decom'd, 1 is a spare)

Removed networking from key/value pairs in inventory so DHCP entries are removed.

Deleted RHN profiles for these db machines.
Removed from puppet dashboard.

Removed dbs from puppet.
:fox2mike I can't needinfo you twice, but I'm noticing a lot of tbpl entries still in puppet manifests, like puppet/trunk/manifests/nodes/netops.pp and puppet/trunk/manifests/nodes/elasticsearch.pp

Can your team (webops) double-check and make sure only the puppet entries we *need* are still in there?
Created bug 1152345 for de-racking.

All work except the 2 items in comment 13 and comment 16 are complete. Once those 2 questions are addressed, this bug can be resolved.
(Assignee)

Comment 18

3 years ago
I assigned the bug for me to do it. Thought I got sick so you guys beat me to it.
Ludo - no worries! I jumped on this because tbpl db's have always been a problem, so it gave me SUCH JOY to kill 'em. :D

Glad you're feeling better!
(Reporter)

Comment 20

3 years ago
(In reply to Sheeri Cabral [:sheeri] from comment #12)
> Usually with decoms, everyone is in the loop and you could just say "OK now
> that we no longer need it, how do we decom this service?" Kind of like how
> you say "we have this new service, how do we get it into production?"

Makes sense, thank you. Just wanted to avoid being lazy :-)

Updated

3 years ago
No longer blocks: 1152225
See Also: → bug 1152225
(In reply to Sheeri Cabral [:sheeri] from comment #13)

> cc Shyam: I noticed that dm-tbpl01.mozilla.org is still in DNS and resolves
> to 10.2.74.89 - is that even still valid?

Seems like someone has already deleted this. I've removed the PTR record as well and no, sjc1 is gone and 10.2 isn't valid. 

(In reply to Sheeri Cabral [:sheeri] from comment #16)
> :fox2mike I can't needinfo you twice, but I'm noticing a lot of tbpl entries
> still in puppet manifests, like puppet/trunk/manifests/nodes/netops.pp and
> puppet/trunk/manifests/nodes/elasticsearch.pp

A needinfo set just a day ago and your statement above is why there's a disable needinfo option in Bugzilla now :) You need to give people sometime to actually read and respond to bugs.
 
> Can your team (webops) double-check and make sure only the puppet entries we
> *need* are still in there?

Filed Bug 1153091 for netops to remove tbpl from smokeping configs
Filed Bug 1153092 for webops to figure out ES changes 

Everything else is fine, I cleaned out the following files (they had the Zeus VIP in there) :

modules/webapp/manifests/admin/genericrhel6.pp
modules/webapp/manifests/genericrhel6/prod.pp

And deleted the following file :

modules/webapp/files/genericrhel6-dev/etc-httpd/domains/tbpl-passwd

in sysadmins r103364.
Depends on: 1153091, 1153092
Flags: needinfo?(smani)
(Reporter)

Comment 22

3 years ago
Visiting https://tbpl-dev.allizom.org gives a 403 rather than a DNS resolution error (or equivalent) - expected?
(Reporter)

Comment 23

3 years ago
(In reply to Ed Morley [:emorley] (formerly :edmorley) from comment #22)
> Visiting https://tbpl-dev.allizom.org gives a 403 rather than a DNS
> resolution error (or equivalent) - expected?

Sorry meant to comment on bug 1152225.
(In reply to Ed Morley [:emorley] (formerly :edmorley) from comment #23)
> (In reply to Ed Morley [:emorley] (formerly :edmorley) from comment #22)
> > Visiting https://tbpl-dev.allizom.org gives a 403 rather than a DNS
> > resolution error (or equivalent) - expected?
> 
> Sorry meant to comment on bug 1152225.

Replied there, but for history sake on this bug, yes *.allizom.org will resolve to an IP in DNS. For example :

shyam@katniss ~ $ host foobar.allizom.org
foobar.allizom.org has address 63.245.217.83

shyam@katniss ~ $ host testing.allizom.org
testing.allizom.org has address 63.245.217.83
Resolving, as there are bugs for the open issues, but the dbs themselves are decom'd.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.