TBPL has just been end of lifed (all content is being redirected, data import cron job stopped etc). This bug is _just_ for decommissioning the DB nodes used by TBPL - other bugs (which will be added to bug 1054977's dep tree soon) will handle the moving of redirects to Zeus (they are currently served from the app root via generic), deletion of src/www directories from generic, removing of flows etc etc. Sheeri, should this be in the DB related components or Infrastructure & Operations? Not sure what DB monitoring pieces need to be turned off by your team prior to hardware decom. I'm presuming for this bug we'll need to: 1) Disable Nagios/... 2) Remove puppet entries 3) Power down/decom the hardware The nodes in question are: tbpl1.db.phx1.mozilla.com tbpl2.db.phx1.mozilla.com tbpl2-new.db.phx1.mozilla.com (https://rpm.newrelic.com/accounts/263620/servers?filterBy=Project:Tbpl&filterBy=Type:Database) No data needs to be retained.
Assignee: infra → nobody
Component: Infrastructure: Other → MOC: Service Requests
QA Contact: jdow → lypulong
(In reply to Ed Morley (Away 23rd March -> 1st April) [:edmorley] from comment #0) > I'm presuming for this bug we'll need to: > 1) Disable Nagios/... Oh and I guess we'll also need to disable the automated data expiration job added by bug 779290.
Yep! we have a protocol for this all..... https://mana.mozilla.org/wiki/display/SYSADMIN/Server+Decommissioning+Checklist This makes me so happy!
We can also remove the db from the shared dev instance, right? tbpl1.db.phx1.mozilla.com aka 10.8.70.53 (warranty expired; known, on purpose) tbpl2.db.phx1.mozilla.com aka 10.8.70.54 (warranty expired; known, on purpose) tbpl2-new.db.phx1.mozilla.com aka 10.8.70.161 (warranty covered until 9/2015) It is ok to power down these 3 servers!
Removed from Nagios in revision 103068. Removed from puppet db stuff (includes newrelic, hiera, cron scripts/auto purge, configuration management and backups) in svn revision 103069.
Ran puppet on the newrelic proxy to remove tbpl from active newrelic instances. Restarted Nagios to ensure all changes made in previous step were kosher. (they were!)
netvault isn't running (/etc/init.d/netvault isn't present) None of the machines have NFS/external mounts Puppet has been delayed for a year on all 3 servers.
sudo shutdown -h now has been performed on all three machines.
I will continue with the steps tomorrow.
(In reply to Sheeri Cabral [:sheeri] from comment #3) > We can also remove the db from the shared dev instance, right? Yup - everything TBPL related can go :-)
w00t! tbpl dev db dropped, 40G of data GONE BABY GONE.
I've removed the now inactive TBPL DB entries for the MySQL plugin: https://rpm.newrelic.com/accounts/263620/plugins/2805?utf8=%E2%9C%93&id=2805&search[name]=tbpl And also on the servers page: https://rpm.newrelic.com/accounts/263620/servers?filterBy=Project:Tbpl (In reply to Sheeri Cabral [:sheeri] from comment #2) > https://mana.mozilla.org/wiki/display/SYSADMIN/ > Server+Decommissioning+Checklist Is there any way I can be given access to view that page? It would have saved a bunch of headscratching on my part as to figure out which bugs I had to file - and I get a "page not found" at the moment for it. I totally get that some pages need to be private, but the permissions are too strict IMO for many pages under SYSADMIN at the moment :-(
I'm not sure about permissions on that page, it's not really "mine" to change permissions on. I could export it and mail you a PDF of the page, but that doesn't really solve the problem that someone *like* you doesn't know what to do. In general, if you have questions about IT stuff, #it is a great place to ask, too. I wish I had a better answer :( Usually with decoms, everyone is in the loop and you could just say "OK now that we no longer need it, how do we decom this service?" Kind of like how you say "we have this new service, how do we get it into production?"
DNS records for tbpl dbs and the VIPs for tbpl dbs are deleted. Deleted the Zeus LB pool and virtual server entries for tbpl-ro and tbpl-rw virtual IPs. cc Shyam: I noticed that dm-tbpl01.mozilla.org is still in DNS and resolves to 10.2.74.89 - is that even still valid?
Changed status in inventory (2 machines are decom'd, 1 is a spare) Removed networking from key/value pairs in inventory so DHCP entries are removed. Deleted RHN profiles for these db machines.
Removed from puppet dashboard. Removed dbs from puppet.
:fox2mike I can't needinfo you twice, but I'm noticing a lot of tbpl entries still in puppet manifests, like puppet/trunk/manifests/nodes/netops.pp and puppet/trunk/manifests/nodes/elasticsearch.pp Can your team (webops) double-check and make sure only the puppet entries we *need* are still in there?
Created bug 1152345 for de-racking. All work except the 2 items in comment 13 and comment 16 are complete. Once those 2 questions are addressed, this bug can be resolved.
I assigned the bug for me to do it. Thought I got sick so you guys beat me to it.
Ludo - no worries! I jumped on this because tbpl db's have always been a problem, so it gave me SUCH JOY to kill 'em. :D Glad you're feeling better!
(In reply to Sheeri Cabral [:sheeri] from comment #12) > Usually with decoms, everyone is in the loop and you could just say "OK now > that we no longer need it, how do we decom this service?" Kind of like how > you say "we have this new service, how do we get it into production?" Makes sense, thank you. Just wanted to avoid being lazy :-)
(In reply to Sheeri Cabral [:sheeri] from comment #13) > cc Shyam: I noticed that dm-tbpl01.mozilla.org is still in DNS and resolves > to 10.2.74.89 - is that even still valid? Seems like someone has already deleted this. I've removed the PTR record as well and no, sjc1 is gone and 10.2 isn't valid. (In reply to Sheeri Cabral [:sheeri] from comment #16) > :fox2mike I can't needinfo you twice, but I'm noticing a lot of tbpl entries > still in puppet manifests, like puppet/trunk/manifests/nodes/netops.pp and > puppet/trunk/manifests/nodes/elasticsearch.pp A needinfo set just a day ago and your statement above is why there's a disable needinfo option in Bugzilla now :) You need to give people sometime to actually read and respond to bugs. > Can your team (webops) double-check and make sure only the puppet entries we > *need* are still in there? Filed Bug 1153091 for netops to remove tbpl from smokeping configs Filed Bug 1153092 for webops to figure out ES changes Everything else is fine, I cleaned out the following files (they had the Zeus VIP in there) : modules/webapp/manifests/admin/genericrhel6.pp modules/webapp/manifests/genericrhel6/prod.pp And deleted the following file : modules/webapp/files/genericrhel6-dev/etc-httpd/domains/tbpl-passwd in sysadmins r103364.
Visiting https://tbpl-dev.allizom.org gives a 403 rather than a DNS resolution error (or equivalent) - expected?
(In reply to Ed Morley [:emorley] (formerly :edmorley) from comment #22) > Visiting https://tbpl-dev.allizom.org gives a 403 rather than a DNS > resolution error (or equivalent) - expected? Sorry meant to comment on bug 1152225.
(In reply to Ed Morley [:emorley] (formerly :edmorley) from comment #23) > (In reply to Ed Morley [:emorley] (formerly :edmorley) from comment #22) > > Visiting https://tbpl-dev.allizom.org gives a 403 rather than a DNS > > resolution error (or equivalent) - expected? > > Sorry meant to comment on bug 1152225. Replied there, but for history sake on this bug, yes *.allizom.org will resolve to an IP in DNS. For example : shyam@katniss ~ $ host foobar.allizom.org foobar.allizom.org has address 188.8.131.52 shyam@katniss ~ $ host testing.allizom.org testing.allizom.org has address 184.108.40.206
Resolving, as there are bugs for the open issues, but the dbs themselves are decom'd.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.