Closed Bug 1013350 Opened 11 years ago Closed 11 years ago

rename socorro-es[1-2].dev.webapp.phx1 to socorro-es[8-9].webapp.phx1 as they're actually prod

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dmaher, Assigned: rbryce)

References

Details

Hello, The socorro-es[1-2].dev.webapp.phx1 nodes are actually production nodes - they're just named poorly. This is no bueno for a variety of reasons. I propose that they be pulled out of the cluster and re-named when we integrate the additional production nodes (bug 909884); they would therefore become socorro-es[8-9].webapp.phx1 .
We just added the new socorro-es[4567] nodes to the cluster and they're currently getting shards moved over to them. This will probably need to sit for a few more days while this happens. Once that's complete, we can disable (remove all shards) from these two nodes and pull them out completely, rename them (reformat even, if we care- won't matter), and put them back in with the proper names.
socorro-es2.dev.webapp.phx1 is fully drained and removed from the cluster. Elasticsearch is stopped, and puppet is disabled for 99 days. You may proceed with renaming it at your leisure, either via reformat/reinstall or any other method that suits your fancy. :) New name should be: socorro-es8.webapp.phx1.mozilla.com Same IP is fine.
socorro-es1.dev.webapp.phx1 is now ready for renaming as well.
Info: Loading facts in /var/lib/puppet/lib/facter/iptables_version.rb Info: Loading facts in /var/lib/puppet/lib/facter/hp_if_info.rb Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Failed to parse template nagios/nagios-host.cfg.erb: Filepath: /usr/lib/ruby/1.8/resolv.rb Line: 93 Detail: no address for socorro-es1.dev.webapp.phx1.mozilla.com at /etc/puppet/modules/nagios/manifests/hosts/phx1.pp:5378 on node nagios1.private.phx1.mozilla.com Error: Cached catalog for nagios1.private.phx1.mozilla.com failed: Could not parse YAML data for catalog nagios1.private.phx1.mozilla.com: allocator undefined for Proc [root@nagios1.private.phx1 pradcliffe]# host socorro-es1.dev.webapp.phx1.mozilla.com Host socorro-es1.dev.webapp.phx1.mozilla.com not found: 3(NXDOMAIN)
pir@wedge> svn ci -m "removing socorro-es[12].dev.webapp.phx1 because they've been taken out of DNS in bug 1013350" Sending puppet/trunk/modules/nagios/manifests/hosts/phx1.pp Sending puppet/trunk/modules/socorro/files/prod/etc/commanderconfig.py Sending puppet/trunk/modules/socorro/manifests/search/prod.pp Transmitting file data ... Committed revision 88331.
Both servers were rekicked. We have a new KS profile that removes some packages we dont use, as well as recommended swap size for the memory. Repuppetized, inventory records are update to date. Generic, and HP monitoring re-enabled. When you are ready, you can remove the comment from socorro-es[8-9]'s nagios config to add socorro-elasticsearch monitoring. {puppet/trunk/modules/nagios/manifests/hosts/phx1.pp}
Assignee: server-ops → rbryce
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Blocks: 1018890
Something has gone very haywire here, on both nodes... First off, they originally had only 2.0TB of storage (after formatting and mounting). Now they have 4.5TB? Secondly... [root@socorro-es9.webapp.phx1 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sdb2 4.5T 1.6G 4.3T 1% / tmpfs 21G 0 21G 0% /dev/shm /dev/sda1 504M 66M 414M 14% /boot /dev/sda1 is /boot /dev/sdb1 is swap /dev/sdb2 is / What's going on here? Where'd the extra space come from, and why the change in layout? Could someone double-check that this is all correct and things are the way they're supposed to be? I just want to be sure before we put these into service. :)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
reformatted /dev/sdb1 (ext4)on both socorro-es[8-9]. They are umounted given your choice of mount point and method. Filesystem Size Used Avail Use% Mounted on /dev/sda3 114G 1.4G 107G 2% / tmpfs 21G 0 21G 0% /dev/shm /dev/sda1 504M 66M 414M 14% /boot /dev/sdb1 5.5G 140M 5.1G 3% /mnt/temp
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.