Closed
Bug 1013350
Opened 11 years ago
Closed 11 years ago
rename socorro-es[1-2].dev.webapp.phx1 to socorro-es[8-9].webapp.phx1 as they're actually prod
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dmaher, Assigned: rbryce)
References
Details
Hello,
The socorro-es[1-2].dev.webapp.phx1 nodes are actually production nodes - they're just named poorly. This is no bueno for a variety of reasons. I propose that they be pulled out of the cluster and re-named when we integrate the additional production nodes (bug 909884); they would therefore become socorro-es[8-9].webapp.phx1 .
Comment 1•11 years ago
|
||
We just added the new socorro-es[4567] nodes to the cluster and they're currently getting shards moved over to them. This will probably need to sit for a few more days while this happens.
Once that's complete, we can disable (remove all shards) from these two nodes and pull them out completely, rename them (reformat even, if we care- won't matter), and put them back in with the proper names.
Comment 2•11 years ago
|
||
socorro-es2.dev.webapp.phx1 is fully drained and removed from the cluster. Elasticsearch is stopped, and puppet is disabled for 99 days. You may proceed with renaming it at your leisure, either via reformat/reinstall or any other method that suits your fancy. :)
New name should be:
socorro-es8.webapp.phx1.mozilla.com
Same IP is fine.
Comment 3•11 years ago
|
||
socorro-es1.dev.webapp.phx1 is now ready for renaming as well.
Comment 4•11 years ago
|
||
Info: Loading facts in /var/lib/puppet/lib/facter/iptables_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/hp_if_info.rb
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Failed to parse template nagios/nagios-host.cfg.erb:
Filepath: /usr/lib/ruby/1.8/resolv.rb
Line: 93
Detail: no address for socorro-es1.dev.webapp.phx1.mozilla.com
at /etc/puppet/modules/nagios/manifests/hosts/phx1.pp:5378 on node nagios1.private.phx1.mozilla.com
Error: Cached catalog for nagios1.private.phx1.mozilla.com failed: Could not parse YAML data for catalog nagios1.private.phx1.mozilla.com: allocator undefined for Proc
[root@nagios1.private.phx1 pradcliffe]# host socorro-es1.dev.webapp.phx1.mozilla.com
Host socorro-es1.dev.webapp.phx1.mozilla.com not found: 3(NXDOMAIN)
Comment 5•11 years ago
|
||
pir@wedge> svn ci -m "removing socorro-es[12].dev.webapp.phx1 because they've been taken out of DNS in bug 1013350"
Sending puppet/trunk/modules/nagios/manifests/hosts/phx1.pp
Sending puppet/trunk/modules/socorro/files/prod/etc/commanderconfig.py
Sending puppet/trunk/modules/socorro/manifests/search/prod.pp
Transmitting file data ...
Committed revision 88331.
| Assignee | ||
Comment 6•11 years ago
|
||
Both servers were rekicked. We have a new KS profile that removes some packages we dont use, as well as recommended swap size for the memory. Repuppetized, inventory records are update to date.
Generic, and HP monitoring re-enabled. When you are ready, you can remove the comment from socorro-es[8-9]'s nagios config to add socorro-elasticsearch monitoring. {puppet/trunk/modules/nagios/manifests/hosts/phx1.pp}
Assignee: server-ops → rbryce
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 7•11 years ago
|
||
Something has gone very haywire here, on both nodes...
First off, they originally had only 2.0TB of storage (after formatting and mounting). Now they have 4.5TB?
Secondly...
[root@socorro-es9.webapp.phx1 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sdb2 4.5T 1.6G 4.3T 1% /
tmpfs 21G 0 21G 0% /dev/shm
/dev/sda1 504M 66M 414M 14% /boot
/dev/sda1 is /boot
/dev/sdb1 is swap
/dev/sdb2 is /
What's going on here? Where'd the extra space come from, and why the change in layout?
Could someone double-check that this is all correct and things are the way they're supposed to be? I just want to be sure before we put these into service. :)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Assignee | ||
Comment 8•11 years ago
|
||
reformatted /dev/sdb1 (ext4)on both socorro-es[8-9]. They are umounted given your choice of mount point and method.
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 114G 1.4G 107G 2% /
tmpfs 21G 0 21G 0% /dev/shm
/dev/sda1 504M 66M 414M 14% /boot
/dev/sdb1 5.5G 140M 5.1G 3% /mnt/temp
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•