Closed Bug 880313 Opened 12 years ago Closed 12 years ago

Re-image elasticsearch6.metrics.scl3.mozilla.com with usual ES partition config

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mreid, Assigned: rbryce)

Details

(Whiteboard: IX Ticket ZNS-162408)

Following up from bug 848508 and bug 868518. The elasticsearch[4578] machines all have similar disk partitioning, for example: [mreid@elasticsearch4.metrics.scl3 ~]$ mount /dev/sda2 on / type ext4 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) /dev/sda1 on /boot type ext4 (rw) /dev/sda3 on /data01 type ext4 (rw) /dev/sdb1 on /data02 type ext4 (rw) /dev/sdc1 on /data03 type ext4 (rw) /dev/sdd1 on /data04 type ext4 (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) elasticsearch6 currently has a different setup which appears to only use some of the disks (sda and sdd): [mreid@elasticsearch6.metrics.scl3 ~]$ mount /dev/sdd1 on / type ext4 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) /dev/sda1 on /boot type ext4 (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) Can elasticsearch6 be re-imaged with the same partitions as the other machines in the cluster?
Assignee: server-ops → rbryce
Reimaged with the proper partitions. root@elasticsearch6 ~]# mount /dev/sda2 on / type ext4 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) /dev/sda1 on /boot type ext4 (rw) /dev/sda5 on /data01 type ext4 (rw) /dev/sdb1 on /data02 type ext4 (rw) /dev/sdc1 on /data03 type ext4 (rw) /dev/sdd1 on /data04 type ext4 (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Seems like we lost this node again, iX and since you had some issues the other day, I didn't poke much : [07:02:00] < nagios-scl3> | (IRC) Sun 07:01:59 PDT [556] elasticsearch6.metrics.scl3.mozilla.com is DOWN :PING CRITICAL - Packet loss = 100%
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
looks like we have a bad hard drive. Fileing for a replacement with IX. Will update with IX ticket details shortly.
/dev/sdd is FAILED. IX ticket ZNS-162408 for replacement.
Whiteboard: IX Ticket ZNS-162408
The drives in this server are out of warranty. IX systems is sending a quote for a new drive today. Do we want to procure extra drives? As, Im sure most of the drives in elasticsearch1-6 are in the same boat.
havent heard back about the drive qoute. :dre please see comment 5. Another option is to see if we can extend the warranty on the servers with IX systems. Or look to migrate theses nodes to newer infrastructure.
Flags: needinfo?(deinspanjer)
Rick, I'm sorry, yes, we should pick up a small quantity of drives so we can replace others as they fail. Maybe 4 or 5?
Flags: needinfo?(deinspanjer)
New drives should arrive on 7-18-13.
Created a new KS profile for these nodes. Its called elasticsearch under the metrics pxe menu. Hard drive replaced, rekickstarted, and puppetized.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.