Closed Bug 1240039 Opened 9 years ago Closed 9 years ago

node15.peach.metrics.scl3.mozilla.com:HP Log is WARNING: WARNING 0002: Internal Storage Enclosure Device Failure (Bay 6, Box 1, Port 1I, Slot 0)

Categories

(Infrastructure & Operations :: DCOps, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mlankford, Assigned: van)

Details

(Whiteboard: Case ID 4653266048)

6:38 AM <@nagios-scl3> Fri 06:38:58 PST [5236] node15.peach.metrics.scl3.mozilla.com:HP Log is WARNING: WARNING 0002: Internal Storage Enclosure Device Failure (Bay 6, Box 1, Port 1I, Slot 0) (http://m.mozilla.org/HP+Log)
[root@node15.peach.metrics.scl3 ~]# hpssacli ctrl all show config Smart Array P420i in Slot 0 (Embedded) (sn: 00143802518D400) Gen8 ServBP 12+2 at Port 1I, Box 1, OK array A (SATA, Unused Space: 0 MB) logicaldrive 1 (2.7 TB, RAID 0, OK) physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SATA, 3 TB, OK) array B (SATA, Unused Space: 0 MB) logicaldrive 2 (2.7 TB, RAID 0, OK) physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SATA, 3 TB, OK) array C (SATA, Unused Space: 0 MB) logicaldrive 3 (2.7 TB, RAID 0, OK) physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SATA, 3 TB, OK) array D (SATA, Unused Space: 0 MB) logicaldrive 4 (2.7 TB, RAID 0, OK) physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SATA, 3 TB, OK) array E (SATA, Unused Space: 0 MB) logicaldrive 5 (2.7 TB, RAID 0, OK) physicaldrive 1I:1:5 (port 1I:box 1:bay 5, SATA, 3 TB, OK) array F (SATA, Unused Space: 0 MB) logicaldrive 6 (2.7 TB, RAID 0, OK) physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SATA, 3 TB, Predictive Failure) **** array G (SATA, Unused Space: 0 MB) logicaldrive 7 (2.7 TB, RAID 0, OK) physicaldrive 1I:1:7 (port 1I:box 1:bay 7, SATA, 3 TB, OK) array H (SATA, Unused Space: 0 MB) logicaldrive 8 (2.7 TB, RAID 0, OK) physicaldrive 1I:1:8 (port 1I:box 1:bay 8, SATA, 3 TB, OK) array I (SATA, Unused Space: 0 MB) logicaldrive 9 (2.7 TB, RAID 0, OK) physicaldrive 1I:1:9 (port 1I:box 1:bay 9, SATA, 3 TB, OK) array J (SATA, Unused Space: 0 MB) logicaldrive 10 (2.7 TB, RAID 0, OK) physicaldrive 1I:1:10 (port 1I:box 1:bay 10, SATA, 3 TB, OK) array K (SATA, Unused Space: 0 MB) logicaldrive 11 (2.7 TB, RAID 0, OK) physicaldrive 1I:1:11 (port 1I:box 1:bay 11, SATA, 3 TB, OK) array L (SATA, Unused Space: 0 MB) logicaldrive 12 (2.7 TB, RAID 0, OK) physicaldrive 1I:1:12 (port 1I:box 1:bay 12, SATA, 3 TB, OK) Enclosure SEP (Vendor ID HP, Model Gen8 ServBP 12+2) 378 (WWID: 5001438022DAAB39, Port: 1I, Box: 1) Expander 380 (WWID: 5001438022DAAB20, Port: 1I, Box: 1) SEP (Vendor ID PMCSIERA, Model SRCv8x6G) 379 (WWID: 500143802518D40F) [root@node15.peach.metrics.scl3 ~]# /usr/lib64/nagios/plugins/custom/check_hplog -t l WARNING 0002: Internal Storage Enclosure Device Failure (Bay 6, Box 1, Port 1I, Slot 0) [root@node15.peach.metrics.scl3 ~]# hpasmcli -s "clear iml;" IML Log successfully cleared. [root@node15.peach.metrics.scl3 ~]#
Opened case #4653266048
Whiteboard: Case ID 4653266048
colo-trip: --- → scl3
pythian, RMA has arrived. please take this host out of rotation so i can rebuild it.
Assignee: server-ops-dcops → vle
acknowledged, we'll decommission node15 right away.
Node is currently being decommissioned, just pending HDFS block replication.
Hi, Node has been decomissioned and set to maintenance mode. I corrected logical device 9 (array I) that was appearing as failed though disk was ok and was causing an IO error. Currently failed disk umounted and ready for replacement: array F (SATA, Unused Space: 0 MB) /dev/sdf /data6 logicaldrive 6 (2.7 TB, RAID 0, OK) physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SATA, 3 TB, Predictive Failure) Regards, Nicolas Parducci Pythian SRE team
drive replaced and host rebuilt. will ship back RMA. [vle@node15.peach.metrics.scl3 ~]$ df -hT | sort cm_processes tmpfs 32G 0 32G 0% /var/run/cloudera-scm-agent/process /dev/md0 ext3 248M 76M 160M 32% /boot /dev/md1 ext4 30G 16G 13G 56% / /dev/sda4 ext4 2.7T 2.0T 532G 80% /data1 /dev/sdb4 ext4 2.7T 2.0T 579G 78% /data2 /dev/sdc1 ext4 2.7T 2.0T 619G 77% /data3 /dev/sdd1 ext4 2.7T 2.0T 574G 79% /data4 /dev/sde1 ext4 2.7T 2.0T 620G 77% /data5 /dev/sdf1 ext4 2.7T 73M 2.6T 1% /data6 /dev/sdg1 ext4 2.7T 2.0T 619G 77% /data7 /dev/sdh1 ext4 2.7T 2.0T 606G 77% /data8 /dev/sdi1 ext4 2.7T 2.2T 391G 86% /data9 /dev/sdj1 ext4 2.7T 2.0T 597G 78% /data10 /dev/sdk1 ext4 2.7T 2.0T 586G 78% /data11 /dev/sdl1 ext4 2.7T 2.0T 611G 77% /data12 Filesystem Type Size Used Avail Use% Mounted on tmpfs tmpfs 32G 0 32G 0% /dev/shm
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Exited maintenance mode, all roles started on node. Node is in rotation. Thanks
You need to log in before you can comment on or make changes to this bug.