node15.peach.metrics.scl3.mozilla.com:HP Log is WARNING: WARNING 0002: Internal Storage Enclosure Device Failure (Bay 6, Box 1, Port 1I, Slot 0)

RESOLVED FIXED

Status

RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: mlankford, Assigned: van)

Tracking

Details

(Whiteboard: Case ID 4653266048)

(Reporter)

Description

3 years ago
6:38 AM <@nagios-scl3> Fri 06:38:58 PST [5236] node15.peach.metrics.scl3.mozilla.com:HP Log is WARNING: WARNING 0002: Internal Storage Enclosure Device Failure (Bay 6, Box 1, Port 1I, Slot 0) (http://m.mozilla.org/HP+Log)
(Reporter)

Comment 1

3 years ago
[root@node15.peach.metrics.scl3 ~]# hpssacli ctrl all show config

Smart Array P420i in Slot 0 (Embedded)    (sn: 00143802518D400)


   Gen8 ServBP 12+2 at Port 1I, Box 1, OK
   array A (SATA, Unused Space: 0  MB)


      logicaldrive 1 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SATA, 3 TB, OK)

   array B (SATA, Unused Space: 0  MB)


      logicaldrive 2 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SATA, 3 TB, OK)

   array C (SATA, Unused Space: 0  MB)


      logicaldrive 3 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SATA, 3 TB, OK)

   array D (SATA, Unused Space: 0  MB)


      logicaldrive 4 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SATA, 3 TB, OK)

   array E (SATA, Unused Space: 0  MB)


      logicaldrive 5 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:5 (port 1I:box 1:bay 5, SATA, 3 TB, OK)

   array F (SATA, Unused Space: 0  MB)


      logicaldrive 6 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SATA, 3 TB, Predictive Failure)  ****

   array G (SATA, Unused Space: 0  MB)


      logicaldrive 7 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:7 (port 1I:box 1:bay 7, SATA, 3 TB, OK)

   array H (SATA, Unused Space: 0  MB)


      logicaldrive 8 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:8 (port 1I:box 1:bay 8, SATA, 3 TB, OK)

   array I (SATA, Unused Space: 0  MB)


      logicaldrive 9 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:9 (port 1I:box 1:bay 9, SATA, 3 TB, OK)

   array J (SATA, Unused Space: 0  MB)


      logicaldrive 10 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:10 (port 1I:box 1:bay 10, SATA, 3 TB, OK)

   array K (SATA, Unused Space: 0  MB)


      logicaldrive 11 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:11 (port 1I:box 1:bay 11, SATA, 3 TB, OK)

   array L (SATA, Unused Space: 0  MB)


      logicaldrive 12 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:12 (port 1I:box 1:bay 12, SATA, 3 TB, OK)

   Enclosure SEP (Vendor ID HP, Model Gen8 ServBP 12+2) 378  (WWID: 5001438022DAAB39, Port: 1I, Box: 1)

   Expander 380  (WWID: 5001438022DAAB20, Port: 1I, Box: 1)

   SEP (Vendor ID PMCSIERA, Model SRCv8x6G) 379  (WWID: 500143802518D40F)

[root@node15.peach.metrics.scl3 ~]# /usr/lib64/nagios/plugins/custom/check_hplog -t l
WARNING 0002: Internal Storage Enclosure Device Failure (Bay 6, Box 1, Port 1I, Slot 0)
[root@node15.peach.metrics.scl3 ~]# hpasmcli -s "clear iml;"

IML Log successfully cleared.

[root@node15.peach.metrics.scl3 ~]#
Opened case #4653266048
Whiteboard: Case ID 4653266048
colo-trip: --- → scl3
(Assignee)

Comment 3

3 years ago
pythian, RMA has arrived. please take this host out of rotation so i can rebuild it.
Assignee: server-ops-dcops → vle

Comment 4

3 years ago
acknowledged, we'll decommission node15 right away.

Comment 5

3 years ago
Node is currently being decommissioned, just pending HDFS block replication.

Comment 6

3 years ago
Hi,

Node has been decomissioned and set to maintenance mode. I corrected logical device 9 (array I) that was appearing as failed though disk was ok and was causing an IO error.

Currently failed disk umounted and ready for replacement:
   array F (SATA, Unused Space: 0  MB) /dev/sdf /data6 
      logicaldrive 6 (2.7 TB, RAID 0, OK)
      physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SATA, 3 TB, Predictive Failure)

Regards,

Nicolas Parducci
Pythian SRE team
(Assignee)

Comment 7

3 years ago
drive replaced and host rebuilt. will ship back RMA.

[vle@node15.peach.metrics.scl3 ~]$ df -hT | sort
cm_processes   tmpfs   32G     0   32G   0% /var/run/cloudera-scm-agent/process
/dev/md0       ext3   248M   76M  160M  32% /boot
/dev/md1       ext4    30G   16G   13G  56% /
/dev/sda4      ext4   2.7T  2.0T  532G  80% /data1
/dev/sdb4      ext4   2.7T  2.0T  579G  78% /data2
/dev/sdc1      ext4   2.7T  2.0T  619G  77% /data3
/dev/sdd1      ext4   2.7T  2.0T  574G  79% /data4
/dev/sde1      ext4   2.7T  2.0T  620G  77% /data5
/dev/sdf1      ext4   2.7T   73M  2.6T   1% /data6
/dev/sdg1      ext4   2.7T  2.0T  619G  77% /data7
/dev/sdh1      ext4   2.7T  2.0T  606G  77% /data8
/dev/sdi1      ext4   2.7T  2.2T  391G  86% /data9
/dev/sdj1      ext4   2.7T  2.0T  597G  78% /data10
/dev/sdk1      ext4   2.7T  2.0T  586G  78% /data11
/dev/sdl1      ext4   2.7T  2.0T  611G  77% /data12
Filesystem     Type   Size  Used Avail Use% Mounted on
tmpfs          tmpfs   32G     0   32G   0% /dev/shm
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED

Comment 8

3 years ago
Exited maintenance mode, all roles started on node. Node is in rotation.
Thanks
You need to log in before you can comment on or make changes to this bug.