Closed Bug 1240039 Opened 8 years ago Closed 8 years ago

node15.peach.metrics.scl3.mozilla.com:HP Log is WARNING: WARNING 0002: Internal Storage Enclosure Device Failure (Bay 6, Box 1, Port 1I, Slot 0)

Categories

(Infrastructure & Operations :: DCOps, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mlankford, Assigned: van)

Details

(Whiteboard: Case ID 4653266048)

6:38 AM <@nagios-scl3> Fri 06:38:58 PST [5236] node15.peach.metrics.scl3.mozilla.com:HP Log is WARNING: WARNING 0002: Internal Storage Enclosure Device Failure (Bay 6, Box 1, Port 1I, Slot 0) (http://m.mozilla.org/HP+Log)
[root@node15.peach.metrics.scl3 ~]# hpssacli ctrl all show config

Smart Array P420i in Slot 0 (Embedded)    (sn: 00143802518D400)


   Gen8 ServBP 12+2 at Port 1I, Box 1, OK
   array A (SATA, Unused Space: 0  MB)


      logicaldrive 1 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SATA, 3 TB, OK)

   array B (SATA, Unused Space: 0  MB)


      logicaldrive 2 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SATA, 3 TB, OK)

   array C (SATA, Unused Space: 0  MB)


      logicaldrive 3 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SATA, 3 TB, OK)

   array D (SATA, Unused Space: 0  MB)


      logicaldrive 4 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SATA, 3 TB, OK)

   array E (SATA, Unused Space: 0  MB)


      logicaldrive 5 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:5 (port 1I:box 1:bay 5, SATA, 3 TB, OK)

   array F (SATA, Unused Space: 0  MB)


      logicaldrive 6 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SATA, 3 TB, Predictive Failure)  ****

   array G (SATA, Unused Space: 0  MB)


      logicaldrive 7 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:7 (port 1I:box 1:bay 7, SATA, 3 TB, OK)

   array H (SATA, Unused Space: 0  MB)


      logicaldrive 8 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:8 (port 1I:box 1:bay 8, SATA, 3 TB, OK)

   array I (SATA, Unused Space: 0  MB)


      logicaldrive 9 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:9 (port 1I:box 1:bay 9, SATA, 3 TB, OK)

   array J (SATA, Unused Space: 0  MB)


      logicaldrive 10 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:10 (port 1I:box 1:bay 10, SATA, 3 TB, OK)

   array K (SATA, Unused Space: 0  MB)


      logicaldrive 11 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:11 (port 1I:box 1:bay 11, SATA, 3 TB, OK)

   array L (SATA, Unused Space: 0  MB)


      logicaldrive 12 (2.7 TB, RAID 0, OK)

      physicaldrive 1I:1:12 (port 1I:box 1:bay 12, SATA, 3 TB, OK)

   Enclosure SEP (Vendor ID HP, Model Gen8 ServBP 12+2) 378  (WWID: 5001438022DAAB39, Port: 1I, Box: 1)

   Expander 380  (WWID: 5001438022DAAB20, Port: 1I, Box: 1)

   SEP (Vendor ID PMCSIERA, Model SRCv8x6G) 379  (WWID: 500143802518D40F)

[root@node15.peach.metrics.scl3 ~]# /usr/lib64/nagios/plugins/custom/check_hplog -t l
WARNING 0002: Internal Storage Enclosure Device Failure (Bay 6, Box 1, Port 1I, Slot 0)
[root@node15.peach.metrics.scl3 ~]# hpasmcli -s "clear iml;"

IML Log successfully cleared.

[root@node15.peach.metrics.scl3 ~]#
Opened case #4653266048
Whiteboard: Case ID 4653266048
colo-trip: --- → scl3
pythian, RMA has arrived. please take this host out of rotation so i can rebuild it.
Assignee: server-ops-dcops → vle
acknowledged, we'll decommission node15 right away.
Node is currently being decommissioned, just pending HDFS block replication.
Hi,

Node has been decomissioned and set to maintenance mode. I corrected logical device 9 (array I) that was appearing as failed though disk was ok and was causing an IO error.

Currently failed disk umounted and ready for replacement:
   array F (SATA, Unused Space: 0  MB) /dev/sdf /data6 
      logicaldrive 6 (2.7 TB, RAID 0, OK)
      physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SATA, 3 TB, Predictive Failure)

Regards,

Nicolas Parducci
Pythian SRE team
drive replaced and host rebuilt. will ship back RMA.

[vle@node15.peach.metrics.scl3 ~]$ df -hT | sort
cm_processes   tmpfs   32G     0   32G   0% /var/run/cloudera-scm-agent/process
/dev/md0       ext3   248M   76M  160M  32% /boot
/dev/md1       ext4    30G   16G   13G  56% /
/dev/sda4      ext4   2.7T  2.0T  532G  80% /data1
/dev/sdb4      ext4   2.7T  2.0T  579G  78% /data2
/dev/sdc1      ext4   2.7T  2.0T  619G  77% /data3
/dev/sdd1      ext4   2.7T  2.0T  574G  79% /data4
/dev/sde1      ext4   2.7T  2.0T  620G  77% /data5
/dev/sdf1      ext4   2.7T   73M  2.6T   1% /data6
/dev/sdg1      ext4   2.7T  2.0T  619G  77% /data7
/dev/sdh1      ext4   2.7T  2.0T  606G  77% /data8
/dev/sdi1      ext4   2.7T  2.2T  391G  86% /data9
/dev/sdj1      ext4   2.7T  2.0T  597G  78% /data10
/dev/sdk1      ext4   2.7T  2.0T  586G  78% /data11
/dev/sdl1      ext4   2.7T  2.0T  611G  77% /data12
Filesystem     Type   Size  Used Avail Use% Mounted on
tmpfs          tmpfs   32G     0   32G   0% /dev/shm
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Exited maintenance mode, all roles started on node. Node is in rotation.
Thanks
You need to log in before you can comment on or make changes to this bug.