Closed Bug 1235844 Opened 8 years ago Closed 8 years ago

vertica2.stage.metrics.scl3.mozilla.com:HP RAID is CRITICAL: RAID CRITICAL - HP Smart Array Failed: Smart Array E200i in Slot 0

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: mlankford, Assigned: van)

Details

Marlena

Reporter

Description

•

8 years ago

8:38 AM <@nagios-scl3> Wed 08:38:13 PST [5180] vertica2.stage.metrics.scl3.mozilla.com:HP RAID is CRITICAL: RAID CRITICAL - HP Smart Array Failed: Smart Array E200i in Slot 0 (Embedded) Controller Status: OK Cache Status: OK Smart Array P400 in Slot 3 Controller Status: OK Cache Status: Temporarily Disabled Battery/Capacitor Status: Failed (Replace Batteries/Capacitors) (http://m.mozilla.org/HP+RAID)

Marlena

Reporter

Comment 1

•

8 years ago

[root@vertica2.stage.metrics.scl3 ~]# hpacucli controller slot=3 show config

Smart Array P400 in Slot 3                (sn: P61630G9SVN6IK)


   Internal Drive Cage at Port 1I, Box 1, OK

   Internal Drive Cage at Port 2I, Box 1, OK
   array A (SAS, Unused Space: 0  MB)


      logicaldrive 1 (546.8 GB, RAID 6, OK)

      physicaldrive 1I:1:5 (port 1I:box 1:bay 5, SAS, 146 GB, OK)
      physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SAS, 146 GB, OK)
      physicaldrive 2I:1:1 (port 2I:box 1:bay 1, SAS, 146 GB, OK)
      physicaldrive 2I:1:2 (port 2I:box 1:bay 2, SAS, 146 GB, OK)
      physicaldrive 2I:1:3 (port 2I:box 1:bay 3, SAS, 146 GB, OK)
      physicaldrive 2I:1:4 (port 2I:box 1:bay 4, SAS, 146 GB, OK)

[root@vertica2.stage.metrics.scl3 ~]# hpacucli controller all show

Smart Array E200i in Slot 0 (Embedded)    (sn: PBACB0A9VVH1B9)
Smart Array P400 in Slot 3                (sn: P61630G9SVN6IK)

[root@vertica2.stage.metrics.scl3 ~]# hpacucli controller all show status

Smart Array E200i in Slot 0 (Embedded)
   Controller Status: OK
   Cache Status: OK

Smart Array P400 in Slot 3
   Controller Status: OK
   Cache Status: Temporarily Disabled
   Battery/Capacitor Status: Failed (Replace Batteries/Capacitors)


[root@vertica2.stage.metrics.scl3 ~]#

Ryan C [:ryanc] (UTC-4)

Updated

•

8 years ago

Assignee: infra → server-ops-dcops

Component: Infrastructure: Other → DCOps

QA Contact: jdow

Van Le [:van]

Assignee

Comment 2

•

8 years ago

looks like another failed RAID battery in the p400 storage blade with no warranty information. the host is out of warranty since feb 2011 so it's be safe to assume they both expired at the same time. do we want to renew the warranty on these 2 devices?

Flags: needinfo?(mpressman)

Van Le [:van]

Assignee

Comment 3

•

8 years ago

also please note this is a G1 blade so perhaps it's also better to just upgrade and renew the service contract on the blade if we decide to keep the host?

QA Contact: jbarnell

Van Le [:van]

Assignee

Updated

•

8 years ago

colo-trip: --- → scl3

Matt Pressman [:mpressman]

Comment 4

•

8 years ago

I'm not sure we know what the time frame is with regard to the vertica servers. They were initially planned to be decommissioned last year and then this quarter, but I don't know if we want to spend more on the stage servers. As of right now, their only purpose is for testing upgrades and I don't see us upgrading without a future plan for the prod service. So, for right now, we can probably hold off on fixing this until a decision is made

Flags: needinfo?(mpressman)

Van Le [:van]

Assignee

Comment 5

•

8 years ago

this is a stage server and it's only affecting the cache on the storage array. going to WONTFIX per c#4.

Assignee: server-ops-dcops → vle

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → WONTFIX

Dave Williams [:daveio]

Comment 6

•

8 years ago

Just resolved the following:

vertica2.stage.metrics.scl3.mozilla.com:HP Health is CRITICAL: CHECK_NRPE: Socket timeout after 60 seconds. (http://m.mozilla.org/HP+Health)

with a hp-health service restart.  Is this machine still appropriate for nagios monitoring?

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

vertica2.stage.metrics.scl3.mozilla.com:HP RAID is CRITICAL: RAID CRITICAL - HP Smart Array Failed: Smart Array E200i in Slot 0

Categories

(Infrastructure & Operations :: DCOps, task)

Tracking

(Not tracked)

People

(Reporter: mlankford, Assigned: van)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6