Closed
Bug 1064892
Opened 10 years ago
Closed 9 years ago
vertica2.stage.metrics.scl3.mozilla.com:HP RAID is CRITICAL: RAID CRITICAL
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: rwatson, Unassigned)
References
Details
(Whiteboard: out of warranty [data: consultative])
nagios-scl3 Tue 07:12:19 PDT [5277] vertica2.stage.metrics.scl3.mozilla.com:HP RAID is CRITICAL: RAID CRITICAL - HP Smart Array Failed: Smart Array E200i in Slot 0 (Embedded) Controller Status: OK Cache Status: OK Smart Array P400 in Slot 3 Controller Status: OK Cache Status: Temporarily Disabled Battery/Capacitor Status: Failed (Replace Batteries/Capacitors) sudo hpacucli controller all show status Smart Array E200i in Slot 0 (Embedded) Controller Status: OK Cache Status: OK Smart Array P400 in Slot 3 Controller Status: OK Cache Status: Temporarily Disabled Battery/Capacitor Status: Failed (Replace Batteries/Capacitors)
Updated•10 years ago
|
colo-trip: --- → scl3
Comment 1•10 years ago
|
||
<nagios-scl3> Tue 13:12:20 PDT [5034] vertica2.stage.metrics.scl3.mozilla.com:HP RAID is CRITICAL: RAID CRITICAL - HP Smart Array Failed: Smart Array E200i in Slot 0 (Embedded) Controller Status: OK Cache Status: OK Smart Array P400 in Slot 3 Controller Status: OK Cache Status: Temporarily Disabled Battery/Capacitor Status: Failed (Replace Batteries/Capacitors) (http://m.mozilla.org/HP+RAID)
Comment 2•10 years ago
|
||
this blade is a G1 and out of warranty since 2011. any idea who the owner is? we should ask if they want to upgrade this server (we have spare g6/g7), p2v, or decommission it.
Assignee: server-ops-dcops → server-ops
Component: Server Operations: DCOps → Server Operations
QA Contact: dmoore → shyam
Whiteboard: out of warranty
Comment 3•10 years ago
|
||
vertica is a metrics server. cc/ tmary srich
Comment 4•10 years ago
|
||
What's the impact here? Is Vertica accessible? This cannot be decom'd or PTV'd. I'm assuming the blade can be replaced, but not sure if both 1 and 2 need to be upgraded.
Comment 5•10 years ago
|
||
Note: this is a stage server. Impact: Vertica is still accessible - production is not affected, and stage is still up. RAID functionality on the disks is compromised, so the system is not as fully redundant as we'd like it to be. This is a stage server, which is why it's out of warranty, as are vertica1 and vertica2 in stage. We are working on testing out how much hardware we want/need for Vertica stage, but other issues have prevented us from moving forward the last 2 months. However, this is a priority for Q4. For now, please upgrade with the spare g6. You can create it as vertica4.stage, and we'll add it to the cluster and decommission vertica2.stage. In Q4 we will come up with a plan for what we want to do with the rest of the hardware in Vertica stage, as well as Vertica production.
Comment 6•10 years ago
|
||
Thanks, Sheeri. Rick, Ashlee, Ryan: will you confirm this is not vertica2.stage.metrics.scl3-storage? Even though it's named stage, it's listed in Inventory as production.
Comment 7•10 years ago
|
||
We have 4 statuses in inventory : Building Decomisionned spare production Stage servers are marked as production.
Comment 8•10 years ago
|
||
Ludo - Sean is asking if this is the storage blade, or the server blade.
Comment 9•10 years ago
|
||
(In reply to Sheeri Cabral [:sheeri] from comment #8) > Ludo - Sean is asking if this is the storage blade, or the server blade. The alarm is for the battery on the *storage blade* itself. I can see that is not clear in the alert. The storage blade has an integrated raid controller, with a replaceable battery. Storage Blade Manufacturer HP Product Name HP StorageWorks SB40c Part Number 411243-B21 System Board Spare Part Number 430798-001 Serial Number SGI833003L ROM Version 1.20
Comment 10•10 years ago
|
||
*nod* and from what I understand, the battery is only used if there's a power outage, to finish saving any disk changes. Is that right? Seems like a very low risk here, for a stage machine, if that's the case.
Updated•10 years ago
|
Whiteboard: out of warranty → out of warranty [consultative]
Updated•10 years ago
|
Whiteboard: out of warranty [consultative] → out of warranty [data: consultative]
Reporter | ||
Comment 11•9 years ago
|
||
Closing this bug out as there a multiple bugs for decom in 2015 and as Sheeri mentions, this is low risk.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
Updated•9 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•