Closed
Bug 804658
Opened 12 years ago
Closed 12 years ago
Brief outage of services on genericrhel6 prod pool
Categories
(Infrastructure & Operations Graveyard :: WebOps: Other, task, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bburton, Assigned: bburton)
References
Details
(Whiteboard: [service interrupt])
Failover of a component in the seamicro chassis that currently houses generic1-5.webapp.phx1 caused a brief outage of services hosted on the genericrhel6 cluster, including * wiki.mozilla.org * blog.mozilla.org * tbpl.mozilla.org * etc Will post additional details shortly
Assignee | ||
Updated•12 years ago
|
Assignee: server-ops-webops → bburton
Priority: -- → P1
Whiteboard: [service interrupt]
Assignee | ||
Comment 1•12 years ago
|
||
Two factors played into this outage 1. Our blade server was set to draining due to maintenance work and was never put back into rotation 2. All five of our seamicro nodes were in the single chassis that had a component failover To rectify this we're taking the following actions 1. We've re-enabled the blade and confirmed it's good to serve prod traffic 2. We're going to move generic4-5 to another seamicro chassis to prevent chassis failure causing an outage We'll update this bug once #2 is complete and RF it
Comment 2•12 years ago
|
||
We used to have a nagios alert that would flag on any zeus backend notes not marked as active. Would this have caught that?
Comment 3•12 years ago
|
||
(In reply to Brandon Burton [:solarce] from comment #1) > > 2. We're going to move generic4-5 to another seamicro chassis to prevent > chassis failure causing an outage this is now complete per bug 804669.
Assignee | ||
Comment 4•12 years ago
|
||
Per :cturra's work, RF!
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•