primary FWSM failed, secondary failed to complete failover transition

RESOLVED FIXED

Status

mozilla.org Graveyard
Server Operations: Projects
--
critical
RESOLVED FIXED
8 years ago
2 years ago

People

(Reporter: bhearsum, Assigned: dmoore)

Tracking

Details

(Reporter)

Description

8 years ago
Nagios is going off about a bunch of things, I can't connect to mpt-vpn in any way, and we're starting to get alarms about build machines.
(Reporter)

Comment 1

8 years ago
Things appear back now...lowering sev
Severity: blocker → critical

Comment 2

8 years ago
Assigning to dmoore for investigation.  Emailed him with some output already.

Short story:
Primary FWSM failed and the Standby never fully finished the takeover until I rebooted the Primary (core1).

Looks like it was about 13 mins of outage.  For RelEng, would have affected any inter-vlan traffic (Vlan90 to Vlan71).
Assignee: server-ops → dmoore

Updated

8 years ago
Summary: mpt-vpn, bm-vmware03, other things appear dead → primary FWSM failed, secondary failed to complete failover transition
(Assignee)

Comment 3

8 years ago
This was a failover failure. The failover sync process (doorbell_poll) crashed on the Active module, which meant it began sending incomplete health messages. The two modules could not successfully negotiate a failover, leaving both of them in an intermediate state.

There is no pending fix from Cisco.

Updated

8 years ago
Component: Server Operations → Server Operations: Projects
(Assignee)

Updated

7 years ago
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.