Last Comment Bug 528186 - primary FWSM failed, secondary failed to complete failover transition
: primary FWSM failed, secondary failed to complete failover transition
Status: RESOLVED FIXED
:
Product: mozilla.org Graveyard
Classification: Graveyard
Component: Server Operations: Projects (show other bugs)
: other
: x86 Mac OS X
: -- critical (vote)
: ---
Assigned To: Derek Moore [:dmoore]
: matthew zeier [:mrz]
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-11-12 05:15 PST by Ben Hearsum (:bhearsum)
Modified: 2015-03-12 08:24 PDT (History)
2 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments

Description Ben Hearsum (:bhearsum) 2009-11-12 05:15:45 PST
Nagios is going off about a bunch of things, I can't connect to mpt-vpn in any way, and we're starting to get alarms about build machines.
Comment 1 Ben Hearsum (:bhearsum) 2009-11-12 05:21:28 PST
Things appear back now...lowering sev
Comment 2 matthew zeier [:mrz] 2009-11-12 05:41:08 PST
Assigning to dmoore for investigation.  Emailed him with some output already.

Short story:
Primary FWSM failed and the Standby never fully finished the takeover until I rebooted the Primary (core1).

Looks like it was about 13 mins of outage.  For RelEng, would have affected any inter-vlan traffic (Vlan90 to Vlan71).
Comment 3 Derek Moore [:dmoore] 2009-11-18 10:22:17 PST
This was a failover failure. The failover sync process (doorbell_poll) crashed on the Active module, which meant it began sending incomplete health messages. The two modules could not successfully negotiate a failover, leaving both of them in an intermediate state.

There is no pending fix from Cisco.

Note You need to log in before you can comment on or make changes to this bug.