Closed Bug 652690 Opened 13 years ago Closed 12 years ago

Multicast heartbeats between lm-zlb01 and 02 failing

Categories

(Infrastructure & Operations Graveyard :: NetOps, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: zandr, Unassigned)

Details

Attachments

(1 file)

While troubleshooting other issues, I observed that both lm-zlb nodes were reporting that they were not hearing heartbeats.

I can't tell you when they started failing, because I can't find any log messages that correlate with the problem. 

I do know that switching to unicast heartbeats solved the problem, and in a 2-node cluster, that's not a big deal, but I think this bears investigation.

The multicast heartbeats were on 239.100.1.1:9090/udp.
Unicast is now working on 9090/udp, the two hosts are 10.2.110.4 and .6, vlan110 in sjc1.
I also some failover events on this cluster from time to time. Other than getting about 35 traps every time it flaps, this is harmless. But it's unusual.

Are there interface counters on this switch that are showing signs of trouble? Seems like we might have cooked another switch card.
Here's what nagios does when we get a flap event.
Some data.

bsx201-07a1 (zlb02):

Apr  7 17:29:11.648 PDT: %SW_MATM-4-MACFLAP_NOTIF: Host 0022.64fc.58fc in vlan 110 is flapping between port Po1 and port Gi0/16
Apr  7 17:29:11.950 PDT: %SW_MATM-4-MACFLAP_NOTIF: Host 0023.7d5b.f2a0 in vlan 110 is flapping between port Po1 and port Gi0/15
Apr  7 17:29:12.152 PDT: %SW_MATM-4-MACFLAP_NOTIF: Host 0023.7d5b.6eb4 in vlan 84 is flapping between port Po1 and port Gi0/3
Apr  7 17:29:12.387 PDT: %SW_MATM-4-MACFLAP_NOTIF: Host 0022.64fc.58fc in vlan 76 is flapping between port Po1 and port Gi0/16
Apr  7 17:29:12.697 PDT: %SW_MATM-4-MACFLAP_NOTIF: Host 0023.7d5b.f2a0 in vlan 76 is flapping between port Po1 and port Gi0/15
Apr  7 17:29:14.190 PDT: %SW_MATM-4-MACFLAP_NOTIF: Host 0023.7d5b.6e8c in vlan 84 is flapping between port Po1 and port Gi0/5
May  2 06:06:26.384 PDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/15, changed state to down
May  2 06:06:29.529 PDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/15, changed state to up

bsx201-07a2 (zlb01):

Nothing interesting.
Is that interface actually bouncing then?
These hosts are gone.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INVALID
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: