Multicast heartbeats between lm-zlb01 and 02 failing

RESOLVED INVALID

Status

Infrastructure & Operations
NetOps
RESOLVED INVALID
7 years ago
5 years ago

People

(Reporter: zandr, Unassigned)

Tracking

Details

Attachments

(1 attachment)

(Reporter)

Description

7 years ago
While troubleshooting other issues, I observed that both lm-zlb nodes were reporting that they were not hearing heartbeats.

I can't tell you when they started failing, because I can't find any log messages that correlate with the problem. 

I do know that switching to unicast heartbeats solved the problem, and in a 2-node cluster, that's not a big deal, but I think this bears investigation.

The multicast heartbeats were on 239.100.1.1:9090/udp.
Unicast is now working on 9090/udp, the two hosts are 10.2.110.4 and .6, vlan110 in sjc1.
(Reporter)

Comment 1

7 years ago
I also some failover events on this cluster from time to time. Other than getting about 35 traps every time it flaps, this is harmless. But it's unusual.

Are there interface counters on this switch that are showing signs of trouble? Seems like we might have cooked another switch card.
(Reporter)

Comment 2

7 years ago
Created attachment 529379 [details]
Nagios alerts on a flap event

Here's what nagios does when we get a flap event.

Comment 3

7 years ago
Some data.

bsx201-07a1 (zlb02):

Apr  7 17:29:11.648 PDT: %SW_MATM-4-MACFLAP_NOTIF: Host 0022.64fc.58fc in vlan 110 is flapping between port Po1 and port Gi0/16
Apr  7 17:29:11.950 PDT: %SW_MATM-4-MACFLAP_NOTIF: Host 0023.7d5b.f2a0 in vlan 110 is flapping between port Po1 and port Gi0/15
Apr  7 17:29:12.152 PDT: %SW_MATM-4-MACFLAP_NOTIF: Host 0023.7d5b.6eb4 in vlan 84 is flapping between port Po1 and port Gi0/3
Apr  7 17:29:12.387 PDT: %SW_MATM-4-MACFLAP_NOTIF: Host 0022.64fc.58fc in vlan 76 is flapping between port Po1 and port Gi0/16
Apr  7 17:29:12.697 PDT: %SW_MATM-4-MACFLAP_NOTIF: Host 0023.7d5b.f2a0 in vlan 76 is flapping between port Po1 and port Gi0/15
Apr  7 17:29:14.190 PDT: %SW_MATM-4-MACFLAP_NOTIF: Host 0023.7d5b.6e8c in vlan 84 is flapping between port Po1 and port Gi0/5
May  2 06:06:26.384 PDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/15, changed state to down
May  2 06:06:29.529 PDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/15, changed state to up

bsx201-07a2 (zlb01):

Nothing interesting.

Comment 4

7 years ago
Is that interface actually bouncing then?

Comment 5

6 years ago
These hosts are gone.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → INVALID
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.