Closed Bug 609006 Opened 14 years ago Closed 14 years ago

MoMo Asterisk server can't access MoCo's anymore (63.245.220.10)

Categories

(Infrastructure & Operations Graveyard :: NetOps, task)

x86_64
Linux
task
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gozer, Assigned: ravi)

Details

momo-vm-10*CLI> iax2 show peers
Name/Username    Host                 Mask             Port          Status    
junctionnetwork  66.227.100.30   (S)  255.255.255.255  4569          OK (61 ms)
mountainview/mo  63.245.220.10   (S)  255.255.255.255  4569 (T)      UNREACHABLE
2 iax2 peers [1 online, 1 offline, 0 unmonitored]

Broke relatively recently according to the logs I have:

[Nov  2 04:24:27] NOTICE[9732] chan_iax2.c: Peer 'mountainview' is now REACHABLE!
[Nov  2 04:25:31] NOTICE[9728] chan_iax2.c: Peer 'mountainview' is now UNREACHABLE!
[Nov  2 04:39:41] NOTICE[9730] chan_iax2.c: Peer 'mountainview' is now REACHABLE!
[Nov  2 04:40:45] NOTICE[9732] chan_iax2.c: Peer 'mountainview' is now UNREACHABLE!
[Nov  2 05:08:33] NOTICE[1076] chan_iax2.c: Peer 'mountainview' is now UNREACHABLE
Side effect of this is that inernal SIP users can't dial into MoCo extensions directly anymore..
Ravi made changes to the network touching telephony yesterday night, this is probably fallout. Over to netops.
Assignee: server-ops → network-operations
Component: Server Operations → Server Operations: Netops
I suspect it's also affecting our ability to dial in to the 778 785 1540 dial-in number.
Severity: normal → critical
Interestingly, dundi isn't affected

momo-vm-10*CLI> dundi show peers
EID                  Host                Model      AvgTime  Status         
00:14:38:be:46:93    63.245.220.10   (S) Symmetric  Unavail  OK (1 ms)      
1 dundi peers [1 online, 0 offline, 0 unmonitored]
(In reply to comment #3)
> I suspect it's also affecting our ability to dial in to the 778 785 1540
> dial-in number.

Yes, correct. That dial in number routes to our server directly, and from there to MoCo, so it's also broken.
Worked for a minute just now.

[Nov  2 07:14:26] NOTICE[1070]: chan_iax2.c:7912 socket_process: Peer 'mountainview' is now REACHABLE! Time: 1020
[Nov  2 07:15:31] NOTICE[1076]: chan_iax2.c:8832 __iax2_poke_noanswer: Peer 'mountainview' is now UNREACHABLE! Time: 1020
We're seeing packet loss from at and beyond your core right now.  Still looking into things, but we're not dropping any packets on our side.  The fact the connection comes up for about a minute and then drops is suspect.
Assignee: network-operations → ravi
momo/mountainvi  63.245.221.100  (S)  255.255.255.255  4569 (T)      OK (15 ms)

Looks like it's working from our end currently.  We were having the same problem with the toronto office actually, and that seems to be working again now.  How's it on your end?
Tried to dial 92 from our YVR office phones, got busy signal.
(In reply to comment #8)
> momo/mountainvi  63.245.221.100  (S)  255.255.255.255  4569 (T)      OK (15 ms)
> 
> Looks like it's working from our end currently.  We were having the same
> problem with the toronto office actually, and that seems to be working again
> now.

What was the cause of the toronto office issue? 

  How's it on your end?

It's not our firewall for sure, hasnt changed in over two weeks, and I've got a stable connection to junctionnetwork over iax2 that is healthy. Nothing in my fw that's specific to either of these.
(In reply to comment #7)
> We're seeing packet loss from at and beyond your core right now.  Still looking
> into things, but we're not dropping any packets on our side.  The fact the
> connection comes up for about a minute and then drops is suspect.

Can you tell me a bit more about the packet losses? Not seeing anything on my end, and we are just a few hops away...
Just as another piece of information, DUNDI lookups to moco are failing as well
The toronto problem was we had it configured by domain name, and the DNS changed in the offices, and they weren't seeing the same IPs on both ends.  Changing the config on both sides to use IP addresses seemed to help.  Your config, however, is already using the IP address on our end.
And DUNDI debugging is showing me something else:

[Nov  3 08:34:25] Rx-Frame Retry[No] -- OSeqno: 000 ISeqno: 000 Type: ENCRYPT      (Command)
[Nov  3 08:34:25]      Flags: 00 STrans: 01871  DTrans: 00000 [63.245.220.10:33931]
[Nov  3 08:34:25]    ENTITY IDENT    : 00:14:38:be:46:93
[Nov  3 08:34:25]    KEYCRC32        : 2994019363
[Nov  3 08:34:25]    ENCDATA         : [IV 075e2a2bfe33050e8884076909578e62] 5 encrypted blocks

[Nov  3 08:34:25] 
[Nov  3 08:34:25]     Erx-Frame Retry[No] -- OSeqno: 000 ISeqno: 000 Type: DPDISCOVER   (Command)
[Nov  3 08:34:25]           Flags: 00 STrans: 01871  DTrans: 00000 [63.245.220.10:33931]
[Nov  3 08:34:25]         VERSION         : 1
[Nov  3 08:34:25]         DIRECT EID      : 00:14:38:be:46:93
[Nov  3 08:34:25]         DIRECT EID      : 00:1a:4b:0b:f0:38
[Nov  3 08:34:25]         DIRECT EID      : 00:14:38:be:46:93
[Nov  3 08:34:25]         CALLED NUMBER   : 1
[Nov  3 08:34:25]         CALLED CONTEXT  : interoffice-backup
[Nov  3 08:34:25]         TTL             : 1
[Nov  3 08:34:25] 
[Nov  3 08:34:25]     ETx-Frame Retry[No] -- OSeqno: 000 ISeqno: 001 Type: DPRESPONSE   (Response)
[Nov  3 08:34:25]           Flags: 00 STrans: 11650  DTrans: 01871 [63.245.220.10:33931] (Final)
[Nov  3 08:34:25]         CAUSE           : NOAUTH: Unsupported DUNDI Context
tcpdumping from my asterisk server, all I can see is a lot of unanswered IAX POKEs from my end with only a few responses from ringring.mv

[Nov  3 08:55:36] Tx-Frame Retry[000] -- OSeqno: 000 ISeqno: 000 Type: IAX     Subclass: POKE   
[Nov  3 08:55:36]    Timestamp: 00011ms  SCall: 05886  DCall: 00000 [63.245.220.10:4569]
[Nov  3 08:55:37] Tx-Frame Retry[001] -- OSeqno: 000 ISeqno: 000 Type: IAX     Subclass: POKE   
[Nov  3 08:55:37]    Timestamp: 00011ms  SCall: 05886  DCall: 00000 [63.245.220.10:4569]
[Nov  3 08:55:39] Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 000 Type: IAX     Subclass: POKE   
[Nov  3 08:55:39]    Timestamp: 00010ms  SCall: 13988  DCall: 00000 [63.245.220.10:38642]
[Nov  3 08:55:39] Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 001 Type: IAX     Subclass: ACK    
[Nov  3 08:55:39]    Timestamp: 00016ms  SCall: 13988  DCall: 00001 [63.245.220.10:38642]

But I don't see any registration attempts or anything like it that I normally see with my junctionnetwork peer.

Has something changed about the MV Asterisk server ?
And one more mystery for the win (fromIRC)

[15:09] ravi: and it works, and essentially i did nothing on the srx side
[15:09] justdave: and in the end I've no clue what we did that finally made it
work.
[15:10] justdave: because the config is basically the same as how it started
out again
[15:10] ravi: yeah, so i'm not willing to say it is the srx any more than i
amwilling to say it is the *
[15:10] ravi: something is strange, but it works
[15:11] mrz: it broke 21 hours after ravi made a change
[15:11] mrz: and was working after the SRX upgrade
[15:11] ravi: we're all in the weeds enough that we should just let this be
as-is, but keep a mental note about it for the future

Solution, just use a different IP for the MoCo MV asterisk server.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.