Closed Bug 1136051 Opened 5 years ago Closed 5 years ago

signaling and ice unittests hit assertion

Categories

(Core :: WebRTC: Networking, defect, P2)

x86
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox39 --- fixed

People

(Reporter: drno, Assigned: drno)

References

Details

Attachments

(2 files, 2 obsolete files)

Attached file log.txt
When executing the signaling_unittests at home they hit this assertion:
https://dxr.mozilla.org/mozilla-central/source/media/mtransport/third_party/nICEr/src/ice/ice_media_stream.c#401

when executing the Variants/SignalingTest.OfferAnswerNoTrickle/0 test.

This if condition
https://dxr.mozilla.org/mozilla-central/source/media/mtransport/third_party/nICEr/src/ice/ice_component.c#1126
is true because the checks have not started yet, but the ice_state in fact is NR_ICE_MEDIA_STREAM_CHECKS_COMPLETED already.

Should we add CHECKS_COMPLETED to the if condition in in_component.c?

Or is the real problem that the caller first claims that its gathering state reached 'complete' and then later appears to be switching back to 'gathering'?

(ice/DEBUG) ICE(PC:1424761210350635): Read 64 bytes from IP4:73.202.152.46:37935/UDP
(stun/DEBUG) ICE(PC:1424761210350635): Message is STUN (other)[64]=0101002c2112a4421e2fde466a3cff09e4ba8c6b002000080001a56968d83c6c000800145a054cd371a0956711a48937b9b9ad0c882df096802800048abbe152
(stun/DEBUG) STUN-CLIENT(QfHC|IP4:10.0.1.19:52016/UDP|IP4:73.202.152.46:37935/UDP(host(IP4:10.0.1.19:52016/UDP)|candidate:3 1 UDP 1694236671 73.202.152.46 37935 typ srflx raddr 10.0.1.19 rport 58530)): Received check response (my_addr=IP4:10.0.1.19:52016/UDP,peer_addr=IP4:73.202.152.46:37935/UDP)
(stun/DEBUG) STUN-CLIENT(QfHC|IP4:10.0.1.19:52016/UDP|IP4:73.202.152.46:37935/UDP(host(IP4:10.0.1.19:52016/UDP)|candidate:3 1 UDP 1694236671 73.202.152.46 37935 typ srflx raddr 10.0.1.19 rport 58530)): Received [64]=0101002c2112a4421e2fde466a3cff09e4ba8c6b002000080001a56968d83c6c000800145a054cd371a0956711a48937b9b9ad0c882df096802800048abbe152
(stun/DEBUG) Parsing STUN message of 64 bytes
(stun/DEBUG) Parsed MsgType: BINDING-RESPONSE
(stun/DEBUG) Parsed Length: 44
(stun/DEBUG) Parsed Cookie: 2112a442
(stun/DEBUG) Parsed ID[12]=1e2fde466a3cff09e4ba8c6b
(stun/DEBUG) size = 44
(stun/DEBUG) Masked XOR-MAPPED-ADDRESS = IP4:104.216.60.108:42345/UDP
(stun/DEBUG) Unmasked XOR-MAPPED-ADDRESS = IP4:73.202.152.46:33915/UDP
(stun/DEBUG) Parsed XOR-MAPPED-ADDRESS: IP4:73.202.152.46:33915/UDP (unmasked) IP4:104.216.60.108:42345/UDP (masked)
(stun/DEBUG) Before pedantic attr_info checks
(stun/DEBUG) After pedantic attr_info checks
(stun/DEBUG) size = 32
(stun/DEBUG) Computing MESSAGE-INTEGRITY
(stun/DEBUG) Computed MESSAGE-INTEGRITY [20]=5a054cd371a0956711a48937b9b9ad0c882df096
(stun/DEBUG) MESSAGE-INTEGRITY[20]=5a054cd371a0956711a48937b9b9ad0c882df096
(stun/DEBUG) Before pedantic attr_info checks
(stun/DEBUG) After pedantic attr_info checks
(stun/DEBUG) size = 8
(stun/DEBUG) Computed FINGERPRINT 8abbe152
(stun/DEBUG) Parsed FINGERPRINT: 8abbe152
(stun/DEBUG) Before pedantic attr_info checks
(stun/DEBUG) After pedantic attr_info checks
(stun/DEBUG) Starting to sanity check encoding
(stun/DEBUG) STUN-CLIENT(QfHC|IP4:10.0.1.19:52016/UDP|IP4:73.202.152.46:37935/UDP(host(IP4:10.0.1.19:52016/UDP)|candidate:3 1 UDP 1694236671 73.202.152.46 37935 typ srflx raddr 10.0.1.19 rport 58530)): successfully received response; processing
(stun/DEBUG) STUN-CLIENT(QfHC|IP4:10.0.1.19:52016/UDP|IP4:73.202.152.46:37935/UDP(host(IP4:10.0.1.19:52016/UDP)|candidate:3 1 UDP 1694236671 73.202.152.46 37935 typ srflx raddr 10.0.1.19 rport 58530)): Successfully parsed mode=10
(stun/DEBUG) STUN-CLIENT(QfHC|IP4:10.0.1.19:52016/UDP|IP4:73.202.152.46:37935/UDP(host(IP4:10.0.1.19:52016/UDP)|candidate:3 1 UDP 1694236671 73.202.152.46 37935 typ srflx raddr 10.0.1.19 rport 58530)): Received mapped address: IP4:73.202.152.46:33915/UDP
(ice/DEBUG) ICE-PEER(PC:1424761210350635:default)/STREAM(1424761210350635 level=0)/CAND-PAIR(QfHC): STUN cb on pair addr = QfHC|IP4:10.0.1.19:52016/UDP|IP4:73.202.152.46:37935/UDP(host(IP4:10.0.1.19:52016/UDP)|candidate:3 1 UDP 1694236671 73.202.152.46 37935 typ srflx raddr 10.0.1.19 rport 58530)
(ice/INFO) ICE-PEER(PC:1424761210350635:default)/CAND-PAIR(zXSq): setting pair to state FROZEN: zXSq|IP4:73.202.152.46:33915/UDP|IP4:73.202.152.46:37935/UDP(srflx(IP4:10.0.1.19:52016/UDP|IP4:23.21.150.121:3478/UDP)|candidate:3 1 UDP 1694236671 73.202.152.46 37935 typ srflx raddr 10.0.1.19 rport 58530)
(ice/INFO) ICE(PC:1424761210350635)/CAND-PAIR(zXSq): Pairing candidate IP4:73.202.152.46:33915/UDP (64fbffff):IP4:73.202.152.46:37935/UDP (64fbffff) priority=7276691097017384959 (64fbffffc9f7ffff)
(registry/DEBUG) Fetched node 'ice.pref.type.peer_rflx' ('UCHAR')
(registry/DEBUG) Fetched node 'ice.pref.interface.en0' ('UCHAR')
(registry/DEBUG) Couldn't fetch node 'stun.client.retransmission_timeout' ('UINT4')
(registry/DEBUG) Couldn't fetch node 'stun.client.retransmission_backoff_factor' ('double')
(registry/DEBUG) Fetched node 'stun.client.maximum_transmits' ('UINT4')
(registry/DEBUG) Couldn't fetch node 'stun.client.final_retransmit_backoff' ('UINT4')
(registry/DEBUG) Fetched node 'stun.allow_loopback' ('char')
(ice/INFO) ICE-PEER(PC:1424761210350635:default)/CAND-PAIR(zXSq): setting pair to state SUCCEEDED: zXSq|IP4:73.202.152.46:33915/UDP|IP4:73.202.152.46:37935/UDP(srflx(IP4:10.0.1.19:52016/UDP|IP4:23.21.150.121:3478/UDP)|candidate:3 1 UDP 1694236671 73.202.152.46 37935 typ srflx raddr 10.0.1.19 rport 58530)
(ice/DEBUG) ICE-PEER(PC:1424761210350635:default)/CAND-PAIR(zXSq): Ensure that check timer is running for new pair zXSq|IP4:73.202.152.46:33915/UDP|IP4:73.202.152.46:37935/UDP(srflx(IP4:10.0.1.19:52016/UDP|IP4:23.21.150.121:3478/UDP)|candidate:3 1 UDP 1694236671 73.202.152.46 37935 typ srflx raddr 10.0.1.19 rport 58530).
Assertion failed: (0), function nr_ice_media_stream_start_checks, file /Users/nohlmeier/src/mozilla-central/media/mtransport/third_party/nICEr/src/ice/ice_media_stream.c, line 401.
Process 31470 stopped
* thread #8: tid = 0x1bf24f, 0x00007fff8d2cd286 libsystem_kernel.dylib`__pthread_kill + 10, name = 'Socket Thread', stop reason = signal SIGABRT
    frame #0: 0x00007fff8d2cd286 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill + 10:
-> 0x7fff8d2cd286:  jae    0x7fff8d2cd290            ; __pthread_kill + 20
   0x7fff8d2cd288:  movq   %rax, %rdi
   0x7fff8d2cd28b:  jmp    0x7fff8d2c8c53            ; cerror_nocancel
   0x7fff8d2cd290:  retq   
(lldb) bt
* thread #8: tid = 0x1bf24f, 0x00007fff8d2cd286 libsystem_kernel.dylib`__pthread_kill + 10, name = 'Socket Thread', stop reason = signal SIGABRT
  * frame #0: 0x00007fff8d2cd286 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff8cf4c42f libsystem_pthread.dylib`pthread_kill + 90
    frame #2: 0x00007fff8bbe7b53 libsystem_c.dylib`abort + 129
    frame #3: 0x00007fff8bbafc39 libsystem_c.dylib`__assert_rtn + 321
    frame #4: 0x000000010124d550 signaling_unittests`nr_ice_media_stream_start_checks(pctx=0x0000000114a6a56c, stream=0x000000011abfc53c) + 64 at ice_media_stream.c:401
    frame #5: 0x00000001012485c3 signaling_unittests`nr_ice_component_insert_pair(pcomp=0x000000011abfc5ec, pair=0x000000011f3da74c) + 323 at ice_component.c:1128
    frame #6: 0x0000000101244cae signaling_unittests`nr_ice_candidate_pair_stun_cb(s=0x0000000000000000, how=0, cb_arg=0x000000011adf870c) + 1998 at ice_candidate_pair.c:282
    frame #7: 0x000000010125c866 signaling_unittests`nr_stun_client_process_response(ctx=0x000000011a4fb40c, msg=0x000000011610d860, len=64, peer_addr=0x000000011610d6e8) + 6006 at stun_client_ctx.c:742
    frame #8: 0x0000000101253875 signaling_unittests`nr_ice_socket_readable_cb(s=0x0000000114a8ca80, how=0, cb_arg=0x00000001186a28dc) + 1253 at ice_socket.c:114
    frame #9: 0x0000000100f55fe8 signaling_unittests`mozilla::NrSocketBase::fire_callback(this=0x0000000114a8ca80, how=0) + 184 at nr_socket_prsock.cpp:187
    frame #10: 0x0000000100f56596 signaling_unittests`mozilla::NrSocket::OnSocketReady(this=0x0000000114a8ca80, fd=0x00000001186cedc0, outflags=1) + 70 at nr_socket_prsock.cpp:197
    frame #11: 0x0000000100f5660a signaling_unittests`non-virtual thunk to mozilla::NrSocket::OnSocketReady(this=0x0000000114a8cb30, fd=0x00000001186cedc0, outflags=1) + 58 at nr_socket_prsock.cpp:200
    frame #12: 0x00000001032aafce XUL`nsSocketTransportService::DoPollIteration(this=0x0000000114a22260, wait=true) + 1262 at nsSocketTransportService2.cpp:882
    frame #13: 0x00000001032aa907 XUL`nsSocketTransportService::Run(this=0x0000000114a22260) + 279 at nsSocketTransportService2.cpp:732
    frame #14: 0x00000001032ab36c XUL`non-virtual thunk to nsSocketTransportService::Run(this=0x0000000114a22278) + 28 at nsSocketTransportService2.cpp:776
    frame #15: 0x00000001031075b8 XUL`nsThread::ProcessNextEvent(this=0x0000000114a8b4f0, aMayWait=false, aResult=0x000000011610fc4e) + 2088 at nsThread.cpp:855
    frame #16: 0x00000001031618c7 XUL`NS_ProcessNextEvent(aThread=0x0000000114a8b4f0, aMayWait=false) + 151 at nsThreadUtils.cpp:265
    frame #17: 0x000000010379c153 XUL`mozilla::ipc::MessagePumpForNonMainThreads::Run(this=0x0000000114d18cc0, aDelegate=0x0000000114a74330) + 691 at MessagePump.cpp:339
    frame #18: 0x0000000103730115 XUL`MessageLoop::RunInternal(this=0x0000000114a74330) + 117 at message_loop.cc:233
    frame #19: 0x0000000103730025 XUL`MessageLoop::RunHandler(this=0x0000000114a74330) + 21 at message_loop.cc:226
    frame #20: 0x000000010372ffcd XUL`MessageLoop::Run(this=0x0000000114a74330) + 45 at message_loop.cc:200
    frame #21: 0x0000000103105a46 XUL`nsThread::ThreadFunc(aArg=0x0000000114a8b4f0) + 358 at nsThread.cpp:356
    frame #22: 0x0000000102d6d58f libnss3.dylib`_pt_root(arg=0x0000000114a1c2d0) + 463 at ptthread.c:212
    frame #23: 0x00007fff8cf4a268 libsystem_pthread.dylib`_pthread_body + 131
    frame #24: 0x00007fff8cf4a1e5 libsystem_pthread.dylib`_pthread_start + 176
    frame #25: 0x00007fff8cf4841d libsystem_pthread.dylib`thread_start + 13
Byron any ideas on my questions above?
Flags: needinfo?(docfaraday)
Flags: firefox-backlog+
Priority: -- → P2
How are we in NR_ICE_MEDIA_STREAM_CHECKS_COMPLETED without ever starting checks? The only place this transition occurs is when we have a nominated pair, which I don't see happening here.
Flags: needinfo?(docfaraday)
Sorry I forgot to update this ticket: I'm no longer able to reproduce the problem. So I'm closing it for now. I'll keep running the test regularly at home and re-open this if I re-encounter this problem.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
Here we go again, this time in the ice unittests... same assertion:

[ RUN      ] IceConnectTest.TestConnectP2ThenP1
Candidate initialized: candidate:0 1 UDP 2126643455 10.0.1.19 55167 typ host
Candidate initialized: candidate:0 1 UDP 2126643455 10.0.1.19 62832 typ host
Candidate initialized: candidate:1 1 UDP 1690501119 73.202.152.46 40316 typ srflx raddr 10.0.1.19 rport 55167
Gathering complete for P1
CANDIDATES:
Stream P1
candidate:0 1 UDP 2126643455 10.0.1.19 55167 typ host
candidate:1 1 UDP 1690501119 73.202.152.46 40316 typ srflx raddr 10.0.1.19 rport 55167

Candidate initialized: candidate:1 1 UDP 1690501119 73.202.152.46 37330 typ srflx raddr 10.0.1.19 rport 62832
Gathering complete for P2
CANDIDATES:
Stream P2
candidate:0 1 UDP 2126643455 10.0.1.19 62832 typ host
candidate:1 1 UDP 1690501119 73.202.152.46 37330 typ srflx raddr 10.0.1.19 rport 62832

Returning candidate: candidate:0 1 UDP 2126643455 10.0.1.19 55167 typ host
Returning candidate: candidate:1 1 UDP 1690501119 73.202.152.46 40316 typ srflx raddr 10.0.1.19 rport 55167
Candidate: candidate:0 1 UDP 2126643455 10.0.1.19 55167 typ host
Candidate: candidate:1 1 UDP 1690501119 73.202.152.46 40316 typ srflx raddr 10.0.1.19 rport 55167
ICE checking P2
Returning candidate: candidate:0 1 UDP 2126643455 10.0.1.19 62832 typ host
Returning candidate: candidate:1 1 UDP 1690501119 73.202.152.46 37330 typ srflx raddr 10.0.1.19 rport 62832
Candidate: candidate:0 1 UDP 2126643455 10.0.1.19 62832 typ host
Candidate: candidate:1 1 UDP 1690501119 73.202.152.46 37330 typ srflx raddr 10.0.1.19 rport 62832
ICE checking P1
Stream ready P2:stream0 ct=1
Begin list of candidate pairs [

Local --> host 10.0.1.19:62832 codeword=dv0g
Remote --> host 10.0.1.19:55167 codeword=7o3U
state = 4 priority = 9133864093730734591 nominated = 1 selected = 1 codeword = c9j4

Local --> host 10.0.1.19:62832 codeword=dv0g
Remote --> srflx 73.202.152.46:40316 codeword=NvSu
state = 3 priority = 7260647024209691135 nominated = 0 selected = 0 codeword = V836

Local --> srflx 10.0.1.19:62832 codeword=O5a+
Remote --> srflx 73.202.152.46:40316 codeword=NvSu
state = 4 priority = 7260647023337406463 nominated = 0 selected = 0 codeword = q+Fx
]
ICE completed P2
Stream ready P1:stream0 ct=1
Begin list of candidate pairs [

Local --> host 10.0.1.19:55167 codeword=1ll9
Remote --> host 10.0.1.19:62832 codeword=wskn
state = 4 priority = 9133864093730734591 nominated = 1 selected = 1 codeword = hL4g

Local --> host 10.0.1.19:55167 codeword=1ll9
Remote --> srflx 73.202.152.46:37330 codeword=KUsB
state = 2 priority = 7260647024209691134 nominated = 0 selected = 0 codeword = 33cc
]
ICE completed P1
Assertion failed: (0), function nr_ice_media_stream_start_checks, file /Users/nohlmeier/src/mozilla-central/media/mtransport/third_party/nICEr/src/ice/ice_media_stream.c, line 401.
Process 5494 stopped
* thread #8: tid = 0x12dce, 0x00007fff8dd43286 libsystem_kernel.dylib`__pthread_kill + 10, name = 'Socket Thread', stop reason = signal SIGABRT
    frame #0: 0x00007fff8dd43286 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill + 10:
-> 0x7fff8dd43286:  jae    0x7fff8dd43290            ; __pthread_kill + 20
   0x7fff8dd43288:  movq   %rax, %rdi
   0x7fff8dd4328b:  jmp    0x7fff8dd3ec53            ; cerror_nocancel
   0x7fff8dd43290:  retq   
(lldb) bt
* thread #8: tid = 0x12dce, 0x00007fff8dd43286 libsystem_kernel.dylib`__pthread_kill + 10, name = 'Socket Thread', stop reason = signal SIGABRT
  * frame #0: 0x00007fff8dd43286 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff8737a42f libsystem_pthread.dylib`pthread_kill + 90
    frame #2: 0x00007fff8c11bb53 libsystem_c.dylib`abort + 129
    frame #3: 0x00007fff8c0e3c39 libsystem_c.dylib`__assert_rtn + 321
    frame #4: 0x00000001000ca570 ice_unittest`nr_ice_media_stream_start_checks(pctx=0x00000001133c930c, stream=0x0000000116b6638c) + 64 at ice_media_stream.c:401
    frame #5: 0x00000001000c55e3 ice_unittest`nr_ice_component_insert_pair(pcomp=0x0000000116b6644c, pair=0x00000001133c9eec) + 323 at ice_component.c:1128
    frame #6: 0x00000001000c1cce ice_unittest`nr_ice_candidate_pair_stun_cb(s=0x0000000000000000, how=0, cb_arg=0x00000001133c9e4c) + 1998 at ice_candidate_pair.c:282
    frame #7: 0x00000001000d9886 ice_unittest`nr_stun_client_process_response(ctx=0x000000011332128c, msg=0x000000011470d830, len=64, peer_addr=0x000000011470d6b8) + 6006 at stun_client_ctx.c:742
    frame #8: 0x00000001000d0895 ice_unittest`nr_ice_socket_readable_cb(s=0x0000000116b42200, how=0, cb_arg=0x000000011303dc0c) + 1253 at ice_socket.c:114
    frame #9: 0x0000000100075bd8 ice_unittest`mozilla::NrSocketBase::fire_callback(this=0x0000000116b42200, how=0) + 184 at nr_socket_prsock.cpp:187
    frame #10: 0x0000000100076186 ice_unittest`mozilla::NrSocket::OnSocketReady(this=0x0000000116b42200, fd=0x00000001130e8350, outflags=1) + 70 at nr_socket_prsock.cpp:197
    frame #11: 0x00000001000761fa ice_unittest`non-virtual thunk to mozilla::NrSocket::OnSocketReady(this=0x0000000116b422b0, fd=0x00000001130e8350, outflags=1) + 58 at nr_socket_prsock.cpp:200
    frame #12: 0x0000000100dc833e XUL`nsSocketTransportService::DoPollIteration(this=0x00000001130188c0, wait=true) + 1262 at nsSocketTransportService2.cpp:920
    frame #13: 0x0000000100dc7ac0 XUL`nsSocketTransportService::Run(this=0x00000001130188c0) + 288 at nsSocketTransportService2.cpp:746
    frame #14: 0x0000000100dc86dc XUL`non-virtual thunk to nsSocketTransportService::Run(this=0x00000001130188d8) + 28 at nsSocketTransportService2.cpp:814
    frame #15: 0x0000000100c23a48 XUL`nsThread::ProcessNextEvent(this=0x0000000113027b00, aMayWait=false, aResult=0x000000011470fc4e) + 2088 at nsThread.cpp:855
    frame #16: 0x0000000100c7dd47 XUL`NS_ProcessNextEvent(aThread=0x0000000113027b00, aMayWait=false) + 151 at nsThreadUtils.cpp:265
    frame #17: 0x00000001012be003 XUL`mozilla::ipc::MessagePumpForNonMainThreads::Run(this=0x0000000113308840, aDelegate=0x000000011305b700) + 691 at MessagePump.cpp:339
    frame #18: 0x0000000101251af5 XUL`MessageLoop::RunInternal(this=0x000000011305b700) + 117 at message_loop.cc:233
    frame #19: 0x0000000101251a05 XUL`MessageLoop::RunHandler(this=0x000000011305b700) + 21 at message_loop.cc:226
    frame #20: 0x00000001012519ad XUL`MessageLoop::Run(this=0x000000011305b700) + 45 at message_loop.cc:200
    frame #21: 0x0000000100c21ed6 XUL`nsThread::ThreadFunc(aArg=0x0000000113027b00) + 358 at nsThread.cpp:356
    frame #22: 0x000000010088955f libnss3.dylib`_pt_root(arg=0x0000000113018b40) + 463 at ptthread.c:212
    frame #23: 0x00007fff87378268 libsystem_pthread.dylib`_pthread_body + 131
    frame #24: 0x00007fff873781e5 libsystem_pthread.dylib`_pthread_start + 176
    frame #25: 0x00007fff8737641d libsystem_pthread.dylib`thread_start + 13
(lldb)

What looks really bizarre to me is this part:

Local --> host 10.0.1.19:62832 codeword=dv0g
Remote --> srflx 73.202.152.46:40316 codeword=NvSu
state = 3 priority = 7260647024209691135 nominated = 0 selected = 0 codeword = V836

Local --> srflx 10.0.1.19:62832 codeword=O5a+
Remote --> srflx 73.202.152.46:40316 codeword=NvSu
state = 4 priority = 7260647023337406463 nominated = 0 selected = 0 codeword = q+Fx

The only difference between the two seems to be that the local once is a host candidate and in the second case server reflexive.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Summary: signaling_unittests hit assertion → signaling and ice unittests hit assertion
Hmm but it "P1:stream0" which is trying to insert the srflx <-> srflx pair here in this case...
Does this basically mean we have nominated and select a pair already, because we successfully completed checks already even before we learned about any remote candidates through trickeling, but then receive more check request for srflx candidates?
Any idea Byron ^?
Flags: needinfo?(docfaraday)
This is really weird. The only way we can end up here is if pctx->checks_started is 0, since if it is not we would have to be in ACTIVE, and wouldn't hit the assert:

https://dxr.mozilla.org/mozilla-central/source/media/mtransport/third_party/nICEr/src/ice/ice_component.c#1127

However, both streams have started checking, looking at the logging. That would mean that something must have restarted checking and set |checks_started| back to 0:

https://dxr.mozilla.org/mozilla-central/source/media/mtransport/third_party/nICEr/src/ice/ice_peer_ctx.c#518

But this test case doesn't do that. My only guess is that somehow checking has started prematurely, then we call StartChecks, bail out early because we're completed already, then receive a check and hit the assert.
Flags: needinfo?(docfaraday)
An additional log line results in this:

(ice/NOTICE) ICE(P2): peer (P2:default) is now checking
ICE checking P2
(ice/DEBUG) ICE-PEER(P2:default)/CAND(QUzK): Pairing with peer candidate candidate:1 1 UDP 1690501119 73.202.152.46 36202 typ srflx raddr 10.0.1.19 rport 54048
(ice/INFO) ICE-PEER(P2:default)/CAND-PAIR(FcdO): setting pair to state FROZEN: FcdO|IP4:10.0.1.19:61268/UDP|IP4:73.202.152.46:36202/UDP(host(IP4:10.0.1.19:61268/UDP)|candidate:1 1 UDP 1690501119 73.202.152.46 36202 typ srflx raddr 10.0.1.19 rport 54048)
(ice/INFO) ICE(P2)/CAND-PAIR(FcdO): Pairing candidate IP4:10.0.1.19:61268/UDP (7ec200ff):IP4:73.202.152.46:36202/UDP (64c2ffff) priority=7260647024209691135 (64c2fffffd8401ff)
(registry/DEBUG) Fetched node 'ice.pref.type.peer_rflx' ('UCHAR')
(registry/DEBUG) Fetched node 'ice.pref.interface.en0' ('UCHAR')
(registry/DEBUG) Couldn't fetch node 'stun.client.retransmission_timeout' ('UINT4')
(registry/DEBUG) Couldn't fetch node 'stun.client.retransmission_backoff_factor' ('double')
(registry/DEBUG) Fetched node 'stun.client.maximum_transmits' ('UINT4')
(registry/DEBUG) Couldn't fetch node 'stun.client.final_retransmit_backoff' ('UINT4')
(registry/DEBUG) Couldn't fetch node 'stun.allow_loopback' ('char')
(ice/DEBUG) ICE-PEER(P2:default)/CAND-PAIR(FcdO): Ensure that check timer is running for new pair FcdO|IP4:10.0.1.19:61268/UDP|IP4:73.202.152.46:36202/UDP(host(IP4:10.0.1.19:61268/UDP)|candidate:1 1 UDP 1690501119 73.202.152.46 36202 typ srflx raddr 10.0.1.19 rport 54048).
(ice/DEBUG) ICE-PEER(P2:default)/CAND(6F5v): Pairing local candidate srflx(IP4:10.0.1.19:61268/UDP|IP4:23.21.150.121:3478/UDP)
(stun/DEBUG) STUN-SERVER(server(IP4:10.0.1.19:0/UDP))/CLIENT(P2:default): Adding client for 45f8a870:cb5281c4
(ice/ERR) ICE(P2): peer (P2:default) setting checks_started to zero

The two call stacks for this are:

  * frame #0: 0x00000001000cfaf1 ice_unittest`nr_ice_peer_ctx_stream_started_checks(pctx=0x0000000113cc994c, stream=0x000000011675ed0c) + 49 at ice_peer_ctx.c:606
    frame #1: 0x00000001000ca5c1 ice_unittest`nr_ice_media_stream_start_checks(pctx=0x0000000113cc994c, stream=0x000000011675ed0c) + 241 at ice_media_stream.c:413
    frame #2: 0x00000001000c5583 ice_unittest`nr_ice_component_insert_pair(pcomp=0x000000011675edcc, pair=0x0000000113cca5cc) + 323 at ice_component.c:1128
    frame #3: 0x00000001000c53d4 ice_unittest`nr_ice_component_pair_candidate(pctx=0x0000000113cc994c, pcomp=0x000000011675edcc, lcand=0x0000000111f3b38c, pair_all_remote=0) + 532 at ice_component.c:825
    frame #4: 0x00000001000c569a ice_unittest`nr_ice_component_pair_candidates(pctx=0x0000000113cc994c, lcomp=0x000000011675e4cc, pcomp=0x000000011675edcc) + 122 at ice_component.c:850
    frame #5: 0x00000001000ca18b ice_unittest`nr_ice_media_stream_pair_candidates(pctx=0x0000000113cc994c, lstream=0x000000011675e40c, pstream=0x000000011675ed0c) + 107 at ice_media_stream.c:292
    frame #6: 0x00000001000cef65 ice_unittest`nr_ice_peer_ctx_pair_candidates(pctx=0x0000000113cc994c) + 373 at ice_peer_ctx.c:394
    frame #7: 0x0000000100089ba9 ice_unittest`mozilla::NrIceCtx::StartChecks(this=0x000000011675e040) + 41 at nricectx.cpp:745

AND

  * frame #0: 0x00000001000cf4ca ice_unittest`nr_ice_peer_ctx_start_checks2(pctx=0x0000000113dc93ac, allow_non_first=1) + 106 at ice_peer_ctx.c:519
    frame #1: 0x0000000100089cba ice_unittest`mozilla::NrIceCtx::StartChecks(this=0x000000011645e100) + 314 at nricectx.cpp:753

So basically calling nr_ice_peer_ctx_pair_candidates() in StartChecks() sets checks_started to 1, but it gets immediately reset to zero in nr_ice_peer_ctx_start_checks2() which gets called in StartChecks() right after.

During my unit test execution checks_started gets set to 1 again for P2 at some point (I'm guessing a peer reflexive adds another pair), but for P1 that never happens, so checks_started is stuck at 0.
I think the code which re-sets the checks_started in nr_ice_peer_ctx_start_checks2() for re-negotiation is causing the problem here. It just sets it hard to zero to handle new streams, but never bothers to set it (back) to one even if checks have been started.
Attachment #8574968 - Flags: feedback?(docfaraday)
My patch seems to fix the problem I have with ice_unittest and signaling_unittest. On top it also fixes the problems I had on my machine when trying to run these two tests with the ICE TCP patches.

But the strange thing is that even though the same resetting of the checks_started flag is present in the aurora code I don't get these problems on the aurora branch. Byron have we landed any ICE related changes in Nightly which could change the timing related to this?
Flags: needinfo?(docfaraday)
I am not aware of any changes since the landing of renegotiation.
Flags: needinfo?(docfaraday)
Comment on attachment 8574968 [details] [diff] [review]
bug_1136051_fix_ice_checks_started.patch

Review of attachment 8574968 [details] [diff] [review]:
-----------------------------------------------------------------

So, nr_ice_peer_ctx_pair_candidates should not be starting checks by itself. I think we can fix this premature transition by checking that the stream is not in NR_ICE_MEDIA_STREAM_UNPAIRED here: https://dxr.mozilla.org/mozilla-central/source/media/mtransport/third_party/nICEr/src/ice/ice_component.c?from=nr_ice_component_insert_pair&case=true#1127

Maybe the check would look like:

if (ice_state == ACTIVE ||
    (ice_state == FROZEN && !checks_started))

::: media/mtransport/third_party/nICEr/src/ice/ice_peer_ctx.c
@@ +514,5 @@
>      /* Might have added some streams */
>      pctx->reported_done = 0;
>      NR_async_timer_cancel(pctx->done_cb_timer);
>      pctx->done_cb_timer = 0;
> +    r_log(LOG_ICE,LOG_ERR,"ICE(%s): peer (%s) setting checks_started to zero",pctx->ctx->label,pctx->label);

This should be at DEBUG, or maybe INFO if phrased as "Starting checks"

@@ +594,5 @@
>        r_log(LOG_ICE,LOG_NOTICE,"ICE(%s): peer (%s) no checks to start",pctx->ctx->label,pctx->label);
>        ABORT(R_NOT_FOUND);
>      }
> +    else {
> +      pctx->checks_started = 1;

I think this will prevent us from transitioning to checking later when we start sending checks.
Attachment #8574968 - Flags: feedback?(docfaraday)
(In reply to Byron Campen [:bwc] from comment #13)
> Comment on attachment 8574968 [details] [diff] [review]
> bug_1136051_fix_ice_checks_started.patch
> 
> Review of attachment 8574968 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> So, nr_ice_peer_ctx_pair_candidates should not be starting checks by itself.
> I think we can fix this premature transition by checking that the stream is
> not in NR_ICE_MEDIA_STREAM_UNPAIRED here:
> https://dxr.mozilla.org/mozilla-central/source/media/mtransport/third_party/
> nICEr/src/ice/ice_component.
> c?from=nr_ice_component_insert_pair&case=true#1127
> 
> Maybe the check would look like:
> 
> if (ice_state == ACTIVE ||
>     (ice_state == FROZEN && !checks_started))

Quick run of the ice and signaling unittest seem to verify that this fixes the problem as well.

I'll compare the logs to check how this changes the behavior.
The end of the previously failing ice unit test now looks like this:

(ice/DEBUG) ICE(P1): Read 64 bytes from IP4:73.202.152.46:34164/UDP
(stun/DEBUG) ICE(P1): Message is STUN (other)[64]=0101002c2112a4424129dee497c315257644fbad002000080001aa9f68d83c6c00080014c7a7245b2d2c91af22dc6ee0d578b64f8de2df2780280004dfda4206
(stun/DEBUG) STUN-CLIENT(+t/o|IP4:10.0.1.19:54245/UDP|IP4:73.202.152.46:34164/UDP(host(IP4:10.0.1.19:54245/UDP)|candidate:1 1 UDP 1692467199 73.202.152.46 34164 typ srflx raddr 10.0.1.19 rport 56930)): Received check response (my_addr=IP4:10.0.1.19:54245/UDP,peer_addr=IP4:73.202.152.46:34164/UDP)
(stun/DEBUG) STUN-CLIENT(+t/o|IP4:10.0.1.19:54245/UDP|IP4:73.202.152.46:34164/UDP(host(IP4:10.0.1.19:54245/UDP)|candidate:1 1 UDP 1692467199 73.202.152.46 34164 typ srflx raddr 10.0.1.19 rport 56930)): Received [64]=0101002c2112a4424129dee497c315257644fbad002000080001aa9f68d83c6c00080014c7a7245b2d2c91af22dc6ee0d578b64f8de2df2780280004dfda4206
(stun/DEBUG) Parsing STUN message of 64 bytes
(stun/DEBUG) Parsed MsgType: BINDING-RESPONSE
(stun/DEBUG) Parsed Length: 44
(stun/DEBUG) Parsed Cookie: 2112a442
(stun/DEBUG) Parsed ID[12]=4129dee497c315257644fbad
(stun/DEBUG) size = 44
(stun/DEBUG) Masked XOR-MAPPED-ADDRESS = IP4:104.216.60.108:43679/UDP
(stun/DEBUG) Unmasked XOR-MAPPED-ADDRESS = IP4:73.202.152.46:35725/UDP
(stun/DEBUG) Parsed XOR-MAPPED-ADDRESS: IP4:73.202.152.46:35725/UDP (unmasked) IP4:104.216.60.108:43679/UDP (masked)
(stun/DEBUG) Before pedantic attr_info checks
(stun/DEBUG) After pedantic attr_info checks
(stun/DEBUG) size = 32
(stun/DEBUG) Computing MESSAGE-INTEGRITY
(stun/DEBUG) Computed MESSAGE-INTEGRITY [20]=c7a7245b2d2c91af22dc6ee0d578b64f8de2df27
(stun/DEBUG) MESSAGE-INTEGRITY[20]=c7a7245b2d2c91af22dc6ee0d578b64f8de2df27
(stun/DEBUG) Before pedantic attr_info checks
(stun/DEBUG) After pedantic attr_info checks
(stun/DEBUG) size = 8
(stun/DEBUG) Computed FINGERPRINT dfda4206
(stun/DEBUG) Parsed FINGERPRINT: dfda4206
(stun/DEBUG) Before pedantic attr_info checks
(stun/DEBUG) After pedantic attr_info checks
(stun/DEBUG) Starting to sanity check encoding
(stun/DEBUG) STUN-CLIENT(+t/o|IP4:10.0.1.19:54245/UDP|IP4:73.202.152.46:34164/UDP(host(IP4:10.0.1.19:54245/UDP)|candidate:1 1 UDP 1692467199 73.202.152.46 34164 typ srflx raddr 10.0.1.19 rport 56930)): successfully received response; processing
(stun/DEBUG) STUN-CLIENT(+t/o|IP4:10.0.1.19:54245/UDP|IP4:73.202.152.46:34164/UDP(host(IP4:10.0.1.19:54245/UDP)|candidate:1 1 UDP 1692467199 73.202.152.46 34164 typ srflx raddr 10.0.1.19 rport 56930)): Successfully parsed mode=10
(stun/DEBUG) STUN-CLIENT(+t/o|IP4:10.0.1.19:54245/UDP|IP4:73.202.152.46:34164/UDP(host(IP4:10.0.1.19:54245/UDP)|candidate:1 1 UDP 1692467199 73.202.152.46 34164 typ srflx raddr 10.0.1.19 rport 56930)): Received mapped address: IP4:73.202.152.46:35725/UDP
(ice/DEBUG) ICE-PEER(P1:default)/STREAM(P1:stream0)/CAND-PAIR(+t/o): STUN cb on pair addr = +t/o|IP4:10.0.1.19:54245/UDP|IP4:73.202.152.46:34164/UDP(host(IP4:10.0.1.19:54245/UDP)|candidate:1 1 UDP 1692467199 73.202.152.46 34164 typ srflx raddr 10.0.1.19 rport 56930)
(ice/INFO) ICE-PEER(P1:default)/CAND-PAIR(6QEo): setting pair to state FROZEN: 6QEo|IP4:73.202.152.46:35725/UDP|IP4:73.202.152.46:34164/UDP(srflx(IP4:10.0.1.19:54245/UDP|IP4:23.21.150.121:3478/UDP)|candidate:1 1 UDP 1692467199 73.202.152.46 34164 typ srflx raddr 10.0.1.19 rport 56930)
(ice/INFO) ICE(P1)/CAND-PAIR(6QEo): Pairing candidate IP4:73.202.152.46:35725/UDP (64e0ffff):IP4:73.202.152.46:34164/UDP (64e0ffff) priority=7269091272642658303 (64e0ffffc9c1ffff)
(registry/DEBUG) Fetched node 'ice.pref.type.peer_rflx' ('UCHAR')
(registry/DEBUG) Fetched node 'ice.pref.interface.en0' ('UCHAR')
(registry/DEBUG) Couldn't fetch node 'stun.client.retransmission_timeout' ('UINT4')
(registry/DEBUG) Couldn't fetch node 'stun.client.retransmission_backoff_factor' ('double')
(registry/DEBUG) Fetched node 'stun.client.maximum_transmits' ('UINT4')
(registry/DEBUG) Couldn't fetch node 'stun.client.final_retransmit_backoff' ('UINT4')
(registry/DEBUG) Couldn't fetch node 'stun.allow_loopback' ('char')
(ice/INFO) ICE-PEER(P1:default)/CAND-PAIR(6QEo): setting pair to state SUCCEEDED: 6QEo|IP4:73.202.152.46:35725/UDP|IP4:73.202.152.46:34164/UDP(srflx(IP4:10.0.1.19:54245/UDP|IP4:23.21.150.121:3478/UDP)|candidate:1 1 UDP 1692467199 73.202.152.46 34164 typ srflx raddr 10.0.1.19 rport 56930)
(ice/DEBUG) ICE-PEER(P1:default)/CAND-PAIR(6QEo): Ensure that check timer is running for new pair 6QEo|IP4:73.202.152.46:35725/UDP|IP4:73.202.152.46:34164/UDP(srflx(IP4:10.0.1.19:54245/UDP|IP4:23.21.150.121:3478/UDP)|candidate:1 1 UDP 1692467199 73.202.152.46 34164 typ srflx raddr 10.0.1.19 rport 56930).
(ice/INFO) ICE-PEER(P1:default)/CAND-PAIR(+t/o): setting pair to state FAILED: +t/o|IP4:10.0.1.19:54245/UDP|IP4:73.202.152.46:34164/UDP(host(IP4:10.0.1.19:54245/UDP)|candidate:1 1 UDP 1692467199 73.202.152.46 34164 typ srflx raddr 10.0.1.19 rport 56930)
(generic/DEBUG) Timer callback fired (set in nr_ice_peer_ctx_destroy:484)
(generic/DEBUG) Timer callback fired (set in nr_ice_ctx_destroy:454)
(generic/DEBUG) Timer callback fired (set in nr_ice_peer_ctx_destroy:484)
(generic/DEBUG) Timer callback fired (set in nr_ice_ctx_destroy:454)
(registry/DEBUG) raising event 'delete' on 'stun'
(registry/DEBUG) No callbacks found on 'stun'
(registry/DEBUG) No callbacks found on ''
(registry/INFO) delete of 'stun' succeeded
(registry/DEBUG) raising event 'delete' on 'ice'
(registry/DEBUG) No callbacks found on 'ice'
(registry/DEBUG) No callbacks found on ''
(registry/INFO) delete of 'ice' succeeded
[       OK ] IceConnectTest.TestConnectP2ThenP1 (3121 ms)

Not sure if it is okay that the new pair 6QEo get set to succeeded. But I guess that even though another pair previously got nominated and select we now know about another potential way to connect. Does this look sane to you Byron?
Flags: needinfo?(docfaraday)
I would expect the new pairs to be set to succeeded with no ill effect.
Flags: needinfo?(docfaraday)
Try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f503e28ff39e
Attachment #8574968 - Attachment is obsolete: true
Attachment #8575702 - Flags: review?(docfaraday)
Blocks: 891551
Comment on attachment 8575702 [details] [diff] [review]
bug_1136051_fix_ice_checks_started.patch

Patch doesn't seem to contain anything...
Flags: needinfo?(drno)
Attachment #8575702 - Flags: review?(docfaraday)
Assignee: nobody → drno
See Also: → 1142156
No idea what happened last time: empty patch + a treeherder link to a different patch

New try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f142d68692eb
Attachment #8575702 - Attachment is obsolete: true
Flags: needinfo?(drno)
Attachment #8576179 - Flags: review?(docfaraday)
Attachment #8576179 - Flags: review?(docfaraday) → review+
Test re-run's turned green.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/14878e83ce63
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
You need to log in before you can comment on or make changes to this bug.