Closed Bug 1001959 Opened 10 years ago Closed 10 years ago

Intermittent test_dataChannel_basicDataOnly.html | application crashed [@ mozalloc_abort(char const * const)] after "ABORT: CRT ASSERT c:\PROGRA~2\MICROS~2.0\vc\include\list(207) : Assertion failed: list iterator not dereferencable"

Categories

(Core :: WebRTC: Networking, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla32
Tracking Status
firefox30 --- unaffected
firefox31 --- fixed
firefox32 --- fixed
firefox-esr24 --- unaffected
b2g-v1.4 --- unaffected
b2g-v2.0 --- fixed

People

(Reporter: RyanVM, Assigned: bwc)

References

Details

(4 keywords)

Attachments

(1 file, 1 obsolete file)

https://tbpl.mozilla.org/php/getParsedLog.php?id=38562279&tree=Mozilla-Inbound

Windows XP 32-bit mozilla-inbound debug test mochitest-3 on 2014-04-26 13:16:28 PDT for push a632ecfff394
slave: t-xp32-ix-070

12:19:17     INFO -  49 INFO TEST-INFO | MEMORY STAT vsize after test: 570236928
12:19:17     INFO -  0[155e3f18]: [GSM Task|tnp] ui.c:104: SIPCC-UI_API: 1/14, ui_call_state: event=1
12:19:17     INFO -  0[155e3f18]: [GSM Task|def] fsm.c:346: SIPCC-FSM: 1/14, fsm_change_state: DEF: STABLE -> CLOSED
12:19:17     INFO -  0[155e38b0]: [CCAPP Task|def] ccprovider.c:1613: SIPCC-SIP_CC_PROV: ccappUpdateSessionData: Call_STATE:. state=1, attr=0, cause=5, instance=0
12:19:17     INFO -  0[155e3f18]: [GSM Task|def] dcsm.c:397: SIPCC-DCSM: dcsm_update_gsm_state: 14 : DCSM_READY --> DCSM_READY
12:19:17     INFO -  0[155e38b0]: [CCAPP Task|CC_SIPCCService] CC_SIPCCService.cpp:724: onCallEvent(CCAPI_CALL_EV_STATE, 1000E, [ONHOOK|]
12:19:17     INFO -  0[155e3f18]: [GSM Task|fsm_sm] fsmdef.c:1293: SIPCC-FSM: 0/0, fsmdef_init_dcb: call_not_counted_in_mnc_bt = FALSE
12:19:17     INFO -  0[155e38b0]: [CCAPP Task|def] ccapi_call.c:57: SIPCC-SIP_CC_PROV: CCAPI_Call_releaseCallInfo: ref=0AD41E50: count=2
12:19:17     INFO -  0[155e3f18]: [GSM Task|def] fsm.c:346: SIPCC-FSM: 0/14, fsm_change_state: DEF: CLOSED -> IDLE
12:19:17     INFO -  0[155e38b0]: [CCAPP Task|def] ccapi_call.c:57: SIPCC-SIP_CC_PROV: CCAPI_Call_releaseCallInfo: ref=0F133308: count=1
12:19:17     INFO -  0[155e3f18]: [GSM Task|def] dcsm.c:397: SIPCC-DCSM: dcsm_update_gsm_state: 14 : DCSM_READY --> DCSM_READY
12:19:17     INFO -  0[155e3f18]: [GSM Task|def] sm.c:65: SIPCC-GSM: 1/14, sm_process_event: DEF   :(STABLE:ONHOOK )
12:19:17     INFO -  0[155e3f18]: [GSM Task|fsm_sm] fsm.c:157: SIPCC-GSM_DBG_PTR: FSM 14  : fsm_get_fcb_by_call_id_and_type    : fcb= 0E73FFF8
12:19:17     INFO -  0[155e3f18]: [GSM Task|fsm_sm] fsm.c:157: SIPCC-GSM_DBG_PTR: FSM 14  : fsm_get_fcb_by_call_id_and_type    : fcb= 00000000
12:19:17     INFO -  0[155e3f18]: [GSM Task|def] fsm.c:346: SIPCC-FSM: 0/14, fsm_change_state: CNF: IDLE -> IDLE
12:19:17     INFO -  0[155e3f18]: [GSM Task|def] dcsm.c:337: SIPCC-DCSM: dcsm_update_gsm_state: 14: Not handling for CNF
12:19:17     INFO -  0[155e3f18]: [GSM Task|fsm_sm] fsm.c:157: SIPCC-GSM_DBG_PTR: FSM 14  : fsm_get_fcb_by_call_id_and_type    : fcb= 0E740018
12:19:17     INFO -  0[155e3f18]: [GSM Task|def] fsm.c:346: SIPCC-FSM: 0/14, fsm_change_state: B2BCNF: IDLE -> IDLE
12:19:17     INFO -  0[155e3f18]: [GSM Task|def] dcsm.c:337: SIPCC-DCSM: dcsm_update_gsm_state: 14: Not handling for B2BCNF
12:19:17     INFO -  0[155e3f18]: [GSM Task|fsm_sm] fsm.c:157: SIPCC-GSM_DBG_PTR: FSM 14  : fsm_get_fcb_by_call_id_and_type    : fcb= 0E740038
12:19:17     INFO -  0[155e3f18]: [GSM Task|fsm_sm] fsmxfr.c:501: SIPCC-FSM: fsmxfr_cleanup: Entered.
12:19:17     INFO -  0[155e3f18]: [GSM Task|fsm_sm] fsmxfr.c:451: SIPCC-FSM: fsmxfr_remove_fcb: Entered.
12:19:17     INFO -  0[155e3f18]: [GSM Task|def] fsm.c:346: SIPCC-FSM: 0/14, fsm_change_state: XFR: IDLE -> IDLE
12:19:17     INFO -  0[155e3f18]: [GSM Task|def] dcsm.c:337: SIPCC-DCSM: dcsm_update_gsm_state: 14: Not handling for XFR
12:19:17     INFO -  0[155e3f18]: [GSM Task|fsm_sm] fsm.c:157: SIPCC-GSM_DBG_PTR: FSM 14  : fsm_get_fcb_by_call_id_and_type    : fcb= 00000000
12:19:17     INFO -  50 INFO TEST-INFO | MEMORY STAT vsizeMaxContiguous after test: 667680768
12:19:17     INFO -  51 INFO TEST-INFO | MEMORY STAT residentFast after test: 309776384
12:19:17     INFO -  0[28b100]: Deleting DataChannelConnection 1480d318
12:19:17     INFO -  0[28b100]: [main|def] ccapi_call.c:57: SIPCC-SIP_CC_PROV: CCAPI_Call_releaseCallInfo: ref=13DDBC58: count=1
12:19:17     INFO -  ###!!! ABORT: CRT ASSERT c:\PROGRA~2\MICROS~2.0\vc\include\list(207) : Assertion failed: list iterator not dereferencable
12:19:17     INFO -  Hit MOZ_CRASH() at c:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\memory\mozalloc\mozalloc_abort.cpp:30
12:19:17     INFO -  [Parent 1616] ###!!! ABORT: Tear-off objects remain in hashtable at shutdown.: '!mTable', file c:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\content\svg\content\src\nsSVGAttrTearoffTable.h, line 28
12:19:17     INFO -  TEST-INFO | Main app process: exit status 80000003
12:19:17  WARNING -  TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_dataChannel_basicDataOnly.html | application terminated with exit code 2147483651
12:19:17     INFO -  INFO | runtests.py | Application ran for: 0:00:41.494000
12:19:17     INFO -  INFO | zombiecheck | Reading PID log: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpuiqgyfpidlog
12:19:17     INFO -  ==> process 1616 launched child process 656 ("C:\slave\test\build\application\firefox\plugin-container.exe" --channel=1616.14c19660.270030511 -greomni "C:\slave\test\build\application\firefox\omni.ja" -appomni "C:\slave\test\build\application\firefox\browser\omni.ja" -appdir "C:\slave\test\build\application\firefox\browser" - 1616 "\\.\pipe\gecko-crash-server-pipe.1616" tab)
12:19:17     INFO -  ==> process 1616 launched child process 3128 ("C:\slave\test\build\application\firefox\plugin-container.exe" --channel=1616.e7a6960.1170175798 -greomni "C:\slave\test\build\application\firefox\omni.ja" -appomni "C:\slave\test\build\application\firefox\browser\omni.ja" -appdir "C:\slave\test\build\application\firefox\browser" - 1616 "\\.\pipe\gecko-crash-server-pipe.1616" tab)
12:19:25  WARNING -  PROCESS-CRASH | /tests/dom/media/tests/mochitest/test_dataChannel_basicDataOnly.html | application crashed [@ mozalloc_abort(char const * const)]
12:19:25     INFO -  Crash dump filename: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpp8fknd\minidumps\933d8780-7731-49e0-88dd-d13ec23bd272.dmp
12:19:25     INFO -  Operating system: Windows NT
12:19:25     INFO -                    5.1.2600 Service Pack 3
12:19:25     INFO -  CPU: x86
12:19:25     INFO -       GenuineIntel family 6 model 30 stepping 5
12:19:25     INFO -       8 CPUs
12:19:25     INFO -  Crash reason:  EXCEPTION_BREAKPOINT
12:19:25     INFO -  Crash address: 0x13b13d0
12:19:25     INFO -  Thread 0 (crashed)
12:19:25     INFO -   0  mozalloc.dll!mozalloc_abort(char const * const) [mozalloc_abort.cpp:a632ecfff394 : 30 + 0x2c]
12:19:25     INFO -      eip = 0x013b13d0   esp = 0x00127614   ebp = 0x0012761c   ebx = 0x0012f784
12:19:25     INFO -      esi = 0x10261440   edi = 0x10260670   eax = 0x00000000   ecx = 0x19c9528e
12:19:25     INFO -      edx = 0x10361f48   efl = 0x00000216
12:19:25     INFO -      Found by: given as instruction pointer in context
12:19:25     INFO -   1  xul.dll!MSCRTReportHook [nsAppRunner.cpp:a632ecfff394 : 1462 + 0x8]
12:19:25     INFO -      eip = 0x032293e8   esp = 0x00127624   ebp = 0x00127628
12:19:25     INFO -      Found by: call frame info
12:19:25     INFO -   2  MSVCR100D.dll + 0x11babc
12:19:25     INFO -      eip = 0x1031babd   esp = 0x00127630   ebp = 0x0012f6c0
12:19:25     INFO -      Found by: call frame info
12:19:25     INFO -   3  MSVCR100D.dll + 0x114661
12:19:25     INFO -      eip = 0x10314662   esp = 0x0012f6c8   ebp = 0x0012f6e0
12:19:25     INFO -      Found by: previous frame's frame pointer
12:19:25     INFO -   4  MSVCR100D.dll + 0x11461a
12:19:25     INFO -      eip = 0x1031461b   esp = 0x0012f6e8   ebp = 0x0012f708
12:19:25     INFO -      Found by: previous frame's frame pointer
12:19:25     INFO -   5  MSVCP100D.dll + 0x4991a
12:19:25     INFO -      eip = 0x104c991b   esp = 0x0012f710   ebp = 0x0012f724
12:19:25     INFO -      Found by: previous frame's frame pointer
12:19:25     INFO -   6  xul.dll!std::_List_const_iterator<std::_List_val<sigslot::_connection_base1<mozilla::NrIceMediaStream *,sigslot::single_threaded> *,std::allocator<sigslot::_connection_base1<mozilla::NrIceMediaStream *,sigslot::single_threaded> *> > >::operator*() [list : 207 + 0x15]
12:19:25     INFO -      eip = 0x02056d17   esp = 0x0012f72c   ebp = 0x0012f794
12:19:25     INFO -      Found by: previous frame's frame pointer
12:19:25     INFO -   7  xul.dll!sigslot::_signal_base1<mozilla::NrIceMediaStream *,sigslot::single_threaded>::disconnect_all() [sigslot.h:a632ecfff394 : 736 + 0x7]
12:19:25     INFO -      eip = 0x0205b5d2   esp = 0x0012f748   ebp = 0x0012f794
12:19:25     INFO -      Found by: call frame info
12:19:25     INFO -   8  xul.dll!sigslot::_signal_base1<mozilla::NrIceMediaStream *,sigslot::single_threaded>::~_signal_base1<mozilla::NrIceMediaStream *,sigslot::single_threaded>() [sigslot.h:a632ecfff394 : 716 + 0x4]
12:19:25     INFO -      eip = 0x0205b8d5   esp = 0x0012f79c   ebp = 0x0012f7b8
12:19:25     INFO -      Found by: call frame info
12:19:25     INFO -   9  xul.dll!mozilla::NrIceMediaStream::~NrIceMediaStream() [nricemediastream.cpp:a632ecfff394 : 187 + 0x2e]
12:19:25     INFO -      eip = 0x0205bb53   esp = 0x0012f7a4   ebp = 0x0012f7b8
12:19:25     INFO -      Found by: call frame info
12:19:25     INFO -  10  xul.dll!mozilla::NrIceMediaStream::Release() [nricemediastream.h:a632ecfff394 : 197 + 0x65]
12:19:25     INFO -      eip = 0x020022df   esp = 0x0012f7b0   ebp = 0x0012f7b8
12:19:25     INFO -      Found by: call frame info
12:19:25     INFO -  11  xul.dll!mozilla::RefPtr<mozilla::NrIceMediaStream>::`scalar deleting destructor'(unsigned int) + 0x11
12:19:25     INFO -      eip = 0x02021315   esp = 0x0012f7c0   ebp = 0x0012f7c8
12:19:25     INFO -      Found by: call frame info
We're in ~NrIceMediaStream() (on MainThread), and have called disconnect_all(), which apparently hit the "list iterator not dereferencable" assertion.

The interesting part of the stack is probably the StoreLongTermICEStatisticsImpl_m()/~RTCStatsQuery().  Perhaps the lifetime of the NrIceMediaStream changed due to this and broke an unguarded assumption (raced against teardown elsewhere?)

12:19:25     INFO -  15  xul.dll!sipcc::RTCStatsQuery::~RTCStatsQuery() [PeerConnectionImpl.cpp:a632ecfff394 : 157 + 0xa]
12:19:25     INFO -      eip = 0x020294de   esp = 0x0012f810   ebp = 0x0012f9a4
12:19:25     INFO -      Found by: call frame info
12:19:25     INFO -  16  xul.dll!nsAutoPtr<sipcc::RTCStatsQuery>::~nsAutoPtr<sipcc::RTCStatsQuery>() [nsAutoPtr.h:a632ecfff394 : 72 + 0xc]
12:19:25     INFO -      eip = 0x02029a9e   esp = 0x0012f81c   ebp = 0x0012f9a4
12:19:25     INFO -      Found by: call frame info
12:19:25     INFO -  17  xul.dll!mozilla::dom::StoreLongTermICEStatisticsImpl_m [WebrtcGlobalInformation.cpp:a632ecfff394 : 228 + 0x7]
12:19:25     INFO -      eip = 0x020353de   esp = 0x0012f824   ebp = 0x0012f9a4
Assignee: nobody → docfaraday
So, if this is a race, it isn't a new one. We've just made it happen more reliably. I can give it a look.
Perhaps a weakPtr to NrIceMediaStream would be least disruptive to shutdown.
(In reply to Jan-Ivar Bruaroey [:jib] from comment #3)
> Perhaps a weakPtr to NrIceMediaStream would be least disruptive to shutdown.

Nah, I doubt it, since if we acquire a RefPtr from it we'll be back in the same situation. My suspicion is that we need to null out the RefPtr on STS, like we do for NrIceCtx; we also need to put a thread assert in the d'tor for NrIceMediaStream so it is clear what thread it must be destroyed on.
(or, refactor so that everything that can be put in a RefPtr can also be destroyed on any thread, through use of private impl pattern)
Blocks: 1002236
Work in progress, needs testing.
The thread assert in ~NrIceMediaStream wasn't really safe. This should be good enough I guess.
Attachment #8413803 - Attachment is obsolete: true
Comment on attachment 8413854 [details] [diff] [review]
Give up references to NrIceMediaStream on STS instead of main.

Review of attachment 8413854 [details] [diff] [review]:
-----------------------------------------------------------------

Also added a wrap around some functions that are only used in the MOZILLA_INTERNAL_API case, to silence a warning.
Attachment #8413854 - Flags: review?(jib)
Comment on attachment 8413854 [details] [diff] [review]
Give up references to NrIceMediaStream on STS instead of main.

Review of attachment 8413854 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
@@ +1900,2 @@
>  static bool isDone(PCImplIceConnectionState state) {
>    return state != PCImplIceConnectionState::Checking &&

or return state > PCImplIceConnectionState::Checking;

@@ +1900,3 @@
>  static bool isDone(PCImplIceConnectionState state) {
>    return state != PCImplIceConnectionState::Checking &&
>           state != PCImplIceConnectionState::New;

How come we're still using PCImplIceConnectionState over RTCIceConnectionState?
Attachment #8413854 - Flags: review?(jib) → review+
(In reply to Jan-Ivar Bruaroey [:jib] from comment #10)
> Comment on attachment 8413854 [details] [diff] [review]
> Give up references to NrIceMediaStream on STS instead of main.
> 
> Review of attachment 8413854 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> lgtm
> 
> @@ +1900,3 @@
> >  static bool isDone(PCImplIceConnectionState state) {
> >    return state != PCImplIceConnectionState::Checking &&
> >           state != PCImplIceConnectionState::New;
> 
> How come we're still using PCImplIceConnectionState over
> RTCIceConnectionState?

If this is MOZILLA_INTERNAL_API only, I guess we could do that, but it is a little extra conversion code.
Needinfo self to check back on try push.
Flags: needinfo?(docfaraday)
Try push looks ok, requesting checkin.
Flags: needinfo?(docfaraday)
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/7a1905545937
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
We should probably uplift this fix. Making sure that everything applies cleanly now.
Comment on attachment 8413854 [details] [diff] [review]
Give up references to NrIceMediaStream on STS instead of main.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 

   906990 (although 970690 made the race much more common)

User impact if declined: 

   Rare crashes on webrtc call end.

Testing completed (on m-c, etc.): 

   The usual CI testing on m-c, plus some by-hand testing to ensure that the threading is right.

Risk to taking this patch (and alternatives if risky): 

   Pretty low.

String or IDL/UUID changes made by this patch:

   None.
Attachment #8413854 - Flags: approval-mozilla-aurora?
Attachment #8413854 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.