Closed Bug 1013238 Opened 6 years ago Closed 6 years ago

Intermittent Shutdown | application crashed [@ sipcc::PeerConnectionCtx::EverySecondTelemetryCallback_m(nsITimer *,void *) ("Assertion failure: PeerConnectionCtx::isActive(), at .../PeerConnectionCtx.cpp:306")

Categories

(Core :: WebRTC, defect)

x86
Windows XP
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla32
Tracking Status
firefox30 --- unaffected
firefox31 --- unaffected
firefox32 --- fixed
firefox-esr24 --- unaffected

People

(Reporter: cbook, Assigned: jib)

References

()

Details

(Keywords: assertion, crash, intermittent-failure)

Attachments

(1 file, 2 obsolete files)

WINNT 6.2 mozilla-inbound debug test mochitest-3 on 2014-05-20 02:49:21 PDT for push 162d32ed3b9c

slave: t-w864-ix-102

https://tbpl.mozilla.org/php/getParsedLog.php?id=40001100&tree=Mozilla-Inbound

Assertion failure: PeerConnectionCtx::isActive(), at c:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/media/webrtc/signaling/src/peerconnection/PeerConnectionCtx.cpp:306
TEST-UNEXPECTED-FAIL | Shutdown | application terminated with exit code 2147483651
PROCESS-CRASH | Shutdown | application crashed [@ sipcc::PeerConnectionCtx::EverySecondTelemetryCallback_m(nsITimer *,void *)]

03:23:22     INFO -  Operating system: Windows NT
03:23:22     INFO -                    6.2.9200
03:23:22     INFO -  CPU: x86
03:23:22     INFO -       GenuineIntel family 6 model 30 stepping 5
03:23:22     INFO -       8 CPUs
03:23:22     INFO -  Crash reason:  EXCEPTION_BREAKPOINT
03:23:22     INFO -  Crash address: 0x7133c144
03:23:22     INFO -  Thread 0 (crashed)
03:23:22     INFO -   0  xul.dll!sipcc::PeerConnectionCtx::EverySecondTelemetryCallback_m(nsITimer *,void *) [PeerConnectionCtx.cpp:162d32ed3b9c : 305 + 0x20]
03:23:22     INFO -      eip = 0x7133c144   esp = 0x00f5e028   ebp = 0x00f5e064   ebx = 0x094792a0
03:23:22     INFO -      esi = 0x00000000   edi = 0x00f5e098   eax = 0x00000000   ecx = 0xcd49d183
03:23:22     INFO -      edx = 0x7384e4d8   efl = 0x00000216
03:23:22     INFO -      Found by: given as instruction pointer in context
03:23:22     INFO -   1  xul.dll!nsTimerImpl::Fire() [nsTimerImpl.cpp:162d32ed3b9c : 555 + 0x5]
03:23:22     INFO -      eip = 0x70d9d720   esp = 0x00f5e06c   ebp = 0x00f5e0ec
03:23:22     INFO -      Found by: call frame info
03:23:22     INFO -   2  xul.dll!nsTimerEvent::Run() [nsTimerImpl.cpp:162d32ed3b9c : 639 + 0xe]
03:23:22     INFO -      eip = 0x70d9d883   esp = 0x00f5e0f4   ebp = 0x00f5e11c
03:23:22     INFO -      Found by: call frame info
03:23:22     INFO -   3  xul.dll!nsThread::ProcessNextEvent(bool,bool *) [nsThread.cpp:162d32ed3b9c : 715 + 0xd]
03:23:22     INFO -      eip = 0x70da074f   esp = 0x00f5e124   ebp = 0x00f5e180
03:23:22     INFO -      Found by: call frame info
03:23:22     INFO -   4  xul.dll!NS_InvokeByIndex [xptcinvoke.cpp:162d32ed3b9c : 70 + 0x2]
03:23:22     INFO -      eip = 0x70da3f2f   esp = 0x00f5e188   ebp = 0x00f5e19c
03:23:22     INFO -      Found by: call frame info
03:23:22     INFO -   5  xul.dll!CallMethodHelper::Invoke() [XPCWrappedNative.cpp:162d32ed3b9c : 2397 + 0xe]
03:23:22     INFO -      eip = 0x71aa397b   esp = 0x00f5e1a4   ebp = 0x00f5e1c8
03:23:22     INFO -      Found by: call frame info
03:23:22     INFO -   6  xul.dll!CallMethodHelper::Call() [XPCWrappedNative.cpp:162d32ed3b9c : 1738 + 0x6]
03:23:22     INFO -      eip = 0x71aaf6e3   esp = 0x00f5e1b8   ebp = 0x00f5e1c8
03:23:22     INFO -      Found by: call frame info
03:23:22     INFO -   7  xul.dll!XPCWrappedNative::CallMethod(XPCCallContext &,XPCWrappedNative::CallMode) [XPCWrappedNative.cpp:162d32ed3b9c : 1705 + 0x11]
03:23:22     INFO -      eip = 0x71ab6289   esp = 0x00f5e1d0   ebp = 0x00f5e2a0
03:23:22     INFO -      Found by: call frame info
03:23:22     INFO -   8  mozjs.dll!js::ExclusiveContext::leaveCompartment(JSCompartment *) [jscntxtinlines.h:162d32ed3b9c : 426 + 0x8]
03:23:22     INFO -      eip = 0x7051709a   esp = 0x00f5e1d8   ebp = 0x00f5e2a0
03:23:22     INFO -      Found by: stack scanning
03:23:22     INFO -   9  xul.dll!XPC_WN_CallMethod(JSContext *,unsigned int,JS::Value *) [XPCWrappedNativeJSOps.cpp:162d32ed3b9c : 1273 + 0xc]
https://tbpl.mozilla.org/php/getParsedLog.php?id=40019615&tree=Mozilla-Inbound
Keywords: crash
Summary: Intermittent Shutdown | application terminated with exit code 2147483651 | application crashed [@ sipcc::PeerConnectionCtx::EverySecondTelemetryCallback_m(nsITimer *,void *) | Assertion failure: PeerConnectionCtx::isActive(), at PeerConnectionCtx.cpp:306 → Intermittent Shutdown | application crashed [@ sipcc::PeerConnectionCtx::EverySecondTelemetryCallback_m(nsITimer *,void *) | Assertion failure: PeerConnectionCtx::isActive(), at PeerConnectionCtx.cpp:306
I'm going to go out on a limb that this is related to bug 941751.
Very likely a recent regression instead of the older timer-killed-on-wrong-thread bug  Pretty high frequency, so regression range should be fairly small.
About 6 pushes before jib's two patches for webrtc telemetry landed, so that's almost certainly the cause.
Assignee: nobody → jib
Depends on: 970685
Change assert to conditional for timer event stragglers on shutdown.
Attachment #8426225 - Flags: review?(rjesup)
Comment on attachment 8426225 [details] [diff] [review]
Fix timer event crash on shutdown in recent PeerConnectionCtx change

Review of attachment 8426225 [details] [diff] [review]:
-----------------------------------------------------------------

r+, but I want to make sure we're not just wallpapering a real problem.  It wasn't clear from the comment in the bug if this is simply a now-expected-known-safe-case, or just quieting it.
Attachment #8426225 - Flags: review?(rjesup) → review+
The original patch had this check so I feel good re-adding it. We somehow convinced ourselves the test was not needed (see Bug 970685 comment 6). Bringing Byron for thoughts.

If a timer was just destroyed on the same thread, does it guarantee there can't be timer events from that timer already on the main thread event queue? If there are, do they get pulled and/or neutralized? If they do then this may be the wrong fix.

I can't see how the assert is hitting for another reason though.

Even if we're unsure about this being a final fix or a wallpaper fix, I propose we land it to stop the noise of the robots.
Flags: needinfo?(docfaraday)
Yeah, something about the timers firing after we're destroyed the timer object feels very scary to me. I'll have to examine the implementation.
Haven't found the implementation yet (dxr doesn't know about it), but the usage pattern I see for nsITimer involves calling Cancel() explicitly on it. Presumably, something else is retaining a reference, so the timer lives on after PeerConnectionCtx goes away. Let me keep looking.
Yep:

http://dxr.mozilla.org/mozilla-central/source/xpcom/threads/TimerThread.cpp?from=AddTimerInternal&case=true#420

Looks like we need to explicitly Cancel(), just releasing our ptr is not enough. The timer will continue to fire until that happens.
Flags: needinfo?(docfaraday)
Calling Cancel() in ~PeerConnectionCtx() should do the trick, right?
Flags: needinfo?(jib)
(In reply to Byron Campen [:bwc] from comment #45)
> Looks like we need to explicitly Cancel(), just releasing our ptr is not
> enough. The timer will continue to fire until that happens.

Nice work, thanks!
Attachment #8426225 - Attachment is obsolete: true
Attachment #8426761 - Flags: review?(rjesup)
Flags: needinfo?(jib)
Summary: Intermittent Shutdown | application crashed [@ sipcc::PeerConnectionCtx::EverySecondTelemetryCallback_m(nsITimer *,void *) | Assertion failure: PeerConnectionCtx::isActive(), at PeerConnectionCtx.cpp:306 → Intermittent Shutdown | application crashed [@ sipcc::PeerConnectionCtx::EverySecondTelemetryCallback_m(nsITimer *,void *) ("Assertion failure: PeerConnectionCtx::isActive(), at .../PeerConnectionCtx.cpp:306")
Attachment #8426761 - Flags: review?(rjesup) → review+
https://hg.mozilla.org/mozilla-central/rev/cb41521cda40
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Duplicate of this bug: 1013372
Duplicate of this bug: 1014529
You need to log in before you can comment on or make changes to this bug.