WebRTC crash [@ mozilla::NrIceCtx::GetGlobalAttributes]

VERIFIED FIXED in mozilla19

Status

()

defect
P1
critical
VERIFIED FIXED
7 years ago
7 years ago

People

(Reporter: posidron, Assigned: anant)

Tracking

(Blocks 1 bug, {crash, testcase})

Trunk
mozilla19
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [WebRTC], [blocking-webrtc+], crash signature)

Attachments

(5 attachments, 2 obsolete attachments)

Posted file testcase (obsolete) —
Console output before the crash happened:

!!! Real PeerConnection constructor called OMG !!!

!!! {f47faeb4-14f9-9144-b5b6-b6f83b388375} : calling initialize
!!! Queue for {f47faeb4-14f9-9144-b5b6-b6f83b388375} is currently: []
!!! Queue for {f47faeb4-14f9-9144-b5b6-b6f83b388375} is currently: []
!!! mozPeerConnection constructor called [object Window @ 0x11bda5990 (native @ 0x11bda3f00)]
!!! {f47faeb4-14f9-9144-b5b6-b6f83b388375} : connect() called
PeerConnectionImpl::Connect()
PeerConnectionImpl::Connect() returned
!!! {f47faeb4-14f9-9144-b5b6-b6f83b388375} : connect() returned
PeerConnectionImpl::ConnectThread() finished
!!! {f47faeb4-14f9-9144-b5b6-b6f83b388375} : createOffer called
!!! {f47faeb4-14f9-9144-b5b6-b6f83b388375} : queued createOffer
!!! Queue for {f47faeb4-14f9-9144-b5b6-b6f83b388375} is currently: [createOffer,]
!!! {f47faeb4-14f9-9144-b5b6-b6f83b388375} : createOffer returned
!!! {f47faeb4-14f9-9144-b5b6-b6f83b388375} : close called
!!! {f47faeb4-14f9-9144-b5b6-b6f83b388375} : onStateChange called: 1
!!! {f47faeb4-14f9-9144-b5b6-b6f83b388375} : close returnedOnMessage: mChannel 0x0 mConnection 0x11cef9cd0
PeerConnectionImpl:: got NotifyConnection
Delivering PeerConnection onconnection
!!! {f47faeb4-14f9-9144-b5b6-b6f83b388375} : onConnection called
!!! in executeNext: !!! Queue for {f47faeb4-14f9-9144-b5b6-b6f83b388375} is currently: [createOffer,]
Posted file callstack
This is probably the same problem as 791165
Eric rv is here SECSuccess.

Breakpoint 2, nr_crypto_nss_random_bytes (buf=0x7fff5fbf4980 "^?m\024", len=4) at nricectx.cpp:91
91	  if (rv != SECSuccess)
gdb $ p rv
$2 = SECSuccess
Oh, I see. This is part of a known defect wrt locking the state of the PCImpl around access in other threads.
This test case routinely gives me the following assert()

(gdb) bt
#0  0x00007fff91624ce2 in __pthread_kill ()
#1  0x00007fff8f77b7d2 in pthread_kill ()
#2  0x00007fff8f76ca7a in abort ()
#3  0x00000001000762ad in PR_Assert (s=0x104d31ac7 "mIceState != kIceGathering", file=0x104d2de6f "/Users/ekr/dev/alder/media/webrtc/signaling//./src/peerconnection/PeerConnectionImpl.h", ln=371) at /Users/ekr/dev/alder/nsprpub/pr/src/io/prlog.c:554
#4  0x000000010430be7d in sipcc::PeerConnectionImpl::CheckApiState (this=0x131d00840) at PeerConnectionImpl.h:371
#5  0x0000000104309630 in sipcc::PeerConnectionImpl::CreateOffer (this=0x131d00840, hints=0x11c9280b0 "") at PeerConnectionImpl.cpp:712
#6  0x00000001033dd769 in NS_InvokeByIndex_P (that=0x131d00840, methodIndex=4, paramCount=1, params=0x7fff5fbf88d8) at /Users/ekr/dev/alder/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_x86_64_unix.cpp:162
#7  0x00000001027c0e54 in CallMethodHelper::Invoke (this=0x7fff5fbf8898) at /Users/ekr/dev/alder/js/xpconnect/src/XPCWrappedNative.cpp:3105
#8  0x00000001027bf20c in CallMethodHelper::Call (this=0x7fff5fbf8898) at /Users/ekr/dev/alder/js/xpconnect/src/XPCWrappedNative.cpp:2439
#9  0x00000001027bb806 in XPCWrappedNative::CallMethod (ccx=@0x7fff5fbf8a48, mode=XPCWrappedNative::CALL_METHOD) at /Users/ekr/dev/alder/js/xpconnect/src/XPCWrappedNative.cpp:2405
#10 0x00000001027cb8cf in XPC_WN_CallMethod (cx=0x10b14cc80, argc=1, vp=0x10f3b1148) at /Users/ekr/dev/alder/js/xpconnect/src/XPCWrappedNativeJSOps.cpp:1469
#11 0x000000010459d732 in js::CallJSNative (cx=0x10b14cc80, native=0x1027cb640 <XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*)>, args=@0x7fff5fbf8cf0) at jscntxtinlines.h:372
#12 0x00000001045945bd in js::InvokeKernel (cx=0x10b14cc80, args={<JS::CallReceiver> = {usedRval_ = false, argv_ = 0x10f3b1158}, argc_ = 1}, construct=js::NO_CONSTRUCT) at /Users/ekr/dev/alder/js/src/jsinterp.cpp:367
#13 0x00000001045dccb3 in js::Invoke (cx=0x10b14cc80, args=@0x7fff5fbf9178, construct=js::NO_CONSTRUCT) at jsinterp.h:119
#14 0x000000010451b28c in js_fun_apply (cx=0x10b14cc80, argc=2, vp=0x10f3b1128) at /Users/ekr/dev/alder/js/src/jsfun.cpp:951
#15 0x000000010459d732 in js::CallJSNative (cx=0x10b14cc80, native=0x10451aba0 <js_fun_apply(JSContext*, unsigned int, JS::Value*)>, args=@0x7fff5fbf9380) at jscntxtinlines.h:372
#16 0x00000001045945bd in js::InvokeKernel (cx=0x10b14cc80, args={<JS::CallReceiver> = {usedRval_ = false, argv_ = 0x10f3b1138}, argc_ = 2}, construct=js::NO_CONSTRUCT) at /Users/ekr/dev/alder/js/src/jsinterp.cpp:367
#17 0x00000001045892ad in js::Interpret (cx=0x10b14cc80, entryFrame=0x10f3b1030, interpMode=js::JSINTERP_NORMAL) at /Users/ekr/dev/alder/js/src/jsinterp.cpp:2454
#18 0x000000010457dec4 in js::RunScript (cx=0x10b14cc80, script=0x11e0b9030, fp=0x10f3b1030) at /Users/ekr/dev/alder/js/src/jsinterp.cpp:324
#19 0x0000000104594696 in js::InvokeKernel (cx=0x10b14cc80, args={<JS::CallReceiver> = {usedRval_ = false, argv_ = 0x10f3b1030}, argc_ = 0}, construct=js::NO_CONSTRUCT) at /Users/ekr/dev/alder/js/src/jsinterp.cpp:378
#20 0x00000001045dccb3 in js::Invoke (cx=0x10b14cc80, args=@0x7fff5fbfbb00, construct=js::NO_CONSTRUCT) at jsinterp.h:119
#21 0x0000000104594c59 in js::Invoke (cx=0x10b14cc80, thisv=@0x7fff5fbfbbb0, fval=@0x7fff5fbfbc08, argc=0, argv=0x7fff5fbfc2f8, rval=0x7fff5fbfbfa8) at /Users/ekr/dev/alder/js/src/jsinterp.cpp:411
#22 0x000000010448ad62 in JS_CallFunctionValue (cx=0x10b14cc80, objArg=0x131a48d00, fval={data = {asBits = 18445477441442188160, debugView = {payload47 = 5127834496, tag = JSVAL_TAG_OBJECT}, s = {payload = {i32 = 832867200, u32 = 832867200, why = 832867200}}, asDouble = -nan(0xb800131a48b80), asPtr = 0xfffb800131a48b80, asWord = 18445477441442188160, asUIntPtr = 18445477441442188160}}, argc=0, argv=0x7fff5fbfc2f8, rval=0x7fff5fbfbfa8) at /Users/ekr/dev/alder/js/src/jsapi.cpp:5974
#23 0x00000001027b003b in nsXPCWrappedJSClass::CallMethod (this=0x131d01510, wrapper=0x131d016e0, methodIndex=16, info=0x10d1653b0, nativeParams=0x7fff5fbfc650) at /Users/ekr/dev/alder/js/xpconnect/src/XPCWrappedJSClass.cpp:1433
#24 0x00000001027a3cab in nsXPCWrappedJS::CallMethod (this=0x131d016e0, methodIndex=16, info=0x10d1653b0, params=0x7fff5fbfc650) at /Users/ekr/dev/alder/js/xpconnect/src/XPCWrappedJS.cpp:580
#25 0x00000001033df1de in PrepareAndDispatch (self=0x131d01760, methodIndex=16, args=0x7fff5fbfc7b0, gpregs=0x7fff5fbfc730, fpregs=0x7fff5fbfc760) at /Users/ekr/dev/alder/xpcom/reflect/xptcall/src/md/unix/xptcstubs_x86_64_darwin.cpp:121
#26 0x00000001033ddc6b in SharedStub () at /Users/ekr/dev/alder/xpcom/reflect/xptcall/src/md/unix/xptcstubs_x86_64_darwin.cpp:35
#27 0x00000001043121bc in sipcc::PeerConnectionObserverDispatch::Run (this=0x10b575290) at PeerConnectionImpl.cpp:170
#28 0x00000001033ac4dd in nsThread::ProcessNextEvent (this=0x10b602b60, mayWait=false, result=0x7fff5fbfce33) at /Users/ekr/dev/alder/xpcom/threads/nsThread.cpp:624
#29 0x000000010331a892 in NS_ProcessPendingEvents_P (thread=0x10b602b60, timeout=20) at nsThreadUtils.cpp:170
#30 0x0000000102e1ffaf in nsBaseAppShell::NativeEventCallback (this=0x10b645640) at /Users/ekr/dev/alder/widget/xpwidgets/nsBaseAppShell.cpp:97
#31 0x0000000102db64cc in nsAppShell::ProcessGeckoEvents (aInfo=0x10b645640) at /Users/ekr/dev/alder/widget/cocoa/nsAppShell.mm:402
#32 0x00007fff95fc14f1 in __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ ()
#33 0x00007fff95fc0d5d in __CFRunLoopDoSources0 ()
#34 0x00007fff95fe7b49 in __CFRunLoopRun ()
#35 0x00007fff95fe7486 in CFRunLoopRunSpecific ()
#36 0x00007fff90b6c4d3 in RunCurrentEventLoopInMode ()
#37 0x00007fff90b73781 in ReceiveNextEventCommon ()
#38 0x00007fff90b7360e in BlockUntilNextEventMatchingListInMode ()
#39 0x00007fff8d338e31 in _DPSNextEvent ()
#40 0x00007fff8d338735 in -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] ()
#41 0x0000000102db4ca7 in -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] (self=0x10b648c40, _cmd=0x7fff8dc106b0, mask=18446744073709551615, expiration=0x422d63c37f00000d, mode=0x7fff7cd6bae0, flag=1 '\001') at /Users/ekr/dev/alder/widget/cocoa/nsAppShell.mm:168
#42 0x00007fff8d335071 in -[NSApplication run] ()
#43 0x0000000102db6f4c in nsAppShell::Run (this=0x10b645640) at /Users/ekr/dev/alder/widget/cocoa/nsAppShell.mm:756
#44 0x0000000102a604d7 in nsAppStartup::Run (this=0x10b6455b0) at /Users/ekr/dev/alder/toolkit/components/startup/nsAppStartup.cpp:273
#45 0x0000000101012f8e in XREMain::XRE_mainRun (this=0x7fff5fbfeb68) at /Users/ekr/dev/alder/toolkit/xre/nsAppRunner.cpp:3834
#46 0x00000001010136af in XREMain::XRE_main (this=0x7fff5fbfeb68, argc=1, argv=0x7fff5fbff758, aAppData=0x1000081c0) at /Users/ekr/dev/alder/toolkit/xre/nsAppRunner.cpp:3911
#47 0x0000000101013acf in XRE_main (argc=1, argv=0x7fff5fbff758, aAppData=0x1000081c0, aFlags=0) at /Users/ekr/dev/alder/toolkit/xre/nsAppRunner.cpp:3987
#48 0x0000000100001ebc in do_main (argc=1, argv=0x7fff5fbff758) at /Users/ekr/dev/alder/browser/app/nsBrowserApp.cpp:174
#49 0x0000000100001785 in main (argc=1, argv=0x7fff5fbff758) at /Users/ekr/dev/alder/browser/app/nsBrowserApp.cpp:279

This appears to be an Anant issue.
Assignee: nobody → anant
Posted file testcase (obsolete) —
Attachment #661309 - Attachment is obsolete: true
Blocks: 792125
Depends on: webrtc-big-lock
Whiteboard: [WebRTC] → [WebRTC], [blocking-webrtc+]
Priority: -- → P1
Flags: in-testsuite?
Just fyi, the testcase will not work anymore and would need an update.
Posted file testcase
Updated testcase for m-c changeset: 110609:cb573b9307e5
Attachment #661776 - Attachment is obsolete: true
Whiteboard: [WebRTC], [blocking-webrtc+] → [WebRTC], [blocking-webrtc+],[fuzzblocker]
Duplicate of this bug: 801747
Verified the callstack is the same.
Posted file testcase
Improved testcase.
It is an error to provide a null callback (atleast once bug 780790 lands), so this test case will have to be updated.
Here a public crash report: bp-cf9314b8-b78f-4128-b83f-eeab72121019

0 	XUL 	mozilla::NrIceCtx::GetGlobalAttributes 	nricectx.cpp:394
1 	XUL 	vcmGetIceParams 	VcmSIPCCBinding.cpp:624
2 	XUL 	fsmdef_ev_createoffer 	fsmdef.c:2901
3 	XUL 	sm_process_event 	sm.c:48
4 	XUL 	fim_process_event 	fim.c:636
5 	XUL 	gsm_process_msg 	gsm.c:132
6 	XUL 	GSMTask 	gsm.c:324
7 	libsystem_c.dylib 	libsystem_c.dylib@0x4e8be 	
8 	libsystem_c.dylib 	libsystem_c.dylib@0x51b74 	
9 	XUL 	XUL@0x154393f
Crash Signature: [@ mozilla::NrIceCtx::GetGlobalAttributes]
Summary: WebRTC crash [@NrIceCtx::GetGlobalAttributes] → WebRTC crash [@ mozilla::NrIceCtx::GetGlobalAttributes]
Duplicate of this bug: 794648
Posted patch crashtest v1Splinter Review
Crashtest for the last testcase on this bug. It does no longer use null for the callback arguments and I also have stripped it down to a minimized case because we crash in createOffer() which should indeed fail if the connection is already closed.
Attachment #673119 - Flags: review?(rjesup)
Duplicate of this bug: 803540
Crash Signature: [@ mozilla::NrIceCtx::GetGlobalAttributes] → [@ mozilla::NrIceCtx::GetGlobalAttributes()] [@ mozilla::NrIceCtx::GetGlobalAttributes]
OS: Mac OS X → All
Hardware: x86_64 → All
Looking at this test case, The problem appears to be, as you say, that createOffer() should just fail immediately once you have called close. We're going to be adding state enforcement in SIPCC, but PC.js should just refuse all requests once .close() has been called.
Comment on attachment 673579 [details] [diff] [review]
Suppress calls to PC after close has been called

Review of attachment 673579 [details] [diff] [review]:
-----------------------------------------------------------------

This will fix the problem at the JS layer. We may want additional fixes in PCImpl.
Attachment #673579 - Flags: review?(anant)
Attachment #673579 - Flags: review?(anant) → review+
Attachment #673579 - Flags: checkin?(anant)
Comment on attachment 673579 [details] [diff] [review]
Suppress calls to PC after close has been called

https://hg.mozilla.org/integration/mozilla-inbound/rev/585b254104f2
Attachment #673579 - Flags: checkin?(anant) → checkin+
whimboo: Can you retest?
Comment on attachment 673579 [details] [diff] [review]
Suppress calls to PC after close has been called

Review of attachment 673579 [details] [diff] [review]:
-----------------------------------------------------------------

Should we [leave_open] this, or open a new bug for PCImpl, or wait until something is reported?  I'm ambivalent.
Attachment #673579 - Flags: review+
I was thinking bug 784519
(In reply to Eric Rescorla from comment #21)
> whimboo: Can you retest?

With the patch applied which is currently in incoming we still crash with the attached crashtest. Please run yourself the following command after you qimported it:

TEST_PATH=dom/media/tests/crashtests/crashtests.list make -C $OBJ_DIR crashtest

So I would say we should not mark this bug as fixed.
I'm not really sure how to read the crashtest logs but I don't see a crash here, and when I ran it against your test individually, things were fine.

Can you please provide a stack trace?
(In reply to Eric Rescorla from comment #25)
> I'm not really sure how to read the crashtest logs but I don't see a crash
> here, and when I ran it against your test individually, things were fine.
> 
> Can you please provide a stack trace?

https://crash-stats.mozilla.com/report/index/bp-b9bb706d-dbb5-495e-bbd6-bfaf42121022
Blocks: 803535
(In reply to Jason Smith [:jsmith] from comment #26)
> https://crash-stats.mozilla.com/report/index/bp-b9bb706d-dbb5-495e-bbd6-
> bfaf42121022

This is a nightly build which doesn't have the fix. So you would have to test a tinderbox build instead.

(In reply to Eric Rescorla from comment #25)
> Can you please provide a stack trace?

I do not really get a stacktrace. It's a problem here which I have to figure out. So I will apply the patch and start a build to check that again.
I re-tested with a tinkerbox inbound build on 10/22 - I'm not getting a reproduction of the crash on any of the test cases attached here and the test case attached to bug 803535. I also am not crashing with the attached crash test.

Can you re-test?
So the tinderbox build works fine but applying the patch to my local tree and opening the testcase immediately freezes Firefox. Eric, I'm on OS X if that makes a difference.
https://hg.mozilla.org/mozilla-central/rev/585b254104f2
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla19
Keywords: verifyme
After the merge and a new build I don't see the hang anymore. Could be that something else was involved which was waiting on inbound.

Randell, can you please review the crashtest so we can get it landed?
Attachment #673119 - Flags: review?(rjesup) → review+
Verified on 10/23. The crash test needs to be landed to - can someone push this that test to inbound?
Status: RESOLVED → VERIFIED
Keywords: verifymecheckin-needed
The push was probably too early here for the crashtest. While the crash isn't happening anymore the test times out. Eric, shouldn't createOffer call onErrorCallback in this situation? Right now it doesn't seem to happen. If that's the case we should get this fixed on another bug. Please let me know and I can file one for it.
Whiteboard: [WebRTC], [blocking-webrtc+],[fuzzblocker] → [WebRTC], [blocking-webrtc+]
You need to log in before you can comment on or make changes to this bug.