Closed Bug 877515 Opened 11 years ago Closed 10 years ago

Intermittent test_peerConnection_offerRequiresReceiveVideoAudio.html | Exited with code -1073741819 [@ mozilla::dom::CallbackObject::CallSetup::CallSetup(JS::Handle<JSObject *>,mozilla::ErrorResult &,mozilla::dom::CallbackObject::ExceptionHandling)]

Categories

(Core :: DOM: Core & HTML, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: RyanVM, Assigned: mccr8)

References

Details

(Keywords: crash, intermittent-failure, Whiteboard: [WebRTC][blocking-webrtc-])

Crash Data

https://tbpl.mozilla.org/php/getParsedLog.php?id=23559379&tree=Mozilla-Inbound

Windows 7 32-bit mozilla-inbound opt test mochitest-3 on 2013-05-29 17:25:02 PDT for push c711779d7be6
slave: t-w732-ix-008

17:42:11     INFO -  18801 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_peerConnection_offerRequiresReceiveVideoAudio.html | Got offer: {"type":"offer","sdp":"v=0\r\no=Mozilla-SIPUA-24.0a1 16569 0 IN IP4 0.0.0.0\r\ns=SIP Call\r\nt=0 0\r\na=ice-ufrag:35801cb1\r\na=ice-pwd:7387c18bc6b623be763abb2b21a5bb51\r\na=fingerprint:sha-256 A3:65:24:29:A9:DF:CF:2B:D3:CC:63:2F:25:65:CF:30:4D:A5:CD:08:89:F4:1A:6C:CA:F5:B1:45:C7:E2:50:97\r\nm=audio 53951 RTP/SAVPF 109 0 8 101\r\nc=IN IP4 63.245.214.82\r\na=rtpmap:109 opus/48000/2\r\na=ptime:20\r\na=rtpmap:0 PCMU/8000\r\na=rtpmap:8 PCMA/8000\r\na=rtpmap:101 telephone-event/8000\r\na=fmtp:101 0-15\r\na=recvonly\r\na=candidate:0 1 UDP 2111832319 10.26.40.158 56292 typ host\r\na=candidate:1 1 UDP 1692467199 63.245.214.82 53951 typ srflx raddr 10.26.40.158 rport 56292\r\na=candidate:0 2 UDP 2111832318 10.26.40.158 56293 typ host\r\nm=video 57925 RTP/SAVPF 120\r\nc=IN IP4 63.245.214.82\r\na=rtpmap:120 VP8/90000\r\na=recvonly\r\na=candidate:0 1 UDP 2111832319 10.26.40.158 56294 typ host\r\na=candidate:1 1 UDP 1692467199 63.245.214.82 57925 typ srflx raddr 10.26.40.158 rport 56294\r\na=candidate:0 2 UDP 2111832318 10.26.40.158 56295 typ host\r\na=candidate:1 2 UDP 1692467198 63.245.214.82 48454 typ srflx raddr 10.26.40.158 rport 56295\r\n"}
17:42:11     INFO -  18802 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_peerConnection_offerRequiresReceiveVideoAudio.html | Run step: PC_LOCAL_SET_LOCAL_DESCRIPTION
17:42:13  WARNING -  TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_offerRequiresReceiveVideoAudio.html | Exited with code -1073741819 during test run
17:42:13     INFO -  INFO | automation.py | Application ran for: 0:13:57.170000
17:42:13     INFO -  INFO | zombiecheck | Reading PID log: c:\users\cltbld\appdata\local\temp\tmpirn5vbpidlog
17:42:13     INFO -  ==> process 2132 launched child process 216
17:42:13     INFO -  ==> process 2132 launched child process 3068
17:42:13     INFO -  ==> process 2132 launched child process 3976
17:42:13     INFO -  ==> process 2132 launched child process 1612
17:42:13     INFO -  INFO | zombiecheck | Checking for orphan process with PID: 216
17:42:13     INFO -  INFO | zombiecheck | Checking for orphan process with PID: 3068
17:42:13     INFO -  INFO | zombiecheck | Checking for orphan process with PID: 3976
17:42:13     INFO -  INFO | zombiecheck | Checking for orphan process with PID: 1612
17:42:13     INFO -  mozcrash INFO | Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1369870053/firefox-24.0a1.en-US.win32.crashreporter-symbols.zip
17:42:13     INFO -  Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1369870053/firefox-24.0a1.en-US.win32.crashreporter-symbols.zip
17:42:21  WARNING -  PROCESS-CRASH | /tests/dom/media/tests/mochitest/test_peerConnection_offerRequiresReceiveVideoAudio.html | application crashed [@ mozilla::dom::CallbackObject::CallSetup::CallSetup(JS::Handle<JSObject *>,mozilla::ErrorResult &,mozilla::dom::CallbackObject::ExceptionHandling)]
17:42:21     INFO -  Crash dump filename: c:\users\cltbld\appdata\local\temp\tmp21fx6u\minidumps\4ddf1ece-e767-4759-b2a2-3c45e0813532.dmp
17:42:21     INFO -  Operating system: Windows NT
17:42:21     INFO -                    6.1.7601 Service Pack 1
17:42:21     INFO -  CPU: x86
17:42:21     INFO -       GenuineIntel family 6 model 30 stepping 5
17:42:21     INFO -       8 CPUs
17:42:21     INFO -  Crash reason:  EXCEPTION_ACCESS_VIOLATION_READ
17:42:21     INFO -  Crash address: 0x8
17:42:21     INFO -  Thread 0 (crashed)
17:42:21     INFO -   0  xul.dll!mozilla::dom::CallbackObject::CallSetup::CallSetup(JS::Handle<JSObject *>,mozilla::ErrorResult &,mozilla::dom::CallbackObject::ExceptionHandling) [CallbackObject.cpp:c711779d7be6 : 46 + 0x3]
17:42:21     INFO -      eip = 0x683649d3   esp = 0x0021e5ac   ebp = 0x0021e5bc   ebx = 0x00000000
17:42:21     INFO -      esi = 0x0021e60c   edi = 0x0021e6f8   eax = 0x00000008   ecx = 0x00000000
17:42:21     INFO -      edx = 0x00000001   efl = 0x00010246
17:42:21     INFO -      Found by: given as instruction pointer in context
17:42:21     INFO -   1  xul.dll!mozilla::dom::mozRTCPeerConnectionJSImpl::SetLocalDescription(mozilla::dom::mozRTCSessionDescription &,mozilla::dom::Optional<mozilla::dom::OwningNonNull<mozilla::dom::VoidFunction> > const &,mozilla::dom::Optional<mozilla::dom::OwningNonNull<mozilla::dom::RTCPeerConnectionErrorCallback> > const &,mozilla::ErrorResult &,mozilla::dom::CallbackObject::ExceptionHandling) [RTCPeerConnectionBinding.cpp:c711779d7be6 : 2716 + 0x32]
17:42:21     INFO -      eip = 0x682ce6d4   esp = 0x0021e5c4   ebp = 0x0021e6d0
17:42:21     INFO -      Found by: call frame info
17:42:21     INFO -   2  xul.dll!mozilla::dom::mozRTCPeerConnectionBinding::setLocalDescription [RTCPeerConnectionBinding.cpp:c711779d7be6 : 595 + 0x1d]
17:42:21     INFO -      eip = 0x682d150b   esp = 0x0021e6d8   ebp = 0x0021e748
17:42:21     INFO -      Found by: call frame info
17:42:21     INFO -   3  xul.dll!mozilla::dom::mozRTCPeerConnectionBinding::genericMethod [RTCPeerConnectionBinding.cpp:c711779d7be6 : 2113 + 0x13]
17:42:21     INFO -      eip = 0x682cbd55   esp = 0x0021e750   ebp = 0x0021e780
17:42:21     INFO -      Found by: call frame info
Hmm...a WebIDL crash? Boris - Do you know?
Flags: needinfo?(bzbarsky)
That looks like a null aCallback was passed to CallSetup.  Andrew, any idea how that can happen?  We only null out the callback pointer on unlink...
Flags: needinfo?(bzbarsky) → needinfo?(continuation)
Crash Signature: [@ mozilla::dom::CallbackObject::CallSetup::CallSetup(JS::Handle<JSObject*>, mozilla::ErrorResult&, mozilla::dom::CallbackObject::ExceptionHandling)]
Component: WebRTC → DOM
QA Contact: jsmith
Whiteboard: [WebRTC][blocking-webrtc-]
bz, did you mean the JS code called SetLocalDescription with a null callback, or something internal messed up and did so?
I mean that somehow we have a null pointer where we expect to have the pointer to the chrome-side JS object.  At least at first glance that's what's going on.
It looks like it is crashing one this line:
  46   xpc_UnmarkGrayObject(aCallback);

xpc_UnmarkGrayObject does a null check (doing nothing in that case), so I don't think aCallback is null.  My usual guess for these kinds of things where we null crash at the start of a method is that |this| is null.  I'm not sure how that would happen, though, except for a use-after-unlink.

As bz suggested on IRC, one possible cause for this is a failure to do unmarkGray somewhere.  It seems like it would be weird that the outer JS representation would be only reachable from C++ in a test, which is what would be needed for it to be gray...
Flags: needinfo?(continuation)
Assignee: nobody → continuation
jesup caught this in a debugger earlier.  Here's some IRC discussion (I took a bit of liberty with editing this to make it take less lines):
16:10	bz	do we know which test this is happening in?
16:10	bz	And which JS code we're generally running?
16:11	jesup	test_peerConnection_offerRequiresReceiveAudio.html

16:12	bz	So presumably the timeout handler is 84 window.setTimeout(_executeNext, 0);
16:13	bz	(in pc.js) We know the mozRTCPeerConnection has been unlinked, right?

16:15	bz	so basically mozRTCPeerConnection::mImpl is null
16:15	bz	as is its mWindow. er, mParent
16:15	bz	So it's been unlinked, but we're being called from JS

16:20	jesup	In this case the script is from setTimeout I believe

16:24	bz	so in this case we have setTimeout running script that does a call on our object
16:25	bz	so I would think we were reachable from content JS all along
16:25	bz	and then, why and where did we get unlinked?
jesup: opt build on windows, --disable-debug --disable-strip, run twice under debugger with ./mach mochitest-plain --debugger=path-to-windbg.exe  dom/media/tests/mochitests/
jesup: with an opt build with --enable-debug, I ran it many times with no failures.  hit on second run with --disable-debug
That was a quote from IRC on how jesup managed to reproduce this.
Randell, if you can manage to reproduce it again with XPCOM_CC_LOG_ALL=1 and get the last 4 or so GC and CC logs, and the address of the object we're crashing on, that would be useful.  I'll investigate it myself.  Maybe I can force the timing of CC somehow to make it happen more.
Flags: needinfo?(rjesup)
Sorry - I ran a bunch more times and never saw it again.  I finally hit the bug I was trying to find, and have sat with it in the debugger so I can play puppet for Michael Tuexen (SCTP library author).  Don't count on my catching it...
Flags: needinfo?(rjesup)
Thanks.  I was hoping you had a magical machine that would easily reproduce this. :)  I'll try messing with when we run the CC...
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
As I mentioned on IRC, these September failures show a different pathology than the original problem recorded in May: even though the top frame is the same, and even though it's the same test case, we're in the middle of a different operation here. All the recent failures are in get_signalingState, while the earlier failure was in setLocalDescription.

They might be related, but they might not. Given that we've attached all the logs to this bug, I don't see much point in splitting it now -- but, for anyone else troubleshooting this problem: we can't assume that this is the same problem that we saw back in May.

Not least of all because we're seeing this on several different platforms, whereas the previous issue appears to have been Windows-only. Updating the bug flags accordingly.
OS: Windows 7 → All
Depends on: 928221
Depends on: 933447
This hasn't happened since October.
Status: REOPENED → RESOLVED
Closed: 11 years ago10 years ago
Resolution: --- → WORKSFORME
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.