Closed Bug 837026 Opened 13 years ago Closed 12 years ago

Frequent Linux/OSX test_peerConnection_basicAudio.html, test_peerConnection_basicAudioVideo.html, test_peerConnection_basicAudioVideoCombined.html, test_peerConnection_basicVideo.html | Test timed

Categories

(Core :: WebRTC: Audio/Video, defect, P1)

defect

Tracking

()

RESOLVED INVALID

People

(Reporter: RyanVM, Unassigned)

References

Details

(Whiteboard: [WebRTC][blocking-webrtc+])

This popped up on inbound tonight with no clear cause for where it's coming from. We're leaving the tests enabled for now so rjesup and abr can more easily investigate what is going on, but that can change if needed. https://tbpl.mozilla.org/php/getParsedLog.php?id=19341615&tree=Mozilla-Inbound Rev4 MacOSX Snow Leopard 10.6 mozilla-inbound debug test mochitest-2 on 2013-01-31 17:11:34 PST for push 4c73235f1aa2 slave: talos-r4-snow-076 26004 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | Test timed out. args: ['/usr/sbin/screencapture', '-C', '-x', '-t', 'png', '/var/folders/Hs/HsDn6a9SG8idoIya6p9mtE+++TI/-Tmp-/mozilla-test-fail_MsCDgc'] SCREENSHOT: <see log> 26005 INFO TEST-END | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | finished in 312006ms 26006 INFO TEST-START | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideo.html ++DOMWINDOW == 13 (0x12d58d8e8) [serial = 3061] [outer = 0x12cd7b7c8] [Parent 352] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004003: file ../../../../intl/uconv/src/nsCharsetConverterManager.cpp, line 301 [Parent 352] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004003: file ../../../../intl/uconv/src/nsCharsetConverterManager.cpp, line 301 --DOMWINDOW == 12 (0x110351d38) [serial = 3060] [outer = 0x12cd7b7c8] [url = http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html] [Parent 352] WARNING: 1 sort operation has occurred for the SQL statement '0x10baa8e30'. See https://developer.mozilla.org/En/Storage/Warnings details.: file ../../../storage/src/mozStoragePrivateHelpers.cpp, line 110 26007 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideo.html | Test timed out. Not taking screenshot here: see the one that was previously logged 26008 INFO TEST-END | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideo.html | finished in 330009ms 26009 INFO TEST-START | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideoCombined.html ++DOMWINDOW == 13 (0x110351d38) [serial = 3062] [outer = 0x12cd7b7c8] [Parent 352] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004003: file ../../../../intl/uconv/src/nsCharsetConverterManager.cpp, line 301 [Parent 352] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004003: file ../../../../intl/uconv/src/nsCharsetConverterManager.cpp, line 301 --DOMWINDOW == 12 (0x12d58d8e8) [serial = 3061] [outer = 0x12cd7b7c8] [url = http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideo.html] 26010 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideoCombined.html | Test timed out. Not taking screenshot here: see the one that was previously logged 26011 INFO TEST-END | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideoCombined.html | finished in 330550ms 26012 INFO TEST-START | /tests/dom/media/tests/mochitest/test_peerConnection_basicVideo.html ++DOMWINDOW == 13 (0x153406f88) [serial = 3063] [outer = 0x12cd7b7c8] [Parent 352] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004003: file ../../../../intl/uconv/src/nsCharsetConverterManager.cpp, line 301 [Parent 352] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004003: file ../../../../intl/uconv/src/nsCharsetConverterManager.cpp, line 301 --DOMWINDOW == 12 (0x110351d38) [serial = 3062] [outer = 0x12cd7b7c8] [url = http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideoCombined.html] [Parent 352] WARNING: 1 sort operation has occurred for the SQL statement '0x10baa8e30'. See https://developer.mozilla.org/En/Storage/Warnings details.: file ../../../storage/src/mozStoragePrivateHelpers.cpp, line 110 26013 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_basicVideo.html | Test timed out. Not taking screenshot here: see the one that was previously logged 26014 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | 4 test timeouts, giving up. 26015 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | Skipping 4 remaining tests. 26016 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | /tests/dom/media/tests/mochitest/test_peerConnection_basicVideo.html finished in a non-clean fashion, probably because it didn't call SimpleTest.finish()
I have an initial hunch that there's an event we are waiting on that isn't firing.
These tests seriously need more log statements about progress of the test. That would make determining the issue here 40 times easier.
Depends on: 837028
Blocks: 837037
Thanks Jason for the logger enhancements. So what's happening here is that we fail in: 58 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | Calling createOffer on local peer connection [..] 59 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | Test timed out. This is happening across all peerconnection tests. Finally we even crash: Thread 21 (crashed) 0 libmozalloc.so!mozalloc_abort(char const*) [mozalloc_abort.cpp:a7f7309da69d : 30 + 0x0] eip = 0x00651f0d esp = 0xa6cfefb0 ebp = 0xa6cfefc8 ebx = 0x00653120 esi = 0x00c59844 edi = 0x00000000 eax = 0x0000000a ecx = 0x00000001 edx = 0x00c5a32c efl = 0x00010246 Found by: given as instruction pointer in context 1 libmozalloc.so!mozalloc_handle_oom(unsigned int) [mozalloc_oom.cpp:a7f7309da69d : 27 + 0xe] eip = 0x00651f52 esp = 0xa6cfefd0 ebp = 0xa6cfefe8 ebx = 0x00653120 esi = 0x00070800 edi = 0x00000000 Found by: call frame info 2 libmozalloc.so!moz_xmalloc [mozalloc.cpp:a7f7309da69d : 56 + 0x8] eip = 0x00651b8d esp = 0xa6cfeff0 ebp = 0xa6cff008 ebx = 0x00653120 esi = 0x00070800 edi = 0x00000000 Found by: call frame info 3 libxul.so + 0x15a2273 eip = 0x02259274 esp = 0xa6cff010 ebp = 0xa6cff058 ebx = 0x032fd160 esi = 0xa8f69740 edi = 0x00000000 Found by: call frame info The stack is barely usable. :(
Crash Signature: [@ mozalloc_abort(char const*)]
Summary: Frequent Linux/OSX test_peerConnection_basicAudio.html, test_peerConnection_basicAudioVideo.html, test_peerConnection_basicAudioVideoCombined.html, test_peerConnection_basicVideo.html | Test timed out. → Frequent Linux/OSX test_peerConnection_basicAudio.html, test_peerConnection_basicAudioVideo.html, test_peerConnection_basicAudioVideoCombined.html, test_peerConnection_basicVideo.html | Test timed out and crash [@ mozalloc_abort(char const*)
The timeout happens here: + info("Calling createOffer on local peer connection"); aPCLocal.createOffer(onCreateOfferSuccess, unexpectedCallbackAndFinish); We have an error handler assigned so we timeout inside of the createOffer method with whether onSuccess nor onError called. Sorry for adding crash here, which is indeed only that the framwork killed Firefox.
Crash Signature: [@ mozalloc_abort(char const*)]
Summary: Frequent Linux/OSX test_peerConnection_basicAudio.html, test_peerConnection_basicAudioVideo.html, test_peerConnection_basicAudioVideoCombined.html, test_peerConnection_basicVideo.html | Test timed out and crash [@ mozalloc_abort(char const*) → Frequent Linux/OSX test_peerConnection_basicAudio.html, test_peerConnection_basicAudioVideo.html, test_peerConnection_basicAudioVideoCombined.html, test_peerConnection_basicVideo.html | Test timed
Another thing I'm noticing as well is that onaddstream isn't getting a callback either after calling addStream.
onaddstream is only fired for remote streams but not when you add a local stream to the peer connection object via addStream().
(In reply to Henrik Skupin (:whimboo) from comment #38) > onaddstream is only fired for remote streams but not when you add a local > stream to the peer connection object via addStream(). Ah, right.
Hmm...maybe bug 834383 regressed this? That's the only guess I've got looking at the changesets over the past day.
(In reply to Jason Smith [:jsmith] from comment #40) > Hmm...maybe bug 834383 regressed this? That's the only guess I've got > looking at the changesets over the past day. See bug 834383 comment 35. Ryan, has the backout of that mentioned patch shown something?
Flags: needinfo?(ryanvm)
We were still seeing the issue after we backed it out.
Flags: needinfo?(ryanvm)
And note that ryan relanded bug 834383, and later philor clobbbered the tree, and in the 7 or 8 pushes since then that have gotten to M3, things seem much better, knock on wood.
That's right, but doesn't satisfy us? I feel it's scary. Why does a clobber make things better in terms of test failures? I thought it's only necessary if (major) build changes have happened. I will at least try to reproduce the problem on OS X with a local build from inbound.
(In reply to Henrik Skupin (:whimboo) from comment #44) > That's right, but doesn't satisfy us? I feel it's scary. Why does a clobber > make things better in terms of test failures? I thought it's only necessary > if (major) build changes have happened. I will at least try to reproduce the > problem on OS X with a local build from inbound. I'm doing the same, using m-i c0b435ca93f9 (which showed the mochi-2 failures pretty reliably, at least on OS X). If I can capture this locally, I can probably start to get some traction around where the problem is.
I'm starting to wonder whether these failures share a pathology with Bug 823056 and Bug 835476. As I mention in 835476, there are a handful of ways the race between ccapp_thread and main thread can be lost, and it's conceivable that one of them leave ccapp_thread spinning in linked_ptr_internal::depart(). If this happens, then any messages sent from the main thread will just sit in the queue indefinitely. Obviously, this theory would be easy to dispel or confirm if we could get at the thread stacks. Barring that, one telltale symptom of this condition would be that one CPU core should be running very hot, at or near 100% user. Is there some way to get this information off the test systems when things fail, or is this something we could easily add?
Do any of these bugs involve .gypi (not .gyp!) change? There's a known bug that .gypi files aren't included in dependencies; if you change a .gypi you must change all dependent .gyp files (trivially) or force a clobber. I however *really* doubt this is the issue (.gypi are solely in media/webrtc/trunk except for media/webrtc/webrtc_config.gypi which only affects files in trunk at this time.
(In reply to Adam Roach [:abr] from comment #45) > I'm doing the same, using m-i c0b435ca93f9 (which showed the mochi-2 > failures pretty reliably, at least on OS X). If I can capture this locally, > I can probably start to get some traction around where the problem is. FWIW, this didn't reproduce the error for me locally. Given that the oranges were cleared by a clobber build, I suspect that this isn't something we're going to be able to easily re-create. I'm marking this as depending on Bug 823056, since I have a strong suspicion that these errors were being caused by the same set of races.
Depends on: 823056
I'm not able to reproduce this either. So we might want to wait a bit and continue watching tbpl if those failures occur again or not.
philor -- I know the title does partially match, but the underlying cause of this bug is that the WebRTC signaling system hangs on initialization. The fingerprint to look for here isn't *one* of these four test cases failing, but *all* *four* of them failing. The log from Comment 50 is a different problem. It may need a new bug opened.
Ok, so I can see this problem on my Win7 box with a Nightly debug build. What logs would you like to have? As I can see we do not execute any single check in each of those peer connection tests but run into a timeout.
Assignee: nobody → adam
We're pretty certain this bug went away with the tree clobber, as it has not resurfaced for over a week. I've opened Bug 839677 to cover the bug shown in the log in comment 50. I've opened Bug 839679 to cover the bug shown in the log in comment 52. We're closing this bug to help prevent it from being an attractive nuisance for tinderbox stars.
Assignee: adam → nobody
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.