Closed
Bug 853858
Opened 12 years ago
Closed 12 years ago
Intermittent test_peerConnection_basicAudio.html, test_peerConnection_basicAudioVideo.html, test_peerConnection_basicAudioVideoCombined.html, test_peerConnection_basicVideo.html, test_peerConnection_bug827843.html, test_peerConnection_bug | Test timed out
Categories
(Core :: WebRTC: Signaling, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla23
People
(Reporter: abr, Assigned: abr)
References
Details
(Keywords: intermittent-failure, Whiteboard: [WebRTC],[blocking-webrtc-][qa-])
Attachments
(2 files, 1 obsolete file)
9.47 KB,
patch
|
ehugg
:
review+
|
Details | Diff | Splinter Review |
4.71 KB,
patch
|
whimboo
:
review+
akeybl
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #839677 +++
RyanVM
https://tbpl.mozilla.org/php/getParsedLog.php?id=20886816&tree=Mozilla-Inbound
Ubuntu 12.04 x64 mozilla-inbound opt test mochitest-2 on 2013-03-20 08:43:11
slave: tst-linux64-ec2-324
09:06:53 INFO - 26689 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | Test timed out.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 2•12 years ago
|
||
Assignee | ||
Comment 3•12 years ago
|
||
Comment on attachment 729798 [details] [diff] [review]
Additional instrumentation for on-hook event
This bug is one of a class of bug where a call is torn down with an ONHOOK event, but no corresponding messaging happens towards the PeerConnection. Since no callback is issued, the corresponding tests time out.
This patch does three things to help shed some light on why calls are sometimes town down without sending an event back to the PeerConnection. (1) All calls to cc_onhook and friends now indicate the source file and line number from which it was called; (2) PeerConnectionImpl has additional logging messages that are printed when it is about to invoke endCall (which ultimately causes an ONHOOK event to be triggered); and (3) Logging of messages sent to the "ui" (PeerConnectionImpl) has been increased from Debug to Notice so that it will be emitted during mochi tests.
Attachment #729798 -
Flags: review?(ethanhugg)
Comment 4•12 years ago
|
||
Comment on attachment 729798 [details] [diff] [review]
Additional instrumentation for on-hook event
Review of attachment 729798 [details] [diff] [review]:
-----------------------------------------------------------------
::: media/webrtc/signaling/src/sipcc/core/gsm/ccapi.c
@@ +1422,5 @@
> pmsg->active_list = active_list;
>
> CC_DEBUG_ENTRY(__FUNCTION__, src_id, dst_id, call_id, line, cc_msg_name(pmsg->msg_id));
> + DEF_DEBUG("(%d/%d) On-hook called from %s:%d",
> + line, call_id, filename, fileline);
Nit: I believe both line_t and callid_t are unsigned.
::: media/webrtc/signaling/src/sipcc/core/includes/phone_debug.h
@@ +108,5 @@
> #define PHN_DEBUG_SNTP_PACKET if (DebugSNTPPacket)
>
> /* TNP adapter debugs */
> #define TNP_DEBUG(format,...) if (TNPDebug) \
> + CSFLogNotice("tnp", format, ## __VA_ARGS__)
You may want to mention here why you are redirecting debug to notice.
Attachment #729798 -
Flags: review?(ethanhugg) → review+
Updated•12 years ago
|
Whiteboard: [WebRTC],[blocking-webrtc+] [qa-] → [WebRTC],[blocking-webrtc+]
Updated•12 years ago
|
Whiteboard: [WebRTC],[blocking-webrtc+] → [WebRTC],[blocking-webrtc-]
Assignee | ||
Comment 5•12 years ago
|
||
Comment 6•12 years ago
|
||
Whiteboard: [WebRTC],[blocking-webrtc-] → [WebRTC],[blocking-webrtc-][leave open]
Assignee | ||
Comment 7•12 years ago
|
||
Okay, I think I see what's going on here, thanks to the log in Bug 854255 comment 7. The telltale clue is that things are cruising along just fine and then *BAM* out of left field, someone destroys the PC.
How can this happen? Well, the PC is rooted in the content javascript context. By my understanding, the only way it can be destroyed is by the content javascript relinquishing all references to it. Once that happens, the garbage collector will eventually find it and clean it up.
Turning to the test at hand; the mochitest itself starts the chain of events like this:
runTest(function () {
var test = new PeerConnectionTest();
test.setMediaConstraints([{audio: true}, {video: true}],
[{audio: true}, {video: true}]);
test.run();
}, true);
That "PeerConnectionTest" object contains the two PCs that will be used for the test.
Now, keep in mind that the "test.run()" function call only executes the first step in the chain of events and then returns. And then the runTest() function returns. And the "test" variable goes out of scope.
That last step is critical: once the "test" variable goes out of scope, the entire graph of objects comprising PeerConnectionTest, its two PCs, their observers, and their supporting C++ objects becomes unrooted, and subject to collection the next time the garbage collector runs.
If that happens before the test is finished, then your PCs go away (along with all your testing objects), and no more callbacks get fired.
The following test cases all reuse this pattern, and will consequently share this pathology:
* test_peerConnection_basicAudio.html (this bug)
* test_peerConnection_basicAudioVideo.html (Bug 854255)
* test_peerConnection_basicAudioVideoCombined.html (Bug 845538)
* test_peerConnection_basicVideo.html
* test_peerConnection_bug827843.html
* test_peerConnection_bug840344.html (Bug 841496)
I'm duping those other bugs to this one, and renaming this one so it will match in tbpl for star purposes.
I think the solution is to move the "var test = new PeerConnectionTest();" into a global scope so it survives for the duration of the test. I'll attach a patch shortly.
Summary: Intermittent /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | Test timed out → Intermittent test_peerConnection_basicAudio.html, test_peerConnection_basicAudioVideo.html, test_peerConnection_basicAudioVideoCombined.html, test_peerConnection_basicVideo.html, test_peerConnection_bug827843.html, test_peerConnection_bug | Test timed out
Assignee | ||
Comment 11•12 years ago
|
||
Assignee | ||
Comment 12•12 years ago
|
||
This patch is untested -- I'm waiting for a tree clobber to finish so I can give it a try.
Assignee | ||
Comment 13•12 years ago
|
||
Comment on attachment 733333 [details] [diff] [review]
Move PeerConnectionTest instance reference into global scope
This works locally, although I've never been able to replicate the timeout on a local machine (so I can't verify that the problem is gone)...
Attachment #733333 -
Flags: review?(rjesup)
Comment 14•12 years ago
|
||
Comment on attachment 733333 [details] [diff] [review]
Move PeerConnectionTest instance reference into global scope
Review of attachment 733333 [details] [diff] [review]:
-----------------------------------------------------------------
Go for it!
Attachment #733333 -
Flags: review?(rjesup) → review+
Assignee | ||
Comment 15•12 years ago
|
||
Whiteboard: [WebRTC],[blocking-webrtc-][leave open] → [WebRTC],[blocking-webrtc-]
Comment 16•12 years ago
|
||
Sorry, backed this out because it's failing on Android and B2G:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b76be1ee5350
Comment 17•12 years ago
|
||
Ah, I know why this failed. In global scope, we don't do the user agent checking, so this is always going to fail on alternative platforms right now.
A simple fix to this problem I think would be to define the var in global scope, but only create the object within the runTest object.
Comment 18•12 years ago
|
||
Comment 19•12 years ago
|
||
Comment on attachment 733756 [details] [diff] [review]
Move PeerConnectionTest instance reference into global scope
Either one should do for this sort of bustage fix
https://tbpl.mozilla.org/?tree=Try&rev=257ff0591dc6
(Oranges are not us; mostly infra)
Attachment #733756 -
Flags: review?(jsmith)
Attachment #733756 -
Flags: review?(hskupin)
Comment 20•12 years ago
|
||
Comment on attachment 733756 [details] [diff] [review]
Move PeerConnectionTest instance reference into global scope
Review of attachment 733756 [details] [diff] [review]:
-----------------------------------------------------------------
Yay for complicate dependencies. This looks fine now and should also pass on Android.
Attachment #733756 -
Flags: review?(jsmith)
Attachment #733756 -
Flags: review?(hskupin)
Attachment #733756 -
Flags: review+
Updated•12 years ago
|
Attachment #733333 -
Attachment is obsolete: true
Comment 21•12 years ago
|
||
Updated•12 years ago
|
Whiteboard: [WebRTC],[blocking-webrtc-] → [WebRTC],[blocking-webrtc-][webrtc-uplift]
Comment 22•12 years ago
|
||
Backed out for leaking the world on Windows mochitest-2.
https://hg.mozilla.org/integration/mozilla-inbound/rev/f7156b703f3d
https://tbpl.mozilla.org/php/getParsedLog.php?id=21469097&tree=Mozilla-Inbound
The tbpl leak analyzer says 10 DOMWINDOWS leaked.
Comment 23•12 years ago
|
||
Well, so much for my idea for working around this problem. We fix the platform issue and now we have leaks.
Clint - Why is mochitest blowing up here with the global reference we are doing in this patch?
Flags: needinfo?(ctalbert)
Comment 24•12 years ago
|
||
I looked into it in detail this morning and chatted with Ryan. This really, really looks like some other random orange - the log is full of no-window and out-of-memory errors that don't seem connected to our code while trying to shut down (and that may be why cleanup didn't happen for a ton of stuff). There's no way moving the scope of this JS var caused this, and all the pushes between this and the backout were green as well (or other known oranges).
I'm going to ask Ryan to let me re-land it (we'd discussed that I would likely do so earlier).
Updated•12 years ago
|
Flags: needinfo?(ctalbert)
Comment 25•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/8262a337d5be
The leaks occurred in Win7 M2, and the logs show none of the tests in this patch ran in M2. Relanded.
Comment 26•12 years ago
|
||
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
Updated•12 years ago
|
status-firefox21:
--- → disabled
status-firefox22:
--- → affected
status-firefox23:
--- → fixed
tracking-firefox22:
--- → ?
Comment 27•12 years ago
|
||
Comment on attachment 733756 [details] [diff] [review]
Move PeerConnectionTest instance reference into global scope
[Approval Request Comment]
Bug caused by (feature/regressing bug #): Initial test
User impact if declined: Random oranges in tests - this patch just updates the tests to make them not sensitive to GC.
Testing completed (on m-c, etc.): Just landed on m-c.
Risk to taking this patch (and alternatives if risky): minimal to no risk and the risk is only to tests run on 22 in automation.
String or IDL/UUID changes made by this patch:
Attachment #733756 -
Flags: approval-mozilla-aurora?
Updated•12 years ago
|
Whiteboard: [WebRTC],[blocking-webrtc-][webrtc-uplift] → [WebRTC],[blocking-webrtc-][webrtc-uplift][qa-]
Comment 28•12 years ago
|
||
Comment on attachment 733756 [details] [diff] [review]
Move PeerConnectionTest instance reference into global scope
Basically NPOTB, and fixes orange. a+
Attachment #733756 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Updated•12 years ago
|
Comment 29•12 years ago
|
||
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 31•12 years ago
|
||
Per comment 29, removing webrtc-uplift flag
Whiteboard: [WebRTC],[blocking-webrtc-][webrtc-uplift][qa-] → [WebRTC],[blocking-webrtc-][qa-]
Comment hidden (Legacy TBPL/Treeherder Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•