Closed Bug 826044 Opened 11 years ago Closed 11 years ago

Debug builds leak for WebRTC crashtests on mozilla-inbound

Categories

(Core :: WebRTC, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla21

People

(Reporter: whimboo, Assigned: jesup)

References

(Blocks 1 open bug)

Details

(Whiteboard: [WebRTC][blocking-webrtc+][qa-])

Attachments

(3 files, 7 obsolete files)

Today I pushed a patch which should enable the WebRTC crashtests on mozilla-central. Not sure why but on mozilla-inbound we got various leaks reported which we haven't seen on Alder before. Reason (as figured out at this time) is that we have leak checking disabled for crash tests. Beh! :( I will enable leak checks on alder first so we can better diagnose the problem.

Leaks we currently see:
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 465275 bytes during test execution
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 132 instances of AtomImpl with size 24 bytes each (3168 bytes total)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 1 instance of BackstagePass with size 24 bytes
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 2 instances of BodyRule with size 16 bytes each (32 bytes total)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 4 instances of Comment with size 68 bytes each (272 bytes total)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 2 instances of DOMStorageImpl with size 100 bytes each (200 bytes total)
Attachment #697187 - Flags: review?(rjesup)
Attachment #697187 - Flags: review?(rjesup) → review+
Blocks: 812648
No longer blocks: 814531
Whiteboard: [MemShrink]
Not sure what's going on here exactly. But I can constantly reproduce it by running the following crash tests in succession: 780790.html and 791330.html. It doesn't matter which code we exercise in the first test, but when I remove the onload handler in 791330.html the leaks don't appear anymore.

Randell told me on IRC that he can see the leaks when running the automation but not when manually doing those checks and quitting the browser afterward. Sadly I can't test this given that my debug build is totally slow (up to 6 minutes until I see the first browser window), so I'm doing a clobber atm.
Running one test at a time, from ./mach crashtest dom/media/tests/crashtests

With just the first one or the last one, no leaks.  All others (run alone) leak for me.  Fedora 17.

If I load these crashtests into a real browser, I see NO leaks in any of the tests. (Start browser, load file, wait to completion, quit browser.)

Something is interacting with the automation rig.  Timing issue perhaps.
So even with enabled leak checks I don't see any leaks with the latest Alder build. Everything is green for crashtests. I have triggered a clobber build in case it will change something. If not, we have to trust our local testing and have to try on inbound again when we think we are good. :/
This isn't a memshrink issue.  This appears to be an automation issue.
Assignee: nobody → hskupin
Priority: -- → P1
Whiteboard: [MemShrink] → [WebRTC], [blocking-webrtc+]
Ted, would you have some pointers how to investigate and fix those leaks? I would like to continue here tomorrow while I'm on the train. Thanks.
If you can't reproduce them locally I don't know that I can offer you any advice. I'm also not really a leak-solving expert, you might want to ask khuey or dbaron.

Is your local debug build built with --disable-optimize? The tinderbox builds are optimized nowadays.
Mine were --disable-optimize; I see leaks when run locally (all but first and last tests).  Same build, loading the tests into the real browser (fresh profile) and then quitting: no leaks.
So I did some additional tests earlier today and as what I can say now is that those leaks are not related to the automation framework. They are tightly connected to peer connection instances and when the browser gets closed. As an example have the following crashtest:

<!DOCTYPE HTML>
<html class="reftest-wait">
<!--
https://bugzilla.mozilla.org/show_bug.cgi?id=791270
-->
<head>
  <meta charset="utf-8">
  <title>Simple PeerConnection Video Test</title>
  <script type="application/javascript">
    function finish() {
      document.documentElement.removeAttribute("class");
    }

    var pc = new mozRTCPeerConnection();
    pc.close();

    setTimeout(finish, 3500);
  </script>
</head>
<body>
</body>
</html>


This will not leak as long as the setTimeout value is large enough. For me it's about 3.5s to let the test pass. Once you decrease the value you will see the leak. You can also make the test pass when you comment out both pc lines. So something inside the PC code has to be the cause.
Assignee: hskupin → nobody
Whiteboard: [WebRTC], [blocking-webrtc+] → [WebRTC][blocking-webrtc+][automation-blocked]
Assignee: nobody → rjesup
Randell, were you able to make some progress here in the last 6 days? Anything you want me to check?
Note: What's shown in comment 9 doesn't match 791270.html, so I'll refer to it as 826044.html.

What I'm seeing on my mac (10.8.2) from running tests individually w/m-c:
- using 3 runs or more each
- with the following set (the latter may alter timing slightly):
  MOZ_WEBRTC_LEAKING_TESTS=1
  MOZ_WEBRTC_TESTS=1
  NSPR_LOG_MODULES=signaling:3

default-preferences pref(media.peerconnection.enabled,true)

load 780790.html - no leaks
load 791270.html - no leaks (but error in log each time *)
load 791278.html - no leaks
load 791330.html - no leaks (but each run takes 5 minutes!)
load 799419.html - no leaks
load 801227.html - LEAKS (110 AtomImpl,1 BsPass,1 BodyRule,3 Comment,2 DOMStrgImpl)
load 802982.html - LEAKS (107 AtomImpl,1 BsPass,2 BodyRule,2 Comment,1 DR_State)
load 812785.html - no leaks
load 826044.html - no leaks (but crashed once **)

The number of leaked objects stays the same from run to run.

*)
 0:03.89 ************************************************************
 0:03.89 * Call to xpconnect wrapped JSObject produced this error:  *
 0:03.89 [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [IPeerConnection.addStream]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: file:///Users/Jan/moz/mozilla-central/obj-x86_64-apple-darwin12.2.1-debug/dist/NightlyDebug.app/Contents/MacOS/components/PeerConnection.js :: PeerConnection.prototype._executeNext :: line 321"  data: no]
 0:03.89 ************************************************************

**) TEST-UNEXPECTED-FAIL| Shutdown | Exited with code 1 during test run

Interestingly, 802982.html never calls mozRTCPeerConnection, just mozGetUserMedia.
801227.html calls both.
Assignee: rjesup → jib
Forgot to say that for 826044.html I tried setTimeout values of 3500m, 10 ms, 1 ms and directly calling the function without setTimeout(). No leaks.
> 0:03.89 * Call to xpconnect wrapped JSObject produced this error:  *
> 0:03.89 [Exception... "Component returned failure code: 0x80004005 
>   (NS_ERROR_FAILURE) [IPeerConnection.addStream]"  nsresult: "0x80004005 

This is from pc.addStream(undefined). We could probably validate this input synchronously and throw a nicer error.

> Interestingly, 802982.html never calls mozRTCPeerConnection, just mozGetUserMedia.
> 801227.html calls both.

And if I comment out peerconnection from 801227.html then it leaks the same.

To summarize, all I need to make it leak every time is this:

<!DOCTYPE HTML>
<html class="reftest-wait">
<!-- https://bugzilla.mozilla.org/show_bug.cgi?id=801227 -->
<head>
  <meta charset="utf-8">
  <title>Abort due to page reload</title>
  <script type="application/javascript">
    <!-- If I remove the localStorage code then it doesn't leak -->
    var index = localStorage.index || 0;
    if (index < 3) {
      localStorage.index = index + 1;
      window.location.reload();
    }
    function finish() {
      delete localStorage["index"];
      document.documentElement.removeAttribute("class");
    }
    navigator.mozGetUserMedia({ audio: true, fake: true }, function (aStream) {
      finish();
    }, finish);
  </script>
</head>
<body>
</body>
</html>

or this:

<!DOCTYPE html>
<html>
<!-- https://bugzilla.mozilla.org/show_bug.cgi?id=802982 -->
<head>
  <meta charset="utf-8">
  <title>Excessive getUserMedia calls</title>
  <script type="application/javascript">
  function boom() {
    navigator.mozGetUserMedia({}, {}, {});
  }
  </script>
</head>
<body onload="boom();"></body>
</html>

Ergo, I'm seeing leaks from mozGetUserMedia, not mozRTCPeerConnection.
Blocks: 796463
Making this a blocker to pref'ing on -- at least until we understand what's going on here: are they true leaks in the core code or are they artifacts of the tests themselves?
I can make them happen in a regular browser debug session.
Attached patch enable WebRTC crashtests (obsolete) — Splinter Review
We'll file 3 bugs: one for leaks from bug 802982 if the permission UI is enabled; one for leaks in 801227, and one for Javascript: Security error, line 14 in 822197
Attachment #714615 - Flags: review?(jsmith)
the change to 802982.html is because there was a funky 16-bit char at the start
Comment on attachment 714615 [details] [diff] [review]
enable WebRTC crashtests

Review of attachment 714615 [details] [diff] [review]:
-----------------------------------------------------------------

review+ with one small nit above comments.

::: dom/media/tests/crashtests/crashtests.list
@@ +3,5 @@
>  load 780790.html
>  load 791270.html
>  load 791278.html
>  load 791330.html
>  load 799419.html

I'd actually leave both of these tests commented out with a comment above explaining why we've prefed them off. Possibly marking a TODO pointing to the bug that's being fixed here.
Attachment #714615 - Flags: review?(jsmith) → review+
Comment on attachment 714615 [details] [diff] [review]
enable WebRTC crashtests

Review of attachment 714615 [details] [diff] [review]:
-----------------------------------------------------------------

Actually, wait a sec. Change that - you need to pref this on in the global crash test list as well. review-
Attachment #714615 - Flags: review+ → review-
(In reply to Jason Smith [:jsmith] from comment #20)
> Comment on attachment 714615 [details] [diff] [review]
> enable WebRTC crashtests
> 
> Review of attachment 714615 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Actually, wait a sec. Change that - you need to pref this on in the global
> crash test list as well. review-

Specifically you need to modify "testing\crsahtest\crashtests.list" to have our tests prefed on by default:

# Bug 801682 - Because of leaks we can't enable media crashtests yet
# Bug 811873 - mozRTCPeerConnection doesn't support remote browser yet
# skip-if(browserIsRemote) include ../../dom/media/tests/crashtests/crashtests.list

^^^ Change that line
Attached patch enable WebRTC crashtests (obsolete) — Splinter Review
Attachment #714615 - Attachment is obsolete: true
Attached patch enable WebRTC crashtests (obsolete) — Splinter Review
Attachment #714630 - Attachment is obsolete: true
Attachment #714631 - Flags: review?(jsmith)
Comment on attachment 714631 [details] [diff] [review]
enable WebRTC crashtests

Review of attachment 714631 [details] [diff] [review]:
-----------------------------------------------------------------

review+ - small nit

::: testing/crashtest/crashtests.list
@@ -26,5 @@
>  include ../../dom/bindings/crashtests/crashtests.list
>  include ../../dom/indexedDB/crashtests/crashtests.list
>  
> -# Bug 801682 - Because of leaks we can't enable media crashtests yet
> -# Bug 811873 - mozRTCPeerConnection doesn't support remote browser yet

I'd still keep the comment about "remote browser" around here. The other is fine to get rid of
Attachment #714631 - Flags: review?(jsmith) → review+
Blocks: 841958
Blocks: 841940
Blocks: 841930
Depends on: 839647
Depends on: 841981
Added the two dependencies that are causing failures. The third failure though on that list I'm puzzled about - 802982.html appears to be failing to load. Don't know why...
Attachment #697187 - Attachment is obsolete: true
(In reply to Jason Smith [:jsmith] from comment #28)
> Added the two dependencies that are causing failures. The third failure
> though on that list I'm puzzled about - 802982.html appears to be failing to
> load. Don't know why...

I have a suspicion that there's a time limit to load, and that test does a ton of getUserMedias in the on-load.  Switching to reftest-wait and re-Trying
Still retriggering, but looking good with reftest-wait:  compare
https://tbpl.mozilla.org/?tree=Try&rev=d6e178388988
to
https://tbpl.mozilla.org/?tree=Try&rev=f4a2fe79e180
(the one that fails to load it appears always to be 10.8 opt)

(the other oranges are all known; I haven't bothered starring)
Ok, we now have a fail-to-load, but on a different test - bu again one that does a ton of stuff all inside onload.  This should be a indicator to us that's a bad pattern.  Will retry with that fixed too (this one is reftest-wait, but does everything in onload());
Ok, we're looking quite green on this:

https://tbpl.mozilla.org/?tree=Try&rev=a0bf58900913
whole bunch of fixes for the crashtests.  A bunch needed to be reftest-wait - plain reeftest with tests that take noticable time (at least under load) breaks the onload timeout in crashtests.  a couple more broke due to bug 842075; adding temporary try {}'s fixes those.  799419 wasn't in the last try push (converted to reftest-wait), so I pushed another to be sure, and if it fails I'll re-remove it.
this one actually will enable them (and not on android/b2g)
Attachment #714845 - Attachment description: enable WebRTC crashtests → Fix Webrtc crashtests, disable known permaoranges
Attachment #714845 - Flags: review?(jsmith)
Attachment #714847 - Flags: review?(jsmith)
Attachment #714631 - Attachment is obsolete: true
Assignee: jib → rjesup
799419 needed finish() of course...
Attachment #714845 - Attachment is obsolete: true
Attachment #714845 - Flags: review?(jsmith)
Attachment #714847 - Attachment is obsolete: true
Attachment #714847 - Flags: review?(jsmith)
Attachment #714859 - Flags: review?(jsmith)
Keywords: mlk
Comment on attachment 714847 [details] [diff] [review]
enable WebRTC crashtests

Oops, bzexport wasn't helpful here
Attachment #714847 - Attachment is obsolete: false
Attachment #714847 - Flags: review?(jsmith)
Attachment #714859 - Attachment description: enable WebRTC crashtests → Fix Webrtc crashtests, disable known permaoranges
and one more test that needed the same treatment.  When too much is done in onload, the results are random and depend on what other tests you've run in the list.  Also, this needed the try{} around addIceCandidate.  New try pushed
Attachment #714859 - Attachment is obsolete: true
Attachment #714859 - Flags: review?(jsmith)
Attachment #714870 - Flags: review?(jsmith)
Attachment #714847 - Flags: review?(jsmith) → review+
Comment on attachment 714870 [details] [diff] [review]
Fix Webrtc crashtests, disable known permaoranges

Review of attachment 714870 [details] [diff] [review]:
-----------------------------------------------------------------

review- only for 802982.html

::: dom/media/tests/crashtests/802982.html
@@ +17,5 @@
>      for (var j = 0; j < 100; ++j) {
>        navigator.mozGetUserMedia({}, {}, {});
>      }
>    }
> +  finish(); // we're not waiting for success/error callbacks here

Why is this outside of the boom function? Shouldn't this be at the end of the boom function?
Attachment #714870 - Flags: review?(jsmith) → review-
Attachment #714870 - Attachment is obsolete: true
Comment on attachment 714924 [details] [diff] [review]
Fix Webrtc crashtests, disable known permaoranges

Yup - oversight (that turns out not to matter on Try...)
Attachment #714924 - Flags: review?(jsmith)
I've repushed the try to verify it's ok with this change since the test wasn't getting real coverage before (though it was for most of my tries; this was a later change).
Attachment #714924 - Flags: review?(jsmith) → review+
https://hg.mozilla.org/mozilla-central/rev/31fa1546a849
https://hg.mozilla.org/mozilla-central/rev/8e9f7c541947
Status: NEW → RESOLVED
Closed: 11 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Whiteboard: [WebRTC][blocking-webrtc+][automation-blocked] → [WebRTC][blocking-webrtc+][automation-blocked][qa-]
Whiteboard: [WebRTC][blocking-webrtc+][automation-blocked][qa-] → [WebRTC][blocking-webrtc+][qa-]
You need to log in before you can comment on or make changes to this bug.