Closed Bug 1073352 Opened 10 years ago Closed 10 years ago

mochitests in debug builds with e10s enabled report window and docshell leaks

Categories

(Core :: DOM: Content Processes, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
e10s m4+ ---

People

(Reporter: markh, Assigned: mccr8)

References

(Blocks 3 open bugs)

Details

(Whiteboard: [MemShrink:P2])

Attachments

(3 files, 4 obsolete files)

If you have a look at the "holly" branch on TBPL (which runs all tests with e10s enabled) you will see many windows and OSX tests fail due to leaks reported by the parent process.  (It seems leak reporting in the content process is currently disabled - bug 1051230).

To see these failures, just look at holly on TBPL and select almost any of the tests that are orange in the debug builds but green in the release builds for the same platform - the most comment leak is along the lines of "leaked 1 window(s) until shutdown [url = about:blank]".  These leaks are generally reproducible locally by running the mochitests with the --e10s command-line option.

There is some speculation that the abrupt termination of the child process might be causing these leaks - in which case the possibility exists they are *real* leaks rather than simply an issue with leak detection code.  If they are real leaks we probably need to get a handle of them, so requesting e10s tracking.
Whiteboard: [memshrink] → [MemShrink:P2]
Assignee: nobody → wmccloskey
Eddy has been working recently on tracking a leak in our devtools debugger tests with E10S and he finally ended up with a minimal test case that doesn't even opens the devtools and still leaks.
I too was looking at a leak in the devtools inspector tests today with E10S and found out too that this wasn't due to the inspector itself.
I think this bug is what we're seeing.
Blocks: 1094749
No longer blocks: 1094749
Depends on: 1096614
I'm not entirely sure if this is a dupe, but it looks similar.  The leaked until shutdown things are all in the child process, not the parent.  This is hopefully just an instance of the GC/CC not running enough due to another leak test being broken.
Summary: mochitests in debug builds with e10s enabled report leaks from the parent process. → mochitests in debug builds with e10s enabled report window and docshell leaks
We should report the type of process, or at least the PID, that leaked the docshells.  That's in the ++-- info, so it should just require a little bit of test harness hacking.
Blocks: 1090929
Blocks: 1103839
Any progress on this Bill? This is blocking us from enabling the devtools tests for debug e10s builds.
Flags: needinfo?(wmccloskey)
(In reply to Andrew McCreight [:mccr8] from comment #4)
> We should report the type of process, or at least the PID, that leaked the
> docshells.  That's in the ++-- info, so it should just require a little bit
> of test harness hacking.

I filed bug 1118852 for that.

(In reply to Eddy Bruel [:ejpbruel] from comment #5)
> Any progress on this Bill? This is blocking us from enabling the devtools
> tests for debug e10s builds.

This is next on my list, after the XPCOM leak detector is enabled for e10s, assuming Bill is okay with that.
Assignee: wmccloskey → continuation
Yeah, please feel free to take this Andrew.
Flags: needinfo?(wmccloskey)
This is Eddy's minimal test example from bug 1094749, just rebased.
Thanks for the update fellows!
I fixed some of the leaks in the test case by adding a bunch of calls to
  Services.obs.notifyObservers(null, "child-mmu-request", null);
in various places, but not all of them.

I suspect the problem is that because we're triggering collections in the child process via a message, they are asynchronous, so the parent ends up printing out the "TEST-START | Shutdown" message that the leak detector uses to decide that everything should be cleaned up by now before the child process is done collecting things.

Maybe I could do something like print out with process is shutting down, then log a message in the child once the shutdown collecting is done?  Then the leak detector could have different end points for different processes.  I'm not sure what test harness code exactly runs in the child process.
What I'd like to do is to set up some testing-only code that listens for a message from the parent process, and have that run once per process, rather than once per test.  For non-test code, it seems like I can do that by adding something to browser/base/content/content.js, but I'm not sure how to do that for testing code.  Do you have any idea, Joel?
Flags: needinfo?(jmaher)
If I set things up to actually GC/CC in the child, and print out when we're done, than it looks like that fixes the DOMWINDOW leaks.  We're still left with a few DOCSHELLs that don't die until a little later.  Given that I've seen a number of weird leak issues with how the content child stuff holds onto docshells, this is probably a legit thing.  Once I get the DOMWINDOW stuff cleaned up more I'll look into why the docshells are alive.
for testing only code I can think of either:
* something which is in the main product but turned on via a pref
* something added to each harness (i.e. mochitest, reftest, marionette, xpcshell)

Obviously adding something to the product would be simpler, we could set a pref or two to control it and have some api calls to get status and clear, etc.

In the harnesses, lets look at mochitest plain.  We run the browser with a single instance of TestRunner [1] for the browser session.  Ideally you could attach something there.  

[1] https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/tests/SimpleTest/TestRunner.js
Flags: needinfo?(jmaher)
Ok, thanks.  I'm actually just interested in mochitest browser chrome for now.  I can't tell what the equivalent there would be.
for browser-chrome, this is the entry point for the main window that controls the full session:
http://hg.mozilla.org/mozilla-central/file/tip/testing/mochitest/browser-test.js
This adds a new frame script thing that listens for a particular message, that gets sent by the browser test code.

When it gets that message, it runs GC+CC, then prints out its PID and that it is done.

The test harness then can check per-process whether the collection has finished yet.

For the test case, this eliminates the DOMWINDOW leaks.

There are still three DOCSHELL leaks, maybe because TabChild or somebody is holding them alive. I need to investigate those.
Depends on: 1129755
Depends on: 1130128
Blocks: 1096614
No longer depends on: 1096614
The old version was causing leaks because the precise GC runnable was created in the frame script, which kept alive the compartment of the frame script.

Bill helped me with this new version, which instead creates the runnable in a .jsm, so it doesn't keep things alive.

I also had to add an additional CC in the precise GC runnable to ensure that some meta element was cleaned up quickly enough.

I still need to clean this patch up, but it works well enough to enable at least some dev tools tests.
Attachment #8556081 - Attachment is obsolete: true
I have this sort of working, but I added a check to ensure that we actually get a Completed ShutdownLeaks collections message in each process we created a window or docshell in, but this causes a lot of failures.  I think the issue is that some tests (such as browser/base/content/test/newtab/browser_newtab_background_captures.js ) create their own child processes, which ends up being shut down before we send the "browser-test:collect-request" message, which I think happens around when we decide to shut down the parent process.  This means that if we leak-until-shutdown a window or docshell for one of those processes, we won't detect it.  So maybe we need to do some kind of pre-pre-XPCOM shutdown thing that runs this code.

Anyways, we're not detecting these leaks right now anyways, and I think it will work okay-ish for e10s where the content process lives about as long as the chrome process, so I'll probably leave dealing with that for followup work.

try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c95644d5a5f6
This reworks how the Mochitest DOMWINDOW and DOCSHELL leak detector works. Rather than
collecting immediately in the top-level script, it sends a message to all processes
telling them to carry out collections. Each process prints out a message when it has
finished the collections. This message is used by the test harness to decide when windows
and docshells for that process should be have been destroyed.
Attachment #8563688 - Flags: review?(jmaher)
This enables a few of the debugger tests. The first test,
browser_dbg_aaa_run_first_leaktest.js, is intentionally disabled,
because it appears to have a legitimate leak.
Attachment #8563689 - Flags: review?(ejpbruel)
Attachment #8560766 - Attachment is obsolete: true
Attachment #8560767 - Attachment is obsolete: true
Attachment #8545411 - Attachment is obsolete: true
Blocks: 1132717
Comment on attachment 8563688 [details] [diff] [review]
part 1 - Make the DOMWINDOW and DOCSHELL leak detector work with e10s.

Review of attachment 8563688 [details] [diff] [review]:
-----------------------------------------------------------------

code wise I don't see any issues here.  How will this work on b2g and android?  Does this work well in single process land?
Attachment #8563688 - Flags: review?(jmaher) → review+
(In reply to Joel Maher (:jmaher) from comment #22)
> code wise I don't see any issues here.  How will this work on b2g and
> android?  Does this work well in single process land?

The frame script is run on every process, including the main one, so that should handle single process just fine.  I think Android will just work like desktop single process.  I'll take a look at a log from my try run to see what is going on with B2G.
Backed out for causing bc3 docshell leaks on Windows and OSX.

https://hg.mozilla.org/integration/mozilla-inbound/rev/44d89ddc145a
Bah, those failures were in my full try push, too, I just didn't notice it because it was listed after failures from another patch I didn't land.
Comment on attachment 8563689 [details] [diff] [review]
part 2 - Enable some devtools tests.

Review of attachment 8563689 [details] [diff] [review]:
-----------------------------------------------------------------

Nice. That's a lot of debugger tests!

Patch LGTM, provided it is green on try ofc.
Attachment #8563689 - Flags: review?(ejpbruel) → review+
The sequence for the last 3 releases of the last doc shell seems to go something like this:

1. We force a CC before we print the message, which destroys an nsHTMLDocument, which in turn causes an HttpChannelChild to die, which does a release on the docshell, putting us at a refcount of 2.
2. From a MessageChannel::DispatchAsyncMessage, we end up in HttpChannelChild::OnStopRequest, which eventually causes us to destroy a nsDocumentOpenInfo, which does another release on the docshell, putting us at a refcount of 1.
3. MessageChannel::NotifyMaybeChannelError() calls into PContentChild::OnChannelClose(), which then seems to start tearing down IPDL stuff, which eventually ends up destroying an HttpChannelChild, which finally ends up destroying the docshell.

The problem is that the message is printed out after 1, so we report this docshell as a leak.  Maybe if I rejigger where the shutdown CCs are done to somehow be after 3, when the ContentChild somehow knows it is going away, then that will fix this (and the other problems I've been having).
The OnChannelClose() doesn't seem to be happening until XRE_ShutdownChildProcess(), so I think that just moving around where do GC/CCs and print out the message isn't going to be enough.
Blocks: 1135956
Blocks: 1135958
In non-e10s mode, the thumbnail tests create a new child process.  The lifetime of these is unrelated to when the test finishes, so the current method in my patch does not work right, either by leaking when it isn't clear if they are real, or by not running the checker at all.  This side-steps those issues by not running the DOM window leak checker in child processes if we're not running in e10s mode.  It still runs in the parent process.  I filed some followup bugs to look into this, but I think they are lower priority than other e10s leak work, as the thumbnail processes likely do not live very long.

try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=73813dfbb555
Attachment #8568579 - Flags: review?(jmaher)
Comment on attachment 8568579 [details] [diff] [review]
part 1b - In non-e10s, only run the shutdown leak detector in the parent process.

Review of attachment 8568579 [details] [diff] [review]:
-----------------------------------------------------------------

this is only for browser-chrome.  Do we have a solution for mochitest-plain/chrome?
Attachment #8568579 - Flags: review?(jmaher) → review+
(In reply to Joel Maher (:jmaher) from comment #31)
> this is only for browser-chrome.  Do we have a solution for
> mochitest-plain/chrome?

I think this leak detector only runs in browser-chrome.  Maybe it would be worth thinking about using in other test frame works.
ok, thought I would ask for completeness :)
Status: NEW → RESOLVED
Closed: 10 years ago
Keywords: leave-open
Resolution: --- → FIXED
Blocks: 1139021
Blocks: 1120569
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: