Closed Bug 1073352 Opened 10 years ago Closed 10 years ago

mochitests in debug builds with e10s enabled report window and docshell leaks

Tracking

()

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

e10s

m4+

---

People

(Reporter: markh, Assigned: mccr8)

References

(Blocks 3 open bugs)

Details

(Whiteboard: [MemShrink:P2])

Attachments

(3 files, 4 obsolete files)

small failing example. not for landing. 10 years ago Andrew McCreight [:mccr8] 3.17 KB, patch		Details \| Diff \| Splinter Review
Make the DOMWINDOW and DOCSHELL leak detector work with e10s. WIP 10 years ago Andrew McCreight [:mccr8] 8.40 KB, patch		Details \| Diff \| Splinter Review
part 1 - Make the DOMWINDOW and DOCSHELL leak detector work with e10s. WIP 10 years ago Andrew McCreight [:mccr8] 9.06 KB, patch		Details \| Diff \| Splinter Review
part 2 - Enable some devtools tests. 10 years ago Andrew McCreight [:mccr8] 2.84 KB, patch		Details \| Diff \| Splinter Review
part 1 - Make the DOMWINDOW and DOCSHELL leak detector work with e10s. 10 years ago Andrew McCreight [:mccr8] 8.80 KB, patch	jmaher : review+	Details \| Diff \| Splinter Review
part 2 - Enable some devtools tests. 10 years ago Andrew McCreight [:mccr8] 3.02 KB, patch	ejpbruel : review+	Details \| Diff \| Splinter Review
part 1b - In non-e10s, only run the shutdown leak detector in the parent process. 10 years ago Andrew McCreight [:mccr8] 1.66 KB, patch	jmaher : review+	Details \| Diff \| Splinter Review

Mark Hammond [:markh] [:mhammond]

Reporter

Description

•

10 years ago

If you have a look at the "holly" branch on TBPL (which runs all tests with e10s enabled) you will see many windows and OSX tests fail due to leaks reported by the parent process.  (It seems leak reporting in the content process is currently disabled - bug 1051230).

To see these failures, just look at holly on TBPL and select almost any of the tests that are orange in the debug builds but green in the release builds for the same platform - the most comment leak is along the lines of "leaked 1 window(s) until shutdown [url = about:blank]".  These leaks are generally reproducible locally by running the mochitests with the --e10s command-line option.

There is some speculation that the abrupt termination of the child process might be causing these leaks - in which case the possibility exists they are *real* leaks rather than simply an issue with leak detection code.  If they are real leaks we probably need to get a handle of them, so requesting e10s tracking.

Brad Lassey [:blassey] (use needinfo?)

Updated

•

10 years ago

tracking-e10s: ? → m5+

Brad Lassey [:blassey] (use needinfo?)

Updated

•

10 years ago

tracking-e10s: m5+ → m4+

Nicholas Nethercote [inactive]

Updated

•

10 years ago

Whiteboard: [memshrink] → [MemShrink:P2]

Brad Lassey [:blassey] (use needinfo?)

Updated

•

10 years ago

Assignee: nobody → wmccloskey

Patrick Brosset <:pbro>

Comment 1

•

10 years ago

Eddy has been working recently on tracking a leak in our devtools debugger tests with E10S and he finally ended up with a minimal test case that doesn't even opens the devtools and still leaks.
I too was looking at a leak in the devtools inspector tests today with E10S and found out too that this wasn't due to the inspector itself.
I think this bug is what we're seeing.

Eddy Bruel [:ejpbruel]

Updated

•

10 years ago

Blocks: 1094749

Andrew McCreight [:mccr8]

Assignee

Updated

•

10 years ago

No longer blocks: 1094749

Depends on: 1096614

Andrew McCreight [:mccr8]

Assignee

Comment 3

•

10 years ago

I'm not entirely sure if this is a dupe, but it looks similar.  The leaked until shutdown things are all in the child process, not the parent.  This is hopefully just an instance of the GC/CC not running enough due to another leak test being broken.

Summary: mochitests in debug builds with e10s enabled report leaks from the parent process. → mochitests in debug builds with e10s enabled report window and docshell leaks

Andrew McCreight [:mccr8]

Assignee

Comment 4

•

10 years ago

We should report the type of process, or at least the PID, that leaked the docshells.  That's in the ++-- info, so it should just require a little bit of test harness hacking.

Eddy Bruel [:ejpbruel]

Updated

•

10 years ago

Blocks: 1090929

Eddy Bruel [:ejpbruel]

Updated

•

10 years ago

Blocks: 1103839

Eddy Bruel [:ejpbruel]

Comment 5

•

10 years ago

Any progress on this Bill? This is blocking us from enabling the devtools tests for debug e10s builds.

Flags: needinfo?(wmccloskey)

Andrew McCreight [:mccr8]

Assignee

Comment 6

•

10 years ago

(In reply to Andrew McCreight [:mccr8] from comment #4)
> We should report the type of process, or at least the PID, that leaked the
> docshells.  That's in the ++-- info, so it should just require a little bit
> of test harness hacking.

I filed bug 1118852 for that.

(In reply to Eddy Bruel [:ejpbruel] from comment #5)
> Any progress on this Bill? This is blocking us from enabling the devtools
> tests for debug e10s builds.

This is next on my list, after the XPCOM leak detector is enabled for e10s, assuming Bill is okay with that.

Assignee: wmccloskey → continuation

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 7

•

10 years ago

Yeah, please feel free to take this Andrew.

Flags: needinfo?(wmccloskey)

Andrew McCreight [:mccr8]

Assignee

Comment 8

•

10 years ago

Attached patch small failing example. not for landing. (obsolete) — Details — Splinter Review

This is Eddy's minimal test example from bug 1094749, just rebased.

Eddy Bruel [:ejpbruel]

Comment 9

•

10 years ago

Thanks for the update fellows!

Andrew McCreight [:mccr8]

Assignee

Comment 10

•

10 years ago

I fixed some of the leaks in the test case by adding a bunch of calls to
  Services.obs.notifyObservers(null, "child-mmu-request", null);
in various places, but not all of them.

I suspect the problem is that because we're triggering collections in the child process via a message, they are asynchronous, so the parent ends up printing out the "TEST-START | Shutdown" message that the leak detector uses to decide that everything should be cleaned up by now before the child process is done collecting things.

Maybe I could do something like print out with process is shutting down, then log a message in the child once the shutdown collecting is done?  Then the leak detector could have different end points for different processes.  I'm not sure what test harness code exactly runs in the child process.

Andrew McCreight [:mccr8]

Assignee

Comment 11

•

10 years ago

What I'd like to do is to set up some testing-only code that listens for a message from the parent process, and have that run once per process, rather than once per test.  For non-test code, it seems like I can do that by adding something to browser/base/content/content.js, but I'm not sure how to do that for testing code.  Do you have any idea, Joel?

Flags: needinfo?(jmaher)

Andrew McCreight [:mccr8]

Assignee

Comment 12

•

10 years ago

If I set things up to actually GC/CC in the child, and print out when we're done, than it looks like that fixes the DOMWINDOW leaks.  We're still left with a few DOCSHELLs that don't die until a little later.  Given that I've seen a number of weird leak issues with how the content child stuff holds onto docshells, this is probably a legit thing.  Once I get the DOMWINDOW stuff cleaned up more I'll look into why the docshells are alive.

Joel Maher ( :jmaher ) (UTC -8)

Comment 13

•

10 years ago

for testing only code I can think of either:
* something which is in the main product but turned on via a pref
* something added to each harness (i.e. mochitest, reftest, marionette, xpcshell)

Obviously adding something to the product would be simpler, we could set a pref or two to control it and have some api calls to get status and clear, etc.

In the harnesses, lets look at mochitest plain.  We run the browser with a single instance of TestRunner [1] for the browser session.  Ideally you could attach something there.  

[1] https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/tests/SimpleTest/TestRunner.js

Flags: needinfo?(jmaher)

Andrew McCreight [:mccr8]

Assignee

Comment 14

•

10 years ago

Ok, thanks.  I'm actually just interested in mochitest browser chrome for now.  I can't tell what the equivalent there would be.

Joel Maher ( :jmaher ) (UTC -8)

Comment 15

•

10 years ago

for browser-chrome, this is the entry point for the main window that controls the full session:
http://hg.mozilla.org/mozilla-central/file/tip/testing/mochitest/browser-test.js

Andrew McCreight [:mccr8]

Assignee

Comment 16

•

10 years ago

Attached patch Make the DOMWINDOW and DOCSHELL leak detector work with e10s. WIP (obsolete) — Details — Splinter Review

This adds a new frame script thing that listens for a particular message, that gets sent by the browser test code.

When it gets that message, it runs GC+CC, then prints out its PID and that it is done.

The test harness then can check per-process whether the collection has finished yet.

For the test case, this eliminates the DOMWINDOW leaks.

There are still three DOCSHELL leaks, maybe because TabChild or somebody is holding them alive. I need to investigate those.

Andrew McCreight [:mccr8]

Assignee

Updated

•

10 years ago

Depends on: 1129755

Andrew McCreight [:mccr8]

Assignee

Updated

•

10 years ago

Depends on: 1130128

Andrew McCreight [:mccr8]

Assignee

Updated

•

10 years ago

Blocks: 1096614

No longer depends on: 1096614

Andrew McCreight [:mccr8]

Assignee

Comment 17

•

10 years ago

Attached patch part 1 - Make the DOMWINDOW and DOCSHELL leak detector work with e10s. WIP (obsolete) — Details — Splinter Review

The old version was causing leaks because the precise GC runnable was created in the frame script, which kept alive the compartment of the frame script.

Bill helped me with this new version, which instead creates the runnable in a .jsm, so it doesn't keep things alive.

I also had to add an additional CC in the precise GC runnable to ensure that some meta element was cleaned up quickly enough.

I still need to clean this patch up, but it works well enough to enable at least some dev tools tests.

Andrew McCreight [:mccr8]

Assignee

Comment 18

•

10 years ago

Attached patch part 2 - Enable some devtools tests. (obsolete) — Details — Splinter Review

Attachment #8556081 - Attachment is obsolete: true

Andrew McCreight [:mccr8]

Assignee

Comment 19

•

10 years ago

I have this sort of working, but I added a check to ensure that we actually get a Completed ShutdownLeaks collections message in each process we created a window or docshell in, but this causes a lot of failures.  I think the issue is that some tests (such as browser/base/content/test/newtab/browser_newtab_background_captures.js ) create their own child processes, which ends up being shut down before we send the "browser-test:collect-request" message, which I think happens around when we decide to shut down the parent process.  This means that if we leak-until-shutdown a window or docshell for one of those processes, we won't detect it.  So maybe we need to do some kind of pre-pre-XPCOM shutdown thing that runs this code.

Anyways, we're not detecting these leaks right now anyways, and I think it will work okay-ish for e10s where the content process lives about as long as the chrome process, so I'll probably leave dealing with that for followup work.

try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c95644d5a5f6

Andrew McCreight [:mccr8]

Assignee

Comment 20

•

10 years ago

Attached patch part 1 - Make the DOMWINDOW and DOCSHELL leak detector work with e10s. — Details — Splinter Review

This reworks how the Mochitest DOMWINDOW and DOCSHELL leak detector works. Rather than
collecting immediately in the top-level script, it sends a message to all processes
telling them to carry out collections. Each process prints out a message when it has
finished the collections. This message is used by the test harness to decide when windows
and docshells for that process should be have been destroyed.

Attachment #8563688 - Flags: review?(jmaher)

Andrew McCreight [:mccr8]

Assignee

Comment 21

•

10 years ago

Attached patch part 2 - Enable some devtools tests. — Details — Splinter Review

This enables a few of the debugger tests. The first test,
browser_dbg_aaa_run_first_leaktest.js, is intentionally disabled,
because it appears to have a legitimate leak.

Attachment #8563689 - Flags: review?(ejpbruel)

Andrew McCreight [:mccr8]

Assignee

Updated

•

10 years ago

Attachment #8560766 - Attachment is obsolete: true

Andrew McCreight [:mccr8]

Assignee

Updated

•

10 years ago

Attachment #8560767 - Attachment is obsolete: true

Andrew McCreight [:mccr8]

Assignee

Updated

•

10 years ago

Attachment #8545411 - Attachment is obsolete: true

Andrew McCreight [:mccr8]

Assignee

Updated

•

10 years ago

Blocks: 1132717

Joel Maher ( :jmaher ) (UTC -8)

Comment 22

•

10 years ago

Comment on attachment 8563688 [details] [diff] [review]
part 1 - Make the DOMWINDOW and DOCSHELL leak detector work with e10s.

Review of attachment 8563688 [details] [diff] [review]:
-----------------------------------------------------------------

code wise I don't see any issues here.  How will this work on b2g and android?  Does this work well in single process land?

Attachment #8563688 - Flags: review?(jmaher) → review+

Andrew McCreight [:mccr8]

Assignee

Comment 23

•

10 years ago

(In reply to Joel Maher (:jmaher) from comment #22)
> code wise I don't see any issues here.  How will this work on b2g and
> android?  Does this work well in single process land?

The frame script is run on every process, including the main one, so that should handle single process just fine.  I think Android will just work like desktop single process.  I'll take a look at a log from my try run to see what is going on with B2G.

Andrew McCreight [:mccr8]

Assignee

Comment 24

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/1ffecdc47c6d

Keywords: leave-open

Andrew McCreight [:mccr8]

Assignee

Comment 25

•

10 years ago

Backed out for causing bc3 docshell leaks on Windows and OSX.

https://hg.mozilla.org/integration/mozilla-inbound/rev/44d89ddc145a

Andrew McCreight [:mccr8]

Assignee

Comment 26

•

10 years ago

Bah, those failures were in my full try push, too, I just didn't notice it because it was listed after failures from another patch I didn't land.

Eddy Bruel [:ejpbruel]

Comment 27

•

10 years ago

Comment on attachment 8563689 [details] [diff] [review]
part 2 - Enable some devtools tests.

Review of attachment 8563689 [details] [diff] [review]:
-----------------------------------------------------------------

Nice. That's a lot of debugger tests!

Patch LGTM, provided it is green on try ofc.

Attachment #8563689 - Flags: review?(ejpbruel) → review+

Andrew McCreight [:mccr8]

Assignee

Comment 28

•

10 years ago

The sequence for the last 3 releases of the last doc shell seems to go something like this:

1. We force a CC before we print the message, which destroys an nsHTMLDocument, which in turn causes an HttpChannelChild to die, which does a release on the docshell, putting us at a refcount of 2.
2. From a MessageChannel::DispatchAsyncMessage, we end up in HttpChannelChild::OnStopRequest, which eventually causes us to destroy a nsDocumentOpenInfo, which does another release on the docshell, putting us at a refcount of 1.
3. MessageChannel::NotifyMaybeChannelError() calls into PContentChild::OnChannelClose(), which then seems to start tearing down IPDL stuff, which eventually ends up destroying an HttpChannelChild, which finally ends up destroying the docshell.

The problem is that the message is printed out after 1, so we report this docshell as a leak.  Maybe if I rejigger where the shutdown CCs are done to somehow be after 3, when the ContentChild somehow knows it is going away, then that will fix this (and the other problems I've been having).

Andrew McCreight [:mccr8]

Assignee

Comment 29

•

10 years ago

The OnChannelClose() doesn't seem to be happening until XRE_ShutdownChildProcess(), so I think that just moving around where do GC/CCs and print out the message isn't going to be enough.

Andrew McCreight [:mccr8]

Assignee

Updated

•

10 years ago

Blocks: 1135956

Andrew McCreight [:mccr8]

Assignee

Updated

•

10 years ago

Blocks: 1135958

Andrew McCreight [:mccr8]

Assignee

Comment 30

•

10 years ago

Attached patch part 1b - In non-e10s, only run the shutdown leak detector in the parent process. — Details — Splinter Review

In non-e10s mode, the thumbnail tests create a new child process.  The lifetime of these is unrelated to when the test finishes, so the current method in my patch does not work right, either by leaking when it isn't clear if they are real, or by not running the checker at all.  This side-steps those issues by not running the DOM window leak checker in child processes if we're not running in e10s mode.  It still runs in the parent process.  I filed some followup bugs to look into this, but I think they are lower priority than other e10s leak work, as the thumbnail processes likely do not live very long.

try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=73813dfbb555

Andrew McCreight [:mccr8]

Assignee

Updated

•

10 years ago

Attachment #8568579 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 31

•

10 years ago

Comment on attachment 8568579 [details] [diff] [review]
part 1b - In non-e10s, only run the shutdown leak detector in the parent process.

Review of attachment 8568579 [details] [diff] [review]:
-----------------------------------------------------------------

this is only for browser-chrome.  Do we have a solution for mochitest-plain/chrome?

Attachment #8568579 - Flags: review?(jmaher) → review+

Andrew McCreight [:mccr8]

Assignee

Comment 32

•

10 years ago

(In reply to Joel Maher (:jmaher) from comment #31)
> this is only for browser-chrome.  Do we have a solution for
> mochitest-plain/chrome?

I think this leak detector only runs in browser-chrome.  Maybe it would be worth thinking about using in other test frame works.

Joel Maher ( :jmaher ) (UTC -8)

Comment 33

•

10 years ago

ok, thought I would ask for completeness :)

Andrew McCreight [:mccr8]

Assignee

Comment 34

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/24c4bac4ad05
https://hg.mozilla.org/integration/mozilla-inbound/rev/7ad956fdf8c5

Ryan VanderMeulen [:RyanVM]

Comment 35

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/24c4bac4ad05
https://hg.mozilla.org/mozilla-central/rev/7ad956fdf8c5

Andrew McCreight [:mccr8]

Assignee

Updated

•

10 years ago

Status: NEW → RESOLVED

Closed: 10 years ago

Keywords: leave-open

Resolution: --- → FIXED

Andrew McCreight [:mccr8]

Assignee

Updated

•

10 years ago

Blocks: 1139021

Andrew McCreight [:mccr8]

Assignee

Updated

•

10 years ago

Blocks: 1120569

You need to log in before you can comment on or make changes to this bug.