Last Comment Bug 683953 - Browser-chrome mochitests should show statistics about leaked DOMWindows and DocShells
: Browser-chrome mochitests should show statistics about leaked DOMWindows and ...
Status: RESOLVED FIXED
[MemShrink:P1]
:
Product: Testing
Classification: Components
Component: BrowserTest (show other bugs)
: Trunk
: All All
: -- normal with 1 vote (vote)
: mozilla13
Assigned To: Tim Taubert [:ttaubert]
:
Mentors:
Depends on: LifetimeTesting 730773 740242
Blocks: 730800 SeaMonkey_bc-leaks bc-leaks 730797 772388
  Show dependency treegraph
 
Reported: 2011-09-01 10:34 PDT by Tim Taubert [:ttaubert]
Modified: 2012-07-10 04:10 PDT (History)
19 users (show)
bugzillamozillaorg_serge_20140323: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
patch v1 (15.21 KB, patch)
2011-09-01 10:34 PDT, Tim Taubert [:ttaubert]
ted: feedback-
Details | Diff | Splinter Review
patch v2 (10.52 KB, patch)
2012-01-12 10:02 PST, Tim Taubert [:ttaubert]
no flags Details | Diff | Splinter Review
patch v3 (11.83 KB, patch)
2012-01-12 14:49 PST, Tim Taubert [:ttaubert]
ted: review+
Details | Diff | Splinter Review
Part 1 - Parse log output and collect statistics about leaked DOMWindows and DocShells (11.09 KB, patch)
2012-02-17 06:52 PST, Tim Taubert [:ttaubert]
no flags Details | Diff | Splinter Review
Part 2 - Add a docShell's history ID to the debug output (1.28 KB, patch)
2012-02-17 06:56 PST, Tim Taubert [:ttaubert]
justin.lebar+bug: review+
Details | Diff | Splinter Review
Part 1 - Parse log output and collect statistics about leaked DOMWindows and DocShells (11.09 KB, patch)
2012-02-17 15:29 PST, Tim Taubert [:ttaubert]
ted: review+
Details | Diff | Splinter Review

Description Tim Taubert [:ttaubert] 2011-09-01 10:34:38 PDT
Created attachment 557565 [details] [diff] [review]
patch v1

It would be nice if we could show statistics about DOMWindows / DocShells that are kept alive until the end of the test run. That would help with detecting new and fixing existing leaks.

I attached a patch that parses the log file after a test run and assigns leaked DOMWindows (with their urls) + DocShells to the tests that created them. My python skills are not the best so don't hesitate if you have some suggestions :) An example of the patch from a try server run can be seen here:

https://tbpl.mozilla.org/php/getParsedLog.php?id=6231192&full=1
(Just search for "ShutdownLeaks")

There are some open questions, though:

1) At the moment that's a "TEST-INFO". We should possibly make that a "TEST-UNEXPECTED-FAIL" (in the far future) but there are just too many leaks at the moment.

2) We rely on the ++/--DOMWINDOW/DOCSHELL lines. They're not printed with MOZ_QUIET=1 so that's why my patch touches those lines. Maybe there's another way of getting those messages(?) but that's what my script was written for in the first place.

3) Because we rely on debug statements that means we have these statistics in debug builds only (which would be sufficient IMHO). Maybe we'd want them in opt builds, too?
Comment 1 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-09-01 13:19:56 PDT
I think a better approach would be to use the API that bug 633670 will add to test  to see if mochitest windows go away when we expect them to (idk if running GC after every test will be prohibitively time-consuming, might have to figure something out here).
Comment 2 Tim Taubert [:ttaubert] 2011-09-05 10:54:41 PDT
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #1)
> I think a better approach would be to use the API that bug 633670 will add
> to test  to see if mochitest windows go away when we expect them to (idk if
> running GC after every test will be prohibitively time-consuming, might have
> to figure something out here).

Yeah, I thought the same but running GC after each test might be somewhat time consuming. Maybe bug 633670 could also add an API to find out if an object would be collected on the next GC run? Dunno if that is feasible without a whole GC run.
Comment 3 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-09-05 14:32:06 PDT
(In reply to Tim Taubert [:ttaubert] from comment #2)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #1)
> > I think a better approach would be to use the API that bug 633670 will add
> > to test  to see if mochitest windows go away when we expect them to (idk if
> > running GC after every test will be prohibitively time-consuming, might have
> > to figure something out here).
> 
> Yeah, I thought the same but running GC after each test might be somewhat
> time consuming. Maybe bug 633670 could also add an API to find out if an
> object would be collected on the next GC run? Dunno if that is feasible
> without a whole GC run.

That's unknowable without running the GC, unfortunately.
Comment 4 Dietrich Ayala (:dietrich) 2011-09-06 21:09:18 PDT
Would be good to know if the effect of post-test-GC is going to be a factor, so we're not working directly against the GoFast team.

This patch seems to work into the harness fairly un-intrusively, which is nice.

Tim, might be good to get :dbaron's feedback review on this as well since he's written the current leak detection automation.
Comment 5 Tim Taubert [:ttaubert] 2011-09-07 06:23:38 PDT
Comment on attachment 557565 [details] [diff] [review]
patch v1

(In reply to Dietrich Ayala (:dietrich) from comment #4)
> Would be good to know if the effect of post-test-GC is going to be a factor,
> so we're not working directly against the GoFast team.

Yeah, I'm afraid that these efforts could be undermined by a post-test-GC approach.

> This patch seems to work into the harness fairly un-intrusively, which is
> nice.

All it does is parse the log after a test run. Maybe we could even disable it on the build servers (with MOZ_QUIET=1 there wouldn't be anything to parse anyway). But it's really useful for fixing leaks on your local machine. You could just run the tests in a debug build and you'd have those statistics to check your patch against. At the moment we have no leak-until-shutdown detection at all, so we're not even noticing when we just wrote some piece of leaking code that is not catched by dbaron's leak detector.

Asking dbaron for feedback on this approach.
Comment 6 Justin Lebar (not reading bugmail) 2011-09-20 14:01:52 PDT
Comment on attachment 557565 [details] [diff] [review]
patch v1

Why do you need to get rid of the MOZ_QUIET environment variable?  Presumably this isn't set on the test machines.
Comment 7 Tim Taubert [:ttaubert] 2011-09-26 16:41:54 PDT
(In reply to Justin Lebar [:jlebar] from comment #6)
> Why do you need to get rid of the MOZ_QUIET environment variable? 
> Presumably this isn't set on the test machines.

Hmmm.. I'm quite sure that it occurred in the build logs I looked at but now I can't find it anymore. So this part could be removed from the patch.
Comment 8 Ted Mielczarek [:ted.mielczarek] 2011-09-27 13:10:01 PDT
Comment on attachment 557565 [details] [diff] [review]
patch v1

Review of attachment 557565 [details] [diff] [review]:
-----------------------------------------------------------------

I was starting to do a real thorough review then I realized this was just "feedback?", so I've just put some main points here. I think overall this is not a bad approach, but I worry about whether the methodology is sound without GCing after every test. Can we reliably get all windows to go away after each test without GC?

Overall the biggest thing I'd change about this is to add some code to make the DOMWINDOW spew go to a file instead of stdout, so you don't have to muck with the stdout-handling loop as much.

::: build/automation.py.in
@@ +766,4 @@
>        (line, didTimeout) = self.readWithTimeout(logsource, timeout)
>        while line != "" and not didTimeout:
> +        if logFile:
> +          logFile.write(line)

It seems like it would be nicer to add support for an environment variable that could be set to redirect the DOMWINDOW output to a file, instead of having to peel it out of stdout. Then you could simply parse that file instead of doing this.

::: build/automationutils.py
@@ +388,5 @@
>      return ["arch", "-arch", "i386"] + cmd
>    # otherwise just execute the command normally
>    return cmd
> +
> +class ShutdownLeakParser:

This could use a short docstring comment explaining what it does.

@@ +396,5 @@
> +
> +  def __del__(self):
> +    self.logFile.close()
> +
> +  def execute(self):

Maybe call this .parse() ?

@@ +402,5 @@
> +
> +    if len(leakingTests):
> +      total = 0
> +      for test in leakingTests:
> +        total += test["leakCount"]

You could just say:
total = sum(test["leakCount"] for test in leakingTests)

@@ +407,5 @@
> +
> +      self.logger.info("TEST-INFO | ShutdownLeaks | leaked %d DOMWindows and/or DocShells until shutdown", total)
> +
> +    for test in leakingTests:
> +      self.logger.info("")

This seems weird. You're just trying to get a blank line?

::: docshell/base/nsDocShell.cpp
@@ +778,5 @@
>  
>  #ifdef DEBUG
>    // We're counting the number of |nsDocShells| to help find leaks
>    ++gNumberOfDocShells;
> +  printf("++DOCSHELL %p == %ld\n", (void*) this, gNumberOfDocShells);

Er, why are you removing this feature?

::: testing/mochitest/runtests.py
@@ +654,5 @@
>        timeout = 330.0 # default JS harness timeout is 300 seconds
>  
> +    # it's a debug build, we can parse leaked DOMWindows and docShells
> +    if Automation.IS_DEBUG_BUILD:
> +      tmpfd, logFileName = tempfile.mkstemp(suffix='debuglog')

I usually use NamedTemporaryFile, but it doesn't make a whole lot of difference, really:
http://docs.python.org/library/tempfile.html#tempfile.NamedTemporaryFile
Comment 9 Martijn Wargers [:mwargers] (not working for Mozilla) 2011-09-27 14:18:32 PDT
Fennec on Windows leaks just by opening/closing, see bug 679709. This would make it perma-orange.
Comment 10 Dietrich Ayala (:dietrich) 2011-10-20 14:54:28 PDT
IIUC, new leaks of this class could be checked-in, and there's no automation tracking and properly coloring the tree for it. Is that a correct understanding?
Comment 11 Dão Gottwald [:dao] 2011-10-20 23:18:53 PDT
(In reply to Dietrich Ayala (:dietrich) from comment #10)
> IIUC, new leaks of this class could be checked-in, and there's no automation
> tracking and properly coloring the tree for it. Is that a correct
> understanding?

yes
Comment 12 Dietrich Ayala (:dietrich) 2011-10-21 08:14:19 PDT
Memshrink team, can we make this P1? Your progress could be being rolled back on a daily basis, since we don't actually catch these regressions. Dao and Tim have been periodically doing these manually, but there's a high human cost, and potential to miss regressions.

Now that we have other trees to fail on, turning this on should not result in an m-c perma-shutdown as long as we quickly identify and mark the permaoranges (which we could corral owners or assign Memshrinkers to work on).
Comment 13 Tim Taubert [:ttaubert] 2012-01-12 10:02:36 PST
Created attachment 588084 [details] [diff] [review]
patch v2

(In reply to Ted Mielczarek [:ted, :luser] from comment #8)
> I was starting to do a real thorough review then I realized this was just
> "feedback?", so I've just put some main points here. I think overall this is
> not a bad approach, but I worry about whether the methodology is sound
> without GCing after every test. Can we reliably get all windows to go away
> after each test without GC?

This patch just tracks windows and docShells created by tests and checks if they're still around after the GC that happens on test suite shutdown. So we don't need a GC after each test.

> Overall the biggest thing I'd change about this is to add some code to make
> the DOMWINDOW spew go to a file instead of stdout, so you don't have to muck
> with the stdout-handling loop as much.

I tried using the NSPR logging facility but that didn't work out well. I need the '++DOMWINDOW/DOCSHELL' lines in sync with the actual 'TEST-START/FINISH' lines. So there's only this one possibility to read our own stdout.

What I changed is that instead of a logFileName argument the patch now passes a logger argument. So we don't need to save the whole stdout to a file and read it again but rather we can pass in a logger that parses every line when it's read from stdout.

> ::: build/automationutils.py
> @@ +388,5 @@
> >      return ["arch", "-arch", "i386"] + cmd
> >    # otherwise just execute the command normally
> >    return cmd
> > +
> > +class ShutdownLeakParser:
> 
> This could use a short docstring comment explaining what it does.

Done.

> @@ +396,5 @@
> > +
> > +  def __del__(self):
> > +    self.logFile.close()
> > +
> > +  def execute(self):
> 
> Maybe call this .parse() ?

Done.

> @@ +402,5 @@
> > +
> > +    if len(leakingTests):
> > +      total = 0
> > +      for test in leakingTests:
> > +        total += test["leakCount"]
> 
> You could just say:
> total = sum(test["leakCount"] for test in leakingTests)

Done.

> @@ +407,5 @@
> > +
> > +      self.logger.info("TEST-INFO | ShutdownLeaks | leaked %d DOMWindows and/or DocShells until shutdown", total)
> > +
> > +    for test in leakingTests:
> > +      self.logger.info("")
> 
> This seems weird. You're just trying to get a blank line?

Yeah... fixed.

> ::: docshell/base/nsDocShell.cpp
> @@ +778,5 @@
> >  
> >  #ifdef DEBUG
> >    // We're counting the number of |nsDocShells| to help find leaks
> >    ++gNumberOfDocShells;
> > +  printf("++DOCSHELL %p == %ld\n", (void*) this, gNumberOfDocShells);
> 
> Er, why are you removing this feature?

Not sure (anymore) what I've done there. Reverted those changes.

> ::: testing/mochitest/runtests.py
> @@ +654,5 @@
> >        timeout = 330.0 # default JS harness timeout is 300 seconds
> >  
> > +    # it's a debug build, we can parse leaked DOMWindows and docShells
> > +    if Automation.IS_DEBUG_BUILD:
> > +      tmpfd, logFileName = tempfile.mkstemp(suffix='debuglog')
> 
> I usually use NamedTemporaryFile, but it doesn't make a whole lot of
> difference, really:
> http://docs.python.org/library/tempfile.html#tempfile.NamedTemporaryFile

I removed the whole thing and all the stdout data is now passed to a logger instead of being written to a log file.

--

The current threshold is 150 leaked DOMWindows and DocShells which seems way too low. I pushed to try and we seem to have a varying count:

https://tbpl.mozilla.org/?tree=Try&rev=5aeaf8aee9fb

The lowest, OS X:
TEST-UNEXPECTED-FAIL | ShutdownLeaks | leaked 227 DOMWindows and/or DocShells until shutdown

The highest, Windows:
TEST-UNEXPECTED-FAIL | ShutdownLeaks | leaked 4180 DOMWindows and/or DocShells until shutdown
Comment 14 Dão Gottwald [:dao] 2012-01-12 10:59:02 PST
(In reply to Tim Taubert [:ttaubert] from comment #13)
> The lowest, OS X:
> TEST-UNEXPECTED-FAIL | ShutdownLeaks | leaked 227 DOMWindows and/or
> DocShells until shutdown
> 
> The highest, Windows:
> TEST-UNEXPECTED-FAIL | ShutdownLeaks | leaked 4180 DOMWindows and/or
> DocShells until shutdown

I've now and again looked at the Windows logs and have never seen such high numbers. Also, the number I'm seeing on Linux is about 135.
Comment 15 Marco Bonardo [::mak] 2012-01-12 11:16:29 PST
Dao, do you test on Win7? there is quite a difference between xp and win7 on that tryrun, 241 vs 4180. Weird, either wrong measure or some of the win7 specific features leak a bunch.
Comment 16 Dão Gottwald [:dao] 2012-01-12 11:20:24 PST
(In reply to Dão Gottwald [:dao] from comment #14)
> the number I'm seeing on Linux is about 135.

That's 129 windows and 6 docshells. On Windows XP on mozilla-central I see 131 + 6. This might be due to a slightly different set of tests being run.
Comment 17 Tim Taubert [:ttaubert] 2012-01-12 12:04:11 PST
Not sure what I'm doing wrong... investigating. Looking manually definitely confirms your counts.
Comment 18 Tim Taubert [:ttaubert] 2012-01-12 14:49:28 PST
Created attachment 588194 [details] [diff] [review]
patch v3

Alright, I was using the pointer address as the window identifier, which seems a bad idea because it gets re-used and obviously a _lot_ more on Windows 7 than XP. I'm now using the serial and added one to the docShells' debug outputs. Waiting for try.
Comment 19 Tim Taubert [:ttaubert] 2012-01-13 02:29:41 PST
Ok, the counts look a lot saner now (DOMWindows + DocShells):

https://tbpl.mozilla.org/?tree=Try&rev=14ba0deab13b

They don't exceed the threshold of 150 leaks so they're not shown as oranges. The counts seem pretty stable so I think we should apply a per-OS limit for leaks.

Linux:   123, 123, 123
Linux64: 123, 123, 123
OS X:     86,  84,  84
OS X 64:  94,  92,  96
Win:     125, 125, 125
Win XP:  123, 123, 123

(That is three runs per OS.)
Comment 20 Tim Taubert [:ttaubert] 2012-01-25 09:14:37 PST
Hey Ted, can you please review this patch? Or do a quick feedback pass if you think it's not ready yet? If you're too busy can you please suggest someone else to review it? Thanks!
Comment 21 Ted Mielczarek [:ted.mielczarek] 2012-01-31 05:04:01 PST
I'll review this today. Sorry for the delay.
Comment 22 Ted Mielczarek [:ted.mielczarek] 2012-01-31 10:28:53 PST
Comment on attachment 588194 [details] [diff] [review]
patch v3

Review of attachment 588194 [details] [diff] [review]:
-----------------------------------------------------------------

I have a bunch of comments here, but they're mostly nits, so r=me with those fixed.

::: build/automationutils.py
@@ +452,5 @@
>      return ["arch", "-arch", "i386"] + cmd
>    # otherwise just execute the command normally
>    return cmd
> +
> +class ShutdownLeakLogger:

class ShutdownLeakLogger(object):
to make this a new-style class (probably doesn't change anything in this implementation, but it's good practice).

@@ +460,5 @@
> +  the GC) to the tests that created them and prints leak statistics.
> +  """
> +  tests = []
> +  leakedWindows = {}
> +  leakedDocShells = {}

This probably doesn't matter for this particular class, but assigning to class members like this means that all instances of this class will share the same object here. You can fix this by assigning these in the constructor instead.

@@ +477,5 @@
> +      elif line.startswith("--DOCSHELL"):
> +        match = re.match("--DOCSHELL .*\[id = (.+?)\]", line)
> +        self.leakedDocShells[match.group(1)] = None
> +    elif line.startswith("TEST-START"):
> +      fileName = line[13:].strip().replace("chrome://mochitests/content/browser/", "")

the [13:] seems really brittle. Can you make the parsing a little more flexible?

@@ +478,5 @@
> +        match = re.match("--DOCSHELL .*\[id = (.+?)\]", line)
> +        self.leakedDocShells[match.group(1)] = None
> +    elif line.startswith("TEST-START"):
> +      fileName = line[13:].strip().replace("chrome://mochitests/content/browser/", "")
> +      self.currentTest = {"fileName": fileName, "windows": [], "docShells": []}

If all you're saving is the serial number of windows and docShells, you could use set() here instead of lists, then use .add(), and your "x in test["windows"]" test will continue to work fine, but be somewhat faster.

@@ +481,5 @@
> +      fileName = line[13:].strip().replace("chrome://mochitests/content/browser/", "")
> +      self.currentTest = {"fileName": fileName, "windows": [], "docShells": []}
> +      self.tests.append(self.currentTest)
> +    elif line.startswith("INFO TEST-END"):
> +      self.currentTest = None

I wonder if it would make sense to track window/docShell cleanup as well and remove self.currentTests from self.tests if its .windows and .docShells are empty. That way you don't have to hang on to thousands of test, only ones whose windows outlive their test run. (Or is this all tests, does the test window always outlive its own test?)

@@ +487,5 @@
> +      self.seenShutdown = True
> +    elif self.currentTest:
> +      if line.startswith("++DOMWINDOW"):
> +        match = re.match("\+\+DOMWINDOW == \d+.*\[serial = (.+?)\]", line)
> +        self.currentTest["windows"].append(match.group(1))

This could probably use a comment saying that we're saving the serial number of the window. (And likewise for the docshells below.)

@@ +490,5 @@
> +        match = re.match("\+\+DOMWINDOW == \d+.*\[serial = (.+?)\]", line)
> +        self.currentTest["windows"].append(match.group(1))
> +      elif line.startswith("++DOCSHELL"):
> +        match = re.match("\+\+DOCSHELL .*\[id = (.+?)\]", line)
> +        self.currentTest["docShells"].append(match.group(1))

If we somehow create a window or docShell in between tests (if the test harness does it, say), we won't catch that, right?

@@ +495,5 @@
> +
> +  def parse(self):
> +    leakingTests = self._parseLeakingTests()
> +
> +    if len(leakingTests):

You can just say "if leakingTests:"

@@ +497,5 @@
> +    leakingTests = self._parseLeakingTests()
> +
> +    if len(leakingTests):
> +      total = sum(test["leakCount"] for test in leakingTests)
> +      msgType = "INFO" if total < 130 else "UNEXPECTED-FAIL"

Can you make 130 into a variable somewhere? I assume we'll want to change this at some point, it would make it easier to find.

@@ +498,5 @@
> +
> +    if len(leakingTests):
> +      total = sum(test["leakCount"] for test in leakingTests)
> +      msgType = "INFO" if total < 130 else "UNEXPECTED-FAIL"
> +      self.logger.info("TEST-%s | ShutdownLeaks | leaked %d DOMWindows and/or DocShells until shutdown", msgType, total)

Is there any value in counting DOMWindows and DocShells separately?

@@ +504,5 @@
> +    for test in leakingTests:
> +      self.logger.info("\n[%s]", test["fileName"])
> +
> +      for url, count in self._zipLeakedWindows(test["leakedWindows"]):
> +        self.logger.info("  %dx [%s]", count, url)

You should probably put "window" in this output line somewhere, or a "windows" line before these are output.

@@ +506,5 @@
> +
> +      for url, count in self._zipLeakedWindows(test["leakedWindows"]):
> +        self.logger.info("  %dx [%s]", count, url)
> +
> +      if len(test["leakedDocShells"]):

Again, just |if test["leakedDocShells"]:| works.

@@ +507,5 @@
> +      for url, count in self._zipLeakedWindows(test["leakedWindows"]):
> +        self.logger.info("  %dx [%s]", count, url)
> +
> +      if len(test["leakedDocShells"]):
> +        self.logger.info("  %dx docShells", len(test["leakedDocShells"]))

Is the intent here to print "10x docShells"? I'd just leave out the x personally.

@@ +524,5 @@
> +    return sorted(leakingTests, key=itemgetter("leakCount"), reverse=True)
> +
> +  def _zipLeakedWindows(self, leakedWindows):
> +    counts = []
> +    counted = {}

Use a set() here for counted.

::: docshell/base/nsDocShell.cpp
@@ +783,5 @@
>    // We're counting the number of |nsDocShells| to help find leaks
>    ++gNumberOfDocShells;
>    if (!PR_GetEnv("MOZ_QUIET")) {
> +      printf("++DOCSHELL %p == %ld [id = %ld]\n", (void*) this,
> +             gNumberOfDocShells, mHistoryID);

You probably need someone else to sign off on the changes to nsDocShell.cpp
Comment 23 Tim Taubert [:ttaubert] 2012-02-17 06:52:39 PST
Created attachment 598224 [details] [diff] [review]
Part 1 - Parse log output and collect statistics about leaked DOMWindows and DocShells

Asking for review again because I changed some parts and did some refactoring to especially make the log() method easier to read.

(In reply to Ted Mielczarek [:ted, :luser] from comment #22)
> class ShutdownLeakLogger(object):
> to make this a new-style class (probably doesn't change anything in this
> implementation, but it's good practice).

Done.

> This probably doesn't matter for this particular class, but assigning to
> class members like this means that all instances of this class will share
> the same object here. You can fix this by assigning these in the constructor
> instead.

Done.

> the [13:] seems really brittle. Can you make the parsing a little more
> flexible?

Done.

> If all you're saving is the serial number of windows and docShells, you
> could use set() here instead of lists, then use .add(), and your "x in
> test["windows"]" test will continue to work fine, but be somewhat faster.

Done.

> I wonder if it would make sense to track window/docShell cleanup as well and
> remove self.currentTests from self.tests if its .windows and .docShells are
> empty. That way you don't have to hang on to thousands of test, only ones
> whose windows outlive their test run. (Or is this all tests, does the test
> window always outlive its own test?)

I think there definitely are tests that don't keep things alive after they ended. The current patch does now remove windows and docShells if they're destroyed while the test still runs and doesn't add the test if it didn't leak anything.

> If we somehow create a window or docShell in between tests (if the test
> harness does it, say), we won't catch that, right?

Yes, we wouldn't catch that though I'm not sure if we should track that.

> Can you make 130 into a variable somewhere? I assume we'll want to change
> this at some point, it would make it easier to find.

Done.

> Is there any value in counting DOMWindows and DocShells separately?

Why not, done.

> You should probably put "window" in this output line somewhere, or a
> "windows" line before these are output.

Done.

> Is the intent here to print "10x docShells"? I'd just leave out the x
> personally.

Done.

> Use a set() here for counted.


Done.

> You probably need someone else to sign off on the changes to nsDocShell.cpp

Split off to a second patch.
Comment 24 Tim Taubert [:ttaubert] 2012-02-17 06:56:40 PST
Created attachment 598225 [details] [diff] [review]
Part 2 - Add a docShell's history ID to the debug output
Comment 25 Justin Lebar (not reading bugmail) 2012-02-17 07:54:02 PST
(In reply to Dietrich Ayala (:dietrich) from comment #12)
> Memshrink team, can we make this P1?

Marking for re-triage, per this request.
Comment 26 Justin Lebar (not reading bugmail) 2012-02-17 08:10:12 PST
Comment on attachment 598225 [details] [diff] [review]
Part 2 - Add a docShell's history ID to the debug output

You're welcome to make this printf nicer, if you'd like.  e.g.:

"DocShell created (%p, id=%d).  %d docshell(s) currently alive."
Comment 27 Tim Taubert [:ttaubert] 2012-02-17 13:24:17 PST
(In reply to Justin Lebar [:jlebar] from comment #26)
> You're welcome to make this printf nicer, if you'd like.  e.g.:
> 
> "DocShell created (%p, id=%d).  %d docshell(s) currently alive."

I kind of liked that DOMWINDOW and DOCSHELL printfs have the same format, like "[key = value]" so I can parse them all the same. I'm not sure if there might be other log parsing tools around that rely on the current format?
Comment 28 Justin Lebar (not reading bugmail) 2012-02-17 15:25:14 PST
Oh, that's a good point about matching the DOMWINDOW printfs.  Carry on, then!
Comment 29 Tim Taubert [:ttaubert] 2012-02-17 15:29:03 PST
Created attachment 598400 [details] [diff] [review]
Part 1 - Parse log output and collect statistics about leaked DOMWindows and DocShells

Removed the use of a Python 2.6 feature.
Comment 30 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2012-02-21 14:08:34 PST
We decided to make this a P2 again because most of the bugs that I've seen for browser-chrome leaks are bugs in tests and not in our code.  If that's not the case, let me know.
Comment 31 Dão Gottwald [:dao] 2012-02-21 14:40:54 PST
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #30)
> We decided to make this a P2 again because most of the bugs that I've seen
> for browser-chrome leaks are bugs in tests and not in our code.  If that's
> not the case, let me know.

I wouldn't be so sure about this. We did see quite a few real bugs over time. Bugs in tests might be the majority, but then it's probably something like a 60% majority rather than 95%.
Comment 32 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2012-02-21 14:42:17 PST
(In reply to Dão Gottwald [:dao] from comment #31)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #30)
> > We decided to make this a P2 again because most of the bugs that I've seen
> > for browser-chrome leaks are bugs in tests and not in our code.  If that's
> > not the case, let me know.
> 
> I wouldn't be so sure about this. We did see quite a few real bugs over
> time. Bugs in tests might be the majority, but then it's probably something
> like a 60% majority rather than 95%.

Ok, thanks for the numbers.  I'll make this a P1 since it's close to parity then.
Comment 33 Ted Mielczarek [:ted.mielczarek] 2012-02-22 05:15:46 PST
Comment on attachment 598400 [details] [diff] [review]
Part 1 - Parse log output and collect statistics about leaked DOMWindows and DocShells

Review of attachment 598400 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good, just really minor nits.

::: build/automationutils.py
@@ +508,5 @@
> +    id = self._parseValue(line, "serial")
> +
> +    if self.currentTest:
> +      windows = self.currentTest["windows"]
> +      windows.add(id) if created else windows.discard(id)

This is not idiomatic Python. You should use the if/else ternary only for expressions. For statements like this, just use normal if/else blocks.

@@ +518,5 @@
> +    id = self._parseValue(line, "id")
> +
> +    if self.currentTest:
> +      docShells = self.currentTest["docShells"]
> +      docShells.add(id) if created else docShells.discard(id)

Same thing here.

@@ +523,5 @@
> +    elif self.seenShutdown and not created:
> +      self.leakedDocShells.add(id)
> +
> +  def _parseValue(self, line, name):
> +    return re.search("\[" + name + " = (.+?)\]", line).group(1)

I'd usually write this like "\[%s = (.+?)\]" % name, but it's not really that bad either way. (If you were doing >1 substitution, I'd always recommend % formatting.)
Comment 36 Serge Gautherie (:sgautherie) 2012-02-27 01:30:11 PST
Comment on attachment 598400 [details] [diff] [review]
Part 1 - Parse log output and collect statistics about leaked DOMWindows and DocShells

Review of attachment 598400 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/automationutils.py
@@ +479,5 @@
> +      self.currentTest = {"fileName": fileName, "windows": set(), "docShells": set()}
> +    elif line.startswith("INFO TEST-END"):
> +      # don't track a test if no windows or docShells leaked
> +      if self.currentTest["windows"] and self.currentTest["docShells"]:
> +        self.tests.append(self.currentTest)

I just had a quick glance, but just in case:
shouldn't it be "or" rather than "and", wrt the comment?
Comment 37 Serge Gautherie (:sgautherie) 2012-02-27 01:35:39 PST
Comment on attachment 598400 [details] [diff] [review]
Part 1 - Parse log output and collect statistics about leaked DOMWindows and DocShells

Review of attachment 598400 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/automationutils.py
@@ +458,5 @@
> +  Parses the mochitest run log when running a debug build, assigns all leaked
> +  DOM windows (that are still around after test suite shutdown, despite running
> +  the GC) to the tests that created them and prints leak statistics.
> +  """
> +  MAX_LEAK_COUNT = 130

Should a bug be filed to regularly Try lowering this value?
Comment 38 Tim Taubert [:ttaubert] 2012-02-27 01:45:48 PST
(In reply to Serge Gautherie (:sgautherie) from comment #36)
> I just had a quick glance, but just in case:
> shouldn't it be "or" rather than "and", wrt the comment?

and/or - not sure there's a different meaning.

(In reply to Serge Gautherie (:sgautherie) from comment #37)
> > +  MAX_LEAK_COUNT = 130
> 
> Should a bug be filed to regularly Try lowering this value?

I reduced the value to 120 before landing (based on the current numbers) and will file a bug about applying a per-OS threshold. We will check and compare the numbers from time to time and file a bug if needed. For now the main goal should be to not introduce new leaks.
Comment 39 Serge Gautherie (:sgautherie) 2012-02-27 02:14:47 PST
(In reply to Tim Taubert [:ttaubert] from comment #38)

> (In reply to Serge Gautherie (:sgautherie) from comment #36)
> > shouldn't it be "or" rather than "and", wrt the comment?
> 
> and/or - not sure there's a different meaning.

As I read this "if" line, it looks like a test is appended if it leaks both window(s) and docshell(s) only. Am I just misunderstanding this line? Or is that intended?


> (In reply to Serge Gautherie (:sgautherie) from comment #37)
> > > +  MAX_LEAK_COUNT = 130
> > 
> > Should a bug be filed to regularly Try lowering this value?
> 
> I reduced the value to 120 before landing (based on the current numbers) and

(I noticed it.)

> will file a bug about applying a per-OS threshold. We will check and compare

(Why not.)

> the numbers from time to time and file a bug if needed. For now the main

I was suggesting to just file a/1 bug and report/lower in it,
so it's not forgotten, everyone can follow, other apps can compare to it, etc.

> goal should be to not introduce new leaks.

(Sure.)
Comment 40 Tim Taubert [:ttaubert] 2012-02-27 02:21:30 PST
(In reply to Serge Gautherie (:sgautherie) from comment #39)
> As I read this "if" line, it looks like a test is appended if it leaks both
> window(s) and docshell(s) only. Am I just misunderstanding this line? Or is
> that intended?

Oops, I didn't check the if condition, only the comment. That's not indented, it should be "or" as well. Thanks for pointing that out.

> I was suggesting to just file a/1 bug and report/lower in it,
> so it's not forgotten, everyone can follow, other apps can compare to it,
> etc.

I think a meta bug that tracks these efforts does make sense.
Comment 41 Serge Gautherie (:sgautherie) 2012-02-27 02:40:28 PST
(In reply to Tim Taubert [:ttaubert] from comment #40)

> Oops, I didn't check the if condition, only the comment. That's not
> indented, it should be "or" as well. Thanks for pointing that out.

You filed bug 730773.

> I think a meta bug that tracks these efforts does make sense.

Meta would be fine...
Comment 42 Dão Gottwald [:dao] 2012-02-27 05:48:16 PST
--DOMWINDOW == 0 (0x49106e8) [serial = 1608] [outer = (nil)] [url = chrome://browser/content/bookmarks/bookmarksPanel.xul]

This is bug 728426 -- Any idea why it doesn't show up in the output generated here?
Comment 43 Tim Taubert [:ttaubert] 2012-02-27 06:18:14 PST
(In reply to Dão Gottwald [:dao] from comment #42)
> This is bug 728426 -- Any idea why it doesn't show up in the output
> generated here?

I'd say that's because of bug 730773. Probably there is no docShell leaked (at least temporarily) and that's why we don't track this test.

Yeah, that's it. I pushed bug 730773 to try:

https://tbpl.mozilla.org/?tree=Try&rev=602fc521eb1a

[browser/components/places/tests/browser/browser_views_liveupdate.js]
  1 window(s) [url = about:blank]
  1 window(s) [url = chrome://browser/content/bookmarks/bookmarksPanel.xul]
Comment 44 Serge Gautherie (:sgautherie) 2012-02-27 08:15:43 PST
Maybe dumbs questions, but:

How is this run for browser-chrome suite only?

Does it print a "TEST-PASS | automationutils.processLeakLog() | no leaks detected!"-like line when no leaks? (I think it doesn't, but should.)

Note You need to log in before you can comment on or make changes to this bug.