Bring back the shutdown leak detector

RESOLVED FIXED in mozilla28

Status

Testing
General
--
critical
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: emorley, Assigned: ttaubert)

Tracking

(Blocks: 1 bug)

Trunk
mozilla28
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [MemShrink:P1])

Attachments

(3 attachments, 6 obsolete attachments)

(Reporter)

Description

5 years ago
The cycle collection analysis added by bug 728294 should have caught the recent issues in bug 932781 and dependents, but didn't.

My money is on bug 863447 regressing it:
https://hg.mozilla.org/mozilla-central/rev/bed3081376ca

Either way, we must fix it & then the inevitable hoards of leaks we have accumulated in the meantime, before the trees can reopen.
(Reporter)

Comment 1

5 years ago
(This bug is one of three holding the trees closed - bug 932781 comment 11)
Severity: critical → blocker
Duplicate of this bug: 932891
Whiteboard: [MemShrink]
Blocks: 658738

Comment 3

5 years ago
(In reply to Ed Morley [:edmorley UTC+1] from comment #3)
> The cycle collection analysis added by bug 728294 should have caught the
> recent issues in bug 932781 and dependents, but didn't.
> 
> My money is on bug 863447 regressing it:
> https://hg.mozilla.org/mozilla-central/rev/bed3081376ca

Try push w/ backout: https://tbpl.mozilla.org/?tree=Try&rev=b86779999d73

I'll try this locally as well but am not sure that'll actually be faster than this try push...

Comment 4

5 years ago
(In reply to :Gijs Kruitbosch from comment #3)
> (In reply to Ed Morley [:edmorley UTC+1] from comment #3)
> > The cycle collection analysis added by bug 728294 should have caught the
> > recent issues in bug 932781 and dependents, but didn't.
> > 
> > My money is on bug 863447 regressing it:
> > https://hg.mozilla.org/mozilla-central/rev/bed3081376ca
> 
> Try push w/ backout: https://tbpl.mozilla.org/?tree=Try&rev=b86779999d73
> 
> I'll try this locally as well but am not sure that'll actually be faster
> than this try push...

FWIW, locally at least this doesn't help

Comment 5

5 years ago
Tim and I are looking at this
Assignee: nobody → ttaubert
Status: NEW → ASSIGNED
(Assignee)

Comment 6

5 years ago
I wonder, isn't bug 932867 a different case of leak than what the CC graph analyzer is supposed to catch? I thought the analyzer catches when there is some JS property that holds a reference to a DOM Element or Window that has been removed from the DOM Tree.

How ever the FrameWorker stuff inserts an iframe into the hiddenDOMWindow and just never uses it again. So it's a valid DOM Element kept alive but the hiddenWindow, we just never use it again.

Is that a leak we should catch?

Comment 7

5 years ago
That is not a "leak". We just end up creating more and more iframes, but never remove them from
DOM.
We could and should perhaps count window objects which are alive, and make sure that count doesn't
increase much over the time.
(Assignee)

Comment 8

5 years ago
Bug 839193 (which turns up in khuey's log [1] as well) seems to have introduced a test leak by never removing observers, right? Shouldn't that be detected by our leak detection? It also doesn't show up in about:cc.

[1] https://khuey.pastebin.mozilla.org/3378934

Comment 9

5 years ago
(In reply to Tim Taubert [:ttaubert] from comment #8)
> Bug 839193 (which turns up in khuey's log [1] as well) seems to have
> introduced a test leak by never removing observers, right? Shouldn't that be
> detected by our leak detection? It also doesn't show up in about:cc.
> 
> [1] https://khuey.pastebin.mozilla.org/3378934

Status update: Tim and I found that enabling allTraces() did make the test suite report a bunch of stuff as leaking, but that is likely overreporting (ie reporting things which aren't actually leaks). Just blanket using that doesn't seem like it'd help us here.

Tim also noticed that when http://mxr.mozilla.org/mozilla-central/source/dom/base/nsGlobalWindow.cpp#1629 runs, if (MOZ_UNLIKELY(cb.WantDebugInfo())) is false. This by itself may not affect things too much, but it's strange that that flag is not present, as AFAICT it shouldn't be.
(In reply to Tim Taubert [:ttaubert] from comment #8)
> Bug 839193 (which turns up in khuey's log [1] as well) seems to have
> introduced a test leak by never removing observers, right? Shouldn't that be
> detected by our leak detection? It also doesn't show up in about:cc.
What kind of observers? We try to mark strong observers black "certainly alive", since we have
tons of stuff using observer service and we need to get the normal CC graphs small.
It is a thin line between a leak and valid observer service use.
I guess we should put back part 1 from bug 728294. That would catch some leaks.
But we need something where we check the number of windows more often.
fwiw, I don't think bug 863447 has blame here, it is not faking or hiding anything, we care to detect things that are indefinitely leaked, not things that are "leaked" for some milliseconds while async work completes.
(In reply to Marco Bonardo [:mak] from comment #12)
> fwiw, I don't think bug 863447 has blame here, it is not faking or hiding
> anything, we care to detect things that are indefinitely leaked, not things
> that are "leaked" for some milliseconds while async work completes.

I agree.  This appears to be a different "type" of leak that we weren't checking before.  Based on that I don't think we should block reopening the tree on this.
(Assignee)

Comment 14

5 years ago
+1. This is an old feature we should bring back but this could take a little while to get right with all the try runs etc. I don't think we should block trees any further.
(Assignee)

Comment 15

5 years ago
Created attachment 825175 [details] [diff] [review]
Bring back the shutdown leak detector

This seems to work fine and is mostly code that we removed in bug 728294. I'm not sure how to detect if we're in a debug build (which isn't actually necessary as we just don't have any lines to parse) and we should not run with webapprtChrome=true.
Attachment #825175 - Flags: feedback?(ted)
(Assignee)

Comment 16

5 years ago
Created attachment 825177 [details] [diff] [review]
Fix leaks in dom/ mochitests

Both tests are leaking according to the shutdown leak detector. They're not anymore with these fixes applied.
Attachment #825177 - Flags: review?(bugs)

Updated

5 years ago
Attachment #825177 - Flags: review?(bugs) → review+
(Reporter)

Updated

5 years ago
Whiteboard: [MemShrink] → [MemShrink] [leave open]
(Assignee)

Comment 18

5 years ago
Here's a try run with most of the recent leak fixes and the leak detection patch:

https://tbpl.mozilla.org/?tree=Try&rev=2a58c0f29f88
(Assignee)

Comment 19

5 years ago
Created attachment 825296 [details] [diff] [review]
Bring back the shutdown leak detector, v2

Added some code that removes the false positives caused by the sidebar and the BrowserNewTabPreloader.
Attachment #825175 - Attachment is obsolete: true
Attachment #825175 - Flags: feedback?(ted)
Attachment #825296 - Flags: feedback?(ted)
Comment on attachment 825296 [details] [diff] [review]
Bring back the shutdown leak detector, v2

Review of attachment 825296 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/automationutils.py
@@ +558,5 @@
> +    if line[2:11] == "DOMWINDOW":
> +      self._logWindow(line)
> +    elif line[2:10] == "DOCSHELL":
> +      self._logDocShell(line)
> +    elif line.startswith("TEST-START"):

Kinda wish this was using the existing currentTest stuff, but it's not that big of a deal.

::: testing/mochitest/runtests.py
@@ +833,5 @@
>      args.append('-foreground')
>      if testUrl:
>         args.append(testUrl)
>  
> +    #if debug build and not webapprtChrome

You can say if mozinfo.info["debug"] and not options.webapprtChrome:

@@ +1052,5 @@
>    ### output processing
>  
>    class OutputHandler(object):
>      """line output handler for mozrunner"""
> +    def __init__(self, harness, utilityPath, symbolsPath=None, dump_screen_on_timeout=True, shutdownLeaks = None):

nit: style here seems to be no spaces around =
Attachment #825296 - Flags: feedback?(ted) → feedback+
(Assignee)

Comment 21

5 years ago
Created attachment 825366 [details] [diff] [review]
Bring back the shutdown leak detector, v3

Here's a new patch that fixes the false positive for the TabView window. It doesn't address ted's comments, yet.
Attachment #825296 - Attachment is obsolete: true
Updating the summary to what seems to be the current goal.
Summary: Work out why the cycle collection leak analysis from bug 728294 is no longer working → Bring back the shutdown leak detector
(Reporter)

Updated

5 years ago
Severity: blocker → critical
(Assignee)

Updated

5 years ago
Depends on: 933699
(Assignee)

Comment 24

5 years ago
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #20)
> > +    #if debug build and not webapprtChrome
> 
> You can say if mozinfo.info["debug"] and not options.webapprtChrome:

'options' isn't available in runApp() so I think I'll just pass a 'webapprtChrome' argument?
Ah. Yeah, do that.
Depends on: 932880
(Assignee)

Comment 26

5 years ago
Created attachment 826089 [details] [diff] [review]
Bring back the shutdown leak detector, v4

Patch addresses Ted's comments and should fix some more false positives. Needs fx-team tip (until that is merged).
Attachment #825366 - Attachment is obsolete: true
(Assignee)

Updated

5 years ago
Depends on: 933551
(Assignee)

Updated

5 years ago
Depends on: 934091
(Assignee)

Comment 27

5 years ago
Created attachment 826283 [details] [diff] [review]
Bring back the shutdown leak detector, v5

Latest patch that needs the patch from bug 934091 to work.
Attachment #826089 - Attachment is obsolete: true
(Assignee)

Comment 28

5 years ago
Created attachment 826307 [details] [diff] [review]
Fix leaks in /browser/base/content/test/general/

Some more leak fixes.
Attachment #826307 - Flags: review?(jaws)
Comment on attachment 826307 [details] [diff] [review]
Fix leaks in /browser/base/content/test/general/

>+  function windowClosed(subject) {
>+    if (subject == win) {
>+      Services.obs.removeObserver(newDocumentShown, "document-shown");
>+      Services.obs.removeObserver(windowClosed, "domwindowclosed");
>+    }
>+  }
>+
>+  // Make sure to remove the 'document-shown' observer in case the window
>+  // is being closed right after it was opened to avoid leaking.
>   Services.obs.addObserver(newDocumentShown, "document-shown", false);
>+  Services.obs.addObserver(windowClosed, "domwindowclosed", false);

Can you use the unload event?
(Assignee)

Comment 30

5 years ago
(In reply to Dão Gottwald [:dao] from comment #29)
> >+  // Make sure to remove the 'document-shown' observer in case the window
> >+  // is being closed right after it was opened to avoid leaking.
> >   Services.obs.addObserver(newDocumentShown, "document-shown", false);
> >+  Services.obs.addObserver(windowClosed, "domwindowclosed", false);
> 
> Can you use the unload event?

I just tried and it doesn't work. Looks like we're closing the window way too early to even receive an unload event.
(Assignee)

Updated

5 years ago
Depends on: 934206
Attachment #826307 - Flags: review?(jaws) → review+
Comment on attachment 826283 [details] [diff] [review]
Bring back the shutdown leak detector, v5

Review of attachment 826283 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/automationutils.py
@@ +603,5 @@
> +      self.leakedWindows[id] = self._parseValue(line, "url")
> +
> +  def _logDocShell(self, line):
> +    created = line[:2] == "++"
> +    id = self._parseValue(line, "pid") + "." + self._parseValue(line, "id")

FWIW, I was using this and found a problem here - self._parseValue may return None, causing a TypeError trying to add a string and None.  This was probably caused by me applying just this patch - there's no "pid" in docShell lines in my output.

(The symptoms were also really bizarre - exceptions weren't written anywhere and it seemed to cause the Fx process to simply hang writing to stdout - I doubt they are related to this patch though, but it sure confused me and took me quite some time to work out the problem was here...)
(Assignee)

Comment 35

5 years ago
(In reply to Mark Hammond [:markh] from comment #31)
> FWIW, I was using this and found a problem here - self._parseValue may
> return None, causing a TypeError trying to add a string and None.  This was
> probably caused by me applying just this patch - there's no "pid" in
> docShell lines in my output.

Good point, I will address this in the next version of the patch and add some better error handling.
(Assignee)

Comment 37

5 years ago
Created attachment 827536 [details] [diff] [review]
Bring back the shutdown leak detector, v6

Together with the patches from bug 934091 and bug 934206 we have a green try run without any leaks \o/. I modified the error handling to explicitly print an error message should we somehow ever get interleaved log lines again. I added some cleanup to browser-test.js to avoid false positives for features that explicitly may keep DOMWindows and DocShells alive until shutdown.

https://tbpl.mozilla.org/?tree=Try&rev=571ad96c3957
Attachment #826283 - Attachment is obsolete: true
Attachment #827536 - Flags: review?(ted)
(Assignee)

Comment 38

5 years ago
Created attachment 829885 [details] [diff] [review]
Bring back the shutdown leak detector, v7

Small fix for Android mochitests: added a default value for the new |webapprtChrome| argument in build/automation.py.in. All running fine now:

https://tbpl.mozilla.org/?tree=Try&rev=6dc945b3212e
Attachment #827536 - Attachment is obsolete: true
Attachment #827536 - Flags: review?(ted)
Attachment #829885 - Flags: review?(ted)
(Assignee)

Comment 39

5 years ago
Latest try push with the leak detector:

https://tbpl.mozilla.org/?tree=Try&rev=21a5d107d49f
Whiteboard: [MemShrink] [leave open] → [MemShrink:P1] [leave open]
(Reporter)

Updated

5 years ago
Blocks: 937997
Blocks: 938691
ttaubert, how close is this to landing?
No longer blocks: 937997
(Assignee)

Comment 41

5 years ago
(In reply to Nicholas Nethercote [:njn] from comment #40)
> ttaubert, how close is this to landing?

This is waiting for review and should be good to go then.
ted: eight day review ping.  This is an important patch.  Thanks.
Comment on attachment 829885 [details] [diff] [review]
Bring back the shutdown leak detector, v7

Review of attachment 829885 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry, I was travelling last week and have been generally behind on reviews.
Attachment #829885 - Flags: review?(ted) → review+
(Assignee)

Comment 44

5 years ago
https://hg.mozilla.org/integration/fx-team/rev/53627e59f761
Whiteboard: [MemShrink:P1] [leave open] → [MemShrink:P1]
(Assignee)

Comment 47

5 years ago
Sorry about that. I pushed to try before but ran only mochitest-bc tests.
(Assignee)

Comment 48

5 years ago
Pushed to try again with some minor fixes to account for metro:

https://tbpl.mozilla.org/?tree=Try&rev=04ab4ad2f6ba
(Assignee)

Comment 49

5 years ago
Pushed to try again with a smaller and better fix:

https://tbpl.mozilla.org/?tree=Try&rev=db126ca7ee5a

Seems all green, waiting a little more to make sure.
(Assignee)

Comment 50

5 years ago
Try was green-ish. Landed again:

https://hg.mozilla.org/integration/fx-team/rev/45dda1081309
https://hg.mozilla.org/mozilla-central/rev/45dda1081309

\m/ frlz this time :)
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
(Reporter)

Updated

5 years ago
Depends on: 943474
You need to log in before you can comment on or make changes to this bug.