Closed Bug 1267693 Opened 4 years ago Closed 3 years ago

ActionButton causes Firefox to reliably leak browser.xul windows

Categories

(Add-on SDK Graveyard :: General, defect)

defect
Not set

Tracking

(firefox47 wontfix, firefox48 wontfix, firefox49 fixed, firefox-esr45 wontfix, firefox50 fixed)

RESOLVED FIXED
mozilla50
Tracking Status
firefox47 --- wontfix
firefox48 --- wontfix
firefox49 --- fixed
firefox-esr45 --- wontfix
firefox50 --- fixed

People

(Reporter: bkelly, Assigned: bkelly)

References

Details

(Whiteboard: [MemShrink:P2])

Attachments

(3 files, 10 obsolete files)

2.55 KB, patch
gkrizsanits
: review+
Details | Diff | Splinter Review
5.96 KB, patch
gkrizsanits
: review+
Details | Diff | Splinter Review
6.09 KB, patch
gkrizsanits
: review+
mccr8
: feedback+
Details | Diff | Splinter Review
STR:

In a fresh profile:

1) Open firefox with about:memory in a tab
2) Take an initial measurement
3) Open a new browser window
4) Close the new browser window
5) In the about:memory tab click "minimize"
6) Take a new measurement and verify there is no browser.xul under top(none)/detached
7) Install mozilla-tree-status: https://addons.mozilla.org/en-US/firefox/addon/mozilla-tree-status/
8) Open a new browser window
9) Close the new browser window
10) In the about:memory tab click minimize again
11) Take a new measurement

At this point you should see there is a browser.xul window under top(none)/detached.

I've reproduced this in FF46 to FF49.  I'm on windows 10.

This seems like a fairly simple addon.  The code is here:

  https://github.com/jsantell/mozilla-tree-status

So I'm writing this as a bug against the add-on sdk for now.
Not sure if it's the add-on or the SDK, but bug 1117816 has some possibly related issues for the SDK in general.
Yes, this is almost certainly a bug in chrome JS code. Vimperator has the same issue. (bug 873163) We can't clear chrome-chrome references automatically like we can with chrome-content references.
I hacked a bunch of stuff out of the addon until it was basically just:

let button = ActionButton({
  id: "tree-status-button",
  icon: "./tree_warning.svg",
  label: "status unknown",
});

This was enough to cause it to leak.  Removing the ActionButton stopped the leak.

There is an ActionButton.destroy(), but none of the documentation says you have to manually call it when the window closes.
Summary: mozilla-tree-status add-on causes firefox to reliably leak browser.xul windows → ActionButton causes Firefox to reliably leak browser.xul windows
Whiteboard: [MemShrink] → [MemShrink:P2]
markh, any idea what the problem is here? (You're listed as an Add-On SDK peer.) bkelly did a nice job of reducing this to a trivial addon in comment 3.
Flags: needinfo?(markh)
(In reply to Nicholas Nethercote [:njn] from comment #4)
> markh, any idea what the problem is here? (You're listed as an Add-On SDK
> peer.)

I fixed that ;) I really shouldn't be a peer for this, so I removed myself.

> bkelly did a nice job of reducing this to a trivial addon in comment
> 3.

I did dig a little though - if I take that addon and change index.js to the single line:

> const { events } = require("sdk/window/events");

I can observe the leak (ie, no need to reference an ActionButton at all). If I comment out the line at https://dxr.mozilla.org/mozilla-central/rev/369a5ee3a2880a4a98df3a00bf3db8d8f36b181b/addon-sdk/source/lib/sdk/event/chrome.js#55, the leak goes away. I believe the call to observe at https://dxr.mozilla.org/mozilla-central/rev/369a5ee3a2880a4a98df3a00bf3db8d8f36b181b/addon-sdk/source/lib/sdk/window/events.js#40 is at fault for reasons I don't understand - when the notification for the new window fires, we end up emitting the chrome window as an arg to a number of listeners - I had trouble working out exactly what was going on there, but I suspect the issue is that a reference to the chromeWindow being passed around there is being held somewhere subtle.

I notice bug 1117816 has references to leaks on a require of sdk/windows and sdk/event/dom, so I'm guessing this is a dupe of one of them. :irakli has some comments in those bugs, so I'll move the needinfo to him.
Flags: needinfo?(markh) → needinfo?(rFobic)
Duplicate of this bug: 1266915
I'm tired of hitting addon leaks.  I'm going to investigate this and other addon leaks I can find.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Flags: needinfo?(rFobic)
The first part of the leak is due to the event/utils.js module caching the last seen event object on `input.value`.  If the event is a DOM event or object, then you are going to end up leaking the global until the next event comes in.

This patch caches the event using a weak reference in `input._weakValue` instead.  It then installs an `input.value` getter that extracts the underlying cached event.

While theoretically the event could be GC'd between when its first dispatched and someone checks `input.value`, I think its very unlikely.  This should be safe for typical uses of the event channel system.

I'll do some try builds to verify, of course.
Attachment #8768940 - Flags: review?(gkrizsanits)
The other part of the leak here is due to window/events.js capturing window objects in closures which are then incorporated into the event channel chain.  This patch removes these closures.

There is one part of the code, however, that wants to explicitly filter events based on the window.document value.  To accomplish this I use a weak reference to the window.
Attachment #8768942 - Flags: review?(gkrizsanits)
Sorry, there were some additional changes in the last patch that are not needed here.

See comment 9 for the patch explanation.
Attachment #8768942 - Attachment is obsolete: true
Attachment #8768942 - Flags: review?(gkrizsanits)
Attachment #8768949 - Flags: review?(gkrizsanits)
The mozilla-tree-status plugin no longer causes windows to leak with these two patches.  I'm going to write a test as a P3 patch, but won't get to that until tomorrow.

Here is a try build to make sure I didn't break anything else:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=90e8c715ad0f
Comment on attachment 8768940 [details] [diff] [review]
P1 Cache the most recent event in a channel weakly to avoid leaking DOM objects. r=gkrizsanits

Try is unhappy with this change it seems.  Let me see what I can do.
Attachment #8768940 - Flags: review?(gkrizsanits)
Updated to fix try failures.

The patch now aggressively defines both a weak getter and setter for value on the Input prototype.  This lets us do the right thing for InputPort automatically.

I also created an Output type to be used by the various pieces of code that create new output channels since these also get .value set on them.

Finally, the patch will add the weak getter/setter if there is no value property defined in receive() to catch other object types that might be passed to us.  We then just call input.value to cache the message.

This will still create a strong ref and potentially leak objects if someone passes their own { value: foo } as the channel, but hopefully that is rare.

Passes tests locally.  I will do another try build before flagging for review again.
Attachment #8768940 - Attachment is obsolete: true
Hmm.  Maybe I should have just made the various code do `output = new Input()` instead of creating an Output type.
The jetpack tests showed that this one was not doing what was intended either.  New try build coming.
Attachment #8768949 - Attachment is obsolete: true
Attachment #8768949 - Flags: review?(gkrizsanits)
Attachment #8769307 - Attachment description: bug1267693_p1_eventutils.patch → P1 Cache the most recent event in a channel weakly to avoid leaking DOM objects. r=gabor
The jetpack tests are looking green.  Here is a full try:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d5e7dfbde742

I'm still trying to figure out how to write a test for this.
Local testing suggests making Reactor.prototype.value implement the weak reference semantics might have been too aggressive.  Lets back off on that for now:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=462b35cd7233
The last try build is finally green on repeated pushes.

Please see comment 13 for the origin patch description.
Attachment #8769307 - Attachment is obsolete: true
Attachment #8769405 - Flags: review?(gkrizsanits)
> Please see comment 13 for the origin patch description.

Err, I mean comment 8.
Comment on attachment 8769289 [details] [diff] [review]
P2 Avoid leaking windows via event channel closures. r=gabor

Please see comment 9 for the original patch description.
Attachment #8769289 - Flags: review?(gkrizsanits)
I've been investigating how to write a test.  The main difficulty is detecting a leaked window before shutdown.  I think my best bet is to use the same technique Nick used in bug 1261869.  I'll try to make that work on Monday.
Comment on attachment 8769405 [details] [diff] [review]
P1 Cache the most recent event in a channel weakly to avoid leaking DOM objects. r=gabor

Review of attachment 8769405 [details] [diff] [review]:
-----------------------------------------------------------------

I'm probably not the best person to review this since this is the first time I've seen this part of the SDK... But since the original authors are not very responsive when it comes to review patches, I tend to be the one who ends up reviewing them if I can, since at least I know some parts of the SDK a bit.

That being said I more or less understand what is going on. What I don't get about this module is conceptually when does it supposed to clear these cached values on input. I mean I get that when an |end| event is received we clear the input from the weakmap that holds a reference onto it (given that output did not get GC'd already which I doubt anyone could tell when will happen but that should have the same effect) but if someone somewhere referencing input then we will leak. And with all the closures and complex usage of input, it's easy to imagine that it can happen. I think at the latest, when |end| is emitted, the cached value should be cleared but probably earlier when we no longer need it. I cannot even guess what was the plan originally to avoid leak for this part: |let output = { value: inputs[0].value };|... Since I have a hard time deciding when should we break these references exactly and people with deeper understanding of this wonderful module are not responsive, I will not mind just doing the weakref workaround if that pleases the jetpack tests.

Thanks a lot for working on this and for the detailed comments in the patch.

::: addon-sdk/source/lib/sdk/event/utils.js
@@ +113,5 @@
>      input[receive](input, message);
>    else
>      emit(input, "data", message);
>  
> +  // Ideally our input will extend Input and already provide a weak value

I'm not sure I'm following this part of the comment here.
Attachment #8769405 - Flags: review?(gkrizsanits) → review+
Comment on attachment 8769289 [details] [diff] [review]
P2 Avoid leaking windows via event channel closures. r=gabor

Review of attachment 8769289 [details] [diff] [review]:
-----------------------------------------------------------------

Lgtm.

::: addon-sdk/source/lib/sdk/window/events.js
@@ +32,4 @@
>  // Function registers single shot event listeners for relevant window events
>  // that forward events to exported event stream.
>  function eventsFor(window) {
> +  // NOTE: Do no use pass a closure from this function into a stream

Do not pass
Attachment #8769289 - Flags: review?(gkrizsanits) → review+
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #27)
> > +  // Ideally our input will extend Input and already provide a weak value
> 
> I'm not sure I'm following this part of the comment here.

The value passed to receive() may just be some arbitrary {} object.  Or it could be an object that extend Input.prototype, like InputPort.  I've already set up the weak value getter/setter on Input.prototype, but other arbitrary objects won't have it.  If they've already set a .value, then we basically just have to use their implementation instead of monkey-patching our weak value setter/getter.  Trying to overwrite it blows up in practice because some things freeze their objects to avoid this kind of thing.
After much gnashing of teeth I came up with this test.  Its quite slow and not perfect.  I put it in its own test directory to try to avoid bad interactions with the existing tests.

Lets see how it does in automation:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=64cd3cc58830
The previous test ran into trouble because of the timing of the self-support hidden window being opened during the middle of the test.

This patch will delay the test until the self-support hidden window is loaded.  It passes in both opt and debug builds locally, although debug has to wait almost 30 seconds for the next self-support window to appear.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=805820336cd7
Attachment #8770623 - Attachment is obsolete: true
The last patch upload was stale due to a stuck SMB connection.  This should be updated.
Attachment #8770705 - Attachment is obsolete: true
Blocks: 1286673
So we wait for the next self-support window.... but, it only retries 5 times.  So if its already done, just continue immediately.
Attachment #8770709 - Attachment is obsolete: true
This adds a simple .done getter to SelfServiceBackend that is used by the P3 patch.

I could have added some kind of .pause() function, but that would be more intrusive and I wanted to have minimal impact on this important production code.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ee3216187481
> // Inspect the heap for any Window objects.  This is very slow to run.

maybe i'm missing something, wouldn't it be more efficient to use observers to track new windows and then verify their collection by keeping weak refs to them?
(In reply to The 8472 from comment #35)
> maybe i'm missing something, wouldn't it be more efficient to use observers
> to track new windows and then verify their collection by keeping weak refs
> to them?

We don't fire observer messages when things are cycle collected or garbage collected.  Its possible to leak things even after the window close events and observer topics fire.
I meant only use observers to track *creation* while using weak refs to probe whether GC/CCs were successful.
(In reply to The 8472 from comment #37)
> I meant only use observers to track *creation* while using weak refs to
> probe whether GC/CCs were successful.

That's a good idea.  I'll try it out!
Is the problem that you can't use the existing window leak detector in BC tests because this is a different test suite? Or does the BC leak detector not work for this.
(In reply to Andrew McCreight [:mccr8] from comment #39)
> Is the problem that you can't use the existing window leak detector in BC
> tests because this is a different test suite? Or does the BC leak detector
> not work for this.

My impression was the existing leak detector was for shutdown leaks.  In this case the window leaks are cleaned up when the addon module is unloaded.  So I don't think these leak through shtudown.  In practice, though, people don't unload their addons very often, so these runtime leaks are still a problem.
Browser chrome tests have a leak detector that look at the ++ and --DOMWINDOW annotations to determine a set of nsGlobalWindows that are created during a test, and fail if they are not cleaned up by the end of the test (there's a bunch of CC/GC before the end).
(In reply to Andrew McCreight [:mccr8] from comment #41)
> Browser chrome tests have a leak detector that look at the ++ and
> --DOMWINDOW annotations to determine a set of nsGlobalWindows that are
> created during a test, and fail if they are not cleaned up by the end of the
> test (there's a bunch of CC/GC before the end).

Right, but I need to unload the module before the end of the test.  So I need to check for window leaks before the end of the test.  Is there a way to trigger this from within the test?

Also, are jetpack tests browser chrome mochitests?  Its not obvious to me if they are.
(In reply to Ben Kelly [:bkelly] from comment #42)
> Right, but I need to unload the module before the end of the test.  So I
> need to check for window leaks before the end of the test.  Is there a way
> to trigger this from within the test?

No.

> Also, are jetpack tests browser chrome mochitests?  Its not obvious to me if
> they are.

I don't think they are.

Ok, so it sounds like you need to implement something yourself. I just wanted to make sure you weren't reinventing the wheel.
(In reply to Andrew McCreight [:mccr8] from comment #43)
> Ok, so it sounds like you need to implement something yourself. I just
> wanted to make sure you weren't reinventing the wheel.

It would be nice if I could since it already handles things like the selfsupport window, etc.
Wouldn't it also make sense to add some addons to AWSY? If webextensions or the sdk are an API surface that can cause leaks that probably should be exercised too.
Testing for leaks in specific APIs is better done in our main testing suite. AWSY is more like a fallback for things we fail to properly test.
The test is a lot faster and simpler with suggestion from comment 37.

Also, I stole the browser-chrome test code to disable the selfsupport window.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=8a8f89c2085b
Attachment #8770958 - Attachment is obsolete: true
Attachment #8770960 - Attachment is obsolete: true
Comment on attachment 8771038 [details] [diff] [review]
P3 Add a test to detect window leaks when using sdk/window/events. r=gabor

Try looks good.  This adds a test that catches this leak.
Attachment #8771038 - Flags: review?(gkrizsanits)
Comment on attachment 8771038 [details] [diff] [review]
P3 Add a test to detect window leaks when using sdk/window/events. r=gabor

Review of attachment 8771038 [details] [diff] [review]:
-----------------------------------------------------------------

This test is awesome, thanks!

::: addon-sdk/source/test/leak/leak-utils.js
@@ +9,5 @@
> +const Startup = Cu.import("resource://gre/modules/sdk/system/Startup.js", {}).exports;
> +
> +// Adapted from the SpecialPowers.exactGC() code.  We don't have a
> +// window to operate on so we cannot use the exact same logic.  We
> +// use 6 GC iterations here as that is what is needed to clean up

I don't know about this part at all. It would be great if someone who knows more about the GC than me could take a look at this part. The rest of the patch seems great though so I can totally r+ that.
Attachment #8771038 - Flags: review?(gkrizsanits) → review+
Comment on attachment 8771038 [details] [diff] [review]
P3 Add a test to detect window leaks when using sdk/window/events. r=gabor

Review of attachment 8771038 [details] [diff] [review]:
-----------------------------------------------------------------

Andrew, can you rubber stamp my gc() function here?  Its mostly copied from the SpecialPowers one without the window specific bits.
Attachment #8771038 - Flags: feedback?(continuation)
Comment on attachment 8771038 [details] [diff] [review]
P3 Add a test to detect window leaks when using sdk/window/events. r=gabor

I filed bug 1287143 to remove the spurious window argument. I won't ask you to wait for that to land, but could you please define a local exactGC() here and call that instead however many times you need to? And then add a comment referring to bug 1287143, and I can remove your exactGC when I land that.
Attachment #8771038 - Flags: feedback?(continuation) → feedback+
I spoke with Andrew in IRC since addon-sdk jetpack tests don't seem to have access to SpecialPowers:

1:22 PM <bkelly> mccr8: I guess I also thought I didn't have access to SpecialPowers in the jetpack test
1:22 PM <mccr8> bkelly: oh yeah I don't actually know if you do or not.
1:22 PM → Coldblackice joined (anonz@moz-nu5n95.oc.cox.net)
1:23 PM <mccr8> bkelly: if there's no special powers in those tests then I guess what you have is fine
1:25 PM <bkelly> mccr8: nothing in addon-sdk uses special powers... not sure how I would bring it in
1:25 PM <mccr8> bkelly: yeah what you have is fine then
See Also: → 1287193
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d1626b1814ae
P1 Cache the most recent event in a channel weakly to avoid leaking DOM objects. r=gabor a=kwierso
https://hg.mozilla.org/integration/mozilla-inbound/rev/9c92ff47d137
P2 Avoid leaking windows via event channel closures. r=gabor a=kwierso
https://hg.mozilla.org/integration/mozilla-inbound/rev/ebc0f45e1ea5
P3 Add a test to detect window leaks when using sdk/window/events. r=gabor a=kwierso
Blocks: 1287235
https://hg.mozilla.org/mozilla-central/rev/d1626b1814ae
https://hg.mozilla.org/mozilla-central/rev/9c92ff47d137
https://hg.mozilla.org/mozilla-central/rev/ebc0f45e1ea5
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
I have been using the nightly in which this landed without restarting for the last two days. It feels a lot less laggy. Normally performance would degrade far more with uptime. No hard numbers though.
I'd like to get this in to aurora if possible.
Comment on attachment 8769405 [details] [diff] [review]
P1 Cache the most recent event in a channel weakly to avoid leaking DOM objects. r=gabor

Approval Request Comment
[Feature/regressing bug #]: Addon-sdk which is used both internally to firefox and in addons.
[User impact if declined]: Memory leaks degrading performance over time.
[Describe test coverage new/current, TreeHerder]: New jetpack test included that verifies memory leak is fixed.  Existing jetpack tests exist to protect against regressions.
[Risks and why]: There is some risk here since I have made an addon visible property a weak reference.  If there is an addon depending on input.value being held alive for a long time then this change might break them.  On the other hand, any such addon is essentially depending on a leak in the system.  I'd like to uplift to only aurora so there is still time on beta to catch these kinds of problems.
[String/UUID change made/needed]: None
Attachment #8769405 - Flags: approval-mozilla-aurora?
Comment on attachment 8769289 [details] [diff] [review]
P2 Avoid leaking windows via event channel closures. r=gabor

See comment 57.
Attachment #8769289 - Flags: approval-mozilla-aurora?
Comment on attachment 8771038 [details] [diff] [review]
P3 Add a test to detect window leaks when using sdk/window/events. r=gabor

See comment 57.
Attachment #8771038 - Flags: approval-mozilla-aurora?
I think bug 1268898 would have to go along with it. I've seen paths to gc roots through the modules patched in both bugs.
Blocks: 1288440
Comment on attachment 8769405 [details] [diff] [review]
P1 Cache the most recent event in a channel weakly to avoid leaking DOM objects. r=gabor

Improve the memory usage, taking it.
We will have time to fix potential regressions during the beta cycle.
Attachment #8769405 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8769289 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8771038 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.