Closed Bug 751420 Opened 12 years ago Closed 12 years ago

Wallflower addon causes every website visited to be retained as a ghost window

Categories

(Core :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jdm, Unassigned)

References

Details

(Keywords: regression, Whiteboard: [MemShrink])

Attachments

(1 file)

STR:
1. install Wallflower
2. open a tab to some unique page
3. close the tab
4. open about:compartments
5. notice the page is now in the ghost windows table
6. disable Wallflower
7. refresh about:compartments, notice the ghost window table is now empty

This behaviour is rock-solid reproducible for me, and on an instance of FF that I had running for about four days, I was seeing:
* memory usage completely out of proportion to the numbers I was used to
* about:memory taking multiple seconds to load (indicating many objects to report upon)
* about:compartments displaying a huge list of zombie compartments and ghost windows (potentially every page I had visited since starting FF four days before)

After disabling Wallflower, I'm seeing a 4x memory reduction, the ghost window table is empty, and FF feels significantly snappier.
And for the record, this is visible on today's nightly for me.
Here's the source code for the add-on:

https://github.com/autonome/Wallflower/blob/master/lib/main.js

Likely a leak in the SDK code somewhere. I'll package up and attach a version of the add-on that uses the latest version of the SDK for you to test with.
Attached file add-on with latest sdk
Can you see if the leak still exists w/ this version of the add-on?
No leaks!
It's disturbing to think how many add-ons are still using old, leaky versions of the Add-on SDK.  Is there any way to encourage add-ons authors to rebuild with the latest, leak-free SDK?

It's also disturbing that this manifested in a Nightly that had the patch from bug 695480 in it.  Any ideas why?
Uploaded new version to AMO. At #164 in the queue, so might be a while before update is pushed out.
This is a regression!  I tested with a just-before-695480 version, and got no zombie compartments.  Then I tested with a day-or-two-after-695480 version, and got heaps of zombie compartments.  This is with Wallflower version 1, from https://addons.mozilla.org/en-US/firefox/addon/wallflower-1/.
Keywords: regression
And in the with-695480 version, I got heaps of these along the way:

error: An exception occurred.
Traceback (most recent call last):
  File "resource://jid1-ub4sjepvr2m4qq-at-jetpack-api-utils-lib/observer-service.js", line 176, in 
    this.callback(subject, data);
  File "resource://jid1-ub4sjepvr2m4qq-at-jetpack-api-utils-lib/content/worker.js", line 525, in _documentUnloa
    this._workerCleanup();
  File "resource://jid1-ub4sjepvr2m4qq-at-jetpack-api-utils-lib/content/worker.js", line 556, in _workerCleanu
    this._contentWorker._destructor();
  File "resource://jid1-ub4sjepvr2m4qq-at-jetpack-api-utils-lib/content/worker.js", line 323, in _destructo
    delete sandbox.__proto__;
  File "resource://jid1-ub4sjepvr2m4qq-at-jetpack-api-utils-lib/content/content-proxy.js", line 647, in 
    return name in obj || name in overload || name == "__isWrappedProxy";
TypeError: can't access dead object

So maybe there's some kind of bad interaction between bug 695480 and the old SDK version?
We think it's regressed *by* bug 695480?  That would be even more of a reason to get all add-ons on the old SDK updated ASAP (bug 751466).
I just retested with revision cc5254f9825f, which is the one where bug 695480 was added, and the zombie compartments are present.  So yes, bug 695480 did cause this :(
Blocks: hueyfix
So I guess the exception we're throwing is preventing some cleanup code from running.
So, the reason 695480 causes this is indeed what I stated in comment 11.

The reason that 695480 doesn't clean this up is because the leak looks something like:

Array of sandbox objects jetpack keeps -> Sandbox JS Object -> Content window nsISupports* (via private slot) -> Content window JS Object.

Since the cross-compartment reference here is in C++ the magic in 695480 can't touch it.

Jetpack appears to run cleanup code that removes the sandbox from the array and allows it to be GCd, which ultimately allows the content objects to be collected.
I'm about to get all "dead object" exception removed on SDK master. It always is the same, it ended up breaking cleanup code. So instead of fixing a leak it create some :(
In all these cases, I don't think we were leaking anything; we were just calling destroy/cleanup/unregister/remove method in wrong order. i.e some of these ended up being called after inner-window-destroyed/outer-window-destroyed.

Here is the tracking bug 749526 for "dead object" exception happening on the SDK.
I've approved the update.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Alexandre, can you give us an idea of what kinds of things an add-on would need to do in order to trigger this fail?
> Since the cross-compartment reference here is in C++ the magic in 695480 can't touch it.

Can we explicitly fix this by keeping track of the sandboxes for a window on the window and nuking their links to the window when the window goes away?
(In reply to Boris Zbarsky (:bz) from comment #16)
> Can we explicitly fix this by keeping track of the sandboxes for a window on
> the window and nuking their links to the window when the window goes away?

More or less.  It's just code I have to find time to write.
Can Bobby or I help?
(In reply to Justin Lebar [:jlebar] from comment #15)
> Alexandre, can you give us an idea of what kinds of things an add-on would
> need to do in order to trigger this fail?

Affected addons:
This particular exception will be thrown with SDK addons versions up to 1.3 (included). I removed this unnecessary cleanup code (delete sandbox.__proto__) through bug 679363.
All addons using Page-mod API or tab.attach() method are concerned.

When/What are we leaking:
I don't know why but bug 695480 doesn't prevent leaking the content document.
So that all content documents matched by the addon page-mod or tab.attach() method are going to be leaked. It is *really* bad. Note that everything is going to be released on addon disabling/uninstall.
(page-mod have an `include` attribute that aims to define on which websites you want to inject a content script. Some addons are using `*`, aka "run on all websites".)

Code details:
The exception comes from this file/line:
https://github.com/mozilla/addon-sdk/blob/584a5528839fbb74a08962ca5533e9f7d94ae139/packages/api-utils/lib/content/worker.js#L275
  delete sandbox.__proto__;
sandbox has its prototype set to a JS proxy to the content window. In proxy code we try to access to "__proto__" attribute of a dead window.
This code is executed on inner-window-destroyed event, so I'm wondering if it is processed just after C++ code from bug 695480?
Can't we just delay this C++ nuke code "some time after" so that all such cleanup code can be successfully executed? It sounds like a simple solution and would allow keeping simple cleanup code!
(In reply to Alexandre Poirot (:ochameau) from comment #19)
> Code details:
> The exception comes from this file/line:
> https://github.com/mozilla/addon-sdk/blob/
> 584a5528839fbb74a08962ca5533e9f7d94ae139/packages/api-utils/lib/content/
> worker.js#L275
>   delete sandbox.__proto__;
> sandbox has its prototype set to a JS proxy to the content window. In proxy
> code we try to access to "__proto__" attribute of a dead window.
> This code is executed on inner-window-destroyed event, so I'm wondering if
> it is processed just after C++ code from bug 695480?

The code from bug 695480 runs just *after* inner-window-destroyed.  Based on that, I bet this is the same underlying problem as bug 751621.
Depends on: 751621
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #20)
> The code from bug 695480 runs just *after* inner-window-destroyed.

Actually, it doesn't, because those observers get run on an event :-/
Should a follow-up bug be filed based on comment 18 and comment 21?
You need to log in before you can comment on or make changes to this bug.