Last Comment Bug 751420 - Wallflower addon causes every website visited to be retained as a ghost window
: Wallflower addon causes every website visited to be retained as a ghost window
Status: RESOLVED FIXED
[MemShrink]
: regression
Product: Core
Classification: Components
Component: General (show other bugs)
: Trunk
: x86 Mac OS X
: -- normal (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
:
Mentors:
Depends on: 751621
Blocks: hueyfix 749526 752877
  Show dependency treegraph
 
Reported: 2012-05-02 16:45 PDT by Josh Matthews [:jdm] (on vacation until Dec 5)
Modified: 2012-05-11 13:20 PDT (History)
13 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
add-on with latest sdk (125.85 KB, application/x-xpinstall)
2012-05-02 17:07 PDT, Dietrich Ayala (:dietrich)
no flags Details

Description Josh Matthews [:jdm] (on vacation until Dec 5) 2012-05-02 16:45:11 PDT
STR:
1. install Wallflower
2. open a tab to some unique page
3. close the tab
4. open about:compartments
5. notice the page is now in the ghost windows table
6. disable Wallflower
7. refresh about:compartments, notice the ghost window table is now empty

This behaviour is rock-solid reproducible for me, and on an instance of FF that I had running for about four days, I was seeing:
* memory usage completely out of proportion to the numbers I was used to
* about:memory taking multiple seconds to load (indicating many objects to report upon)
* about:compartments displaying a huge list of zombie compartments and ghost windows (potentially every page I had visited since starting FF four days before)

After disabling Wallflower, I'm seeing a 4x memory reduction, the ghost window table is empty, and FF feels significantly snappier.
Comment 1 Josh Matthews [:jdm] (on vacation until Dec 5) 2012-05-02 16:46:03 PDT
And for the record, this is visible on today's nightly for me.
Comment 2 Dietrich Ayala (:dietrich) 2012-05-02 17:02:46 PDT
Here's the source code for the add-on:

https://github.com/autonome/Wallflower/blob/master/lib/main.js

Likely a leak in the SDK code somewhere. I'll package up and attach a version of the add-on that uses the latest version of the SDK for you to test with.
Comment 3 Dietrich Ayala (:dietrich) 2012-05-02 17:07:16 PDT
Created attachment 620528 [details]
add-on with latest sdk

Can you see if the leak still exists w/ this version of the add-on?
Comment 4 Josh Matthews [:jdm] (on vacation until Dec 5) 2012-05-02 17:15:06 PDT
No leaks!
Comment 5 Nicholas Nethercote [:njn] 2012-05-02 17:42:35 PDT
It's disturbing to think how many add-ons are still using old, leaky versions of the Add-on SDK.  Is there any way to encourage add-ons authors to rebuild with the latest, leak-free SDK?

It's also disturbing that this manifested in a Nightly that had the patch from bug 695480 in it.  Any ideas why?
Comment 6 Dietrich Ayala (:dietrich) 2012-05-02 17:53:18 PDT
Uploaded new version to AMO. At #164 in the queue, so might be a while before update is pushed out.
Comment 7 Nicholas Nethercote [:njn] 2012-05-02 23:11:23 PDT
This is a regression!  I tested with a just-before-695480 version, and got no zombie compartments.  Then I tested with a day-or-two-after-695480 version, and got heaps of zombie compartments.  This is with Wallflower version 1, from https://addons.mozilla.org/en-US/firefox/addon/wallflower-1/.
Comment 8 Nicholas Nethercote [:njn] 2012-05-02 23:14:46 PDT
And in the with-695480 version, I got heaps of these along the way:

error: An exception occurred.
Traceback (most recent call last):
  File "resource://jid1-ub4sjepvr2m4qq-at-jetpack-api-utils-lib/observer-service.js", line 176, in 
    this.callback(subject, data);
  File "resource://jid1-ub4sjepvr2m4qq-at-jetpack-api-utils-lib/content/worker.js", line 525, in _documentUnloa
    this._workerCleanup();
  File "resource://jid1-ub4sjepvr2m4qq-at-jetpack-api-utils-lib/content/worker.js", line 556, in _workerCleanu
    this._contentWorker._destructor();
  File "resource://jid1-ub4sjepvr2m4qq-at-jetpack-api-utils-lib/content/worker.js", line 323, in _destructo
    delete sandbox.__proto__;
  File "resource://jid1-ub4sjepvr2m4qq-at-jetpack-api-utils-lib/content/content-proxy.js", line 647, in 
    return name in obj || name in overload || name == "__isWrappedProxy";
TypeError: can't access dead object

So maybe there's some kind of bad interaction between bug 695480 and the old SDK version?
Comment 9 Justin Lebar (not reading bugmail) 2012-05-02 23:16:13 PDT
We think it's regressed *by* bug 695480?  That would be even more of a reason to get all add-ons on the old SDK updated ASAP (bug 751466).
Comment 10 Nicholas Nethercote [:njn] 2012-05-02 23:42:23 PDT
I just retested with revision cc5254f9825f, which is the one where bug 695480 was added, and the zombie compartments are present.  So yes, bug 695480 did cause this :(
Comment 11 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2012-05-03 06:46:11 PDT
So I guess the exception we're throwing is preventing some cleanup code from running.
Comment 12 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2012-05-03 08:14:31 PDT
So, the reason 695480 causes this is indeed what I stated in comment 11.

The reason that 695480 doesn't clean this up is because the leak looks something like:

Array of sandbox objects jetpack keeps -> Sandbox JS Object -> Content window nsISupports* (via private slot) -> Content window JS Object.

Since the cross-compartment reference here is in C++ the magic in 695480 can't touch it.

Jetpack appears to run cleanup code that removes the sandbox from the array and allows it to be GCd, which ultimately allows the content objects to be collected.
Comment 13 Alexandre Poirot [:ochameau] 2012-05-03 08:31:21 PDT
I'm about to get all "dead object" exception removed on SDK master. It always is the same, it ended up breaking cleanup code. So instead of fixing a leak it create some :(
In all these cases, I don't think we were leaking anything; we were just calling destroy/cleanup/unregister/remove method in wrong order. i.e some of these ended up being called after inner-window-destroyed/outer-window-destroyed.

Here is the tracking bug 749526 for "dead object" exception happening on the SDK.
Comment 14 Jorge Villalobos [:jorgev] 2012-05-03 09:17:45 PDT
I've approved the update.
Comment 15 Justin Lebar (not reading bugmail) 2012-05-03 15:43:14 PDT
Alexandre, can you give us an idea of what kinds of things an add-on would need to do in order to trigger this fail?
Comment 16 Boris Zbarsky [:bz] (still a bit busy) 2012-05-06 22:18:02 PDT
> Since the cross-compartment reference here is in C++ the magic in 695480 can't touch it.

Can we explicitly fix this by keeping track of the sandboxes for a window on the window and nuking their links to the window when the window goes away?
Comment 17 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2012-05-06 22:20:37 PDT
(In reply to Boris Zbarsky (:bz) from comment #16)
> Can we explicitly fix this by keeping track of the sandboxes for a window on
> the window and nuking their links to the window when the window goes away?

More or less.  It's just code I have to find time to write.
Comment 18 Boris Zbarsky [:bz] (still a bit busy) 2012-05-06 23:03:11 PDT
Can Bobby or I help?
Comment 19 Alexandre Poirot [:ochameau] 2012-05-07 06:29:36 PDT
(In reply to Justin Lebar [:jlebar] from comment #15)
> Alexandre, can you give us an idea of what kinds of things an add-on would
> need to do in order to trigger this fail?

Affected addons:
This particular exception will be thrown with SDK addons versions up to 1.3 (included). I removed this unnecessary cleanup code (delete sandbox.__proto__) through bug 679363.
All addons using Page-mod API or tab.attach() method are concerned.

When/What are we leaking:
I don't know why but bug 695480 doesn't prevent leaking the content document.
So that all content documents matched by the addon page-mod or tab.attach() method are going to be leaked. It is *really* bad. Note that everything is going to be released on addon disabling/uninstall.
(page-mod have an `include` attribute that aims to define on which websites you want to inject a content script. Some addons are using `*`, aka "run on all websites".)

Code details:
The exception comes from this file/line:
https://github.com/mozilla/addon-sdk/blob/584a5528839fbb74a08962ca5533e9f7d94ae139/packages/api-utils/lib/content/worker.js#L275
  delete sandbox.__proto__;
sandbox has its prototype set to a JS proxy to the content window. In proxy code we try to access to "__proto__" attribute of a dead window.
This code is executed on inner-window-destroyed event, so I'm wondering if it is processed just after C++ code from bug 695480?
Can't we just delay this C++ nuke code "some time after" so that all such cleanup code can be successfully executed? It sounds like a simple solution and would allow keeping simple cleanup code!
Comment 20 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2012-05-07 07:09:13 PDT
(In reply to Alexandre Poirot (:ochameau) from comment #19)
> Code details:
> The exception comes from this file/line:
> https://github.com/mozilla/addon-sdk/blob/
> 584a5528839fbb74a08962ca5533e9f7d94ae139/packages/api-utils/lib/content/
> worker.js#L275
>   delete sandbox.__proto__;
> sandbox has its prototype set to a JS proxy to the content window. In proxy
> code we try to access to "__proto__" attribute of a dead window.
> This code is executed on inner-window-destroyed event, so I'm wondering if
> it is processed just after C++ code from bug 695480?

The code from bug 695480 runs just *after* inner-window-destroyed.  Based on that, I bet this is the same underlying problem as bug 751621.
Comment 21 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2012-05-07 14:21:22 PDT
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #20)
> The code from bug 695480 runs just *after* inner-window-destroyed.

Actually, it doesn't, because those observers get run on an event :-/
Comment 22 Manoj 2012-05-07 21:30:37 PDT
Should a follow-up bug be filed based on comment 18 and comment 21?
Comment 23 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2012-05-07 21:31:16 PDT
Yeah I'll do that tomorrow.

Note You need to log in before you can comment on or make changes to this bug.