Closed Bug 607601 Opened 9 years ago Closed 9 years ago

Page-mod are leaking the full document

Categories

(Add-on SDK Graveyard :: General, defect, P1)

x86_64
Windows 7
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nabor.gilgalad, Assigned: ochameau)

References

Details

(Keywords: APIchange, memory-leak, relnote)

Attachments

(2 files, 4 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.41 Safari/534.7
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.1; de; rv:1.9.2.11) Gecko/20101012 Firefox/3.6.11 AND Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b6) Gecko/20100101 Firefox/4.0b6

I wrote a small test scenario and hopefully it is correct. 
My main.js 
var pageMod = require("page-mod"); 
var counter = 0; 
pageMod.add(new pageMod.PageMod({ 
        include : ["http://www.google.de/"], 
        contentScriptURL : [packaging.getURLForData("/test.js")], 
        contentScriptWhen : "ready", 
        onAttach : function(worker, mod) { 
                worker.on("message", function(data) { 
                        if (data === "getCounter") { 
                                worker.postMessage({ 
                                        "counter" : counter++ 
                                }); 
                        } 
                }); 
        } 
})); 

My test.js 
self.on("message", function(data) { 
        var maxCounter = 1000; 
        if (data && data.counter < maxCounter) { 
                console.debug("Counter:", data.counter, "/", maxCounter); 
                // give it some time, so it may garbage collect... 
                setTimeout(function() { 
                        location.reload(); 
                }, 100); 
        } 
}); 

var skynet = { 
        init : function() { 
                var elem = document.createElement("div"); 
                document.body.appendChild(elem); 
                elem.innerHTML = "Hallo Welt"; 
                postMessage("getCounter"); 
        } 
}; 

skynet.init(); 
Try it (open www.google.de) an watch the task manager. I tried it with 
Windows 7 64Bit and Firefox 3.6.11 and Firefox 4b6 (32 Bit). 
I have seen that it starts pretty fast and than, after it reaches 200 
to 300 it get's very slow and at the end Firefox uses 600 MByte RAM 
and gives nothing back. 

Reproducible: Always

Steps to Reproduce:
See at the Details section
Actual Results:  
Firefox uses more than 600MB RAM and this value doesn't decrease anymore.
The Add-on SDK is no longer a Mozilla Labs experiment and has become a big enough project to warrant its own Bugzilla product, so the "Add-on SDK" product has been created for it, and I am moving its bugs to that product.

To filter bugmail related to this change, filter on the word "looptid".
Component: Jetpack SDK → General
Product: Mozilla Labs → Add-on SDK
QA Contact: jetpack-sdk → general
Attached patch First fix (obsolete) — Splinter Review
I confirm that there is a huuuuge leak on pagemod. Worker object is never destroyed until pagemod destroy that commonly never happen until firefox close!

So here is a patch that fix this leak. 
But I need some more time to figure out if this is the right location to do that.
Assignee: nobody → poirot.alex
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Comment on attachment 521590 [details] [diff] [review]
First fix

>+    window.addEventListener('unload', function unload(event) {
>+      if (event.target != window.document) return;
>+      worker.destroy();
>+    }, true);

This is bad, see bug 642004 for why.
Here is results with 25 reload of google.de:
# Current memory usage:
before loads: 47519340
after 25 loads: 56977490

# With the previous patch applied:
before loads: 47508256
after 25 loads: 49107354

There is still a leak but way way lighter.

I checked if patch from bug 642306 (leak with page-worker) lower memory usage for this page-mod example but unfortonately it doesn't :(
(In reply to comment #3)
> Comment on attachment 521590 [details] [diff] [review]
> First fix
> 
> >+    window.addEventListener('unload', function unload(event) {
> >+      if (event.target != window.document) return;
> >+      worker.destroy();
> >+    }, true);
> 
> This is bad, see bug 642004 for why.

Wha this is crazy! Thanks for the info!!!
Priority: -- → P1
Target Milestone: --- → 1.0
Still need some more thought.
Mainly the location of this fix:
I can't do that in worker, because (at least) Tab.attach generate a worker that stay alive across location change. If we can remove this "feature", we could destroy all workers upon document unload and be safe in any cases.
(In reply to comment #6)
> I can't do that in worker, because (at least) Tab.attach generate a worker that
> stay alive across location change. If we can remove this "feature", we could
> destroy all workers upon document unload and be safe in any cases.

Erm, that sounds like a bug, not a feature.  Page mods' lives should be the lives of the pages they modify, even if they are attached to those pages via the Tab.attach() method.
Attached patch Cleanup workers from worker.js (obsolete) — Splinter Review
Here is a new patch with unload tracking directly into worker.js,
so all kind of workers unregister automatically all internal references to the target document.
Note to ease the review:
- I split worker.destroy in two parts because traits composition is very complicated. As Panel, PageWorker and Symbiont compose with Worker, we can't remove message listener on document unload.
But in PageMod and Tab.attach cases we want an automatic full destruction, that's why I've added a detach event and watch for it in page-mod.js and tab.js.
- I've updated test-tabs as tab.attach worker is now destroyed on document unload 
- test-content-worker needed some modifications because of about:blank document that is loaded before any document on new iframes/windows, that mess with this destroy on unload ...

This patch doesn't change the leak from previous patch, but I found others sources of leaks. I'm waiting for this patch to land before working on these as I may modify same piece of code!
Attachment #521590 - Attachment is obsolete: true
Attachment #521658 - Attachment is obsolete: true
Attachment #525364 - Flags: review?(myk)
Blocks: 639626
One of these leak is described in bug 649334.
There is a pending platform leak around sanboxes, bug 646575.
Depends on: 649334, 646575
Summary: Memory Leak in Jetpack 0.9 → Page-mod are leaking the full document
Duplicate of this bug: 639626
I applied this patch locally, and I can confirm that it fixes bug 639626.
Blocks: mlk-fx5+
No longer blocks: 639626
Can someone tell me what products are affected by this leak?  Eg. are multiple add-ons affected?
Blocks: 639626
(In reply to comment #12)
> Can someone tell me what products are affected by this leak?  Eg. are multiple
> add-ons affected?

Yes.  Any add-ons built with older versions of the SDK which use the page-mods API are affected.  Also, see bug 642306 which affects add-ons using the page-worker API in a similar way.
I think you might want to use the notifications added in bug 484710 instead of listening for unload on the <browser>, perhaps in a follow-up bug.

Otherwise, depending on how unload event on the <browser> element works, it could happen that the worker is destroyed when the page is going to the bfcache, and will not work properly when the page is restored from the bfcache.
Attachment #525364 - Flags: review?(myk)
Thanks Nickolay for this tip! Bfcache is really not easy to play with.
At the end it's easier to use these event than searching for the parent <browser>.

I've fixed two additional things with this new patch :
- use `once` instead of `on` in page-mod.js and tab.js (remove a leak)
- fix annotator example by using worker's detach event instead of breaking bfcache by using unload events

Myk: I didn't gave this patch to review to Irakli because of the API change of worker (detach event), but again, I'm still new to review system. So do not hesitate to tell me if I'm wrong!
Attachment #525364 - Attachment is obsolete: true
Attachment #525708 - Flags: review?(myk)
Blocks: 649615
Comment on attachment 525708 [details] [diff] [review]
Same patch but using inner-window-destroyed instead of unload.

(In reply to comment #15)
> Myk: I didn't gave this patch to review to Irakli because of the API change of
> worker (detach event), but again, I'm still new to review system. So do not
> hesitate to tell me if I'm wrong!

You can ask me for API review and then ask someone else for code review, but you can also ask me for both at the same time, as you did here, so no worries, this request is fine!


>diff --git a/packages/addon-kit/tests/test-tabs.js b/packages/addon-kit/tests/test-tabs.js

>+    function checkEnd() {
>+      if (detachEventCount!=2) return;

Nit: spaces around "!=", and block on next line, i.e.:

      if (detachEventCount!=2)
        return;


>diff --git a/packages/api-utils/lib/content/worker.js b/packages/api-utils/lib/content/worker.js

>     if ('onError' in options)
>-        this.on('error', options.onError);
>+      this.on('error', options.onError);
>     if ('onMessage' in options)
>-        this.on('message', options.onMessage);
>+      this.on('message', options.onMessage);

Nice catch!


>-    WorkerGlobalScope(this); // will set this._port pointing to the private API
>+    WorkerGlobalScope(this); // will set this._port pointing to the private API

Nit: this line and some others introduce trailing whitespace unnecessarily.


>+    // Track for document unload to destroy this worker.
>+    // We can't watch for unload event on website's window object as it 
>+    // prevents bfcache to work: 

Nit: Track for document unload -> Track document unload
Nit: website -> page
Nit: to work -> from working


>+    // https://developer.mozilla.org/En/Working_with_BFCache
>+    this._windowID = this._window.
>+      QueryInterface(Ci.nsIInterfaceRequestor).
>+      getInterface(Ci.nsIDOMWindowUtils).
>+      currentInnerWindowID;

Nit: for readability, indent such multi-line references like this:

    this._windowID = this._window.
                     QueryInterface(Ci.nsIInterfaceRequestor).
                     getInterface(Ci.nsIDOMWindowUtils).
                     currentInnerWindowID;


>+    observers.add("inner-window-destroyed", 
>+      this._documentUnload = this._documentUnload.bind(this));

Nit: for readability, indent such multi-line argument lists like this:

    observers.add("inner-window-destroyed", 
                  this._documentUnload = this._documentUnload.bind(this));


Or, in cases (unlike this one) where you need horizontal space, like this:

    observers.add(
      "inner-window-destroyed", 
      this._documentUnload = this._documentUnload.bind(this)
    );


>-   * Tells _port to unload itself and removes all the references from itself.
>+   * Destroy completely the worker: inside and ousite references

Nit: ousite -> outside
Or even better: inside and ousite -> internal and external


>+    if (!this._port) return;

Nit:

  if (!this._port)
    return;


>diff --git a/packages/api-utils/tests/test-content-worker.js b/packages/api-utils/tests/test-content-worker.js

>+  test.assertEqual(window.document.location.href, "about:blank", 
>+    "window starts by loading about:blank");
...
>+    test.assertNotEqual(window.document.location.href, "about:blank", 
>+      "window is now on the right document");
...
>+        test.assertEqual(worker.url, window.document.location.href, 
>+          "worker.url still works");
...
>+    test.assertEqual(worker.url, window.document.location.href, 
>+      "worker.url works");

Previous indentation nits apply here too.


Other than the nits, the only substantive issue is that the "attach" event fires on PageMod objects, but the "detach" event fires on Worker objects, which seems inconsistent.  I can't think of a better solution, though, so r=myk.
Attachment #525708 - Flags: review?(myk) → review+
Here is the updated patch with review comments addressed, that landed here:
https://github.com/mozilla/addon-sdk/commit/b2f8d12ec746d9cd3018a1888f85fe20165c183d

Thanks for the review, it is a big win for the SDK: huge leak killed and detach event was expected by multiple developers!
Attachment #525708 - Attachment is obsolete: true
Blocks: 651102
Alexandre, this bug is marked as depending on bug 646575 and bug 649334, but it looks to me that they are related but not actually blocking this bug.  Should they be changed to block bug mlk-fx5+?  (We could create a tracking bug for all add-on SDK leaks, but that doesn't seem necessary, mlk-fx5+ seems good enough.)

Also, I know almost nothing about the add-on SDK... is it hosted on github?  Ie. is that the final landing?  What's the critiria for marking this bug as resolved?

Also, will this change make it into add-ons in time for Firefox 5?

Sorry for all the questions.  And great work, thank you!
The big leak that lead to this bug creation has been fixed with this patch.
It's going to be included in the next SDK release, ie 1.0b5 that is going to be released on May 4.
Then, there is others kind of leak (bug 646575 and bug 649334), that end up cause some leaks in page-mod and others API. I'm new to bugzilla and adding them as dependent was a way to tell: "hey there is others king of leaks that mess with page-mod".

Addon-sdk is hosted on github and the SDK release and code are not heavily linked to Firefox ones. As the SDK doesn't contain any code in Firefox codebase, we are releasing on our schedule. But, our goal is to ship the 1.0 version in the same time than Firefox5 to gain some visibility.
Thanks for the answers, Alexandre.  W.r.t. bug dependencies, usually you mark bug B as depending on bug A if bug B can't be fixed until bug A has been fixed.  That isn't the case here, so I'll change the other bugs' dependencies.  If two bugs are related but not dependent you can mention that in the bug comments.
No longer depends on: 646575
No longer depends on: 649334
(In reply to comment #14)
> I think you might want to use the notifications added in bug 484710 instead of
> listening for unload on the <browser>, perhaps in a follow-up bug.
> 
> Otherwise, depending on how unload event on the <browser> element works, it
> could happen that the worker is destroyed when the page is going to the
> bfcache, and will not work properly when the page is restored from the bfcache.
There's nothing special about the <browser> element, except that adding an unload event listener doesn't affect whether a page can go into bfcache. The reason you need those other notifications is that they're the only way to find out if a page ever unloads from within bfcache.
Right. I guessed that (but was not sure if) 'unload' fires on the <browser> when a page goes into the bfcache. If that's the case, destroying the page-mod worker on unload would be premature.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
We need to document there change, as the API has a new event and doesn't behave the same, more details in bug 651102.
Keywords: APIchange
Keywords: relnote
Keywords: mlk
No longer blocks: mlk-fx5+
You need to log in before you can comment on or make changes to this bug.