Closed Bug 642306 Opened 9 years ago Closed 9 years ago

page-worker.Page leaks like crazy

Categories

(Add-on SDK Graveyard :: General, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jwatt, Assigned: ochameau)

References

()

Details

(Keywords: memory-leak)

Attachments

(2 files, 3 obsolete files)

I'm trying to write an add-on that can run an analysis script over a load of webpages and generate summary reports. Unfortunately it seems that page-worker.Page leaks like crazy, so I run out of memory after about 2000 pages. Yes, I'm calling destroy() on each Page instance, and there's only one Page instance alive at any given time.

I've put up a reduced test add-on at http://hg.jwatt.org/page-worker-leak/
Which other than an icon only contains:

http://hg.jwatt.org/page-worker-leak/raw-file/tip/lib/main.js
I've updated the add-on to use a data:// URI instead of a separate icon file, so the only non-generated content is now just main.js.

I've also added a boolean config variable 'reuseSamePageWorkerPage' to main.js to allow testing of both (a) using a new page-worker.Page for each page load, and (b) reusing the same page-worker.Page object for each page load.
With reuseSamePageWorkerPage=false I can load pages until I run out or RAM, which doesn't take too long. I run out of RAM after approximately 3500 page loads, at which point FF is using 2.5 GB of "Real Mem".

With reuseSamePageWorkerPage=true I can load a lot more pages and churn through RAM a lot more slowely, but when FF gets to only 0.6 GB "Real Mem" or so (after ~5500 page loads) I get NS_ERROR_OUT_OF_MEMORY errors even though there are a couple of GB of RAM still available. More specifically I get the error:

error: An exception occurred.
Traceback (most recent call last):
  File "resource://jid0-r5rj13fz1ughflxgwp9cwfttvfe-api-utils-lib/observer-service.js", line 174, in null
    this.callback(subject, data);
  File "resource://jid0-r5rj13fz1ughflxgwp9cwfttvfe-api-utils-lib/content/symbiont.js", line 169, in _onStart
    this._onInit();
  File "resource://jid0-r5rj13fz1ughflxgwp9cwfttvfe-api-utils-lib/content/worker.js", line 339, in Worker
    WorkerGlobalScope(this); // will set this._port pointing to the private API
  File "resource://jid0-r5rj13fz1ughflxgwp9cwfttvfe-api-utils-lib/traits.js", line 142, in Trait
    return self.constructor.apply(self, arguments) || self._public;
  File "resource://jid0-r5rj13fz1ughflxgwp9cwfttvfe-api-utils-lib/content/worker.js", line 178, in WorkerGlobalScope
    Cu.evalInSandbox(es5code.contents, sandbox, "1.8", es5code.filename);
[Exception... "Component returned failure code: 0x8007000e (NS_ERROR_OUT_OF_MEMORY) [nsIXPCComponents_Utils.evalInSandbox]"  nsresult: "0x8007000e (NS_ERROR_OUT_OF_MEMORY)"  location: "JS frame :: resource://jid0-r5rj13fz1ughflxgwp9cwfttvfe-api-utils-lib/securable-module.js -> resource://jid0-r5rj13fz1ughflxgwp9cwfttvfe-api-utils-lib/content/worker.js :: WorkerGlobalScope :: line 178"  data: no]

multiple times, followed by some other errors, then the following error repeatedly until I kill FF:

error: An exception occurred.
Traceback (most recent call last):
  File "resource://jid0-r5rj13fz1ughflxgwp9cwfttvfe-api-utils-lib/observer-service.js", line 174, in 
    this.callback(subject, data);
  File "resource://jid0-r5rj13fz1ughflxgwp9cwfttvfe-api-utils-lib/content/symbiont.js", line 165, in _onStart
    let window = HAS_DOCUMENT_ELEMENT_INSERTED ? domObj.defaultView : domObj;
InternalError: too much recursion
Alex: you recently tackled a memory leak in the widgets module.  Is this something you might be able to look into?
Assignee: nobody → poirot.alex
Attached patch Fix symbiont and hidden-frame (obsolete) — Splinter Review
There is two major leaks:
1/ The biggest one is in symbiont, that create an hidden frame without removing it
2/ We do not remove custom xul hostFrame in hidden-frame module until module unload, we could remove it when we remove the last hidden frame.
Attachment #521174 - Flags: review?(myk)
FYI, here is statistics running this test:
Without the fix:
 Before running test: 40689240
 After running test:  69442404
With the fix:
 Before running test: 40593028
 After running test:  43298368
Oops I mixed the statistics/leak tests.
Previous statistics was for this test.

Now, here is statistics running the hidden frame test:
Without the fix:
 Before running test: 36697964
 After running test:  37527320 (+829356)
With the fix:
 Before running test: 36598728
 After running test:  37250628 (+651900)
So we are able to free 177456 extra memory with it.
Nice! :)

What needs to happen before I can have people use my add-on without being bitten by these leaks? Do I just need to rebuild the add-on with a version of the Add-on SDK with this fix, or does Firefox need to be update? Or both?
Yes, just rebuild you addon with a fixed SDK, nothing depends on internal Firefox source code!
So if you can take the last version. Today, the 1.0b4rc3 version, from here:
https://ftp.mozilla.org/pub/mozilla.org/labs/jetpack/addon-sdk-1.0b4rc3.zip
Apply the patch I provided here and rebuild a new XPI!

Finally keep us in touch to confirm that everything is now working fine :)
Glad to hear Firefox doesn't need an update! :)

So I took 1.0b4rc3 and applied the patch, and tested with the test add-on linked from this bug. In summary:

reuseSamePageWorkerPage = false
  The leak is vastly reduced, but I'm still seeing a leak of about 18 KB
  per page worker instantiated and destroyed to load about:blank.

reuseSamePageWorkerPage = true
  I can only load about:blank into the page worker about 2700 times
  before I get the following error and the page worker stops working:

error: An exception occurred.
Traceback (most recent call last):
  File "resource://jid0-r5rj13fz1ughflxgwp9cwfttvfe-api-utils-lib/observer-service.js", line 174, in null
    this.callback(subject, data);
  File "resource://jid0-r5rj13fz1ughflxgwp9cwfttvfe-api-utils-lib/content/symbiont.js", line 182, in _onStart
    this._onInit();
  File "resource://jid0-r5rj13fz1ughflxgwp9cwfttvfe-api-utils-lib/content/worker.js", line 339, in Worker
    WorkerGlobalScope(this); // will set this._port pointing to the private API
  File "resource://jid0-r5rj13fz1ughflxgwp9cwfttvfe-api-utils-lib/traits.js", line 142, in Trait
    return self.constructor.apply(self, arguments) || self._public;
  File "resource://jid0-r5rj13fz1ughflxgwp9cwfttvfe-api-utils-lib/content/worker.js", line 178, in WorkerGlobalScope
    Cu.evalInSandbox(es5code.contents, sandbox, "1.8", es5code.filename);
[Exception... "Component returned failure code: 0x8007000e (NS_ERROR_OUT_OF_MEMORY) [nsIXPCComponents_Utils.evalInSandbox]"  nsresult: "0x8007000e (NS_ERROR_OUT_OF_MEMORY)"  location: "JS frame :: resource://jid0-r5rj13fz1ughflxgwp9cwfttvfe-api-utils-lib/securable-module.js -> resource://jid0-r5rj13fz1ughflxgwp9cwfttvfe-api-utils-lib/content/worker.js :: WorkerGlobalScope :: line 178"  data: no]
For the |reuseSamePageWorkerPage = true| case with the NS_ERROR_OUT_OF_MEMORY error, I should note that there are a couple of GB of RAM free when that happens.
Comment on attachment 521174 [details] [diff] [review]
Fix symbiont and hidden-frame

>+    // We need to remove observers from current iframe,
>+    // if we call initFrame with another one or 
>+    // if we call initFrame multiple times like page-worker
>+    if (this._frame)
>+      this._cleanUpFrame();

Erm, I'm not quite sure what this comment is saying.  Is it saying that initFrame has to remove observers before adding new ones, whether it is being called with a new frame or an existing one?  If so, perhaps the following wording would be a bit clearer:

  We need to remove observers from the current iframe
  before adding new ones to avoid leaking the iframe,
  whether initFrame is being called with a new iframe
  or an existing one (which happens when page-worker
  calls initFrame multiple times with the same iframe).


>+  if (cache.length == 0 && hostFrame && hostFrame !== hiddenWindow) {
>+    hiddenWindow.document.body.removeChild(hostFrame);
>+    hostFrame = null;
>+    hostDocument = null;
>+    isHostFrameReady = false;
>+  }

Won't this result in a lot of unnecessary host frame creation and deletion if addon code creates and destroys a number of pages in sequence?  It seems like it would be better to leave the host frame around, once created, until the hidden-frame module itself is unloaded.


Otherwise this looks good, although it doesn't completely solve the problems Jonathan is experiencing.  It's possible that there is an underlying platform issue, as I note other references to such errors around the web:

  http://superuser.com/questions/104907/firefox-temporarily-freezing-on-most-web-page-load-ns-error-out-of-memory
  http://userscripts.org/topics/50343


In any case, a patch with the review comments above addressed will get r=myk, but let's leave this bug open pending resolution of the additional problems.
Attachment #521174 - Flags: review?(myk) → review-
Blocks: 639626
I've remove the hostframe removal and changed the comment.
landed:
https://github.com/mozilla/addon-sdk/commit/6c48eba07fe537439cadc63ef5c2b39b7d0eac8a
Attachment #521174 - Attachment is obsolete: true
(In reply to comment #13)
> Created attachment 522868 [details] [diff] [review]
> Fix hidden frame leak in symbiont, commited version
> 
> I've remove the hostframe removal and changed the comment.
> landed:
> https://github.com/mozilla/addon-sdk/commit/6c48eba07fe537439cadc63ef5c2b39b7d0eac8a

FWIW, this poatch has not improved bug 639626 at all.
OS: Mac OS X → All
Priority: -- → P1
Hardware: x86 → All
Target Milestone: --- → 1.0
I tested repulled my addon-sdk git repo from master and built another version of Bugzilla Tweaks, and I can confirm that bug 639626 has *not* been fixed by this.  It can be reproduced extremely easily by the steps to reproduce in bug 639626 comment 0.
Ehsan: Thanks for this new test! In fact I've mixed this bug and bug 607601 the later one may fix bugzilla tweaks or at least dratiscally lower the memory leak.
The current bug is more about page-worker leaks than page-mod ones.
Here is one another quite big leak that is spread over almost all jetpack code.
We are used to register destructor in unload module, but by doing so we are leaking all objects because we never unregister them when we call destructor manually.

First I'm adding unload.unregister method and I'm calling this method in worker and symbiont. I'm going to fill another bug to apply this on all our codebase!

This patch allow to save 1M of memory on the attached unit test when we instanciate a new page-worker at each loop!
Attachment #522868 - Attachment is obsolete: true
Attachment #525377 - Flags: review?(myk)
(In reply to comment #17)
> Created attachment 525377 [details] [diff] [review]
> Unregister destructor from unload module
> 
> Here is one another quite big leak that is spread over almost all jetpack code.
> We are used to register destructor in unload module, but by doing so we are
> leaking all objects because we never unregister them when we call destructor
> manually.

Isn't this what unload.ensure() is for?
See Atul's (year-old) comments here about when to use unload.when() vs. ensure(): bug 547417 comment 19 and 21
about unload.ensure, see my response in bug 649334
No longer blocks: 639626
Attachment #525377 - Flags: review?(myk)
I submited a new patch in bug 649334 to fix this particular unload problem, that exists in multiple APIs.
Attachment #525377 - Attachment is obsolete: true
Alexandre, should this bug now be marked as a duplicate of bug 649334?
No, because there may be others leaks left around page-worker.
Jonathan: Can you test again with current master of the SDK ?
As bug 649334 just landed, another quite important leak is fixed!

I've runned your script and it seems to be working fine, Firefox appears to get back to the starting memory use value, or at least get very close to it.
I cloned the git repo and did 'bin/activate', but it doesn't seem to work properly since 'cfx' isn't recognized.
oops - didn't source it
Yeah, so for |reuseSamePageWorkerPage = false| this appears to be fixed. I'm seeing:

 Pages  Real Mem
-----------------
  1187  553.8
  5065  592.7
 10236  601.9
 15026  585.9
 21131  595.1
 25146  591.0
 30076  600.9
 35111  588.8
 40147  610.7
 45041  588.2
 50088  591.7
 56073  596.2
 60051  591.1
 65017  591.3
 70140  613.1
 75527  588.6
 80015  614.0
 90083  603.9
102141  609.6

|reuseSamePageWorkerPage = true| still has the problem mentioned in comment 10, but that's probably better addressed as a separate bug.
You can't load more than ~3000 workers before it throws?
You get the same exception ? I got another one that stop the test but I still had memory available. An exception about weak references.

Yes, we could open a seperate bug, dedicated to leak on page worker reuse.
I get to 2630 loads of about:blank?<incremented> in the same Page, and then it stops with the following output in the console. (I have tons of RAM left.)

error: An exception occurred.
Traceback (most recent call last):
  File "resource://jid0-r5rj13fz1ughflxgwp9cwfttvfe-api-utils-lib/observer-service.js", line 174, in null
    this.callback(subject, data);
  File "resource://jid0-r5rj13fz1ughflxgwp9cwfttvfe-api-utils-lib/content/symbiont.js", line 191, in _onStart
    this._onInit();
  File "resource://jid0-r5rj13fz1ughflxgwp9cwfttvfe-api-utils-lib/content/worker.js", line 340, in Worker
    WorkerGlobalScope(this); // will set this._port pointing to the private API
  File "resource://jid0-r5rj13fz1ughflxgwp9cwfttvfe-api-utils-lib/traits.js", line 142, in Trait
    return self.constructor.apply(self, arguments) || self._public;
  File "resource://jid0-r5rj13fz1ughflxgwp9cwfttvfe-api-utils-lib/content/worker.js", line 179, in WorkerGlobalScope
    Cu.evalInSandbox(es5code.contents, sandbox, "1.8", es5code.filename);
[Exception... "Component returned failure code: 0x8007000e (NS_ERROR_OUT_OF_MEMORY) [nsIXPCComponents_Utils.evalInSandbox]"  nsresult: "0x8007000e (NS_ERROR_OUT_OF_MEMORY)"  location: "JS frame :: resource://jid0-r5rj13fz1ughflxgwp9cwfttvfe-api-utils-lib/securable-module.js -> resource://jid0-r5rj13fz1ughflxgwp9cwfttvfe-api-utils-lib/content/worker.js :: WorkerGlobalScope :: line 179"  data: no]
I haven't seen an exception about weak references.

I think we should close this as fixed by the push in comment 13 + the fix in bug 649334 and I'll open another bug for the evalInSandbox problem.
Sure, thanks for your help.
Such bug report really help us bring the best 1.0 possible version!
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Pleasure. Thanks for working on fixing things. :)

I filed bug 651531 for the evalInSandbox NS_ERROR_OUT_OF_MEMORY issue.
No longer blocks: mlk-fx5+
You need to log in before you can comment on or make changes to this bug.