Closed
Bug 790201
Opened 12 years ago
Closed 12 years ago
Nuke social worker sandbox on shutdown
Categories
(Firefox Graveyard :: SocialAPI, defect)
Firefox Graveyard
SocialAPI
Tracking
(firefox17+ verified, firefox18 verified, firefox19 verified)
VERIFIED
FIXED
Firefox 19
People
(Reporter: jaws, Assigned: markh)
Details
(Keywords: memory-leak, Whiteboard: [Fx17])
Attachments
(3 files, 4 obsolete files)
4.70 KB,
patch
|
Felipe
:
review+
Gavin
:
feedback+
|
Details | Diff | Splinter Review |
2.42 KB,
patch
|
mixedpuppy
:
review+
Felipe
:
review+
|
Details | Diff | Splinter Review |
8.65 KB,
patch
|
Felipe
:
review+
lsblakk
:
approval-mozilla-aurora+
lsblakk
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
After enabling the Social API, there are reports from people that the heap-unclassified is growing to a very large percentage.
This needs to be investigated and we need to make sure that Firefox is not the source of any of these memory leaks.
As a side note, by design these social API panels have a lifetime equal to the browser window they reside in, so references may not be removed until shutdown in the majority of cases.
Reporter | ||
Updated•12 years ago
|
Whiteboard: [Fx17]
Comment 1•12 years ago
|
||
This fixes most of the issues that I found while analyzing the code this week. Not everything is necessary but I believe they all increase correctness. Testing with Jared's demo provider I don't see any window leaks during runtime, and the nukeSandbox call gets rid of the shutdown leak.
Attachment #660789 -
Flags: review?(gavin.sharp)
Comment 2•12 years ago
|
||
Another thing that we need to look into more (but I don't see it making any difference), is that if I call nukeSandbox in the middle of a session (not on shutdown), this message listener (http://mxr.mozilla.org/mozilla-central/source/toolkit/components/social/MessagePortWorker.js#99) being dispatched (not the JS code of the function -- the js stack is empty at the time) will throw "can't access dead object" errors. So this mean something along the way there has a reference back to something that existed in the sandbox, which might or might not be a problem
Assignee | ||
Comment 3•12 years ago
|
||
Comment on attachment 660789 [details] [diff] [review]
Patch
Drive-by:
+ Services.obs.addObserver(function cleanupSandbox () {
+ Services.obs.removeObserver(cleanupSandbox, "xpcom-shutdown");
+ Cu.nukeSandbox(sandbox);
+ }, "xpcom-shutdown", false);
+
Would it be better to call terminateWorker() in that notification (or maybe that is too late for the other DOM cleanup)?
But that looks great (and I learned about nukeSandbox :)
Comment 4•12 years ago
|
||
Comment on attachment 660789 [details] [diff] [review]
Patch
I don't think the event handler removals are necessary.
We should get a better bug on file about the need to unwrap for XHR, that WFM one isn't going anywhere :)
Comment 5•12 years ago
|
||
(In reply to :Felipe Gomes from comment #2)
> Another thing that we need to look into more (but I don't see it making any
> difference), is that if I call nukeSandbox in the middle of a session (not
> on shutdown), this message listener
> (http://mxr.mozilla.org/mozilla-central/source/toolkit/components/social/
> MessagePortWorker.js#99) being dispatched (not the JS code of the function
> -- the js stack is empty at the time) will throw "can't access dead object"
> errors. So this mean something along the way there has a reference back to
> something that existed in the sandbox, which might or might not be a problem
Not sure I follow exactly, but if you nuke the sandbox while the provider panels are still active, their use of mozSocial.getWorker or ports they've previously obtained to the worker will throw, right? Is that what you're seeing?
Assignee | ||
Comment 6•12 years ago
|
||
Comment on attachment 660789 [details] [diff] [review]
Patch
In general this looks good to me. I'm happy with the event listeners being removed and as Gavin didn't explicitly say to remove that code, I'll leave that up to you.
In the same vein though, I think it would be nice if the xpcom shutdown observer was removed in terminateWorker - once terminateWorker has been called (which should happen in normal usage), the shutdown code is no longer necessary. Another alternative might be to have a single observer that iterates over _workerCache and nukes all sandboxes it finds. This really is a nit though as we don't expect many workers to be created and terminated in a normal run, so again, I'll leave that to your judgement.
Attachment #660789 -
Flags: review?(gavin.sharp) → review+
Comment 7•12 years ago
|
||
I've removed the removeEventListeners as they don't seem to be necessary at this poing, and I'll continue investigating
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #4)
> We should get a better bug on file about the need to unwrap for XHR, that
> WFM one isn't going anywhere :)
Filed bug 798660
https://hg.mozilla.org/integration/mozilla-inbound/rev/e9237d1ae5ff
tracking-firefox17:
--- → ?
Summary: Memory leak when Social API is enabled → Nuke social worker sandbox on shutdown
Comment 8•12 years ago
|
||
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/9b9678df7672 because something in the push was hitting "browser_frameworker.js | sub-test testEarlyClose failed: [Exception... "Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIXPCComponents_Utils.nukeSandbox]" nsresult: "0x80070057 (NS_ERROR_ILLEGAL_VALUE)" location: "JS frame :: resource://gre/modules/FrameWorker.jsm :: terminate :: line 239" data: no]" and "browser_frameworker.js | check that websockets worked - Got FAILED calling WebSocket constructor: TypeError: WebSocket is not a constructor, expected ok" and I didn't know which (though I presume this one), and didn't know interconnectedness among them.
Updated•12 years ago
|
Assignee | ||
Comment 9•12 years ago
|
||
I had a play with this and can reproduce the errors. Some of the errors (eg, "Can't access dead object") are caused by the nukeSandbox being done before the hidden iframe has unloaded (infact, in at least 1 test we terminate the worker before the documentelementinserted (sp?) event has fired, so this.sandbox is null)
I attempted to solve this by doing all the termination in an unload handler for the document, which solved some of the problems but still leaves a number more related directly to the removal of the XPCNativeWrapper.unwrap - it causes the WebSocket test to outright fail, and most other tests to dump obscure exceptions and c++ stack traces - so something is still very wrong there. These obscure assertions and WebSocket failure happens even when the nukeSandbox call is commented out.
This new patch is obviously not ready to be checked in (see the:
// XXX - markh - this is necessary to avoid lots of obscure assertions
comment and commented out code around it - but it does pass the tests and doesn't fill the log with assertions. I *know* WebSocket is one thing that needs it, but I'm not sure what others do - I've just re-enabled the unwrapping for all - and I probably won't get back to this until Monday.
There is also a change to toolkit/components/social/test/browser/worker_relative.js which looks like a genuine problem in the test which starts happening due to the worker termination waiting for the unload event)
*sob*
Assignee | ||
Comment 10•12 years ago
|
||
It appears many of those assertions were caused by the patch adding XMLHttpRequest to the sandbox twice - once the "normal" version and once the unwrapped version. By not doing that and by adding WebSocket to the "whitelist" those assertions and test failures go away.
I figured that instead of the xpcom shutdown observer, we might as well *always* add an unload event handler that does the cleanup, and terminate simply removes the node from the document triggering that unload. App shutdown should also cause the unload to fire, meaning it gets cleaned up in that case too. By referencing |sandbox| in the unload handler, we no longer need to store the sandbox in |this|.
Assuming this looks OK, we should file another bug for the need to unwrap WebSocket like we already have done for XMLHttpRequest.
Like my previous version of the patch, this also includes a fix for a legitimate test problem that wasn't seen before due to the timing characteristics of the previous shutdown - this shutdown process actually happens a little later after the worker window has completely finished doing its thing and unload fires.
Attachment #660789 -
Attachment is obsolete: true
Attachment #670701 -
Attachment is obsolete: true
Attachment #670979 -
Flags: feedback?(gavin.sharp)
Attachment #670979 -
Flags: feedback?(felipc)
Comment 11•12 years ago
|
||
Comment on attachment 670979 [details] [diff] [review]
New saner version
>diff --git a/toolkit/components/social/FrameWorker.jsm b/toolkit/components/social/FrameWorker.jsm
>+ workerWindow.addEventListener("uload", function unloadListener() {
>+ workerWindow.removeEventListener("iload", unloadListener);
uload/iload? o_O
> terminate: function terminate() {
>+ if (!this.url in workerCache) {
"!" binds tighter than "in", so you're testing "false in workerCache" here AFAICT (i.e. you'll never hit this).
I wonder whether moving the destruction of the ports to during unload cause trouble for any port-close handlers that try to make use of window-dependent APIs?
Assignee | ||
Comment 12•12 years ago
|
||
Fixed the obvious typos in the event handler and the "!" thang...
Attachment #670979 -
Attachment is obsolete: true
Attachment #670979 -
Flags: feedback?(gavin.sharp)
Attachment #670979 -
Flags: feedback?(felipc)
Attachment #670994 -
Flags: feedback?(gavin.sharp)
Attachment #670994 -
Flags: feedback?(felipc)
Assignee | ||
Comment 13•12 years ago
|
||
FWIW, I just built 17 from the beta tree and without this patch a shutdown reports:
=> mFreeCount: 77674 -- LEAKED 2799 !!!
=> mAdoptFreeCount: 4708 -- LEAKED 4 !!!
and with the patch it reports no leaks. On trunk, no string leaks are reported either way. So this patch has a noticeable effect on 17 and as such may still be a candidate for uplift.
Assignee | ||
Comment 14•12 years ago
|
||
Rebased to avoid a conflict on m-c and as discussed on IRC, added a comment about why storing the sandbox in |this| is no longer necessary.
Attachment #670994 -
Attachment is obsolete: true
Attachment #670994 -
Flags: feedback?(gavin.sharp)
Attachment #670994 -
Flags: feedback?(felipc)
Attachment #671331 -
Flags: review?(gavin.sharp)
Attachment #671331 -
Flags: review?(felipc)
Comment 15•12 years ago
|
||
Comment on attachment 671331 [details] [diff] [review]
Rebased and new comments re this.sandbox
Review of attachment 671331 [details] [diff] [review]:
-----------------------------------------------------------------
::: toolkit/components/social/FrameWorker.jsm
@@ +118,4 @@
> 'location'];
> workerAPI.forEach(function(fn) {
> try {
> + // Bug 756173 - XHR has issues in a sandbox and need
update this bug number with the new filed about the issue: bug 798660. I'll update that bug to also mention WebSocket
@@ +252,5 @@
> + Cu.reportError("FrameWorker: failed to close pending port. " + ex);
> + }
> + }
> + worker.pendingPorts = null;
> +
nit: whitespace on blank line
Attachment #671331 -
Flags: review?(felipc) → review+
Comment 16•12 years ago
|
||
Comment on attachment 671331 [details] [diff] [review]
Rebased and new comments re this.sandbox
Looks good to me.
Attachment #671331 -
Flags: review?(gavin.sharp) → feedback+
Assignee | ||
Comment 17•12 years ago
|
||
This patch is failing on try with:
TEST-INFO | chrome://mochitests/content/browser/toolkit/components/social/test/browser/browser_workerAPI.js | Console message: [JavaScript Error: "TypeError: pending is null" {file: "resource://gre/modules/FrameWorker.jsm" line: 215}]
TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/toolkit/components/social/test/browser/browser_workerAPI.js | Test timed out
https://tbpl.mozilla.org/php/getParsedLog.php?id=16183641&tree=Try
which implies we are executing a 'load' handler after the 'unload' handler has set the array to null. While the patch could avoid this by setting the arrays back to an empty array instead of null, that smells too much - we probably should work out why...
Assignee | ||
Comment 18•12 years ago
|
||
The reload of workers broke the original patch. Attaching modifications to the original patch necessary to get the nukeSandbox changes working with reload.
Note this is *on top of* the other patch - I'll fold them into 1 for checkin, but figured this might make the reload specific changes easier to review.
Ideally Shane *and* Felipe could review - Shane as he understands the reload code best, and Felipe so I can carry the original r+ forward.
Try at https://tbpl.mozilla.org/?tree=Try&rev=bcd7c250ba0a
Attachment #672616 -
Flags: review?(mixedpuppy)
Attachment #672616 -
Flags: review?(felipc)
Updated•12 years ago
|
Attachment #672616 -
Flags: review?(mixedpuppy) → review+
Updated•12 years ago
|
Attachment #672616 -
Flags: review?(felipc) → review+
Assignee | ||
Comment 19•12 years ago
|
||
Assignee | ||
Updated•12 years ago
|
Flags: in-testsuite+
Comment 20•12 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 19
Updated•12 years ago
|
Assignee: felipc → mhammond
Comment 21•12 years ago
|
||
[Approval Request Comment]
Bug caused by (feature/regressing bug #): social api
User impact if declined: better clean-up on shutdown
Testing completed (on m-c, etc.): landed on m-c last week
Risk to taking this patch (and alternatives if risky): touches social-only worker reload/shutdown
String or UUID changes made by this patch: none
Attachment #675253 -
Flags: review+
Attachment #675253 -
Flags: approval-mozilla-beta?
Attachment #675253 -
Flags: approval-mozilla-aurora?
Comment 22•12 years ago
|
||
Comment on attachment 675253 [details] [diff] [review]
Rollup patch for aurora/beta
needed for social-api, been on m-c a week, approving.
Attachment #675253 -
Flags: approval-mozilla-beta?
Attachment #675253 -
Flags: approval-mozilla-beta+
Attachment #675253 -
Flags: approval-mozilla-aurora?
Attachment #675253 -
Flags: approval-mozilla-aurora+
Comment 23•12 years ago
|
||
Comment 24•12 years ago
|
||
Marking verified fixed since this landed with tests. Please add verifyme keyword if there's something you need QA to test.
Status: RESOLVED → VERIFIED
Updated•6 years ago
|
Product: Firefox → Firefox Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•