onunload frame navigation leaves child document alive too long

RESOLVED WORKSFORME

Status

()

RESOLVED WORKSFORME
7 years ago
6 years ago

People

(Reporter: jruderman, Unassigned)

Tracking

(Blocks: 3 bugs, {memory-leak, testcase})

Trunk
memory-leak, testcase
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [MemShrink:P2])

Attachments

(7 attachments, 13 obsolete attachments)

247 bytes, text/html
Details
5.19 KB, patch
smaug
: review+
Details | Diff | Splinter Review
1.31 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
2.98 KB, patch
smaug
: review+
Details | Diff | Splinter Review
24.06 KB, patch
mossop
: review+
Details | Diff | Splinter Review
7.75 KB, patch
Details | Diff | Splinter Review
1.79 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
(Reporter)

Description

7 years ago
Created attachment 543847 [details]
testcase

1. Run a debug build (so you can see --DOMWINDOW lines in the terminal).
2. Load the testcase.
3. Click the link.
4. In a new tab, open about:memory.
5. Click the "Minimize memory usage" button.

Result: "testcase" and "child_1" are collected, but "child_2" remains.

If I close the tab or navigate it to a third page, "child_2" finally becomes collectable.
Blocks: 659860
Whiteboard: [MemShrink]
OS: Linux → All
Hardware: x86_64 → All
OS: All → Linux
Hardware: All → x86_64
Whiteboard: [MemShrink] → [MemShrink:P2]
Assignee: nobody → continuation
I wonder whether the session history is holding on to things somehow here...
Jesse, can you still reproduce this?  I haven't actually looked at the DOMWINDOW lines, because I'm not sure how to generate that on the Mac.  If somebody could tell me how to do that, that would be great.  I guess mochitests do that, so I should poke around in the scripts for that.

Looking at the cycle collector logs, I don't see the nsDocuments for child_1 or child_2 sticking around for long.  In fact, on two runs, they never showed up in the CC logs at all, suggesting that they were cleaned up by regular reference counting, not cycle collection.

Ignoring various about:memory, about:blank, etc. documents, this is what I see when grepping for nsDocument (xhtml) in all of the logs:

cc-edges-1.log:0x100f18000 [rc=10] nsDocument (xhtml) data:text/html,child_1
cc-edges-1.log:0x105810200 [rc=10] nsDocument (xhtml) https://bug669240.bugzilla.mozilla.org/attachment.cgi?id=543847

cc-edges-2.log:0x106e69000 [rc=24] nsDocument (xhtml) data:text/html,Bye
cc-edges-2.log:0x105810200 [rc=11] nsDocument (xhtml) https://bug669240.bugzilla.mozilla.org/attachment.cgi?id=543847
cc-edges-2.log:0x1068f3c00 [rc=8] nsDocument (xhtml) data:text/html,child_2
cc-edges-2.log:0x100f18000 [rc=7] nsDocument (xhtml) data:text/html,child_1

cc-edges-3.log:0x106e69000 [rc=11] nsDocument (xhtml) data:text/html,Bye

nsDocuments are the only thing labelled with "child_", but I guess DOMWINDOW is recording activity related to nsGlobalWindows instead?

There are 15 nsGlobalWindows to start with, and after the child documents go away, the number of nsGlobalWindows drops to 13.  If child_1, child_2 and testcase all had their own global window, this would suggest that we're hanging on to a global window for too long.  I'll try to figure out how to get the global window from the document.
The DOMWINDOW lines should appear if you make a debug build.
Well, they didn't. ;)
I think the problem is that stdout and stderr are sinkholed somewhere by default on the Mac.  Or at least they are the way I run it, with open.
Ah, that's probably it.  The un-idomatic

 $ dist/Nightly.app/Contents/MacOS/firefox

gives me stdout on my mac.
Thanks!  That worked.

I ran it with DEBUG_CC, and the unexpected garbage logging spit out this:

nsCycleCollector: nsGlobalWindow 0x118225488 was not collected due to 1
  external references (1 total - 0 known)
  An object expected to be garbage could be reached from it by the path:
    nsGlobalWindow 0x118225488

I'll have to look into it tomorrow more to figure out what exactly the interleaving of CCs and shutting down is.

Updated

7 years ago
Assignee: continuation → azakai

Comment 9

7 years ago
Dumping the CC log, it looks like the two windows (inner and outer) that should be collected but aren't, are held on to by a chain of length 12,418 (!), beginning with the root |JS Object (RegExpStatics)|.
This may be InstallTrigger related. I can't find anything wrong in that code though. But not creating the InstallTrigger getter (in toolkit/mozapps/extensions/content/extensions-content.js) makes the problem go away.

Summary of the problem: We have a page with a link and an iframe. When the link is clicked we browse to Bye (a data url). The page's unload was set to browse the iframe to child_2 (also a data url). So when the link is clicked, we expect unload to be called which causes the iframe to browse to child_2. child_2 is created, and the problem is that it stays open as long as the tab with Bye is open - even though it no longer exists since we browsed to Bye.

Analysis: Looking at the window that is leaked (child_2), the CC graph shows it is held on to by its XPCWrappedNative and that is held on to by a |JS Object (Window)|, which I am guessing is the Window's global object (the JS Window object holds onto the XPCWrappedNative through xpc_GetJSPrivate). That seems ok.

Now, that JS Window is a [root] in the CC graph. Looking in the GC log to see why, we get

0xaff31218 Window acb30340        via XPCWrappedNative::mFlatJSObject(0xaff31178 Window).shape(0xaff58ab0 shape).InstallTrigger getter(0xaff3c3d0 Proxy).proto(0xaff3c0f8 Proxy).parent

(This is what prompted me to try removing the InstallTrigger to see what happens.) So, we are held on to by aff31178, which turns out to be another |JS Object (Window)|. Looking at its xpc_GetJSPrivate, to see what native window it corresponds to (I am inferring that from what we saw earlier in this comment - I hope that's right?), turns out that is the DOMWindow created for the Bye page (the page we are navigated to when we click the link), and *not* child_2 which we started with. Odd...

When the link is clicked, the Bye window is created, and the child_2 window immediately after it. (This seems slightly odd to me, I would have guessed we first shut down the current page, call it's unload etc., before creating the next one.) We end up with the new Bye window pointing to the child_2 window, through the chain we see above, summarized as

(Bye == ) mFlatJSObject->shape->InstallTrigger getter->proto->parent (== child_2)

That sort of looks like the InstallTrigger on Bye is one that should be on child_2 instead. Or as if the child_2 one is reused on Bye. In other words, the InstallTrigger getter created on Bye is incorrect somehow.
mrbkap on irc said that the 'Proxy' stuff in the dumps might be security wrappers. So that may be a factor here.
Jesse on irc suggested that reimplementing the InstallTrigger instantiation mechanism using nsIDOMGlobalPropertyInitializer might resolve this. Looking at nsIDOMGlobalPropertyInitializer, it appears to be a potentially slimmer way of doing things (currently we create a getter in JS on each new window). So it might be a win in two ways.

Mossop, would it be ok if I wrote a patch for that?

(Note that even if that does fix this bug, I suspect there remains an underlying problem, maybe with security wrappers as mrbkap hinted.)
(In reply to comment #12)
> Jesse on irc suggested that reimplementing the InstallTrigger instantiation
> mechanism using nsIDOMGlobalPropertyInitializer might resolve this. Looking
> at nsIDOMGlobalPropertyInitializer, it appears to be a potentially slimmer
> way of doing things (currently we create a getter in JS on each new window).
> So it might be a win in two ways.
> 
> Mossop, would it be ok if I wrote a patch for that?

Sure

Updated

7 years ago
Blocks: 675412
Bug 669096 looks similar to this. Perhaps the causes are related.
Marking as depending on bug 669096 since bug 669096 comment 12 and 13 sound like the exact same issue the investigation here turned up. Sorry for bugspam.
Depends on: 669096
OS: Linux → All
Hardware: x86_64 → All
Created attachment 552271 [details] [diff] [review]
Part 1: Make messageManager accessible from window

I have patches that rewrite the InstallTrigger to use nsIDOMGlobalPropertyInitializer. They work perfectly locally, and they resolve this bug (and hopefully bug 675412 too). However I get failures on tryserver that I don't see locally (on all platforms, including the one I am running locally), http://tbpl.mozilla.org/?tree=Try&pusher=azakai@mozilla.com&rev=fa3534efd36b

I am not sure what could be causing that difference. I'm putting up the patches in hopes someone will notice something I am forgetting. Also advice on what to do when there are such discrepancies between local and tryserver would be welcome.

This is part 1, which allows getInterface on a window to get to the window's messageManager. Works in both the single and multiprocess case. This is needed in this bug since we have the window, but not the messageManager, and there is no other way to get to it.
Created attachment 552273 [details] [diff] [review]
Part 2: Use nsIDOMGlobalPropertyInitializer

This is the main patch. It switches from a frameScript that we load into using nsIDOMGlobalPropertyInitializer.

It's mostly straightforward, with few changes to the previous InstallTrigger code. The only interesting bit is the use of WeakMap to store the messageManagers: We don't want to add our message listener more than once to the same messageManager, but also don't want to keep a list of all the messageManagers we have seen. (If each window had a messageManager this would be simple, but instead each frame/tab does - so multiple windows can share a single messageManager.) WeakMaps exactly solve this, we use the messageManagers as keys in the map, and they are weak refs - when they are collected, they vanish from the map, giving the exact behavior we need here.
If you haven't already, you can download the Tryserver build and run it locally.  If you can reproduce it, you could then debug what the problem is.
Created attachment 552274 [details] [diff] [review]
diff between extensions-content.js and InstallTrigger.js

For convenience, here is a diff -w between the old extensions-content.js code that implemented InstallTriggers, and the new InstallTrigger.js.
Created attachment 552278 [details] [diff] [review]
Part 3: Prevent crash in nsFrameScriptExecutor::Traverse

This last patch fixes a crash that started happening with the previous two patches. I'm not sure exactly what change triggered it. What happens is that nsFrameScriptExecutor::Traverse is called during shutdown, when nsContentUtils::XPConnect() is already NULL.

I am not sure if this is a problem with the previous two patches, or whether it was always a problem but just became noticeable with them. There appears to be one other place in the code that makes the assumption that nsContentUtils::XPConnect() is non-NULL in calls to Traverse, in nsJSContext.
Attachment #552278 - Flags: feedback?(Olli.Pettay)
(In reply to Andrew McCreight [:mccr8] from comment #18)
> If you haven't already, you can download the Tryserver build and run it
> locally.  If you can reproduce it, you could then debug what the problem is.

Thanks! I forgot about that. It does fail locally with the tryserver build, now to find out what is different between my build and tryserver's...
Comment on attachment 552278 [details] [diff] [review]
Part 3: Prevent crash in nsFrameScriptExecutor::Traverse

NS_CYCLE_COLLECTION_NOTE_EDGE_NAME should be inside the if() {} .
Attachment #552278 - Flags: feedback?(Olli.Pettay) → feedback+
Comment on attachment 552271 [details] [diff] [review]
Part 1: Make messageManager accessible from window

This exposes message manager to content pages without any security checks.
Attachment #552271 - Flags: review-
(In reply to Alon Zakai (:azakai) from comment #20)
> This last patch fixes a crash that started happening with the previous two
> patches. I'm not sure exactly what change triggered it. What happens is that
> nsFrameScriptExecutor::Traverse is called during shutdown, when
> nsContentUtils::XPConnect() is already NULL.

Won't this cause leaks given that we won't note the edge into the context?

Updated

7 years ago
Blocks: 678365
Created attachment 552566 [details] [diff] [review]
Part 4: Disable a crashtest that fails due to a new assertion

I think most of the tryserver failures are now fixed. A last issue is dealt with in this patch, we need to disable a crashtest since it fails due to the new assertion that now happens (bug 678365).

Waiting on talos to confirm that these patches help perf (or at least do not regress).
Created attachment 552568 [details] [diff] [review]
Part 1: Make messageManager accessible from window v2

Changed Part 1 to take into account the security issue smaug mentioned.

(This also requires a tiny change to Part 2, to do |.getInterface(Ci.nsIDOMWindowUtils).messageManager| instead of getInterface for the messageManager directly.)
Attachment #552271 - Attachment is obsolete: true

Updated

7 years ago
Attachment #552568 - Flags: review?(Olli.Pettay)
Created attachment 552695 [details] [diff] [review]
Part 2: Use nsIDOMGlobalPropertyInitializer v2

(The "diff between extensions-content.js and InstallTrigger.js" attachment should hopefully make reviewing this more convenient.)
Attachment #552273 - Attachment is obsolete: true
Attachment #552695 - Flags: review?(dtownsend)
Created attachment 552696 [details] [diff] [review]
Part 3: Prevent crashes when nsContentUtils::XPConnect() is null
Attachment #552278 - Attachment is obsolete: true
Attachment #552696 - Flags: review?(Olli.Pettay)
Created attachment 552697 [details] [diff] [review]
Part 4: Disable crashtests that fail due to new assertion

This patch disables all the crashtests that use InstallTrigger, as they now assert due to bug 678365.

Asking mrbkap to review since most are in xpconnect.
Attachment #552566 - Attachment is obsolete: true
Attachment #552697 - Flags: review?(mrbkap)
Comment on attachment 552697 [details] [diff] [review]
Part 4: Disable crashtests that fail due to new assertion

Can you add the magic marking to say that these assert 1 time instead of skipping them entirely?

asserts(1) load ...
Attachment #552697 - Flags: review?(mrbkap)
(In reply to Blake Kaplan (:mrbkap) from comment #30)
> Comment on attachment 552697 [details] [diff] [review]
> Part 4: Disable crashtests that fail due to new assertion
> 
> Can you add the magic marking to say that these assert 1 time instead of
> skipping them entirely?
> 
> asserts(1) load ...

Tried that, but now the problem is that these tests do not assert 100% reliably, so there are failures with |assertion count 0 is less than expected 1 assertions|. Is there a way to allow 0 *or* 1 assertions?
(In reply to Alon Zakai (:azakai) from comment #31)
> (In reply to Blake Kaplan (:mrbkap) from comment #30)
> > Comment on attachment 552697 [details] [diff] [review]
> > Part 4: Disable crashtests that fail due to new assertion
> > 
> > Can you add the magic marking to say that these assert 1 time instead of
> > skipping them entirely?
> > 
> > asserts(1) load ...
> 
> Tried that, but now the problem is that these tests do not assert 100%
> reliably, so there are failures with |assertion count 0 is less than
> expected 1 assertions|. Is there a way to allow 0 *or* 1 assertions?

http://mxr.mozilla.org/mozilla-central/source/content/xbl/crashtests/crashtests.list#8
Created attachment 553623 [details] [diff] [review]
Part 4: Disable crashtests that fail due to intermittent new assertion

Thanks khuey!
Attachment #552697 - Attachment is obsolete: true
Attachment #553623 - Flags: review?(mrbkap)
Comment on attachment 553623 [details] [diff] [review]
Part 4: Disable crashtests that fail due to intermittent new assertion

Great, thanks.
Attachment #553623 - Flags: review?(mrbkap) → review+
Comment on attachment 552568 [details] [diff] [review]
Part 1: Make messageManager accessible from window v2

>+nsDOMWindowUtils::GetMessageManager(nsIContentFrameMessageManager** aResult)
>+{
>+  *aResult = nsnull;
>+
>+  if (!IsUniversalXPConnectCapable()) {
>+    return NS_ERROR_DOM_SECURITY_ERR;
>+  }
>+
>+  nsPIDOMWindow* innerWindow = mWindow->GetCurrentInnerWindow();
>+  if (!innerWindow)
>+    return NS_ERROR_FAILURE;
if (expr) {
  stmt;
}
or, in this case,
NS_ENSURE_TRUE(innerWindow, NS_ERROR_FAILURE);


>+      if (fl)
>+        messageManager = do_QueryInterface(fl->GetTabChildGlobalAsEventTarget());
if (expr) {
  stmt;
}



>+    }
>+  } else {
>+    // Multi-process case. Get the TabChildGlobal
>+    nsCOMPtr<nsIDocShell> docShell = innerWindow->GetDocShell();
>+    if (docShell) {
>+      nsRefPtr<TabChild> tabChild = GetTabChildFrom(docShell);
GetTabChildFrom is null safe, so no need for if (docshell)

>+      if (tabChild)
>+        messageManager = tabChild->GetGlobal();
if (expr) {
  stmt;
}


>+    }
>+  }
>+
>+  if (messageManager) {
>+    *aResult = messageManager.get();
>+    NS_ADDREF(*aResult);
>+  }

No need for if() nor addref.

messageManager.forget(aResult);

> 
>+    nsIContentFrameMessageManager* GetGlobal() { return mTabChildGlobal; }
s/GetGlobal/GetTabChildGlobal/
Attachment #552568 - Flags: review?(Olli.Pettay) → review+
Comment on attachment 552696 [details] [diff] [review]
Part 3: Prevent crashes when nsContentUtils::XPConnect() is null

Hmm, why are we traversing even nsJSContext so late that 
nsContentUtils::XPConnect() is null.

Does also nsXPConnect::GetXPConnect() return null in those cases?
Any chance you could attach a diff -w between extensions-content.js and InstallTrigger.js?
Created attachment 555193 [details] [diff] [review]
diff between extensions-content.js and InstallTrigger.js -U 8 -w

Updated -U 8 -w diff between extensions-content.js and InstallTrigger.js.
Attachment #552274 - Attachment is obsolete: true
Comment on attachment 552695 [details] [diff] [review]
Part 2: Use nsIDOMGlobalPropertyInitializer v2

Review of attachment 552695 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/mozapps/extensions/InstallTrigger.js
@@ +181,5 @@
> +
> +    /**
> +     * @see amInstallTrigger.cpp
> +     * TODO: When e10s lands on m-c, consider removing amInstallTrigger.cpp
> +     *       See bug 571166

Remove this TODO while you're here

@@ +240,5 @@
> +InstallTriggerManager.prototype = {
> +  classID: Components.ID("{2d24aee2-c4a6-4c66-b1e9-b600d9013157}"),
> +  QueryInterface: XPCOMUtils.generateQI([Ci.nsIDOMGlobalPropertyInitializer]),
> +
> +  messageManagers: new WeakMap(),

This will mean every instance of InstallTriggerManager has the same WeakMap, what is the lifecycle of nsIDOMGlobalPropertyInitializer?
Comment on attachment 552696 [details] [diff] [review]
Part 3: Prevent crashes when nsContentUtils::XPConnect() is null

Waiting for answer to my questions.
Attachment #552696 - Flags: review?(Olli.Pettay) → review-
(In reply to Olli Pettay [:smaug] from comment #40)
> Comment on attachment 552696 [details] [diff] [review]
> Part 3: Prevent crashes when nsContentUtils::XPConnect() is null
> 
> Waiting for answer to my questions.

Sorry for the delay, I am still working on that. It looks like using xpcprivate's nsXPConnect::GetXPConnect() does the right thing, however, for some reason I can't get it to build properly on Windows. Since I don't have a Windows machine I am doing this asynchronously using tryserver (where sometimes my pushes get ignored...).

(In reply to Dave Townsend (:Mossop) from comment #39)
> @@ +240,5 @@
> > +InstallTriggerManager.prototype = {
> > +  classID: Components.ID("{2d24aee2-c4a6-4c66-b1e9-b600d9013157}"),
> > +  QueryInterface: XPCOMUtils.generateQI([Ci.nsIDOMGlobalPropertyInitializer]),
> > +
> > +  messageManagers: new WeakMap(),
> 
> This will mean every instance of InstallTriggerManager has the same WeakMap,
> what is the lifecycle of nsIDOMGlobalPropertyInitializer?

It's an XPCOM JS component, I never saw a formal definition of the lifecycle of such things, but assumed a singleton was created and never destroyed? Is that wrong?
(In reply to Alon Zakai (:azakai) from comment #41)
> (In reply to Olli Pettay [:smaug] from comment #40)
> > Comment on attachment 552696 [details] [diff] [review]
> > Part 3: Prevent crashes when nsContentUtils::XPConnect() is null
> > 
> > Waiting for answer to my questions.
> 
> Sorry for the delay, I am still working on that. It looks like using
> xpcprivate's nsXPConnect::GetXPConnect() does the right thing, however, for
> some reason I can't get it to build properly on Windows. Since I don't have
> a Windows machine I am doing this asynchronously using tryserver (where
> sometimes my pushes get ignored...).
> 
> (In reply to Dave Townsend (:Mossop) from comment #39)
> > @@ +240,5 @@
> > > +InstallTriggerManager.prototype = {
> > > +  classID: Components.ID("{2d24aee2-c4a6-4c66-b1e9-b600d9013157}"),
> > > +  QueryInterface: XPCOMUtils.generateQI([Ci.nsIDOMGlobalPropertyInitializer]),
> > > +
> > > +  messageManagers: new WeakMap(),
> > 
> > This will mean every instance of InstallTriggerManager has the same WeakMap,
> > what is the lifecycle of nsIDOMGlobalPropertyInitializer?
> 
> It's an XPCOM JS component, I never saw a formal definition of the lifecycle
> of such things, but assumed a singleton was created and never destroyed? Is
> that wrong?

Is it created as a service or a regular component? The former would presumably mean singleton per process, the latter means there could be multiple instances per process but it may still be a singleton depending on what the code instantiating does.
Ah, right, thanks. Ok, I read the platform code that uses this and checked at runtime to make sure, and it does create multiple instances. So a small modification is indeed needed here.

I just pushed to try with this fix and a fix for smaug's issue, hopefully it will be green and then I'll put up updated patches.
Created attachment 555555 [details] [diff] [review]
Part 2: Use nsIDOMGlobalPropertyInitializer v3

Looks good on try so far. Here is an updated Part 2, that creates a separate WeakMap per Manager.
Attachment #552695 - Attachment is obsolete: true
Attachment #555555 - Flags: review?(dtownsend)
Attachment #552695 - Flags: review?(dtownsend)
Created attachment 555557 [details] [diff] [review]
diff between extensions-content.js and InstallTrigger.js -U 8 -w

Updated diff -U 8 -w for InstallTrigger.js
Attachment #555193 - Attachment is obsolete: true
Created attachment 555559 [details] [diff] [review]
Part 3: Prevent crashes when nsContentUtils::XPConnect() is null v2

Updated Part 3. Using nsXPConnect::GetXPConnect() works when nsContentUtils::XPConnect() is already null. (Why Windows was so picky about the position of the #include here, I do not know...)
Attachment #552696 - Attachment is obsolete: true
Attachment #555559 - Flags: review?(Olli.Pettay)
Comment on attachment 555555 [details] [diff] [review]
Part 2: Use nsIDOMGlobalPropertyInitializer v3

Review of attachment 555555 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/mozapps/extensions/InstallTrigger.js
@@ +235,5 @@
> + */
> +function InstallTriggerManager() {
> +  this.callbacks = {};
> +  this.messageManagers = new WeakMap();
> +}

I don't think this is right for what seems to be the lifecycle of nsIDOMGlobalPropertyInitializer. It appears that a new instance is created for every DOM window that accesses InstallTrigger. That means currently you have one messageManagers weakmap per DOM window so you're never actually going to get a hit in the map for any message manager. The previous patch which put the weakmap on the prototype was actually better in this regard as it would be shared across all instances however I dislike using the prototype like that.

Instead I think you should force InstallTriggerManager to be a singleton with a custom xpcom nsIFactory. See http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/extensions/addonManager.js#239

@@ +262,5 @@
> +          });
> +        }
> +      } catch(e) {
> +        Cu.reportError(e);
> +      }

The jar flushing stuff needs to exist exactly once per content process, this patch makes it happen once per DOM window that requests InstallTrigger. That might be zero or multiple times per content process. It probably needs to remain in a content script.
Attachment #555555 - Flags: review?(dtownsend) → review-
Created attachment 555878 [details] [diff] [review]
Part 2: Use nsIDOMGlobalPropertyInitializer v4

That makes perfect sense. Here is an updated patch.
Attachment #555555 - Attachment is obsolete: true
Attachment #555878 - Flags: review?(dtownsend)
Created attachment 555880 [details] [diff] [review]
diff between extensions-content.js and InstallTrigger.js -U 8 -w

Updated convenience diff between the two main files.
Attachment #555557 - Attachment is obsolete: true

Updated

7 years ago
Blocks: 673432
(Reporter)

Updated

7 years ago
Summary: onunload frame navigation leaves child document alive too long → Port InstallTrigger to nsIDOMGlobalPropertyInitializer (was: onunload frame navigation leaves child document alive too long)

Updated

7 years ago
No longer blocks: 673432
Attachment #555878 - Flags: review?(dtownsend) → review+
This latest version leaks on tryserver. The issue is that the previous version used one WeakMap per window, while the latest uses a single WeakMap for everything. There are some leaking issues with WeakMaps, but they were not an issue as long as the WeakMap depended on a window (when the window went away, so did the WeakMap, and things got freed), but with a single WeakMap we keep a hold on objects forever, leaking them.

In theory even the previous version should have leaked while the window was still alive, but our tools don't appear to detect brief leaks like that. Ironically this bug began as such a "hold on too long, but don't leak forever" leak, and the patches here did fix the symptoms as reported, but apparently we were just minimizing the temporary leak and not eliminating it.

I don't see a reasonable workaround here, so probably this should wait for the relevant WeakMap bugs. At that point the leak should be completely fixed here.
Depends on: 680937, 668855
All the dependent bugs are fixed.  Can we land this now?
I'm not sure if it will work, unfortunately. :-/  My fix for bug 680937 uses wrapper preservation, which I think is necessary, but this means that it only works for nsWrapperCaches.  It looks to me like the weak map is used with nsIContentFrameMessageManager keys, and those only look to implement nsISupports, not nsWrapperCache.

You can probably hack around this by creating a key object that is doubly linked with the frame message manager.  This key will have the same lifetime as the frame message manager and can be retrieved from the manager.

It would work something like this:

var messageManager = getMessageManager(aWindow);
if (!this.messageManagers.has(messageManager.key)) {
  messageManager.addMessageListener(MSG_INSTALL_CALLBACK, this);
  var key = {manager:messageManager};
  messageManager.key = key;
  this.messageManagers.set(key, true);
}

Basically, you are doing manual "wrapper preservation".  Come to think of it, this doesn't require bug 680937, only bug 668855.
Or we could just make the message manager preserve its own wrapper.
Sure, if you want to do the direct thing.

I wonder if setting an expando property on the message manager might be enough to preserve the wrapper...
Created attachment 583626 [details] [diff] [review]
Part -1: Make messageManager preserve its wrapper
Attachment #583626 - Flags: review?(bugs)
Created attachment 583660 [details] [diff] [review]
Part 0: Allow things other than nodes to opt in to WeakMaps
Attachment #583660 - Flags: review?(continuation)
Comment on attachment 583626 [details] [diff] [review]
Part -1: Make messageManager preserve its wrapper

I don't understand why this is needed.
nsInProcessTabChildGlobal is an nsDOMEventTargetWrapperCache, so it has
wrapper cache.
Also TabChildGlobal is an nsDOMEventTargetWrapperCache
Attachment #583626 - Flags: review?(bugs) → review-
Comment on attachment 583660 [details] [diff] [review]
Part 0: Allow things other than nodes to opt in to WeakMaps

As per our IRC discussion, I think it makes sense to use a more local hack to fix this while we figure out what a good general fix would be.
Attachment #583660 - Flags: review?(continuation) → review-
Created attachment 583692 [details] [diff] [review]
Part 0: Use a more localized hack.
Attachment #583626 - Attachment is obsolete: true
Attachment #583660 - Attachment is obsolete: true
Attachment #583692 - Flags: review?(continuation)
Comment on attachment 583692 [details] [diff] [review]
Part 0: Use a more localized hack.

Review of attachment 583692 [details] [diff] [review]:
-----------------------------------------------------------------

A test to ensure that we don't drop these kinds of keys would be good, but if it is too hard to do that's okay, as I imagine it will just make everything fall apart anyways. r=me
Attachment #583692 - Flags: review?(continuation) → review+
I attempted to write a test, but was unsuccessful at getting ahold of a message manager.  The crash that we discussed on IRC was caused by PreciseGCRunnable running after XPConnect begins shutting down, and isn't related to this patch.
Unfortunately with all these patches I see some hangs in weakmap marking on try :-/
Depends on: 707313
The hang is fixed, but there's still a leak on mochitest-browser-chrome to track down.
Assignee: azakai → continuation
A couple of the nsGlobalWindows that are alive at shutdown are being held alive by a ChromeWindow that is itself alive via this path:

via file:///Users/amccreight/mz/cent4/obj-dbg/dist/NightlyDebug.app/Contents/MacOS/components/InstallTrigger.js :
0x12a7e4600 [BackstagePass 146efc2b0]
    --[gSingleton]-> 0x126eb5c40 [Object <no private>]
    --[callbacks]-> 0x126eb5c80 [Object <no private>]
    --[3]-> 0x14c12de40 [Object <no private>]      // this is actually 38 not 3...
    --[callback]-> 0x14cc9c520 [Proxy <no private>]
    --[type]-> 0x153bb16a0 [type_object //example.com/browser/toolkit/mozapps/extensions/test/xpinstall/installtrigger.html?%7B%22Unsigned%20XPI%22%3A%22http%3A%2F%2Fexample.com%2Fbrowser%2Ftoolkit%2Fmozapps%2Fextensions%2Ftest]
    --[type_proto]-> 0x14cc9c340 [Proxy <no private>]
    --[type]-> 0x153bb1250 [type_object ivate>]
    --[type_proto]-> 0x14e7c2c80 [Proxy <no private>]
    --[shape]-> 0x12303fec0 [shape <no private>]
    --[base]-> 0x14b7d9158 [base_shape rivate>]
    --[parent]-> 0x118c39c40 [ChromeWindow 11b6e3400]

gSingleton and the callbacks are the most suspicious part of this, IMO.

gSingleton is the name of a global variable added in part 2 of the patch:
+var gSingleton;

+      // Add callback Id, done here, so only if we actually got here
+      params.callbackId = gSingleton.addCallback(aCallback, params.uris);

It looks like it gets cleared in receiveMessage:
+    if (callbackObj.urls.length == 0)
+      this.callbacks[callbackId] = null;

I guess that never happened?

I'll poke around a little and try to narrow down what test is leaking.
Unsurprisingly, running just the installer tests like this causes leaks:
TEST_PATH=toolkit/mozapps/extensions/test/xpinstall/ make -C obj-dbg/ mochitest-browser-chrome

I wasn't able to figure out how to run individual tests, but running that directory doesn't take too long.

Anyways, with that, every nsGlobalWindow is leaked via one of the callbacks in singleton.  There are four callbacks:

0x118cf5cc0 Object <no private>
> 0x11cbbc580 type
> 0x14a01d948 shape
> 0x1452819c0 13
> 0x145274d80 22
> 0x12ab9dd00 33
> 0x14a022d40 34

And there's at least one nsGlobalWindow leaked via each one.

Each of these callbacks has a single URL in it ( http://example.com/browser/toolkit/mozapps/extensions/test/xpinstall/unsigned.xpi in all cases).  It looks like the callback is cleared only when the URL list is empty, so that would explain why they were never cleared.  I'm not a JS expert, but the code that deletes URLs from the list looks okay to me ( callbackObj.urls.splice(callbackObj.urls.indexOf(url), 1); ).  In fact, that whole function looks just copied over from existing code, so I'm not sure what the problem is.
I don't think I know enough about this code to go any further. If somebody wants to investigate, I can help with the leak hunting angle.
Assignee: continuation → nobody
Depends on: 695480
I believe this leak ended up getting fixed by bug 695480.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Resolution: FIXED → WORKSFORME
"this leak" being the one reported in comment 0, not the one with the patch porting it to nsIDOMGlobalPropertyInitializer, right? Seems misleading to close this as WORKSFORME with its current summary.

Seems like there are still potential benefits to porting this to nsIDOMGlobalPropertyInitializer, so maybe that should be tracked in a separate bug.
Summary: Port InstallTrigger to nsIDOMGlobalPropertyInitializer (was: onunload frame navigation leaves child document alive too long) → onunload frame navigation leaves child document alive too long
(Reporter)

Comment 69

6 years ago
> I believe this leak ended up getting fixed by bug 695480.

I thought bug 695480 only limited the size of leaks rather than eliminating them entirely.
> I thought bug 695480 only limited the size of leaks rather than eliminating
> them entirely.

Nope, it eliminates them.  That's why it's so good :)
(In reply to Nicholas Nethercote [:njn] from comment #70)
> Nope, it eliminates them.  That's why it's so good :)
You can still leak within a chrome compartment, I think.

Anyways, if the patch in this bug still has value outside of fixing a leak, feel free to reopen this bug.
You need to log in before you can comment on or make changes to this bug.