Closed Bug 526897 Opened 15 years ago Closed 14 years ago

Probable DOMWindow leak using GMail

Categories

(Core :: DOM: Core & HTML, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 497808
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: roc, Assigned: peterv)

References

Details

(Keywords: memory-leak)

Attachments

(6 files)

Attached file Log
Browsing for a while, I noticed that Firefox was getting bloated. So I closed all my windows. I noticed that closing the window containing Google Calendar and GMail released a *ton* of DOMWindows, mostly associated with GMail internal URLs. Possibly this is a GMail bug, I'm not sure. Whoever's bug it is, it's bad.
Flags: blocking1.9.2?
It is remotely possible that the Coscripter extension caused this, so I'll try again without that extension to see if I can reproduce the problem.
Flags: blocking1.9.2?
Attached file log #2
Here, the only extension enabled was DOM Inspector. This is a recent trunk build. The browser was up for a few days I guess ... this still look very bad.
Just to be clear, that log is part of shutdown.

This bug really worries me a lot.
Me too. I've been trying to reproduce with cycle collector debugging and logging turned on, but haven't yet managed to.
What's the slowdown for turning those on? How many logs would I generate over a few days? If I can dogfood with those on, I will, just tell me how.
Peter asked how I get URLs containing stuff like "cat=Bugzilla&search=cat". I think just clicking on labels to view the messages that match the label does it. For example, I can see them with the following steps:
-- Open a new window and navigate to Gmail (https://mail.google.com/mail/?nocheckbrowser#inbox)
-- Click on my Bugzilla label
-- Close the window
This is still happening :-(. Peter, did you manage to reproduce it?
Still happening. This makes me very sad.
blocking2.0: --- → ?
I just got into a state where closing the GMail tab freed up 200 DOMWindows.
Well, that could be a bug in gmail. Perhaps something in it is keeping references
to already removed iframe windows or something. Or does it perhaps keep
iframes alive too long.

Or maybe this is indeed a gecko problem. I'll try to reproduce this.
Some preliminary results attached, I'll try to run it over longer periods later, but it started being painful. :)
So Alexander's log is interesting.  It's showing a lot of windows go away but only a few docshells.  That implies that these were no longer "live" frames but that something was keeping those windows alive...  But being no longer "live" they shouldn't have been entraining their DOMs either, I would think.

roc, can you try applying the patch from bug 523885 and seeing whether it affects what you see at all?
I'll re-test, but with bug 523885 there seems to be a crash when using gmail and
CC runs.
(In reply to comment #10)
> Well, that could be a bug in gmail.

It certainly could be. On the other hand, if it works better in other browsers, the end user isn't going to care.
I think bug 523885 might help here. The debug_cc stuff needs to be fixed, but at
least DOMWindow count doesn't increase. 
And debug_cc didn't complain about non-collectable Windows either.

But I'm not 100% sure, since I can't always reproduce the problem.
On my latest trunk building, enabling DEBUG_CC makes the browser unusable for real dogfooding :-(.
I suspect this was not fixed by bug 523885. I haven't been able to keep a build up for several days yet (running out of battery while on the road, etc), but I just accidentally closed my GMail tab and 31 GMail DOMWindows were released.

Alex, can you confirm this?
Well, bug 523885 got backed out. (Perhaps you had a build with the patch)
I'll give it a try once the patch in bug 523885 has landed.
Attached file log
I used a build that definitely contains the fix for bug 523885 for several days. Closing the GMail window freed up 300 DOMWindows (I guess only about 150 inners if every inner has an outer), so this bug is definitely still happening. :-(

We need some kind of plan here. Can someone write a patch that will let me dump the heap graph after a few days, without having to run with DEBUG_CC all that time?
I note that somehow accessibility seems to be turned on in my builds, perhaps that contributes to the problem.

I also note that we seem to be leaking some DOM Windows in general until shutdown, but that is probably a different bug since these GMail DOM Windows are cleaned up just by closing the window.
(In reply to comment #22)
> We need some kind of plan here. Can someone write a patch that will let me dump
> the heap graph after a few days, without having to run with DEBUG_CC all that
> time?

It's certainly possible to do; the node/edge description code for the cycle collector doesn't depend on having DEBUG_CC defined; it just depends on the flags set on the nsCycleCollectionTraversalCallback.  It's probably a bit of work, though.
The other thing that might get you a bit of information is using tools/footprint/leak-gauge.pl on incomplete logs; it would give you the URLs associated with the windows that are still alive.
(In reply to comment #23)
> I note that somehow accessibility seems to be turned on in my builds, perhaps
> that contributes to the problem.

Just built with accessibility, or with caret browsing enabled or something like that? It's disabled by default on OS X, so unless Alexander turned it on in his build it shouldn't be enabled for him.

I'm still trying to reproduce this, I'm seeing some windows staying around but so far if I wait long enough they eventually go away. I need to switch between labels a lot to get that. I keep hovering around 27 windows with just Gmail open.
It's built with accessibility, caret browsing is not enabled. I don't know why the accessibility service is actually running, but it seems to be producing warnings, so it must be.

How about writing the patch that would let us get a CC-heap-dump at any time without DEBUG_CC? I bet that would be really useful for lots of situations.
Attached patch CC loggingSplinter Review
Don't really have the time for a better solution, but here's a quick and dirty patch that adds a boolean to garbageCollect (on nsIDOMWindowUtils). You need privileges to call that, but if passed true it should dump .dot files with the whole graph from the two cycle collections. If we have the addresses of some of these leaking windows we might be able to get some data out of that. Ideally you want the reverse graph, but that's a bit expensive to generate.
I haven't yet tested the patch, let me know how it goes.
I think this patch makes us leak a lot. I applied the patch and crashed with OOM after less than a day of browser uptime (not much actual user activity during that time).
Maybe it's not a leak, but a problem with peak memory usage for very large graphs.

It would probably help if edge names were nsIAtoms instead of strings...

I've modified the patch to only actually store edge names if we are going to output a graph. That should help me not crash, at least until I'm ready to draw a graph...
Yeah, sorry. Thought about that after I attached the patch but didn't realize it was important.
This is a log showing us freeing about 75 GMail DOM windows when I closed my Gmail window.

Before I closed the window, I was able to capture a cycle collector graph here:
http://people.mozilla.com/~roc/leaked-gmail-windows-cycle-graph.dot.bz2

Let me know if there's anything else I can do.
I've been trying to analyze the log, it's hard because it's so big, there is an enormous amount of interconnected objects (also, we don't log whether a JS object is rooted).

There are at least some timeouts that keep windows alive, and it looks like they're still running (they have a reference count of two, we know about one edge to them):

0x2ce81070 [nsTimeout]
    --[mScriptHandler]-> 0x1adc4330 [nsJSScriptTimeoutHandler]
    --[]-> 0x1a391a40 [JS Object (Function) (global=1bcc4060)]
    --[__parent__]-> 0x1bcc4060 [JS Object (Window) (global=1bcc4060)]
    --[LV]-> 0x1a59be40 [JS Object (Object) (global=1bcc4060)]
    --[Sh]-> 0x1a5967c0 [JS Object (Object) (global=1bcc4060)]
    --[h]-> 0x1a596840 [JS Object (Object) (global=1bcc4060)]
    --[w]-> 0x1a596860 [JS Object (Object) (global=1bcc4060)]
    --[inbox/12808e9df7f26d2d]-> 0x288d88a0 [JS Object (Object) (global=1bcc4060)]
    --[sS]-> 0x1ab2b4a0 [JS Object (Object) (global=1bcc4060)]
    --[w]-> 0x1aacba40 [JS Object (Object) (global=1bcc4060)]
    --[ia]-> 0x1ab2b760 [JS Object (Array) (global=1bcc4060)]
    --[array_dslots[43]]-> 0x1a7e44a0 [JS Object (Object) (global=1bcc4060)]
    --[La]-> 0x1a803240 [JS Object (Array) (global=1a8bac80)]
    --[__parent__]-> 0x1a8bac80 [JS Object (Window) (global=1a8bac80)]
    --[__proto__]-> 0x1a8bb1e0 [JS Object (XPC_WN_ModsAllowed_NoCall_Proto_JSClass - Window) (global=1a8b4840)]
    --[__parent__]-> 0x1a8b4840 [JS Object (Window) (global=1a8b4840)]
    --[xpc_GetJSPrivate(obj)]-> 0x2de35d60 [XPCWrappedNative (Window)]
    --[]-> 0x2de41bf0 [nsGlobalWindow]

The timeout itself looks to be associated with a window that's connected to a live XULWindow (I found edges going from a XULWindow through a bunch of iframes to the window that holds the timeout). So far it looks like this is due to the way the app is written, but it's hard to know for sure due to the multiple levels of nested iframes.
In particular for the timeout I mentioned before:

0x1fcbdf00 [nsGlobalWindow]
    --[mDocument]-> 0x7203400 [nsDocument]
    --[mSubDocuments entry->mSubDocument]-> 0x7061000 [nsDocument]
    --[mScriptGlobalObject]-> 0x22a4e770 [nsGlobalWindow]
    --[]-> 0x2ce81070 [nsTimeout]

From the other log:
--DOMWINDOW == 55 (0x1fcbdf00) [serial = 7] [outer = 0x0] [url = https://mail.google.com/mail/?ui=2&shva=1#inbox]

That looks like the main Gmail window that you just closed?
Yes, probably.

Maybe timeouts don't keep windows alive in other browsers?
Actually, assuming we've navigated away from these windows, and they weren't bfcached (since they're in IFRAMEs), the timeout can never fire again (right?) so we could drop the reference?
Have we navigated away from the window that the timeout is associated with? The wrapper for |0x22a4e770 [nsGlobalWindow]| is still in 0x2160c1f0's mInnerWindowHolders. I'm trying to figure out why 0x2160c1f0 is still alive, but so far it looks like it's just a live window.
I don't really know, but surely only a small subset of those 75 DOM windows correspond to the current documents of visible IFRAMEs?
timeouts are kept alive by the inner window, right?
Would it be possible that when document.open creates a new inner window,
we somehow don't clean up the old inner window properly?
(just a guess. I don't even know if gmail uses document.open)
Figuring out why that window is alive is not easy, but it is suspicious that
the only edges to it seem to come from the wrapper. Also:

    --[gBrowser]-> 0x1a1628c0 [JS Object (XULElement) (global=1169940)]
    --[mCurrentBrowser]-> 0x1a160dc0 [JS Object (XULElement) (global=1169940)]
    --[_contentWindow]-> 0x1a1733a0 [XPCNativeWrapper (Window)
(global=1a146d00)]
    --[0]-> 0x1bcc43c0 [XPCNativeWrapper (Window) (global=1bcd3f80)]
    --[__parent__]-> 0x1bcd3f80 [JS Object (Window) (global=1bcd3f80)]
    --[xpc_GetJSPrivate(obj)]-> 0x2160dd00 [XPCWrappedNative (Window)]
    --[]-> 0x2160c1f0 [nsGlobalWindow]

I wonder if this is caused by the XPCNativeWrapper for a window not clearing
numerical properties when the corresponding properties on the XPCWrappedNative
have gone away, until someone accesses them.

(In reply to comment #38)
> I don't really know, but surely only a small subset of those 75 DOM windows
> correspond to the current documents of visible IFRAMEs?

Sure, but this looks like timeouts on one window holding all those other windows alive. What matters is if that one window is alive.
Blocking, we need to figure this out for 1.9.3 it seems.
Assignee: nobody → peterv
blocking2.0: ? → beta1+
blocking2.0: beta1+ → beta2+
--> BetaN, I don't think this blocks a specific beta, might actually only be final+ depending on how invasive we suspect the fix might be.
blocking2.0: beta2+ → betaN+
This may be related, pasting my email from one of the lists for reference:

Been browsing for a single day (restarted this morning to apply the latest nightly update). No Flash active, since I'm running the Mac OS X 64-bit builds.

firefox-bin is currently at 701 MB with 3 (yes, three) tabs open — Gmail, nothing else — and I've let it sit for half an hour in case it was slow to free up memory or something.

My (hopefully non-controversial and well-tested) add-ons:

    * AdBlock Plus 1.2.1
    * Add-on Compatibility Reporter 0.5
    * BarTab 1.5.1
    * Bugzilla Tweaks 1.2
    * Firebug 1.6X.0a18
    * Firefox Sync 1.4.1
    * Google Gears 0.5.36.0 (using it with Firefox 3.6, but doesn't do anything in 4.0)
    * LeechBlock 0.5
    * Site Preferences 0.1.1
    * Test Pilot 1.0a3

I then closed the remaining three tabs, so no tabs/windows were open, just the process — and memory is now steady at 659 MB (and 20 threads) after waiting for a bit. Without a single tab open.
I'm definitely seeing some leaking in Fx4b1 if I let it run for a few days with a bunch of tabs open. I too have a number of extensions installed, so on next restart, I'll try running without them to see if I have the same problem. I don't really use Gmail however.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: