Closed
Bug 720512
Opened 13 years ago
Closed 12 years ago
huge number of orphan DOM nodes when Gmail is open for a long time
Categories
(Core :: XPCOM, defect)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: mccr8, Assigned: mccr8)
References
Details
(Whiteboard: [Snappy:P1])
Attachments
(1 file, 2 obsolete files)
9.16 KB,
text/plain
|
Details |
I've noticed that if I leave Gmail up for a couple of days, then CC times start getting bad, around 650ms consistently. GC times remain reasonable.
Memory usage is not awesome, at around 400mb, but not totally abysmal. Heap-unclassified is a little high at 37%.
Presumably Gmail is leaking something that makes the cycle collector sad. I'm going to analyze the CC logs to try to figure out what is going on, and try to figure out if we can do better.
This is certainly a Snappy issue. May be a MemShrink thing, too, so I'll mark it as such for now.
Comment 1•13 years ago
|
||
Andrew says reloading the tab drops the CC times, which backs up it being gmail holding onto stuff.
Updated•13 years ago
|
Assignee: nobody → continuation
Whiteboard: [MemShrink], [Snappy] → [Snappy]
Assignee | ||
Comment 2•13 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #1)
> Andrew says reloading the tab drops the CC times, which backs up it being
> gmail holding onto stuff.
Sure, but it could be that we're leaking stuff within content somehow.
Assignee | ||
Comment 3•13 years ago
|
||
I had this problem again, post CC improvements. I did some analysis of the logs. Here are the top 8 types of nodes:
60471 nsGenericElement (xhtml)
20314 nsGenericDOMDataNode
14247 JS Object
4865 nsEventListenerManager
4259 nsDOMCSSAttributeDeclaration
2813 XPCWrappedNative
1556 nsGenericElement (XUL)
1183 JS Script
Huge number of DOM nodes.
It looks like the bulk of these (around 71000 of them) are orphan DOM nodes, eg DOMs that aren't currently in a document. There are almost 300 separate orphan DOM trees. I'm going to try to merge each of these together into a single node to see if I can make any more sense out of the graph that way, as it will get rid of 71k nodes from the graph.
Assignee | ||
Comment 4•13 years ago
|
||
I'm going to try running this build for a few days with the new DOM bindings turned off and we'll see if that matters.
Assignee | ||
Comment 5•13 years ago
|
||
Same problem with them turned off. about 71575 nodes in orphaned gmail DOMs.
Updated•13 years ago
|
Whiteboard: [Snappy] → [Snappy:P1]
Assignee | ||
Comment 6•13 years ago
|
||
I had a similar problem just now, after my browser had been running for a number of days with Gmail open. I'd been getting some long GC times, often around a second. CC times maybe were a little bad, but not too bad. When I closed Gmail (and some other tabs so maybe it was something else), I started getting 2.6 second CCs. I had about a half dozen of these, until finally it freed the 400k or so nodes and it went back to normal. I forgot to grab a log though. CC and GC times dropped back to fast again (100ms GC and 52ms CC).
Comment 7•13 years ago
|
||
I have Cww's CC log about this issue, I think.
Something somewhere is keeping an element alive.
Comment 8•13 years ago
|
||
Andrew, if you can reproduce this, could you check whether you've used find.
Although bug 669845 is fixed, I think find might keep some elements alive too long.
Also, could you perhaps check what object is the root which keeps the stuff alive,
and when reloading, what ends up releasing it.
(52ms CC time when not loading pages isn't really normal. Median CC time is somewhere between 10 and 20)
Comment 9•13 years ago
|
||
I have 50-60ms CC times and 200-220ms GC with this from about:tabs
13.0a1 (2012-03-03), running since last Saturday night on a private pull & build
781 tabs across 21 groups in 21 windows
29 tabs have been loaded
about:memory
1,203.41 MB ── resident
1,141.36 MB ── explicit
2,074.08 MB ── vsize
about 200MB system-principal; 50MB about:blank (unloaded tabs); 43MB plus.google.com, 23MB gmail; 283MB window-objects; 231MB (20%) heap-unclassified
Assignee | ||
Comment 10•13 years ago
|
||
Randell, could you file a separate issue for that please. This bug is about a specific issue with long CCs while just having a single tab open. Thanks.
(In reply to Olli Pettay [:smaug] from comment #8)
> Andrew, if you can reproduce this, could you check whether you've used find.
> Although bug 669845 is fixed, I think find might keep some elements alive
> too long.
It is hard to reproduce. You have to keep Gmail open for 4 or 5 days. It is better than before. The CC times are not really that bad, it is mostly the GC that is having problems. But given that this problem feels like it is a slow leak of live data, that's what you'd expect.
Woah, I just had a 2 second GC. Awesome.
GC(T+257365.3) TotalTime: 1966.6ms, Type: global, MMU(20ms): 0%, MMU(50ms): 0%, Reason: SET_NEW_DOCUMENT, NonIncrementalReason: incremental permanently disabled, +chunks: 0, -chunks: 0
mark: 135.8, mark-roots: 8.4, mark-other: 19.0, sweep: 756.3, sweep-obj: 27.9, sweep-string: 5.2, sweep-script: 0.9, sweep-shape: 8.1, discard-code: 14.8, discard-analysis: 669.5, xpconnect: 5.0, deallocate: 0.2
I've also seen a few like this:
GC(T+257507.4) TotalTime: 1890.8ms, Type: global, MMU(20ms): 0%, MMU(50ms): 0%, Reason: CC_WAITING, NonIncrementalReason: incremental permanently disabled, +chunks: 0, -chunks: 0
mark: 140.9, mark-roots: 14.2, mark-other: 19.0, sweep: 332.2, sweep-obj: 50.3, sweep-string: 4.7, sweep-script: 0.9, sweep-shape: 7.1, discard-code: 10.5, discard-analysis: 243.5, xpconnect: 4.8, deallocate: 0.2
Notice that there's about a second of GC time unaccounted for in the breakdown!
> Also, could you perhaps check what object is the root which keeps the stuff
> alive,
> and when reloading, what ends up releasing it.
Sure, I'll try to take a log next time it gets bad. Though maybe I should use bad GC times as my benchmark.
Assignee | ||
Comment 11•13 years ago
|
||
Last time I looked at it, the problem was Gmail had a huge number of disconnected DOM nodes.
Assignee | ||
Comment 12•13 years ago
|
||
I'm also getting the occasional weirdly long CC.
CC(T+258720.2) duration: 531ms, suspected: 5678, visited: 9648 RCed and 5530 GCed, collected: 1988 RCed and 61 GCed (2049 waiting for GC)
ForgetSkippable 30 times before CC, min: 0 ms, max: 19 ms, avg: 1 ms, total: 44 ms, removed: 11794
CC(T+258737.9) duration: 72ms, suspected: 1471, visited: 9428 RCed and 5512 GCed, collected: 2168 RCed and 68 GCed (2236 waiting for GC)
ForgetSkippable 7 times before CC, min: 0 ms, max: 49 ms, avg: 10 ms, total: 75 ms, removed: 2393
These CCs both have about the same size CC graph and collected about the same number of objects, but the first one took much longer. It has more suspected nodes, but that doesn't seem like it should be enough. I guess I should turn on detailed timer information.
Assignee | ||
Comment 13•13 years ago
|
||
Nice, I reloaded Gmail just now, and I'm now getting crazy CC times:
CC(T+259244.4) duration: 3385ms, suspected: 208, visited: 226245 RCed and 134109 GCed, collected: 42 RCed and 0 GCed (203 waiting for GC)
ForgetSkippable 2 times before CC, min: 0 ms, max: 0 ms, avg: 0 ms, total: 1 ms, removed: 209
I guess something is holding all of the dead Gmail gunk alive. I have a CC log for it, though logs this huge are hard to analyze.
Doing a find in a different page (on Wikipedia) didn't help anything.
Assignee | ||
Updated•13 years ago
|
Summary: long CC times on gmail after being open for a long time → huge number of orphan DOM nodes when Gmail is open for a long time
Comment 14•13 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=387d15cf5cdd
Make DOMEventTarget helper to not keep window alive.
Comment 15•13 years ago
|
||
Looks like applying the patch to m-c needs some --fuzz
Comment 16•13 years ago
|
||
Attachment #603649 -
Attachment is obsolete: true
Assignee | ||
Comment 17•13 years ago
|
||
I browsed for maybe 5 hours using this patch, then reloaded Gmail and I got this:
CC(T+17874.7) duration: 126ms, suspected: 14978, visited: 23966 RCed and 2803 GCed, collected: 20695 RCed and 284 GCed (20979 waiting for GC)
ForgetSkippable 3 times before CC, min: 6 ms, max: 17 ms, avg: 10 ms, total: 32 ms, removed: 6882
CC(T+17880.9) duration: 19ms, suspected: 458, visited: 3235 RCed and 2524 GCed, collected: 319 RCed and 3 GCed (322 waiting for GC)
ForgetSkippable 6 times before CC, min: 0 ms, max: 12 ms, avg: 2 ms, total: 17 ms, removed: 974
CC(T+17887.1) duration: 11ms, suspected: 104, visited: 2950 RCed and 2521 GCed, collected: 52 RCed and 2 GCed (54 waiting for GC)
ForgetSkippable 6 times before CC, min: 0 ms, max: 11 ms, avg: 2 ms, total: 15 ms, removed: 809
In other words, this appears to have fixed the problem. When I reload Gmail, I get a bunch of nodes in the graph, but they are collected right away. And it doesn't seem quite as bad. I'll leave my Gmail tab up for another day or so and see if that continues to hold.
Assignee | ||
Comment 18•13 years ago
|
||
By "doesn't seem quite as bad", I mean it doesn't seem like there are quite as many orphan nodes.
Comment 19•13 years ago
|
||
Sounds good. The patch is pretty mechanical but need some reviews...
Assignee | ||
Comment 20•13 years ago
|
||
Perhaps I spoke too soon. I'm getting half-second CCs now.
CC(T+46568.4) duration: 519ms, suspected: 90, visited: 22754 RCed and 133975 GCed, collected: 57 RCed and 2 GCed (59 waiting for GC)
ForgetSkippable 2 times before CC, min: 0 ms, max: 24 ms, avg: 12 ms, total: 25 ms, removed: 256
Comment 21•13 years ago
|
||
Comment on attachment 603681 [details] [diff] [review]
v2
>- mScriptContext = aScriptContext;
> if (aOwnerWindow) {
>- mOwner = aOwnerWindow->IsOuterWindow() ?
>- aOwnerWindow->GetCurrentInnerWindow() : aOwnerWindow;
>+ BindToOwner(aOwnerWindow->IsOuterWindow() ?
>+ aOwnerWindow->GetCurrentInnerWindow() : aOwnerWindow);
> } else {
>- mOwner = nsnull;
>+ BindToOwner(aOwnerWindow);
> }
BindToOwner(aOwnerWindow); makes us use the right version of BindToOwner(nsnull).
> nsXMLHttpRequest::GetLoadGroup() const
> {
>- if (mState & XML_HTTP_REQUEST_BACKGROUND) {
>+ if (mState & XML_HTTP_REQUEST_BACKGROUND) {
Apparently I added few extra spaces here. I'll fix.
The idea with the patch is the nsDOMEventTargetHelper objects, like XHR, doesn't own the window or relevant scriptcontext anymore.
Instead the nsDOMEventTargetHelper::mOwner <-> nsGlobalWindow connection is a raw pointer and cleared when either one of the objects is deleted.
mHasHadOwner is there to prevent event handling on objects which don't have mOwner anymore (shouldn't change any behavior, since before the patch
owner shouldn't be the current inner window anymore, if eventtargethelper is the only one which owns it, and in that case CheckInnerWindowCorrectness
would fail.).
Attachment #603681 -
Flags: review?(jst)
Comment 22•13 years ago
|
||
Comment on attachment 603681 [details] [diff] [review]
v2
Ah, better to investigate the CC issues then some more.
Attachment #603681 -
Flags: review?(jst)
Comment 23•13 years ago
|
||
Though, I think we should take the patch anyway. It is just useless to have strong
pointers in nsDOMEventTargetHelper.
I move the patch to another bug.
Updated•13 years ago
|
Attachment #603681 -
Attachment is obsolete: true
Assignee | ||
Comment 24•13 years ago
|
||
Well, maybe it did fix some problem. I'm seeing orphan DOMs from Google Reader, and some pages I had closed many hours before, but Gmail doesn't seem to show up in any important number (this is with all tabs closed).
Assignee | ||
Comment 25•13 years ago
|
||
One of the pages that was leaking appears to be held alive by setDesktopBackground.xul. I'll file a new bug blocking bug 728407.
Assignee | ||
Comment 26•13 years ago
|
||
Okay, so, there are still a ton of Gmail orphan nodes in the WantAllTraces dump I made, so I guess this didn't actually fix it? Well, maybe it didn't fix the leak, but it may have fixed the problem where it takes a long time for things to be freed after you close the tab.
One of the orphan DOMs is being kept alive by two elements in the DOM that have a missing edge:
0x15fec8580 [nsGenericElement (xhtml) div id=':1cu' class='anM' https://mail.google.com/mail/?ui=2&view=bsp&ver=ohhl4rw8mbn4]
0x12c3add00 [nsGenericElement (xhtml) span class='hb' https://mail.google.com/mail/?ui=2&view=bsp&ver=ohhl4rw8mbn4]
Comment 27•13 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #26)
> Okay, so, there are still a ton of Gmail orphan nodes in the WantAllTraces
> dump I made, so I guess this didn't actually fix it? Well, maybe it didn't
> fix the leak, but it may have fixed the problem where it takes a long time
> for things to be freed after you close the tab.
Well, I was hoping that it could fix the latter problem, not the first one.
So, bug 734057 could be for the latter problem, and this one could be used for the original problem.
>
> One of the orphan DOMs is being kept alive by two elements in the DOM that
> have a missing edge:
>
> 0x15fec8580 [nsGenericElement (xhtml) div id=':1cu' class='anM'
> https://mail.google.com/mail/?ui=2&view=bsp&ver=ohhl4rw8mbn4]
> 0x12c3add00 [nsGenericElement (xhtml) span class='hb'
> https://mail.google.com/mail/?ui=2&view=bsp&ver=ohhl4rw8mbn4]
This looks like the Cww's log I have, and bz had similar problem with nytimes.
Could you try to debug what releases those roots when you close the tab.
Assignee | ||
Comment 28•13 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #27)
> Could you try to debug what releases those roots when you close the tab.
Sure. This is a problem that comes up over and over again so I'm tempted to come up with a more general solution.
Assignee | ||
Comment 29•13 years ago
|
||
Here's another weird thing with the patched build. I was logged out by gmail, so it booted me back to the login screen. I logged in again, and I started getting huge CCs:
CC(T+112902.0) duration: 1822ms, suspected: 53, visited: 136442 RCed and 207090 GCed, collected: 52 RCed and 2 GCed (54 waiting for GC)
ForgetSkippable 2 times before CC, min: 0 ms, max: 30 ms, avg: 15 ms, total: 31 ms, removed: 92
Those objects won't go away, even when I closed the window. They've hung around like this for a few minutes. When I did a dump, one of the orphan DOM nodes was being held alive by a path that starts out like this:
0x14ef23c00 [nsDOMEventTargetHelper ]
--[mOnReadystatechangeListener]-> 0x1565f8fa0 [nsDOMEventListenerWrapper]
--[mListener]-> 0x15d6d3010 [nsXPCWrappedJS (nsIDOMEventListener)] --[mJSObj]-> 0x14698e8c0 [JS Object (Function - $Xb)]
--[]-> 0x14dbf52e0 [JS Object (Function - $Xb)] --[]-> 0x155eea340 [JS Object (Object)]
The full path is kind of interesting, I'll attach it. It may shed light on why we're leaking everything in Gmail.
Assignee | ||
Comment 30•13 years ago
|
||
Comment 31•13 years ago
|
||
Jason, is it possible that necko keeps stuff alive forever, or very long time?
nsDOMEventTargetHelper is XMLHttpRequest.
The patch for bug 734057 could be made more effective if the hashtable was
enumerated not only in dtor but also in nsGlobalWindow::CleanUp.
Could you Andrew try that?
Ah, and could you make nsDOMEventTargetHelper::Disconnect() virtual.
Then in XHR add Disconnect(). It should call nsDOMEventTargetHelper::Disconnect()
and make mListener of mOnReadystatechangeListener null.
Comment 32•13 years ago
|
||
If you try that, make sure to clear also mOnFooListeners in nsXHREventTarget
Assignee | ||
Comment 33•13 years ago
|
||
Sure, I can look into that after I finish up this other security bug I've been avoiding for 3 months.
Comment 34•13 years ago
|
||
> Jason, is it possible that necko keeps stuff alive forever
Generally it shouldn't, of course, but I'm not an expert in XHR, so jonas might know better than me (looks like you may have figured this out on IRC with khuey already?)
I have no ideas here.
Are we sure the problem isn't simply Google holding JS-references to these XHR objects? I.e. do we know if the XHR still has a JS-wrapper, and if it does, what's keeping it alive.
We've certainly seen bugs in gmail before resulting in vast number of objects being held alive.
(In reply to Jonas Sicking (:sicking) from comment #35)
> I have no ideas here.
>
> Are we sure the problem isn't simply Google holding JS-references to these
> XHR objects? I.e. do we know if the XHR still has a JS-wrapper, and if it
> does, what's keeping it alive.
>
> We've certainly seen bugs in gmail before resulting in vast number of
> objects being held alive.
There are two unknown native edges. Look at the trace mccr8 posted.
Comment 37•13 years ago
|
||
(In reply to Jonas Sicking (:sicking) from comment #35)
> I have no ideas here.
>
> Are we sure the problem isn't simply Google holding JS-references to these
> XHR objects?
In that case XHR wouldn't be in the CC log at all.
The XPCWrappedNative is keeping the XHR alive. Doesn't that mean that the JS-wrapper is alive?
Comment 39•13 years ago
|
||
Sure, but XPCWrappedNative wouldn't be black here.
Anyhow, there are unknown edges. Those are the interesting ones.
bug 734057 should help here.
Assignee | ||
Comment 40•12 years ago
|
||
I haven't noticed this in a while.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•