Closed Bug 720512 Opened 13 years ago Closed 12 years ago

huge number of orphan DOM nodes when Gmail is open for a long time

Categories

(Core :: XPCOM, defect)

Product:

Component:

Platform:

x86_64

macOS

Type:

defect

Priority:

Not set

Severity:

normal

Tracking

()

Status:

RESOLVED WORKSFORME

People

(Reporter: mccr8, Assigned: mccr8)

References

Details

(Whiteboard: [Snappy:P1])

Attachments

(1 file, 2 obsolete files)

guess fix 13 years ago Olli Pettay [:smaug][bugs@pettay.fi] 82.03 KB, patch		Details \| Diff \| Splinter Review
v2 13 years ago Olli Pettay [:smaug][bugs@pettay.fi] 82.05 KB, patch		Details \| Diff \| Splinter Review
path2orphan 13 years ago Andrew McCreight [:mccr8] 9.16 KB, text/plain		Details

Andrew McCreight [:mccr8]

Assignee

Description

•

13 years ago

I've noticed that if I leave Gmail up for a couple of days, then CC times start getting bad, around 650ms consistently. GC times remain reasonable. Memory usage is not awesome, at around 400mb, but not totally abysmal. Heap-unclassified is a little high at 37%. Presumably Gmail is leaking something that makes the cycle collector sad. I'm going to analyze the CC logs to try to figure out what is going on, and try to figure out if we can do better. This is certainly a Snappy issue. May be a MemShrink thing, too, so I'll mark it as such for now.

Randell Jesup [:jesup] (needinfo me)

Comment 1

•

13 years ago

Andrew says reloading the tab drops the CC times, which backs up it being gmail holding onto stuff.

Nicholas Nethercote [inactive]

Updated

•

13 years ago

Assignee: nobody → continuation

Whiteboard: [MemShrink], [Snappy] → [Snappy]

Andrew McCreight [:mccr8]

Assignee

Comment 2

•

13 years ago

(In reply to Randell Jesup [:jesup] from comment #1) > Andrew says reloading the tab drops the CC times, which backs up it being > gmail holding onto stuff. Sure, but it could be that we're leaking stuff within content somehow.

Andrew McCreight [:mccr8]

Assignee

Comment 3

•

13 years ago

I had this problem again, post CC improvements. I did some analysis of the logs. Here are the top 8 types of nodes: 60471 nsGenericElement (xhtml) 20314 nsGenericDOMDataNode 14247 JS Object 4865 nsEventListenerManager 4259 nsDOMCSSAttributeDeclaration 2813 XPCWrappedNative 1556 nsGenericElement (XUL) 1183 JS Script Huge number of DOM nodes. It looks like the bulk of these (around 71000 of them) are orphan DOM nodes, eg DOMs that aren't currently in a document. There are almost 300 separate orphan DOM trees. I'm going to try to merge each of these together into a single node to see if I can make any more sense out of the graph that way, as it will get rid of 71k nodes from the graph.

Andrew McCreight [:mccr8]

Assignee

Comment 4

•

13 years ago

I'm going to try running this build for a few days with the new DOM bindings turned off and we'll see if that matters.

Andrew McCreight [:mccr8]

Assignee

Comment 5

•

13 years ago

Same problem with them turned off. about 71575 nodes in orphaned gmail DOMs.

(dormant account)

Updated

•

13 years ago

Whiteboard: [Snappy] → [Snappy:P1]

Andrew McCreight [:mccr8]

Assignee

Comment 6

•

13 years ago

I had a similar problem just now, after my browser had been running for a number of days with Gmail open. I'd been getting some long GC times, often around a second. CC times maybe were a little bad, but not too bad. When I closed Gmail (and some other tabs so maybe it was something else), I started getting 2.6 second CCs. I had about a half dozen of these, until finally it freed the 400k or so nodes and it went back to normal. I forgot to grab a log though. CC and GC times dropped back to fast again (100ms GC and 52ms CC).

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 7

•

13 years ago

I have Cww's CC log about this issue, I think. Something somewhere is keeping an element alive.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 8

•

13 years ago

Andrew, if you can reproduce this, could you check whether you've used find. Although bug 669845 is fixed, I think find might keep some elements alive too long. Also, could you perhaps check what object is the root which keeps the stuff alive, and when reloading, what ends up releasing it. (52ms CC time when not loading pages isn't really normal. Median CC time is somewhere between 10 and 20)

Randell Jesup [:jesup] (needinfo me)

Comment 9

•

13 years ago

I have 50-60ms CC times and 200-220ms GC with this from about:tabs 13.0a1 (2012-03-03), running since last Saturday night on a private pull & build 781 tabs across 21 groups in 21 windows 29 tabs have been loaded about:memory 1,203.41 MB ── resident 1,141.36 MB ── explicit 2,074.08 MB ── vsize about 200MB system-principal; 50MB about:blank (unloaded tabs); 43MB plus.google.com, 23MB gmail; 283MB window-objects; 231MB (20%) heap-unclassified

Andrew McCreight [:mccr8]

Assignee

Comment 10

•

13 years ago

Randell, could you file a separate issue for that please. This bug is about a specific issue with long CCs while just having a single tab open. Thanks. (In reply to Olli Pettay [:smaug] from comment #8) > Andrew, if you can reproduce this, could you check whether you've used find. > Although bug 669845 is fixed, I think find might keep some elements alive > too long. It is hard to reproduce. You have to keep Gmail open for 4 or 5 days. It is better than before. The CC times are not really that bad, it is mostly the GC that is having problems. But given that this problem feels like it is a slow leak of live data, that's what you'd expect. Woah, I just had a 2 second GC. Awesome. GC(T+257365.3) TotalTime: 1966.6ms, Type: global, MMU(20ms): 0%, MMU(50ms): 0%, Reason: SET_NEW_DOCUMENT, NonIncrementalReason: incremental permanently disabled, +chunks: 0, -chunks: 0 mark: 135.8, mark-roots: 8.4, mark-other: 19.0, sweep: 756.3, sweep-obj: 27.9, sweep-string: 5.2, sweep-script: 0.9, sweep-shape: 8.1, discard-code: 14.8, discard-analysis: 669.5, xpconnect: 5.0, deallocate: 0.2 I've also seen a few like this: GC(T+257507.4) TotalTime: 1890.8ms, Type: global, MMU(20ms): 0%, MMU(50ms): 0%, Reason: CC_WAITING, NonIncrementalReason: incremental permanently disabled, +chunks: 0, -chunks: 0 mark: 140.9, mark-roots: 14.2, mark-other: 19.0, sweep: 332.2, sweep-obj: 50.3, sweep-string: 4.7, sweep-script: 0.9, sweep-shape: 7.1, discard-code: 10.5, discard-analysis: 243.5, xpconnect: 4.8, deallocate: 0.2 Notice that there's about a second of GC time unaccounted for in the breakdown! > Also, could you perhaps check what object is the root which keeps the stuff > alive, > and when reloading, what ends up releasing it. Sure, I'll try to take a log next time it gets bad. Though maybe I should use bad GC times as my benchmark.

Andrew McCreight [:mccr8]

Assignee

Comment 11

•

13 years ago

Last time I looked at it, the problem was Gmail had a huge number of disconnected DOM nodes.

Andrew McCreight [:mccr8]

Assignee

Comment 12

•

13 years ago

I'm also getting the occasional weirdly long CC. CC(T+258720.2) duration: 531ms, suspected: 5678, visited: 9648 RCed and 5530 GCed, collected: 1988 RCed and 61 GCed (2049 waiting for GC) ForgetSkippable 30 times before CC, min: 0 ms, max: 19 ms, avg: 1 ms, total: 44 ms, removed: 11794 CC(T+258737.9) duration: 72ms, suspected: 1471, visited: 9428 RCed and 5512 GCed, collected: 2168 RCed and 68 GCed (2236 waiting for GC) ForgetSkippable 7 times before CC, min: 0 ms, max: 49 ms, avg: 10 ms, total: 75 ms, removed: 2393 These CCs both have about the same size CC graph and collected about the same number of objects, but the first one took much longer. It has more suspected nodes, but that doesn't seem like it should be enough. I guess I should turn on detailed timer information.

Andrew McCreight [:mccr8]

Assignee

Comment 13

•

13 years ago

Nice, I reloaded Gmail just now, and I'm now getting crazy CC times: CC(T+259244.4) duration: 3385ms, suspected: 208, visited: 226245 RCed and 134109 GCed, collected: 42 RCed and 0 GCed (203 waiting for GC) ForgetSkippable 2 times before CC, min: 0 ms, max: 0 ms, avg: 0 ms, total: 1 ms, removed: 209 I guess something is holding all of the dead Gmail gunk alive. I have a CC log for it, though logs this huge are hard to analyze. Doing a find in a different page (on Wikipedia) didn't help anything.

Andrew McCreight [:mccr8]

Assignee

Updated

•

13 years ago

Summary: long CC times on gmail after being open for a long time → huge number of orphan DOM nodes when Gmail is open for a long time

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 14

•

13 years ago

Attached patch guess fix (obsolete) — Details — Splinter Review

https://tbpl.mozilla.org/?tree=Try&rev=387d15cf5cdd Make DOMEventTarget helper to not keep window alive.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 15

•

13 years ago

Looks like applying the patch to m-c needs some --fuzz

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 16

•

13 years ago

Attached patch v2 (obsolete) — Details — Splinter Review

https://tbpl.mozilla.org/?tree=Try&rev=79633aa76bdf

Attachment #603649 - Attachment is obsolete: true

Andrew McCreight [:mccr8]

Assignee

Comment 17

•

13 years ago

I browsed for maybe 5 hours using this patch, then reloaded Gmail and I got this: CC(T+17874.7) duration: 126ms, suspected: 14978, visited: 23966 RCed and 2803 GCed, collected: 20695 RCed and 284 GCed (20979 waiting for GC) ForgetSkippable 3 times before CC, min: 6 ms, max: 17 ms, avg: 10 ms, total: 32 ms, removed: 6882 CC(T+17880.9) duration: 19ms, suspected: 458, visited: 3235 RCed and 2524 GCed, collected: 319 RCed and 3 GCed (322 waiting for GC) ForgetSkippable 6 times before CC, min: 0 ms, max: 12 ms, avg: 2 ms, total: 17 ms, removed: 974 CC(T+17887.1) duration: 11ms, suspected: 104, visited: 2950 RCed and 2521 GCed, collected: 52 RCed and 2 GCed (54 waiting for GC) ForgetSkippable 6 times before CC, min: 0 ms, max: 11 ms, avg: 2 ms, total: 15 ms, removed: 809 In other words, this appears to have fixed the problem. When I reload Gmail, I get a bunch of nodes in the graph, but they are collected right away. And it doesn't seem quite as bad. I'll leave my Gmail tab up for another day or so and see if that continues to hold.

Andrew McCreight [:mccr8]

Assignee

Comment 18

•

13 years ago

By "doesn't seem quite as bad", I mean it doesn't seem like there are quite as many orphan nodes.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 19

•

13 years ago

Sounds good. The patch is pretty mechanical but need some reviews...

Andrew McCreight [:mccr8]

Assignee

Comment 20

•

13 years ago

Perhaps I spoke too soon. I'm getting half-second CCs now. CC(T+46568.4) duration: 519ms, suspected: 90, visited: 22754 RCed and 133975 GCed, collected: 57 RCed and 2 GCed (59 waiting for GC) ForgetSkippable 2 times before CC, min: 0 ms, max: 24 ms, avg: 12 ms, total: 25 ms, removed: 256

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 21

•

13 years ago

Comment on attachment 603681 [details] [diff] [review] v2 >- mScriptContext = aScriptContext; > if (aOwnerWindow) { >- mOwner = aOwnerWindow->IsOuterWindow() ? >- aOwnerWindow->GetCurrentInnerWindow() : aOwnerWindow; >+ BindToOwner(aOwnerWindow->IsOuterWindow() ? >+ aOwnerWindow->GetCurrentInnerWindow() : aOwnerWindow); > } else { >- mOwner = nsnull; >+ BindToOwner(aOwnerWindow); > } BindToOwner(aOwnerWindow); makes us use the right version of BindToOwner(nsnull). > nsXMLHttpRequest::GetLoadGroup() const > { >- if (mState & XML_HTTP_REQUEST_BACKGROUND) { >+ if (mState & XML_HTTP_REQUEST_BACKGROUND) { Apparently I added few extra spaces here. I'll fix. The idea with the patch is the nsDOMEventTargetHelper objects, like XHR, doesn't own the window or relevant scriptcontext anymore. Instead the nsDOMEventTargetHelper::mOwner <-> nsGlobalWindow connection is a raw pointer and cleared when either one of the objects is deleted. mHasHadOwner is there to prevent event handling on objects which don't have mOwner anymore (shouldn't change any behavior, since before the patch owner shouldn't be the current inner window anymore, if eventtargethelper is the only one which owns it, and in that case CheckInnerWindowCorrectness would fail.).

Attachment #603681 - Flags: review?(jst)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 22

•

13 years ago

Comment on attachment 603681 [details] [diff] [review] v2 Ah, better to investigate the CC issues then some more.

Attachment #603681 - Flags: review?(jst)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 23

•

13 years ago

Though, I think we should take the patch anyway. It is just useless to have strong pointers in nsDOMEventTargetHelper. I move the patch to another bug.

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

13 years ago

Depends on: 734057

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

13 years ago

Attachment #603681 - Attachment is obsolete: true

Andrew McCreight [:mccr8]

Assignee

Comment 24

•

13 years ago

Well, maybe it did fix some problem. I'm seeing orphan DOMs from Google Reader, and some pages I had closed many hours before, but Gmail doesn't seem to show up in any important number (this is with all tabs closed).

Andrew McCreight [:mccr8]

Assignee

Comment 25

•

13 years ago

One of the pages that was leaking appears to be held alive by setDesktopBackground.xul. I'll file a new bug blocking bug 728407.

Andrew McCreight [:mccr8]

Assignee

Comment 26

•

13 years ago

Okay, so, there are still a ton of Gmail orphan nodes in the WantAllTraces dump I made, so I guess this didn't actually fix it? Well, maybe it didn't fix the leak, but it may have fixed the problem where it takes a long time for things to be freed after you close the tab. One of the orphan DOMs is being kept alive by two elements in the DOM that have a missing edge: 0x15fec8580 [nsGenericElement (xhtml) div id=':1cu' class='anM' https://mail.google.com/mail/?ui=2&view=bsp&ver=ohhl4rw8mbn4] 0x12c3add00 [nsGenericElement (xhtml) span class='hb' https://mail.google.com/mail/?ui=2&view=bsp&ver=ohhl4rw8mbn4]

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 27

•

13 years ago

(In reply to Andrew McCreight [:mccr8] from comment #26) > Okay, so, there are still a ton of Gmail orphan nodes in the WantAllTraces > dump I made, so I guess this didn't actually fix it? Well, maybe it didn't > fix the leak, but it may have fixed the problem where it takes a long time > for things to be freed after you close the tab. Well, I was hoping that it could fix the latter problem, not the first one. So, bug 734057 could be for the latter problem, and this one could be used for the original problem. > > One of the orphan DOMs is being kept alive by two elements in the DOM that > have a missing edge: > > 0x15fec8580 [nsGenericElement (xhtml) div id=':1cu' class='anM' > https://mail.google.com/mail/?ui=2&view=bsp&ver=ohhl4rw8mbn4] > 0x12c3add00 [nsGenericElement (xhtml) span class='hb' > https://mail.google.com/mail/?ui=2&view=bsp&ver=ohhl4rw8mbn4] This looks like the Cww's log I have, and bz had similar problem with nytimes. Could you try to debug what releases those roots when you close the tab.

Andrew McCreight [:mccr8]

Assignee

Comment 28

•

13 years ago

(In reply to Olli Pettay [:smaug] from comment #27) > Could you try to debug what releases those roots when you close the tab. Sure. This is a problem that comes up over and over again so I'm tempted to come up with a more general solution.

Andrew McCreight [:mccr8]

Assignee

Comment 29

•

13 years ago

Here's another weird thing with the patched build. I was logged out by gmail, so it booted me back to the login screen. I logged in again, and I started getting huge CCs: CC(T+112902.0) duration: 1822ms, suspected: 53, visited: 136442 RCed and 207090 GCed, collected: 52 RCed and 2 GCed (54 waiting for GC) ForgetSkippable 2 times before CC, min: 0 ms, max: 30 ms, avg: 15 ms, total: 31 ms, removed: 92 Those objects won't go away, even when I closed the window. They've hung around like this for a few minutes. When I did a dump, one of the orphan DOM nodes was being held alive by a path that starts out like this: 0x14ef23c00 [nsDOMEventTargetHelper ] --[mOnReadystatechangeListener]-> 0x1565f8fa0 [nsDOMEventListenerWrapper] --[mListener]-> 0x15d6d3010 [nsXPCWrappedJS (nsIDOMEventListener)] --[mJSObj]-> 0x14698e8c0 [JS Object (Function - $Xb)] --[]-> 0x14dbf52e0 [JS Object (Function - $Xb)] --[]-> 0x155eea340 [JS Object (Object)] The full path is kind of interesting, I'll attach it. It may shed light on why we're leaking everything in Gmail.

Andrew McCreight [:mccr8]

Assignee

Comment 30

•

13 years ago

Attached file path2orphan — Details

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 31

•

13 years ago

Jason, is it possible that necko keeps stuff alive forever, or very long time? nsDOMEventTargetHelper is XMLHttpRequest. The patch for bug 734057 could be made more effective if the hashtable was enumerated not only in dtor but also in nsGlobalWindow::CleanUp. Could you Andrew try that? Ah, and could you make nsDOMEventTargetHelper::Disconnect() virtual. Then in XHR add Disconnect(). It should call nsDOMEventTargetHelper::Disconnect() and make mListener of mOnReadystatechangeListener null.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 32

•

13 years ago

If you try that, make sure to clear also mOnFooListeners in nsXHREventTarget

Andrew McCreight [:mccr8]

Assignee

Comment 33

•

13 years ago

Sure, I can look into that after I finish up this other security bug I've been avoiding for 3 months.

Comment 34

•

13 years ago

> Jason, is it possible that necko keeps stuff alive forever Generally it shouldn't, of course, but I'm not an expert in XHR, so jonas might know better than me (looks like you may have figured this out on IRC with khuey already?)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 35

•

13 years ago

I have no ideas here. Are we sure the problem isn't simply Google holding JS-references to these XHR objects? I.e. do we know if the XHR still has a JS-wrapper, and if it does, what's keeping it alive. We've certainly seen bugs in gmail before resulting in vast number of objects being held alive.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 36

•

13 years ago

(In reply to Jonas Sicking (:sicking) from comment #35) > I have no ideas here. > > Are we sure the problem isn't simply Google holding JS-references to these > XHR objects? I.e. do we know if the XHR still has a JS-wrapper, and if it > does, what's keeping it alive. > > We've certainly seen bugs in gmail before resulting in vast number of > objects being held alive. There are two unknown native edges. Look at the trace mccr8 posted.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 37

•

13 years ago

(In reply to Jonas Sicking (:sicking) from comment #35) > I have no ideas here. > > Are we sure the problem isn't simply Google holding JS-references to these > XHR objects? In that case XHR wouldn't be in the CC log at all.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 38

•

13 years ago

The XPCWrappedNative is keeping the XHR alive. Doesn't that mean that the JS-wrapper is alive?

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 39

•

13 years ago

Sure, but XPCWrappedNative wouldn't be black here. Anyhow, there are unknown edges. Those are the interesting ones. bug 734057 should help here.

Andrew McCreight [:mccr8]

Assignee

Comment 40

•

12 years ago

I haven't noticed this in a while.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → WORKSFORME

You need to log in before you can comment on or make changes to this bug.