Closed Bug 700645 Opened 13 years ago Closed 13 years ago

IRC Cloud causes heap-unclassified blowup, long CC times

Categories

(Core :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: asa, Assigned: mccr8)

References

(Blocks 1 open bug, )

Details

(Whiteboard: [Snappy])

Attachments

(5 files)

After a week or so of trying to narrow it down, I've concluded that it's IRC Cloud that's leaking and leading to my massive heap unclassified memory usage. 

The reason I believe it's Cloud IRC is that when I reload my IRC cloud tab, my memory usage returns to normal. 

I leave IRC Cloud running all the time in a pinned tab. I also interact with it quite a bit. Last night, before going to bed, I looked at my memory usage in about:memory and everything looked reasonable. This morning I reloaded about:memory and it was crazy large. 

1,652.12 MB (100.0%) -- explicit
├──1,407.96 MB (85.22%) -- heap-unclassified
├────192.27 MB (11.64%) -- js
│    ├───58.89 MB (03.56%) -- gc-heap-chunk-dirty-unused
│    ├───29.43 MB (01.78%) -- compartment(https://twitter.com/#!/mozillapeople/awesome)
│    │   ├──18.70 MB (01.13%) -- gc-heap
│    │   │  ├──10.09 MB (00.61%) -- (5 omitted)
│    │   │  └───8.61 MB (00.52%) -- arena
│    │   │      ├──8.44 MB (00.51%) -- unused
│    │   │      └──0.16 MB (00.01%) -- (2 omitted)
│    │   └──10.74 MB (00.65%) -- (8 omitted)
│    ├───22.33 MB (01.35%) -- compartment([System Principal], 0x6e69000)
│    │   ├──11.91 MB (00.72%) -- (8 omitted)
│    │   └──10.42 MB (00.63%) -- gc-heap
│    │      └──10.42 MB (00.63%) -- (7 omitted)
│    ├───21.66 MB (01.31%) -- compartment(https://irccloud.com/#!/ircs://irc.mozilla.org:6697/%23ux)
│    │   ├──16.56 MB (01.00%) -- gc-heap
│    │   │  ├──11.89 MB (00.72%) -- arena
│    │   │  │  ├──11.73 MB (00.71%) -- unused
│    │   │  │  └───0.16 MB (00.01%) -- (2 omitted)
│    │   │  └───4.67 MB (00.28%) -- (5 omitted)
│    │   └───5.10 MB (00.31%) -- (8 omitted)
│    ├───18.79 MB (01.14%) -- compartment(https://mail.mozilla.com/zimbra/?app=calendar&client=advanced#1)
│    │   ├──10.12 MB (00.61%) -- gc-heap
│    │   │  └──10.12 MB (00.61%) -- (6 omitted)
│    │   └───8.67 MB (00.52%) -- (8 omitted)
│    ├───15.82 MB (00.96%) -- compartment(https://bugzilla.mozilla.org/query.cgi)
│    │   ├───8.59 MB (00.52%) -- gc-heap
│    │   │   └──8.59 MB (00.52%) -- (6 omitted)
│    │   └───7.23 MB (00.44%) -- (8 omitted)
│    ├───13.00 MB (00.79%) -- compartment(http://maps.google.com/maps?q=2%20Harrison%20St)
│    │   └──13.00 MB (00.79%) -- (8 omitted)
│    └───12.34 MB (00.75%) -- (9 omitted)
├─────15.95 MB (00.97%) -- images
│     ├──15.64 MB (00.95%) -- content
│     │  ├──15.64 MB (00.95%) -- used
│     │  │  ├──13.84 MB (00.84%) -- uncompressed-heap
│     │  │  └───1.79 MB (00.11%) -- (2 omitted)
│     │  └───0.00 MB (00.00%) -- (1 omitted)
│     └───0.31 MB (00.02%) -- (1 omitted)
├─────14.11 MB (00.85%) -- storage
│     └──14.11 MB (00.85%) -- sqlite
│        ├───8.61 MB (00.52%) -- places.sqlite
│        │   ├──8.32 MB (00.50%) -- cache-used [3]
│        │   └──0.29 MB (00.02%) -- (2 omitted)
│        └───5.50 MB (00.33%) -- (13 omitted)
├─────12.14 MB (00.73%) -- (8 omitted)
└──────9.70 MB (00.59%) -- layout
       └──9.70 MB (00.59%) -- (12 omitted)

Other Measurements
    1.10 MB -- canvas-2d-pixel-bytes
   12.14 MB -- gfx-d2d-surfacecache
   10.81 MB -- gfx-d2d-surfacevram
   14.76 MB -- gfx-surface-image
    0.00 MB -- gfx-surface-win32
1,497.26 MB -- heap-allocated
1,535.26 MB -- heap-committed
      2.47% -- heap-committed-unallocated-fraction
    3.44 MB -- heap-dirty
  503.73 MB -- heap-unallocated
          3 -- js-compartments-system
          8 -- js-compartments-user
  136.00 MB -- js-gc-heap
   27.37 MB -- js-gc-heap-arena-unused
    0.00 MB -- js-gc-heap-chunk-clean-unused
   58.89 MB -- js-gc-heap-chunk-dirty-unused
     63.43% -- js-gc-heap-unused-fraction
    6.59 MB -- js-total-analysis-temporary
   15.17 MB -- js-total-mjit
   31.10 MB -- js-total-objects
   13.00 MB -- js-total-scripts
   13.16 MB -- js-total-shapes
   18.10 MB -- js-total-strings
    3.31 MB -- js-total-type-inference
1,812.75 MB -- private
1,866.88 MB -- resident
    0.70 MB -- shmem-allocated
    0.70 MB -- shmem-mapped
2,557.27 MB -- vsize

Then I reloaded the IRC Cloud tab and the memory usage returned to normal:

276.98 MB (100.0%) -- explicit
├──179.81 MB (64.92%) -- js
│  ├───73.23 MB (26.44%) -- gc-heap-chunk-dirty-unused
│  ├───26.53 MB (09.58%) -- compartment(https://twitter.com/#!/mozillapeople/awesome)
│  │   ├──17.50 MB (06.32%) -- gc-heap
│  │   │  ├──11.46 MB (04.14%) -- arena
│  │   │  │  ├──11.31 MB (04.08%) -- unused
│  │   │  │  └───0.15 MB (00.06%) -- (2 omitted)
│  │   │  ├───3.15 MB (01.14%) -- objects
│  │   │  │   ├──1.65 MB (00.60%) -- non-function
│  │   │  │   └──1.50 MB (00.54%) -- function
│  │   │  ├───1.47 MB (00.53%) -- (3 omitted)
│  │   │  └───1.42 MB (00.51%) -- shapes
│  │   │      └──1.42 MB (00.51%) -- (2 omitted)
│  │   ├───6.10 MB (02.20%) -- (7 omitted)
│  │   └───2.94 MB (01.06%) -- mjit-code
│  │       ├──2.70 MB (00.97%) -- method
│  │       └──0.24 MB (00.09%) -- (2 omitted)
│  ├───18.39 MB (06.64%) -- compartment(https://mail.mozilla.com/zimbra/?app=calendar&client=advanced#1)
│  │   ├──10.12 MB (03.65%) -- gc-heap
│  │   │  ├───4.38 MB (01.58%) -- objects
│  │   │  │   ├──3.52 MB (01.27%) -- non-function
│  │   │  │   └──0.85 MB (00.31%) -- (1 omitted)
│  │   │  ├───2.21 MB (00.80%) -- arena
│  │   │  │   ├──2.13 MB (00.77%) -- unused
│  │   │  │   └──0.08 MB (00.03%) -- (2 omitted)
│  │   │  ├───2.11 MB (00.76%) -- shapes
│  │   │  │   ├──1.66 MB (00.60%) -- tree
│  │   │  │   └──0.45 MB (00.16%) -- (1 omitted)
│  │   │  └───1.42 MB (00.51%) -- (3 omitted)
│  │   ├───3.68 MB (01.33%) -- (5 omitted)
│  │   ├───2.59 MB (00.93%) -- script-data
│  │   └───2.00 MB (00.72%) -- mjit-code
│  │       ├──1.90 MB (00.69%) -- method
│  │       └──0.10 MB (00.04%) -- (2 omitted)
│  ├───15.77 MB (05.69%) -- compartment([System Principal], 0x6e69000)
│  │   ├───8.98 MB (03.24%) -- gc-heap
│  │   │   ├──3.54 MB (01.28%) -- objects
│  │   │   │  ├──2.74 MB (00.99%) -- function
│  │   │   │  └──0.80 MB (00.29%) -- (1 omitted)
│  │   │   ├──2.46 MB (00.89%) -- shapes
│  │   │   │  ├──2.13 MB (00.77%) -- tree
│  │   │   │  └──0.33 MB (00.12%) -- (1 omitted)
│  │   │   ├──1.92 MB (00.69%) -- arena
│  │   │   │  ├──1.86 MB (00.67%) -- unused
│  │   │   │  └──0.06 MB (00.02%) -- (2 omitted)
│  │   │   └──1.07 MB (00.39%) -- (4 omitted)
│  │   ├───2.63 MB (00.95%) -- mjit-code
│  │   │   ├──2.55 MB (00.92%) -- method
│  │   │   └──0.07 MB (00.03%) -- (2 omitted)
│  │   ├───2.61 MB (00.94%) -- (6 omitted)
│  │   └───1.55 MB (00.56%) -- script-data
│  ├───13.00 MB (04.69%) -- compartment(http://maps.google.com/maps?q=2%20Harrison%20St)
│  │   ├───8.17 MB (02.95%) -- gc-heap
│  │   │   ├──4.45 MB (01.61%) -- arena
│  │   │   │  ├──4.38 MB (01.58%) -- unused
│  │   │   │  └──0.07 MB (00.02%) -- (2 omitted)
│  │   │   ├──1.99 MB (00.72%) -- (4 omitted)
│  │   │   └──1.73 MB (00.63%) -- objects
│  │   │      └──1.73 MB (00.63%) -- (2 omitted)
│  │   ├───2.70 MB (00.98%) -- (6 omitted)
│  │   └───2.13 MB (00.77%) -- mjit-code
│  │       ├──2.07 MB (00.75%) -- method
│  │       └──0.05 MB (00.02%) -- (2 omitted)
│  ├───11.85 MB (04.28%) -- compartment(https://irccloud.com/#!/ircs://irc.mozilla.org:6697/%23ux)
│  │   ├───8.41 MB (03.04%) -- gc-heap
│  │   │   ├──5.50 MB (01.98%) -- arena
│  │   │   │  ├──5.41 MB (01.95%) -- unused
│  │   │   │  └──0.09 MB (00.03%) -- (2 omitted)
│  │   │   ├──1.55 MB (00.56%) -- objects
│  │   │   │  └──1.55 MB (00.56%) -- (2 omitted)
│  │   │   └──1.36 MB (00.49%) -- (4 omitted)
│  │   ├───2.00 MB (00.72%) -- (7 omitted)
│  │   └───1.44 MB (00.52%) -- mjit-code
│  │       └──1.44 MB (00.52%) -- (3 omitted)
│  ├────8.69 MB (03.14%) -- compartment(https://bugzilla.mozilla.org/query.cgi)
│  │    ├──5.03 MB (01.82%) -- gc-heap
│  │    │  ├──2.82 MB (01.02%) -- (5 omitted)
│  │    │  └──2.21 MB (00.80%) -- objects
│  │    │     └──2.21 MB (00.80%) -- (2 omitted)
│  │    ├──2.16 MB (00.78%) -- (6 omitted)
│  │    └──1.50 MB (00.54%) -- mjit-code
│  │       ├──1.41 MB (00.51%) -- method
│  │       └──0.09 MB (00.03%) -- (2 omitted)
│  ├────6.81 MB (02.46%) -- compartment(atoms)
│  │    ├──4.80 MB (01.73%) -- string-chars
│  │    └──2.01 MB (00.73%) -- gc-heap
│  │       ├──1.80 MB (00.65%) -- strings
│  │       └──0.21 MB (00.08%) -- (1 omitted)
│  ├────2.16 MB (00.78%) -- runtime
│  │    ├──2.00 MB (00.72%) -- atoms-table
│  │    └──0.16 MB (00.06%) -- (1 omitted)
│  ├────2.13 MB (00.77%) -- gc-heap-chunk-admin
│  └────1.25 MB (00.45%) -- (6 omitted)
├───47.99 MB (17.33%) -- heap-unclassified
├───16.92 MB (06.11%) -- images
│   ├──16.60 MB (05.99%) -- content
│   │  ├──16.60 MB (05.99%) -- used
│   │  │  ├──14.81 MB (05.35%) -- uncompressed-heap
│   │  │  ├───1.79 MB (00.65%) -- raw
│   │  │  └───0.00 MB (00.00%) -- (1 omitted)
│   │  └───0.00 MB (00.00%) -- (1 omitted)
│   └───0.31 MB (00.11%) -- (1 omitted)
├───13.99 MB (05.05%) -- storage
│   └──13.99 MB (05.05%) -- sqlite
│      ├───8.49 MB (03.06%) -- places.sqlite
│      │   ├──8.19 MB (02.96%) -- cache-used [3]
│      │   └──0.29 MB (00.11%) -- (2 omitted)
│      ├───4.11 MB (01.48%) -- (12 omitted)
│      └───1.39 MB (00.50%) -- cookies.sqlite
│          └──1.39 MB (00.50%) -- (3 omitted)
├────8.40 MB (03.03%) -- layout
│    ├──4.97 MB (01.79%) -- (10 omitted)
│    ├──1.91 MB (00.69%) -- shell(https://twitter.com/#!/mozillapeople/awesome)
│    │  ├──1.63 MB (00.59%) -- arenas
│    │  └──0.28 MB (00.10%) -- (2 omitted)
│    └──1.53 MB (00.55%) -- shell(https://mail.mozilla.com/zimbra/?app=calendar&client=advanced)
│       └──1.53 MB (00.55%) -- (2 omitted)
├────5.20 MB (01.88%) -- dom
├────2.46 MB (00.89%) -- (6 omitted)
└────2.21 MB (00.80%) -- spell-check

Other Measurements
    1.10 MB -- canvas-2d-pixel-bytes
   12.18 MB -- gfx-d2d-surfacecache
   10.81 MB -- gfx-d2d-surfacevram
   15.73 MB -- gfx-surface-image
    0.00 MB -- gfx-surface-win32
  123.46 MB -- heap-allocated
  167.69 MB -- heap-committed
     26.36% -- heap-committed-unallocated-fraction
    3.59 MB -- heap-dirty
  521.53 MB -- heap-unallocated
          3 -- js-compartments-system
          8 -- js-compartments-user
  136.00 MB -- js-gc-heap
   26.62 MB -- js-gc-heap-arena-unused
    0.00 MB -- js-gc-heap-chunk-clean-unused
   73.23 MB -- js-gc-heap-chunk-dirty-unused
     73.41% -- js-gc-heap-unused-fraction
    2.16 MB -- js-total-analysis-temporary
   13.34 MB -- js-total-mjit
   21.21 MB -- js-total-objects
   12.99 MB -- js-total-scripts
   12.42 MB -- js-total-shapes
    9.20 MB -- js-total-strings
    3.17 MB -- js-total-type-inference
  443.03 MB -- private
  500.09 MB -- resident
    0.70 MB -- shmem-allocated
    0.70 MB -- shmem-mapped
1,210.88 MB -- vsize

So, that blow-up happened without interaction. Here's what else might be special about my use of IRC Cloud. 

I have 7 channels open, including a couple of medium volume ones like #developers #fx-team and #ux.  I also have a couple of query chats with individuals open. Finally, I have an archive of a few previous one on one chats but I don't think those are loaded except when you manually fetch them.
Any ideas where that memory is being used? In the first dump the irccloud js compartment seems to only be using 01.31%

There's about ~70% unaccounted for in that trace.
Oh sorry, I misread the indentation. It's all in heap-unclassified I guess.
Blocks: 696761
Whiteboard: [MemShrink]
At the moment with IRCCloud we are letting history grow forever in the DOM. We'll probably end up trimming older messages to keep a leaner DOM, but it would be interesting to know exactly where the memory is leaking, as it's not listed as DOM or JS here.
> We'll probably end up trimming older messages to keep a leaner DOM, but it would be interesting to 
> know exactly where the memory is leaking, as it's not listed as DOM or JS here.

Yes.  It's a bug on our end that all the memory is going into heap-unclassified.
Based on comment 4, I'm taking out "memory leak" from the bug summary.
Summary: IRC Cloud causes huge memory leak but all in heap unclassified → IRC Cloud causes heap-unclassified blowup
(In reply to James Wheare from comment #4)
> At the moment with IRCCloud we are letting history grow forever in the DOM.
> We'll probably end up trimming older messages to keep a leaner DOM, but it
> would be interesting to know exactly where the memory is leaking, as it's
> not listed as DOM or JS here.

That does sound like a possible culprit.  How do you store the history?  That might help somebody (not me...) figure out what reporter might be missing.  Or since you are presumably storing the entire history in the same way from the get go, njn may have some luck getting some insights with only a little bit of DMD.

The JS reporters are generally fairly complete, so I'd think it is more likely that it is some other kind of reporter that is missing.
Blocks: DarkMatter
History is stored in a series of divs, grouped by channel, most of which have ids, e.g:

<div class="row messageRow type_buffer_msg highlight chat" id="e127_225136">

Most of them are not referenced by javascript. There is some pretty hairy CSS used for layout though.
> History is stored in a series of divs, grouped by channel, most of which have ids

Is there a significant amount of text in the divs?  We're still not reporting textruns... but I'd expect those to mostly expire anyway...

In any case, if we can just get some DMD stacks for this stuff, esp ones with 8+ frames (to make sure we figure out what's going on if this is string memory), I'm pretty sure things will become clear.  "Just" need to reproduce under DMD!
In each div there are a few spans that contain the timestamp, author and message. Not a lot of text no, but a bunch of floating and box model trickery to align them correctly. With some elements that are positioned off screen so that they're invisible but show up when copy/pasted. The off screen elements are all absolutely positioned in the same spot at: left: -999px; top: -999px;

The layout should be easily inspectable with Firebug or equivalent. It might be worth taking a look at the DOM structure in Tilt too http://blog.mozilla.com/tilt/

If anyone needs an invite to IRCCloud to debug this let me know your email address.
I ran IRCCloud for a few hours this morning.

During lunch, while I was watching a flash video, the browser suddenly started freezing for long periods of time (20s?).  I closed the IRCCloud tab, and things got better, but still unacceptable.

It doesn't look like this heap-unclassified growth is gradual.

I now have 2000ms CC times, 1GB heap-unclassified, and a zombie IRCCloud compartment.
re the zombie compartment, I'm running ABP 1.3.11a.3198 as my only extension.  (I saw the zombie before I enabled about:telemetry.)
Sorry to spam the comments.

I'm running the 2011-11-08 nightly, http://hg.mozilla.org/mozilla-central/rev/81dedcc49ac0, which aiui has peterv's xpcom wrapper fixes.
Justin is getting 300k+ ref counted nodes in his CC graph (that's the upper end of the bucket), but it sounds like his JS nodes are reasonable.  This is consistent with his really long CC times and reasonable GC times.
> "Just" need to reproduce under DMD!

I'm trying to reproduce without DMD first.  I left IRCCloud running last night, no dice, I'll leave it running longer.

When we get a blow-up I always suspect JS first, but our JS coverage in about:memory is pretty darn good.  Our DOM coverage is not as good, AIUI.
I would recommend joining some high traffic channels, e.g. #freenode, #ubuntu on freenode. And then make sure you select them at least once before you leave it running. The history is only rendered to the DOM after you first select a channel, but then will continue unabated.
Summary: IRC Cloud causes heap-unclassified blowup → IRC Cloud causes heap-unclassified blowup, long CC times
I analyzed Justin's 524MB CC log to see what objects are in it [1].  Here are the top 9 kinds of objects that appear:

 2297899 nsGenericDOMDataNode
 1108899 nsGenericElement (xhtml) a
   76117 nsGenericElement (xhtml) span
   42758 JS Object (Object)
   26777 JS Object (Array)
    9698 nsGenericElement (xhtml) div
    8831 nsGenericElement (xhtml) li
    8477 JS Object (HTMLLIElement)
    8191 JS Object (HTMLAnchorElement)

The first two seem bad...

[1] https://github.com/amccreight/heapgraph/blob/master/cc/live_census.py
Hmm.  nsGenericDOMDataNode does report both itself and its text as part of the DOM memory reporter....

One other possible non-dmd option.  I seem to recall roc at some point playing with heap dumps on Windows.  I wonder whether we could just look around the heap for what's allocated in the cases when 90+% is unclassified?
We're currently making quite extensive use of the jQuery .data() storage, for what it's worth. But only to store javascript objects that are referenced elsewhere, so it shouldn't add to memory usage, unless it's making copies somewhere.
This probably isn't surprising, but about a 1/5th of the lines in the file are edges of mAttrsAndChildren[i]

host-7-174:jlebar amccreight$ grep "mAttrsAndChildren\[i\]" cc-edges.log | wc
 3511941 10535823 133458474
host-7-174:jlebar amccreight$ wc cc-edges.log 
 15234657 48136852 524045490 cc-edges.log

jst was digging around in the reporter for nsGenericDOMData node.  It looks like there was actually a bug there that was fixed on the 21st (caught by clang): bug 695324.  jst says that it could cause some pretty severe undercounting.  But I assume Justin is running a nightly more recent than that.
Comment 14: 

> I'm running the 2011-11-08 nightly, http://hg.mozilla.org/mozilla-
> central/rev/81dedcc49ac0, which aiui has peterv's xpcom wrapper fixes.

My DOM reporter counts 3,352,073 bytes.  That's 3,352,073 / 2,297,899 ~= 1.5 bytes per nsGenericDOMDataNode object, excluding all other DOM objects.

Sounds like the reporter is not functioning as we expect.
So it looks like the DOM memory reporter walks all extant windows and for each one it walks the window's document (if any) and for each document it walks the non-anonymous nodes in that document.

That's going to miss disconnected subtrees of various sorts.  It might also miss documents whose inner window has already gone away (assuming this can happen, of course).

The question is what we can do about it.  Adding every DOM subtree root to some sort of global hashset sounds expensive.  Can we assume that we only need to worry about stuff with JS wrappers to find subtrees (not subtree roots, note!) and walk some sort of xpconnect hashtables for now?  At least until the new DOM bindings are done?

In any case, 2.3e6 textnodes should be about 156MB just for the objects, not counting their actual text.  1.1e6 <a> elements is 110MB, again not counting things like child lists, attributes, etc.

Sounds like irccloud has a whole bunch of text-and-anchors that it's created but not put in the DOM.  Is that expected for history?  Are millions of nodes expected there?
Note that my IRCCloud tab is closed.  Something is keeping it alive; there's a zombie compartment.  That something may be keeping alive these millions of nodes as well.
Ah so for every URL we detect we're creating an anchor that never gets appended to the DOM and only gets assigned to a local variable within a function. It's used so we can easily access the location API on the element and get access to protocol, host and pathname properties, letting the browser parse the URL for us.

I can push a change to the site that parses the URL parts without this dummy anchor node and you can verify if the leak is still there.
> that never gets appended to the DOM and only gets assigned to a local variable within a
> function.

Are you creating text nodes under those anchors too?  Can you link to the relevant code?

Please don't change anything about the site!
Don't worry I shan't change anything until you give the go ahead :) And it can always be reverted.

Here's the code, using jQuery. No text nodes, just an href

            var dummyLink = $('<a>').attr('href', url)[0];
            if (window.location.protocol == dummyLink.protocol &&
                window.location.host     == dummyLink.host     &&
                window.location.pathname == dummyLink.pathname) {
                target = '';
            }

I'll have a look at the jQuery source to see what it's doing under the hood.
I'm pretty sure jQuery will just do a document.createElement followed by element.setAttribute in this case.
Assignee: nobody → continuation
Whiteboard: [MemShrink] → [MemShrink:P1]
Andrew, what's the plan for fixing this, and what assistance would you like, from me or others?

It seems to me that the main issue is the zombie compartment.  If we could fix the zombie compartment, the high heap-unclassified might go away; it might be that all these extra DOM objects are attached to the zombie window, but aren't being counted in our DOM memory reporter because the window should be dead.
Depends on: 701041
I filed Bug 701041 on getting the DOM memory reporter to piggyback on the cycle collector.  That could solve the heap-unclassified problem here.

The other subproblem here is the zombie compartment, and possible other junk that is leaking.  I'll investigate the CC graph you gave me, Justin, to see what is holding some of those DOM nodes alive.  If the compartment is being held alive via DOM then that might give some insight.
Here's what is keeping one of the nsGenericDOMDataNodes alive.  I just picked this one at random.

The nsDocument itself is being held alive by an unknown edge:
0x7f1d2bf16000 nsDocument (xhtml) https://irccloud.com/#!/ircs://irc.mozilla.org:6697/%23developers

I assume that is the document for the page?
    Root 0x7f1d2bf16000 is a ref counted object with 2 unknown edge(s).
    known edges:
       0x7f1d29e28800 [nsPresContext] --[mDocument]-> 0x7f1d2bf16000
       0x7f1d214e8240 [nsEventStateManager] --[mDocument]-> 0x7f1d2bf16000
       0x7f1d211ea880 [nsNodeInfoManager] --[mDocument]-> 0x7f1d2bf16000
       0x7f1d209960b0 [XPCWrappedNative (HTMLDocument)] --[mIdentity]-> 0x7f1d2bf16000

There's also a nsGenericElement (xhtml) form, a nsGenericDOMDataNode, and a nsGenericElement (xhtml) hr with missing edges holding the node alive.

The paths are not identical, but they start out quickly going into JS, passing through a large number of JS Objects, Functions, Calls and a Window, then pass through a JS Object (HTMLIElement):

    --[]-> 0x7f1d1340ff98 [JS Object (Object) (global=7f1d215da060)]
    --[]-> 0x7f1d125adef8 [JS Object (HTMLLIElement) (global=7f1d215da060)]
    --[xpc_GetJSPrivate(obj)]-> 0x7f1cff681280 [nsGenericElement (xhtml) li]
    --[GetParent()]-> 0x7f1d2160d470 [nsGenericElement (xhtml) ul]
    --[mAttrsAndChildren[i]]-> 0x7f1d1b164b00 [nsGenericElement (xhtml) li]
    --[mAttrsAndChildren[i]]-> 0x7f1d3529c8a0 [nsGenericElement (xhtml) a]
    --[mAttrsAndChildren[i]]-> 0x7f1d1b164d80 [nsGenericDOMDataNode]
I should say, the initial paths into JS are not identical, but they eventually converge, so the tail of all of the paths look like what I pasted in the last comment.
> The nsDocument itself is being held alive by an unknown edge:
> 0x7f1d2bf16000 nsDocument (xhtml)
> https://irccloud.com/#!/ircs://irc.mozilla.org:6697/%23developers
>
> I assume that is the document for the page?

I'm not sure if this answers your question, but that's the URL of the zombie compartment.
What does it mean to see a JSScript in the chain?
Good point, Kyle.  I'm not really sure. Looking at the raw graph, the JSScript 0x7f1d20f87980 (which is in the chain) has about 360 outgoing edges.  The second one in the chain only has a single outgoing edge.

There are about 13000 JSScripts in the CC graph.  For some reason my census script only finds 3999 which is odd...
billm, can you answer comments 35/36?
The reason my census script only found 4000 was that I am skipping marked ones, which makes sense.

I talked to bill.  The scripts are basically the raw representation of the actual functions, I believe.  He said it isn't that surprising to see a JSScript with a bunch of outgoing edges.  It looks like the real problem is that the second JSScript points to its global (which I guess is normal), but then the global holds a bunch of things.  So probably not a problem with the script itself.

I'm going to write a script to see what exactly is holding onto the nsGenericDOMDataNode and the a elements.  They vastly outnumber everything else, so there must be something that is holding a huge number of references to them.
Just to reiterate, it seems like there are three things wrong here:
1. The zombie compartment.
2. The huge number of anchor elements and nsGenericDOMDataNode.
3. The heap-unclassified issue.

For #2, it would help to know the context for the JS code that creates the anchor tags in comment 28. Specifically, are there any other functions nested inside of the function that creates the anchor? Or does it call functions that might create closures that live a long time?
So the log in comment 32 shows that there is an <li> which has an <a> child which has a textnode child.  The <li> also has a <ul> parent.  Furthermore, the <li> is closed over by a function that's reachable from the Window.  Since it's reachable, it can't go away, and since it closes ovr the <li> the <li> can't go away and neither can any of its kids.

That shouldn't be related to the dummyLink thing, unless that's added as a child to an <li> at some point.

James, does the <li> thing spark anything on your end?
It looks like there are 1108831 nsGenericDOMDataNodes being held in "nsGenericElement (xhtml) a" nodes.  The only other one that is even remotely common is 75854 being held by "nsGenericElement (xhtml) span".  This may be incomplete because in that data collection I am only counting a node once if it holds two different nodes.

I then counted which nodes are holding <a> nodes.  There are 1.1 million in nsGenericDomDataNodes, and another 1.1 million in... spans (nsGenericElement (xhtml) span).  This is weird because there are only 76k spans in the graph, so some of them must be gigantic.
OK, so the <li> thing is not relevant.  What we care about is a <span> holding an <a> via its childNodes, and finding out what's keeping that <span> alive...
I looked at the number of out edges from each span in the graph.  The largest has 2954.  So, it isn't concentrated into a single span.

Here are the sizes of the largest 10 spans: 2954, 2952, 2950, 2948, 2946, 2944, 2942, 2940, 2938, 2936, 2934.  Notice a pattern?  It continues that way, dropping by 2 every time as far as I can tell from skimming things, until 94!  Then there are 2 spans with 93 children, 1 with 92, 4 with 91, etc, breaking the pattern.

I looked in detail at a span with 300 out edges.  This is an mNodeInfo, a GetParent(), and 298 elements of mAttrsAndChildren.  The span has 299 references.  I'm assuming 298 of them are parent pointers from the mAttrsAndChildren.  The final reference is an mAttrsAndChildren in a div entry:

0x7f1d1ccdf800 [rc=5] nsGenericElement (xhtml) div
> 0x7f1d12ca4500 Preserved wrapper
> 0x7f1d20b10720 mNodeInfo
> 0x7f1d1ccdf980 mAttrsAndChildren[i]
> 0x7f1d1ccdfa80 mAttrsAndChildren[i]
> 0x7f1d1ccdfe80 mAttrsAndChildren[i]
> 0x7f1d1ccdff80 mAttrsAndChildren[i]  <-- this is Sparta.  I mean, the node with 300 out edges.

As for the mAttrsAndChildren entries of the span node, based on a spot check, they seem to alternate between <a> and nsGenericDomDataNodes, which explains why all of the sizes are even.  The other children of the div are also spans, but small ones.

I'm working on figuring out the full reason the largest node is alive, but doing a full log analysis that looks at edges is extremely slow.
> Here are the sizes of the largest 10 spans:

Nice.  So that's about 2 million kids in those spans.  It sounds like something keeps cloning nodes and adding 2 kids to the clone and then leaking all the nodes or something....

So yeah, we need to figure out why that div is alive...  What do the paths to it look like?  Or is getting those the slow thing?
This took about 40 minutes to generate.  Yikes.

Anyways, it looks almost exactly like what is keeping the nsGenericDOMDataNode alive.  Exact same roots, almost the same paths.  Looks like some JS-y stuff is leaking through a window, which eventually reached a JS Object (HTMLDivElement), whatever that is, which has as a Parent() the div node hodling the span.
Comment on attachment 573333 [details]
what is keeping a mega span / div alive

So this looks like there is a path from the window to some function that closes over an array.  That array contains objects, some of which hold on to the <div> element(s).

James, any ideas?

Andrew, is there a way to change the log output as follows:

1)  For Function objects output the function's name, if any.
2)  For nsGenericElements, output the ID, if any.

That would possibly help narrow things down here... maybe.
Not sure if it's relevant anymore, but there are no function closures created around the dummyLink.

I think I need a more detailed description of the DOM/JS structures involved here, function names, object property names, element ids, classes, full paths from window/document root would all be helfpul.

I'm not entirely sure what's meant by "edges" in this context.
(In reply to James Wheare from comment #47)
> I'm not entirely sure what's meant by "edges" in this context.

I just mean a reference.  So, if I say there's an edge from a div to a span, I just mean, the div contains a pointer to a span.
Attached file Combined join/part div
Attached file Single join div
OK so here's something to chew on.

We have some code that combines IRC joins/parts/quit messages etc into single lines. To progressively build up these lines, we stuff references to the individual message divs in an array. We then store this array in an object keyed on a DOM id of the combined div using the jQuery data() method. So each time a new line comes in we look up the existing lines and rewrite the combined div.

I've attached what the HTML for both a combined line, and a single join line look like.
Oh and these combined rows can grow very large in some channels where people join and leave quite frequently without saying anything.
James, that sounds like exactly the sort of thing Andrew is seeing in the log.

So what he's seeing is an array of objects; each object has a reference to a div.  Each div has a span with a few thousand kids.  Based on your attachment, the combined div's <span class="message"> looks exactly like such a <span>.

But is it expected to have 3000 or more combined divs around at once?  Comment 50 doesn't sound like it...

I'd really like to see your code that manages this combined div stuff.

Andrew, add to the requests from comment 46 a dump of the class attribute?  ;)
Well if you're in 7 channels and people are joining and leaving a lot you could get a lot of these divs. They probably won't all have thousands of kids though.

The code is in this file:
https://irccloud.com/static/js/app/renderer.js

Look for the groupJoinPart function (defined on line 172), which will get called for each successive join/part line. It's a bit complex but note particularly line 201:

> row.data('rows', rows)

(aside: yes that file is embarrasingly monolithic, I know :) I'm knee deep in a refactoring branch atm, which incidentally is doing this part quite a bit differently and may not suffer from the same issues)
> They probably won't all have thousands of kids though.

Right; the issue here is having 1500 divs with numbers of kids ranging from 100 to 3000 exactly in increments of 2....

Thanks for the link to the code; I'll take a look at that tonight!
Boris: Unfortunately I'm not sure how much of that information I can get.  The JS dump already tries to put the function name in if it can figure it out.  The JS dumping code also doesn't give edge names in opt builds which makes things more confusing to figure out.

As for the DOM stuff, I'm not entirely sure where the ids and class attributes are stored, but I don't think I have that in the CC dump either.  This is literally all of the information I have about the span:

0x7f1c5ed95b00 [rc=2] nsGenericElement (xhtml) span
> 0x7f1d20b109a0 mNodeInfo
> 0x7f1c5ed72500 GetParent()
> 0x7f1c5ed95b80 mAttrsAndChildren[i]

I looked at the mNodeInfo object, but it doesn't seem to have much that looks very interesting.
(In reply to Andrew McCreight [:mccr8] from comment #55)
> Boris: Unfortunately I'm not sure how much of that information I can get. 
> The JS dump already tries to put the function name in if it can figure it
> out.  The JS dumping code also doesn't give edge names in opt builds which
> makes things more confusing to figure out.

How hard would it be to fix that?

> As for the DOM stuff, I'm not entirely sure where the ids and class
> attributes are stored, but I don't think I have that in the CC dump either. 

We should fix that too. That's definitely not hard.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #56)
> How hard would it be to fix that?
I think the question would be what the performance impact is.  It would add a very predictable branch every time the JS GC steps along the edge of a graph.  I can ask Bill about that.

Anyways, having the debug info here wouldn't tell us the name of the function.  I don't know anything about the JS engine's function representation to know why it gives up on producing a name in this case.

> We should fix that too. That's definitely not hard.

Okay, I can look into that.
> Boris: Unfortunately I'm not sure how much of that information I can get.

I didn't mean from this log.  This log is what it is, right?  I meant for future.  Logging the id and classes of an element should not be hard, esp if we want to condition it on some sort of env var or something.

As for the JS functions, they could just be anonymous.  Which is annoying, but not unexpected.

I'm not going to finish reading through renderer.js tonight; will pick it up again tomorrow morning.
(In reply to Andrew McCreight [:mccr8] from comment #57)
> I think the question would be what the performance impact is.  It would add
> a very predictable branch every time the JS GC steps along the edge of a
> graph.  I can ask Bill about that.

Now that the GC and the CC use different mark paths, I doubt it would affect performance much at all. We should fix this.
(In reply to Boris Zbarsky (:bz) from comment #58)
> I didn't mean from this log.
Ah, sorry, I misunderstood.  I'll file bugs on that and on the JS_TraceChildren edge information in non-debug builds.
Depends on: 701415
Depends on: 701423
dvander found a page with very similar symptoms.  See bug 701443.
It's possible that the zombie compartment is coming from Adblock Plus (bug 701477), so I'm going to try to reproduce with a clean profile.
I ran irccloud continuously for something like 3 days on a clean profile, no add-ons, and didn't see the bad spike that Asa and Justin saw.  Memory usage did grow slowly and continuously, by the time I gave up "explicit" was 819MB and "heap-unclassified" was only 24%.  (about:memory output is below.)

I reloaded just before giving up and "explicit" dropped back to 342MB.  Then I closed the tab and the irccloud.com compartment went away, so I don't see the Zombie.  So there's some evidence for the theory that ABP is the problem.



Main Process

Explicit Allocations
819.78 MB (100.0%) -- explicit
├──456.04 MB (55.63%) -- js
│  ├──341.86 MB (41.70%) -- compartment(https://irccloud.com/)
│  │  ├──239.80 MB (29.25%) -- gc-heap
│  │  │  ├──105.75 MB (12.90%) -- objects
│  │  │  │  ├──103.96 MB (12.68%) -- non-function
│  │  │  │  └────1.78 MB (00.22%) -- (1 omitted)
│  │  │  ├───85.17 MB (10.39%) -- strings
│  │  │  ├───37.13 MB (04.53%) -- arena
│  │  │  │   ├──33.49 MB (04.09%) -- unused
│  │  │  │   └───3.63 MB (00.44%) -- (2 omitted)
│  │  │  ├────6.97 MB (00.85%) -- (2 omitted)
│  │  │  └────4.78 MB (00.58%) -- shapes
│  │  │       └──4.78 MB (00.58%) -- (2 omitted)
│  │  ├───60.89 MB (07.43%) -- string-chars
│  │  ├───20.70 MB (02.53%) -- analysis-temporary
│  │  ├────9.31 MB (01.14%) -- object-slots
│  │  ├────6.89 MB (00.84%) -- (4 omitted)
│  │  └────4.27 MB (00.52%) -- type-inference
│  │       └──4.27 MB (00.52%) -- (3 omitted)
│  ├───65.65 MB (08.01%) -- compartment(atoms)
│  │   ├──61.88 MB (07.55%) -- string-chars
│  │   └───3.77 MB (00.46%) -- (1 omitted)
│  ├───14.62 MB (01.78%) -- compartment([System Principal], 0x7f6c3cc25000)
│  │   ├───8.85 MB (01.08%) -- gc-heap
│  │   │   ├──4.72 MB (00.58%) -- (6 omitted)
│  │   │   └──4.13 MB (00.50%) -- objects
│  │   │      └──4.13 MB (00.50%) -- (2 omitted)
│  │   └───5.77 MB (00.70%) -- (8 omitted)
│  ├────9.34 MB (01.14%) -- gc-heap-chunk-dirty-unused
│  ├────8.00 MB (00.98%) -- stack
│  ├────7.92 MB (00.97%) -- gc-heap-decommitted
│  ├────4.32 MB (00.53%) -- runtime
│  │    └──4.32 MB (00.53%) -- (2 omitted)
│  ├────4.28 MB (00.52%) -- gc-heap-chunk-admin
│  └────0.04 MB (00.00%) -- (2 omitted)
├──200.40 MB (24.45%) -- heap-unclassified
├──134.01 MB (16.35%) -- dom
├───16.49 MB (02.01%) -- layout
│   ├──14.80 MB (01.81%) -- shell(https://irccloud.com/)
│   │  ├──14.59 MB (01.78%) -- arenas
│   │  └───0.22 MB (00.03%) -- (1 omitted)
│   └───1.68 MB (00.21%) -- (4 omitted)
├────6.83 MB (00.83%) -- (8 omitted)
└────6.01 MB (00.73%) -- storage
     └──6.01 MB (00.73%) -- sqlite
        └──6.01 MB (00.73%) -- (14 omitted)

Resident Set Size (RSS) Breakdown

Proportional Set Size (PSS) Breakdown

Virtual Size Breakdown

Swap Usage Breakdown

Other Measurements
    0.00 MB -- canvas-2d-pixel-bytes
    0.05 MB -- gfx-surface-image
    0.10 MB -- gfx-surface-xlib
  527.88 MB -- heap-allocated
  575.14 MB -- heap-committed
      8.21% -- heap-committed-unallocated-fraction
    2.16 MB -- heap-dirty
   53.11 MB -- heap-unallocated
          2 -- js-compartments-system
          2 -- js-compartments-user
  274.00 MB -- js-gc-heap
   33.92 MB -- js-gc-heap-arena-unused
    0.00 MB -- js-gc-heap-chunk-clean-unused
    9.34 MB -- js-gc-heap-chunk-dirty-unused
    7.92 MB -- js-gc-heap-decommitted
     18.68% -- js-gc-heap-unused-fraction
   21.60 MB -- js-total-analysis-temporary
    4.32 MB -- js-total-mjit
  119.86 MB -- js-total-objects
    8.92 MB -- js-total-scripts
    9.74 MB -- js-total-shapes
  212.30 MB -- js-total-strings
    7.46 MB -- js-total-type-inference
         17 -- page-faults-hard
 61,382,712 -- page-faults-soft
  873.73 MB -- resident
1,365.76 MB -- vsize
(In reply to Nicholas Nethercote [:njn] from comment #63)
> I reloaded just before giving up and "explicit" dropped back to 342MB.  Then
> I closed the tab and the irccloud.com compartment went away, so I don't see
> the Zombie.  So there's some evidence for the theory that ABP is the problem.

I don't run ABP. So maybe mine is a different issue?
(In reply to Asa Dotzler [:asa] from comment #64)
> 
> I don't run ABP. So maybe mine is a different issue?

Oh, then ABP probably isn't causing the spike.
http://people.mozilla.org/~jlebar/cc-edges-2.xz

This is a cc dump from a browser with adblock plus disabled where I've been running IRC Cloud all day.  about:memory looks normal, and I'm seeing ~130ms CC times.
Here's the 5 most common nodes from Justin's new log:

  281325 nsGenericDOMDataNode
  152395 nsGenericElement (xhtml) span
   66320 nsGenericElement (xhtml) a
   17470 nsGenericElement (xhtml) div
    8138 nsGenericElement (xhtml) li

More spans, oddly enough.  Way less <a>s.
(In reply to Asa Dotzler [:asa] from comment #64)
> (In reply to Nicholas Nethercote [:njn] from comment #63)
> > I reloaded just before giving up and "explicit" dropped back to 342MB.  Then
> > I closed the tab and the irccloud.com compartment went away, so I don't see
> > the Zombie.  So there's some evidence for the theory that ABP is the problem.
> 
> I don't run ABP. So maybe mine is a different issue?

But you run some extensions, right?  Which ones?  It's possible that one of them is exhibiting the same bug as ABP here.

I've been running IRCCloud all night without ABP and I'm seeing CC times at 250ms.  Which is pretty reasonable, considering that I have 140mb reported in the DOM and another 192mb (28%) heap-unclassified.

Memory usage does seem to be increasing, so it's still possible that I'll reach this blowup state.  But it seems unlikely, as heap-unclassified isn't increasing quickly enough.
(My last comment might not have sent out e-mails, so...see comment 68 if you didn't get mail.)
It's possible the problem only occurs when you're in a certain combination of channels and have selected them at least once. Just to reiterate comment 17 :) Can people describe the steps used to set IRCCloud running for those who are seeing the problem as well as those who aren't. e.g. which channels, which one is in the foreground, are you actively using it or just leaving it alone in a tab?
Channels:
  IRCCloud #feedback, #foo
  Mozilla #b2g #build #content #developers #memshrink #mobile #planning, plus one private message window
  freenode ##kernel ##linux #debian #freenode #gentoo #ubuntu, plus a nickserv PM

I've been using it, but mostly in #b2g, #developers, and #memshrink.  I've periodically gone through and selected each channel.

The IRCCloud UI is getting slow, but CC times are still reasonable.  (That is, so far, I only see the bugs on your end.  :)
(In reply to Justin Lebar [:jlebar] from comment #68)
> (In reply to Asa Dotzler [:asa] from comment #64)
> > (In reply to Nicholas Nethercote [:njn] from comment #63)
> > > I reloaded just before giving up and "explicit" dropped back to 342MB.  Then
> > > I closed the tab and the irccloud.com compartment went away, so I don't see
> > > the Zombie.  So there's some evidence for the theory that ABP is the problem.
> > 
> > I don't run ABP. So maybe mine is a different issue?
> 
> But you run some extensions, right?  Which ones?  It's possible that one of
> them is exhibiting the same bug as ABP here.

No. I was running nightly tester tools 3.1.5.1 but I uninstalled that and continued to see the problem. I run no other extensions.
This morning I woke up to 

2,125.56 MB (100.0%) -- explicit
├──1,959.13 MB (92.17%) -- heap-unclassified
├────107.61 MB (05.06%) -- js
│    ├───38.27 MB (01.80%) -- (15 omitted)
│    ├───33.24 MB (01.56%) -- compartment(https://twitter.com/mozillapeople/#/awesome)
│    │   ├──18.31 MB (00.86%) -- gc-heap
│    │   │  └──18.31 MB (00.86%) -- (6 omitted)
│    │   └──14.93 MB (00.70%) -- (8 omitted)
│    ├───12.57 MB (00.59%) -- compartment(https://www.yammer.com/mozilla.com/?m=568992015)
│    │   └──12.57 MB (00.59%) -- (9 omitted)
│    ├───12.23 MB (00.58%) -- compartment(https://mail.mozilla.com/zimbra/?app=calendar&client=advanced#1)
│    │   └──12.23 MB (00.58%) -- (8 omitted)
│    └───11.29 MB (00.53%) -- compartment([System Principal], 0x6c68000)
│        └──11.29 MB (00.53%) -- (9 omitted)
├─────17.94 MB (00.84%) -- layout
│     └──17.94 MB (00.84%) -- (15 omitted)
├─────14.80 MB (00.70%) -- dom
├─────13.24 MB (00.62%) -- (8 omitted)
└─────12.85 MB (00.60%) -- storage
      └──12.85 MB (00.60%) -- sqlite
         └──12.85 MB (00.60%) -- (13 omitted)

which doesn't even show a compartment for IRC Cloud (which is alive and working and I can comment in and see new chat activity.)  When I reload IRC Cloud, I get this: 

243.13 MB (100.0%) -- explicit
├──139.16 MB (57.24%) -- js
│  ├───44.74 MB (18.40%) -- compartment(https://irccloud.com/#!/ircs://irc.mozilla.org:6697/%23ux)
│  │   ├──25.31 MB (10.41%) -- gc-heap
│  │   │  ├──17.15 MB (07.06%) -- objects
│  │   │  │  ├──15.85 MB (06.52%) -- non-function
│  │   │  │  └───1.30 MB (00.54%) -- function
│  │   │  ├───5.92 MB (02.44%) -- strings
│  │   │  ├───1.31 MB (00.54%) -- shapes
│  │   │  │   └──1.31 MB (00.54%) -- (2 omitted)
│  │   │  └───0.92 MB (00.38%) -- (3 omitted)
│  │   ├───6.70 MB (02.76%) -- string-chars
│  │   ├───4.93 MB (02.03%) -- analysis-temporary
│  │   ├───3.69 MB (01.52%) -- mjit-code
│  │   │   ├──3.59 MB (01.48%) -- method
│  │   │   └──0.09 MB (00.04%) -- (2 omitted)
│  │   ├───2.15 MB (00.88%) -- object-slots
│  │   └───1.96 MB (00.81%) -- (4 omitted)
│  ├───32.62 MB (13.41%) -- compartment(https://twitter.com/mozillapeople/#/awesome)
│  │   ├──18.30 MB (07.53%) -- gc-heap
│  │   │  ├───7.16 MB (02.95%) -- objects
│  │   │  │   ├──4.03 MB (01.66%) -- non-function
│  │   │  │   └──3.13 MB (01.29%) -- function
│  │   │  ├───5.07 MB (02.09%) -- arena
│  │   │  │   ├──4.91 MB (02.02%) -- unused
│  │   │  │   └──0.16 MB (00.07%) -- (2 omitted)
│  │   │  ├───2.75 MB (01.13%) -- shapes
│  │   │  │   ├──1.55 MB (00.64%) -- tree
│  │   │  │   └──1.20 MB (00.49%) -- (1 omitted)
│  │   │  ├───1.67 MB (00.69%) -- scripts
│  │   │  └───1.65 MB (00.68%) -- (2 omitted)
│  │   ├───3.83 MB (01.58%) -- object-slots
│  │   ├───2.75 MB (01.13%) -- string-chars
│  │   ├───2.46 MB (01.01%) -- shapes-extra
│  │   │   ├──1.97 MB (00.81%) -- tree-tables
│  │   │   └──0.49 MB (00.20%) -- (3 omitted)
│  │   ├───2.29 MB (00.94%) -- script-data
│  │   ├───1.66 MB (00.68%) -- analysis-temporary
│  │   └───1.32 MB (00.54%) -- (3 omitted)
│  ├───12.85 MB (05.29%) -- compartment(https://www.yammer.com/mozilla.com/?m=568992015)
│  │   ├───8.73 MB (03.59%) -- gc-heap
│  │   │   ├──3.37 MB (01.39%) -- arena
│  │   │   │  ├──3.30 MB (01.36%) -- unused
│  │   │   │  └──0.07 MB (00.03%) -- (2 omitted)
│  │   │   ├──2.27 MB (00.94%) -- objects
│  │   │   │  ├──1.26 MB (00.52%) -- non-function
│  │   │   │  └──1.01 MB (00.42%) -- (1 omitted)
│  │   │   ├──1.87 MB (00.77%) -- shapes
│  │   │   │  └──1.87 MB (00.77%) -- (2 omitted)
│  │   │   └──1.22 MB (00.50%) -- (3 omitted)
│  │   ├───2.84 MB (01.17%) -- (7 omitted)
│  │   └───1.28 MB (00.53%) -- script-data
│  ├───12.02 MB (04.94%) -- compartment(https://mail.mozilla.com/zimbra/?app=calendar&client=advanced#1)
│  │   ├───6.74 MB (02.77%) -- gc-heap
│  │   │   ├──2.50 MB (01.03%) -- (4 omitted)
│  │   │   ├──2.46 MB (01.01%) -- objects
│  │   │   │  ├──1.63 MB (00.67%) -- non-function
│  │   │   │  └──0.83 MB (00.34%) -- (1 omitted)
│  │   │   └──1.78 MB (00.73%) -- shapes
│  │   │      ├──1.35 MB (00.56%) -- tree
│  │   │      └──0.43 MB (00.18%) -- (1 omitted)
│  │   ├───2.73 MB (01.12%) -- (6 omitted)
│  │   └───2.54 MB (01.05%) -- script-data
│  ├───11.71 MB (04.82%) -- compartment([System Principal], 0x6c68000)
│  │   ├───7.38 MB (03.04%) -- gc-heap
│  │   │   ├──3.48 MB (01.43%) -- objects
│  │   │   │  ├──2.74 MB (01.13%) -- function
│  │   │   │  └──0.74 MB (00.30%) -- (1 omitted)
│  │   │   ├──2.42 MB (00.99%) -- shapes
│  │   │   │  ├──2.11 MB (00.87%) -- tree
│  │   │   │  └──0.31 MB (00.13%) -- (1 omitted)
│  │   │   └──1.49 MB (00.61%) -- (5 omitted)
│  │   ├───2.85 MB (01.17%) -- (7 omitted)
│  │   └───1.49 MB (00.61%) -- script-data
│  ├────6.43 MB (02.65%) -- compartment(atoms)
│  │    ├──4.51 MB (01.86%) -- string-chars
│  │    └──1.92 MB (00.79%) -- gc-heap
│  │       ├──1.86 MB (00.77%) -- strings
│  │       └──0.05 MB (00.02%) -- (1 omitted)
│  ├────5.00 MB (02.06%) -- compartment(https://bugzilla.mozilla.org/show_bug.cgi?id=700645#c68)
│  │    ├──3.37 MB (01.39%) -- gc-heap
│  │    │  └──3.37 MB (01.39%) -- (6 omitted)
│  │    └──1.63 MB (00.67%) -- (7 omitted)
│  ├────4.85 MB (01.99%) -- (8 omitted)
│  ├────3.57 MB (01.47%) -- compartment(https://etherpad.mozilla.org/StabilityProgram)
│  │    ├──1.82 MB (00.75%) -- gc-heap
│  │    │  └──1.82 MB (00.75%) -- (6 omitted)
│  │    └──1.75 MB (00.72%) -- (8 omitted)
│  ├────2.42 MB (01.00%) -- gc-heap-decommitted
│  ├────1.71 MB (00.70%) -- compartment(http://wilksnet.com/2011/11/10/switching-from-android-to-ios/)
│  │    └──1.71 MB (00.70%) -- (8 omitted)
│  └────1.25 MB (00.51%) -- gc-heap-chunk-admin
├───53.96 MB (22.20%) -- heap-unclassified
├───16.16 MB (06.65%) -- layout
│   ├───6.24 MB (02.57%) -- shell(https://twitter.com/mozillapeople/#/awesome)
│   │   ├──5.96 MB (02.45%) -- arenas
│   │   └──0.28 MB (00.12%) -- (2 omitted)
│   ├───5.64 MB (02.32%) -- (11 omitted)
│   ├───1.50 MB (00.62%) -- shell(https://mail.mozilla.com/zimbra/?app=calendar&client=advanced#1)
│   │   └──1.50 MB (00.62%) -- (2 omitted)
│   ├───1.48 MB (00.61%) -- shell(https://www.yammer.com/mozilla.com/)
│   │   └──1.48 MB (00.61%) -- (3 omitted)
│   └───1.31 MB (00.54%) -- shell(https://irccloud.com/#!/ircs://irc.mozilla.org:6697/%23developers)
│       └──1.31 MB (00.54%) -- (3 omitted)
├───13.20 MB (05.43%) -- storage
│   └──13.20 MB (05.43%) -- sqlite
│      ├───7.00 MB (02.88%) -- places.sqlite
│      │   ├──6.69 MB (02.75%) -- cache-used [3]
│      │   └──0.31 MB (00.13%) -- (2 omitted)
│      ├───2.76 MB (01.13%) -- (10 omitted)
│      ├───2.05 MB (00.85%) -- other
│      └───1.39 MB (00.57%) -- cookies.sqlite
│          ├──1.38 MB (00.57%) -- cache-used
│          └──0.01 MB (00.01%) -- (2 omitted)
├───11.93 MB (04.91%) -- dom
├────3.56 MB (01.47%) -- images
│    ├──3.30 MB (01.36%) -- content
│    │  ├──3.30 MB (01.36%) -- used
│    │  │  ├──2.25 MB (00.92%) -- raw
│    │  │  └──1.05 MB (00.43%) -- (2 omitted)
│    │  └──0.00 MB (00.00%) -- (1 omitted)
│    └──0.27 MB (00.11%) -- (1 omitted)
├────2.94 MB (01.21%) -- (6 omitted)
└────2.21 MB (00.91%) -- spell-check
I don't know why your heap-unclassified has blown up while mine hasn't (although mine is slowly creeping up there; it's at 30% now), but the fact that reloading the page makes all the memory go away means you probably don't have the zombie compartment I had.

The high heap-unclassified may be covered by other outstanding bugs, and the slow cc times may be unavoidable when there are so many DOM noes sitting around...
> which doesn't even show a compartment for IRC Cloud

Because you're not looking at the verbose "show everything" version and the JS usage for irc cloud may well be so small in that case it's getting lumped into the "omitted" bucket.

Given your reload results, it's pretty clear that the real issue is that the irc cloud script ends up holding on to a ton (as in, millions) of nodes that are not in the DOM.  The only question is whether that's due to a bug in our code or their code...  I still plan to look at the script linked in comment 53, but if someone else wants to look too, that would be great.
I'm seeing this one too on Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:11.0a1) Gecko/20111118 Firefox/11.0a1 ID:20111118031005

FWIW Nightly gives back most of the memory if I refresh (F5) IRCcloud's tab.
Whiteboard: [MemShrink:P1] → [MemShrink:P1][snappy]
One of the DOM elements (the one in the second attachment I gave above) is actually part of an IRCcloud document.  Smaug says it could be a data document and not the currently displayed document.  I'm currently working on an analysis that traces out the parent pointers to clump together dom nodes (using my favorite algorithm union-find).  Currently it looks like there are actually a bunch of separate DOM nodes, so I'll have to see if they are in documents or not.  I suspect they are not.
Whiteboard: [MemShrink:P1][snappy] → [MemShrink:P1]
Taras, Why doesn't this bug fall under the purview of snappy?
(In reply to Justin Lebar [:jlebar] from comment #78)
> Taras, Why doesn't this bug fall under the purview of snappy?

Because memshrink people are on it. If you disagree, feel free to renominate it.
So long as bugs like this would normally qualify as [snappy], I'm happy.
With the cycle collector, high memory usage and poor responsiveness are often related.
(In reply to Andrew McCreight [:mccr8] from comment #81)
> With the cycle collector, high memory usage and poor responsiveness are
> often related.

That is true but I explicitly spun this bug off from the "long CC" times bug I filed because we wanted to tackle it independently and it might end up being fixed by the site.
(In reply to Taras Glek (:taras) from comment #79)
> (In reply to Justin Lebar [:jlebar] from comment #78)
> > Taras, Why doesn't this bug fall under the purview of snappy?
> 
> Because memshrink people are on it. If you disagree, feel free to renominate
> it.

I'm happy for a bug to be tagged with both.  Having said that, with the creation of project Snappy I wonder if MemShrink's scope should be clarified/reduced to purely be about reducing memory consumption (and things that support that goal: about:memory, regression detection infrastructure, etc.)  In which case things like long GC/CC pauses would not be MemShrink bugs but Snappy bugs.

This bug would be Snappy for the CC pauses and MemShrink for the high heap-unclassified -- double-tagging bugs is fine.  But it's probably not worth overthinking this too much.
Armed with more knowledge about how the DOM works, I wrote another analysis that breaks DOM nodes in clumps by following GetParent() chains: two nodes will be in the same clump if following their parent chains ends up at the same node.

The first few aren't surprising.  There are 144k nodes associated with the document https://irccloud.com/#!/ircs://irc.mozilla.org:6697/%23developers, then there are 9000 nodes that are children of a bugzilla document.

Then there are 5085 nodes in a clump that is not the child of any document.  The top node is a <ul>.  This clump is from the IRCcloud document, but not owned by it.

Next down the line is a clump of 4458 nodes owned by the document chrome://global/content/console.xul.

Then things get a little weird.  There's a clump of 4439 topped by a <div> not owned by a document, again part of the IRCcloud document.  Then there are clumps of size 4436, 4433, 4430, 4427, etc., down to 155.  This would seem to account for most of the nodes in the graph (though my numbers aren't consistent with my other analysis so maybe I'm off here...).  This roughly corresponds to the large spans I found before.  So, the webpage is creating a ton of large-ish disconnected DOMs, which as we have seen in bug 702813, the cycle collector is not good at handling, and are not accounted for in the current DOM memory reporter.
smaug pointed out that it is a little weird that there are DOMs that are part of documents that are being traversed by the cycle collector, and he hypothesized that they may be part of a data DOM (I think that's what they are called).  He landed bug 703654 so we can look into that in the future.  But from my analysis here, the connected DOMs are "only" a few hundred thousand of the nodes, so that isn't really causing the problem alone.
Timeless is also experiencing this issue.  It shows up consistently after < 24 hours, as he updates to the new Nightly each day.  Huge heap-unclassified, large memory usage, slow browser.  He's started running a separate Firefox instance so that it doesn't blow up his browser session.  It is a separate profile, so it shouldn't be too bad.  Timeless gave me an invite, so I'll try to see if I can reproduce it.
To reiterate comment 10: if anyone else needs an invite I am more than happy to hand them out. I just need an email address.
(In reply to Olli Pettay [:smaug] from comment #88)
> Anyone who can see this, want to try
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/opettay@mozilla.
> com-fbae0becadca/ ?

I'm on it. Will report back when I learn anything.
(In reply to Olli Pettay [:smaug] from comment #88)
> Anyone who can see this, want to try
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/opettay@mozilla.
> com-fbae0becadca/ ?
Note, the build does not reduce memory usage (which can be caused by the page itself), but
may reduce the time it takes to run CC.
I managed to reproduce this locally.  When I just subbed to a bunch of Mozilla channel I didn't hit it, but adding in the whatg and mercurial channels seemed to trigger it this morning.  Though maybe I added them yesterday and it didn't really hit then either.  Anyways, memory was steadily cranking up to about 3 gigs.  With smaug's patch, I was getting some long pauses (1.2 seconds or so) intermixed with shorter pauses (around 300ms).  The browser was surprisingly usable given that, but I haven't tried to recreate it in an unmodified browser.
OS: Windows 7 → All
Hardware: x86_64 → All
Something has changed in IRCCloud, our code, or my usage, but I have not been able to reproduce this heap-uncommitted blow-up for about four or five days now.
I think Timeless was still hitting it within the last few days.

I've been stymied trying to put together some builds with additional measurement stuff and haven't tried to recreate it since comment 91.
Has anyone tried http://timtaubert.de/2011/12/firefox-add-on-websockets-for-irccloud/ if it changes
the behavior. Maybe the leak is caused by the Flash part of IRCCloud.
Could anyone test this:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/opettay@mozilla.com-1c7954136907/

Would be interesting to know CC times and whether purple cleanup takes lots of time.
(In reply to Olli Pettay [:smaug] from comment #94)
> Has anyone tried
> http://timtaubert.de/2011/12/firefox-add-on-websockets-for-irccloud/

Doesn't make any difference (with a default trunk build). Minutes after startup I managed to get up to ~500ms CC times. After closing the IRCCloud tab I got this:

CC(T+4499.7) collected: 202596 (202596 waiting for GC), suspected: 180, duration: 822 ms.

Now I'm back to average times of 20-70ms.
As in the other bug, this seems to be a case of a leaky webapp that the CC does not handle well, rather than a problem with the browser per se.  The work to make about:memory see this kind of memory usage has been spun off into bug 704623.
Whiteboard: [MemShrink:P1] → [Snappy]
Cycle collector improvements landed in Nightly, so maybe pause times will not be as bad.  On the other hand, you'll still run out of memory...
No reports here for a couple of months, so I'm going to close this as works-for-me.  Please reopen if you are still having problems.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: