nsDocument leak due to nsTypeAheadFind

RESOLVED FIXED

Status

()

RESOLVED FIXED
8 years ago
8 years ago

People

(Reporter: roc, Assigned: smaug)

Tracking

({memory-leak})

unspecified
x86
Windows 7
memory-leak
Points:
---

Firefox Tracking Flags

(blocking2.0 betaN+)

Details

Attachments

(1 attachment)

I noticed that my Firefox process' "private working set" had ballooned to 1.8GB so I decided to investigate. about:memory showed 1.8GB in "win32/private bytes". None of the categories in about:memory showed much memory usage (but I don't have jemalloc compiled in here).

I closed a few tabs, including some large tinderbox logs and a tbpl/try page that had been open for a day or two. win32/private bytes dropped to around 1GB. At this point I mainly just had a long-running GMail session open. Here's the paste of about:memory at that point:

win32/privatebytes1,009,246,208win32/workingset1,003,237,376xpconnect/js/gcchunks65,011,712storage/sqlite/pagecache39,165,392storage/sqlite/other1,916,232gfx/d2d/surfacecache2,307,752gfx/d2d/surfacevram11,647,148images/chrome/used/raw0images/chrome/used/uncompressed338,648images/chrome/unused/raw0images/chrome/unused/uncompressed8,384images/content/used/raw71,001images/content/used/uncompressed786,292images/content/unused/raw8,803images/content/unused/uncompressed13,936layout/all3,742,100layout/bidi664gfx/surface/image1,445,068gfx/surface/win320content/canvas/2d_pixel_bytes5,525,392

I dumped the CC graph at this point and uploaded it to http://people.mozilla.org/~roc/leak/cc-edges-3.log.bz2

I then closed the GMail tab and waited for a while. At this point I had just a blank tab and the about:memory tab open. Here's about:memory:

win32/privatebytes863,457,280win32/workingset868,630,528xpconnect/js/gcchunks65,011,712storage/sqlite/pagecache39,169,624storage/sqlite/other1,916,232gfx/d2d/surfacecache1,657,404gfx/d2d/surfacevram9,848,352images/chrome/used/raw0images/chrome/used/uncompressed411,620images/chrome/unused/raw0images/chrome/unused/uncompressed0images/content/used/raw34,080images/content/used/uncompressed8,736images/content/unused/raw0images/content/unused/uncompressed0layout/all1,706,955layout/bidi0gfx/surface/image862,248gfx/surface/win320content/canvas/2d_pixel_bytes713,664

I dumped the CC graph at this point and uploaded it to http://people.mozilla.org/~roc/leak/cc-edges-4.log.bz2

I also downloaded vmmap.exe and grabbed its output, and made a full heap dump of the firefox process, so I can make those available if needed.

Is this enough information to do something with?
blocking2.0: --- → ?
Peterv, what other info would be helpful here? I do think we need to get more of a hand on the leak situation before we ship, since we've certainly regressed leak wise since 3.6, but exactly what bug, or bugs, are responsible for the bulk of the new leaks I don't think is known yet...
Assignee: nobody → peterv
blocking2.0: ? → betaN+
(In reply to comment #0)
> At this point I mainly just had a long-running GMail session open.

I suppose the document that's present in the second log with "https://mail.google.com/mail/?ui=2&shva=1#label/www-" as the partial url is from that session. However, most other documents from gmail seem to have gone away since they're not in the second log. It *could* mean that they weren't suspected in the second CC, but we certainly broke some edges to them so that'd be weird.

> I then closed the GMail tab and waited for a while. At this point I had just a
> blank tab and the about:memory tab open.

At that point it looks like there are a number of documents alive from bugzilla and about:blank (apart from a bunch of chrome stuff and the gmail document mentioned above).

It certainly looks like we did collect a bunch of stuff (first log is 44M, second log is 16,5M), but I'll look a bit into the remaining documents.
The bugzilla documents look like they're all alive because we're missing an edge, the two known edges are part of cycles through a parser and an event listener. No idea what the missing edge would be :-(.
I have the memory dump. How can we analyze it to identify the missing edges?
I'm not sure, I've never done that. Usually we use refcount logging.

Can we load the dump in a debugger?
Yes, I think so.

Maybe I can search the dump for pointers to the object we have missing edges to. Can you give me some addresses to search for?
0x253e67b0 nsDocument (xhtml) https://bugzilla.mozilla.org/buglist.cgi?cmdtype=run [2/3]
0x4162ee38 nsDocument (xhtml) https://bugzilla.mozilla.org/show_bug.cgi?id=612128 [2/3]
0x30c272e0 nsDocument (xhtml) https://bugzilla.mozilla.org/show_bug.cgi?id=605618 [2/3]

Known edges are XPCWrappedNative.mIdentity and nsContentSink.mDocument.
If you want me to search I could do that too, but I'll need to download the dump from somewhere.
There are 30 ocurrences of 0x253e67b0, 28 occurrences of 0x4162ee38 and 32 occurrences of 0x30c272e0. I guess we have some non-addrefing references around.
OK, searching for references to 0x253e67b0 (WinDbg can search debuggee memory, hooray!), I found an nsRange at 0x3e7a69e8 which has 0x253e67b0 as its mRoot. The nsRange has a refcount of 3. This would seem to be your missing reference.

nsRange participates in cycle collection and tries to traverse mRoot. So why wouldn't this show up in your cycle collector dump? That address doesn't seem to be listed at all.

If it helps, mStartParent and mEndParent are both 0x28b15b18.
Interestingly if you look at 28B15B18 in the dump it's
28B15B18 [root] [1/3]
which adds to the evidence that the nsRange was not traversed.

I considered the possibility that the nsRange is leftover from being destroyed, but I guess its refcount field should not be 3 in that case. The entire object looks completely OK (offset fields are both 7, boolean flag fields are 0 or 1).
Are you going to need to know what's keeping the nsRange alive?
(In reply to comment #10)
> nsRange participates in cycle collection and tries to traverse mRoot. So why
> wouldn't this show up in your cycle collector dump? That address doesn't seem
> to be listed at all.

Because we're probably missing a part of the cycle that goes from the document to the range. Unfortunately we'll probably have to figure out multiple edges to have a picture of the whole cycle.
(In reply to comment #10)
> The nsRange has a refcount of 3.

I misinterpreted nsCycleCollectingAutoRefCnt, the refcount is 1.
OK, the nsRange is kept alive by an nsTypeAheadFind, field mSearchRange. nsTypeAheadFind is not a cycle-collection participant. It has two references into it, I think from JS objects but I'm not sure. The references don't apppear to be from normal C++ objects.
OK, here are some steps to reproduce a leak.

0) Enabled DOMWINDOW "++/--" logging
1) Start Firefox. Navigate to some innocuous page, say https://bugzilla.mozilla.org.
2) Open FAYT and search for something in the page so there's a match.
3) Open a new window and then close the old window.
4) Wait. Notice that no Windows for https://bugzilla.mozilla.org are released.
5) Close the last window. Notice that two Windows for https://bugzilla.mozilla.org are released.

If I repeat those steps but skip step 2, two Windows for https://bugzilla.mozilla.org are released at step 4 (although you do have to wait a little bit).
Note, lowercase 'window' means actual browser window. New windows open at about:home, although I don't think it matters.
I hope that's enough information for you to reproduce and find the problem. I don't know how to figure out what's keeping the nsTypeAheadFind object alive.
(Assignee)

Comment 19

8 years ago
Patch coming for the comment 16 case.
Assignee: peterv → Olli.Pettay
(Assignee)

Comment 20

8 years ago
Created attachment 500012 [details] [diff] [review]
patch

I don't really know who should review this.
FAYT hasn't been changed a lot lately.
Attachment #500012 - Flags: review?(peterv)
(Assignee)

Updated

8 years ago
Status: NEW → ASSIGNED
Comment on attachment 500012 [details] [diff] [review]
patch

r=jst
Attachment #500012 - Flags: review?(peterv) → review+
(Assignee)

Comment 22

8 years ago
http://hg.mozilla.org/mozilla-central/rev/250bf984b8bc
Status: ASSIGNED → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
(Assignee)

Comment 23

8 years ago
If there is still some other leak to fix, better to file a new bug for that.

Updated

8 years ago
Summary: Major leak detected → nsDocument leak due to nsTypeAheadFind

Updated

8 years ago
Keywords: mlk
You need to log in before you can comment on or make changes to this bug.