Closed Bug 693016 Opened 13 years ago Closed 12 years ago

tbpl leaks memory badly

Categories

(Tree Management Graveyard :: TBPL, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jesup, Assigned: Swatinem)

References

Details

Attachments

(3 files)

I had 438MB of tbpl in about:memory; and closing the tab took me from 2.2GB resident to 1.1GB resident.  It had been open for several days as an app tab; I'm running 9.0a2 (2011-10-01).

Tagging Memshrink because it might be a bug in FF/GC/CC and not tbpl; remove if it's clearly a web app bug.


Before:

│  ├────438.69 MB (23.29%) -- compartment(https://tbpl.mozilla.org/?tree=Mozilla-Inbound)
│  │    ├──180.17 MB (09.57%) -- string-chars
│  │    ├──158.70 MB (08.43%) -- gc-heap
│  │    │  ├───86.57 MB (04.60%) -- objects
│  │    │  ├───39.65 MB (02.11%) -- arena-unused
│  │    │  ├───29.53 MB (01.57%) -- strings
│  │    │  └────2.95 MB (00.16%) -- (5 omitted)
│  │    ├───97.14 MB (05.16%) -- object-slots
│  │    └────2.68 MB (00.14%) -- (7 omitted)
│  ├────294.64 MB (15.64%) -- gc-heap-chunk-dirty-unused
....

├────765.45 MB (40.64%) -- heap-unclassified
├─────42.65 MB (02.26%) -- layout
│     └──42.65 MB (02.26%) -- (125 omitted)
├─────25.24 MB (01.34%) -- (6 omitted)
└─────20.75 MB (01.10%) -- dom


After:


│  ├──149.94 MB (17.50%) -- gc-heap-chunk-dirty-unused
...
├──345.48 MB (40.32%) -- heap-unclassified
├───34.00 MB (03.97%) -- layout
│   └──34.00 MB (03.97%) -- (122 omitted)
├───12.07 MB (01.41%) -- dom
I'll note that the browser was almost unusable (Win 7, x64) until the tab was closed.  Mouse cursor would disappear if over FF for 5-10 sec at a time or more.
Attached image Image of cpu/mem use
Image from sysinternals ProcMon - this is the CPU/mem use of just Firefox.

I suspect the regular mem use peaks are attempts to GC/CC, and as you see one core is pegged pretty much all the time
Unfortunately, there are no tools in firefox to tell you what objects on the js heap use that memory.

I guess you can’t reproduce such high mem usage when you use the arrows to load up as much history as has accumulated in that time? And hover over all the bug links to fill up the bug cache (that has been improved considerably lately).

Chrome has a good memory profiler that helped me found the callback problem that I fixed in the other bug. So maybe let it run for a few days in chrome, compare the memory numbers and make a memory profile in chrome to see which objects are using up that memory?
Also another idea: What extensions are you using? Firebug is responsible for a zombie compartment on tbpl, maybe it also introduces leaks in between.
Can someone test this with two browsers side by side, like say Firefox and Chrome? If they both behave the same then the problem is likely in tbpl, if not, we need to investigate further.

Leaving this unprioritized until we have an answer to that.
Also, see bug 674919.
No extensions that could cause this - just session manager and test pilot.  May well be tbpl and a dup of 674919.
I'll run this through DMD.
Assignee: nobody → nnethercote
Depends on: DMD
Whiteboard: [memshrink] → [MemShrink:P2]
I ran three TBPL tabs overnight for 12 hours under an FF10 trunk build, the memory usage went up slightly but nothing very notable (e.g. resident went from 183MB to 223MB).  The heap-unclassified was 18.7% at the start and only 19.2% at the end.  I can think of two reasons why I might seeing a much smaller heap-unclassified:

- FF10 has better memory reporting?

- The effect is more noticeable when the browser is being used normally, rather than sitting untouched overnight?

So, no great insights there.

I did a similar run under DMD in parallel.  The output wasn't terribly surprising.  There were a couple of not-entirely-common reports:

==1034== Unreported: 3,258,368 (cumulative: 6,929,968) bytes in 6,364 heap block(s) in record 2 of 15908:
==1034==  Requested bytes unreported: 3,258,368 / 3,258,368
==1034==  Slop      bytes unreported: 0 / 0
==1034==    at 0x402A063: malloc (vg_replace_malloc.c:263)
==1034==    by 0x6FD8221: js_malloc (Utility.h:166)
==1034==    by 0x6FD82DE: JSRuntime::malloc_(unsigned long, JSContext*) (jscntxt.h:700)
==1034==    by 0x6FD83CF: JSContext::malloc_(unsigned long) (jscntxt.h:1203)
==1034==    by 0x8034EEC: JSRope::flatten(JSContext*) (String.cpp:148)
==1034==    by 0x7E6D180: js::StringBuffer::append(JSString*) (String.h:786)
==1034==    by 0x7E6D4B1: js::ValueToStringBuffer(JSContext*, JS::Value const&, js::StringBuffer&) (jsstrinlines.h:235) 
==1034==    by 0x7E5EE7C: array_toString_sub(JSContext*, JSObject*, int, JSString*, JS::Value*) (jsarray.cpp:1671)


==1034== Unreported: 1,990,144 (cumulative: 13,174,768) bytes in 15,548 heap block(s) in record 5 of 15908:
==1034==  Requested bytes unreported: 1,865,760 / 1,865,760
==1034==  Slop      bytes unreported: 124,384 / 124,384
==1034==    at 0x402A063: malloc (vg_replace_malloc.c:263)
==1034==    by 0x403BFFA: moz_xmalloc (mozalloc.cpp:103)
==1034==    by 0x6CD4733: NS_NewTextNode(nsIContent**, nsNodeInfoManager*) (mozalloc.h:230)
==1034==    by 0x70B1563: nsHtml5TreeOperation::AppendText(unsigned short const*, unsigned int, nsIContent*, nsHtml5TreeOpExecutor*) (nsHtml5TreeOperation.cpp:191)
==1034==    by 0x70B2E11: nsHtml5TreeOperation::Perform(nsHtml5TreeOpExecutor*, nsIContent**) (nsHtml5TreeOperation.cpp:500)
==1034==    by 0x70B64E5: nsHtml5TreeOpExecutor::FlushDocumentWrite() (nsHtml5TreeOpExecutor.cpp:632)
==1034==    by 0x7073E29: nsHtml5Parser::ParseHtml5Fragment(nsAString_internal const&, nsIContent*, nsIAtom*, int, bool, bool) (nsHtml5Parser.cpp:559)
==1034==    by 0x6C26FA8: nsContentUtils::ParseFragmentHTML(nsAString_internal const&, nsIContent*, nsIAtom*, int, bool, bool) (nsContentUtils.cpp:3641)


==1034== Unreported: 1,467,264 (cumulative: 19,278,480) bytes in 3,821 heap block(s) in record 9 of 15908:
==1034==  Requested bytes unreported: 1,436,696 / 1,436,696
==1034==  Slop      bytes unreported: 30,568 / 30,568
==1034==    at 0x4028AB6: calloc (vg_replace_malloc.c:566)
==1034==    by 0x406DA42: PR_Calloc (prmem.c:475)
==1034==    by 0x417DF39: PORT_ZAlloc_Util (secport.c:139)
==1034==    by 0x41DDC84: ssl3_NewSessionID (ssl3con.c:5872)
==1034==    by 0x41E03EB: ssl3_HandleHandshakeMessage (ssl3con.c:5214)
==1034==    by 0x41E227B: ssl3_HandleRecord (ssl3con.c:8779)
==1034==    by 0x41E2C1E: ssl3_GatherCompleteHandshake (ssl3gthr.c:209)
==1034==    by 0x41E3BB7: ssl_GatherRecord1stHandshake (sslcon.c:1258)
Let tbpl run in a pinned tab from start of browser run until today (8 days).  Never focused it that I remember.

Browser memory usage grew until 32-bit address space exhaustion.  

tbpl was 895MB; on reload of the tab it was 13MB.  explicit went from 3.2GB to 2.3GB.  The only other 'unusual' tab was techradar.com, which was 256MB.  Atoms was 100MB, which may or may not be reasonable.  Total is around 125 tabs at a guess.

heap-unclassified went from 39% (1275MB) to 37% (894MB)

Win32 build, Aurora 20111126.

Only active extensions are Flashblock and SessionManager.
> Atoms was 100MB, which may or may not be reasonable.

That is rather high.  And 77MB of it is gc-heap/arenas/unused, which suggests there's some awful fragmentation happening.
Looking at the about:memory again, I just noticed the 100M worth of function objects.
So this is either a bug in tbpl where we attach functions to objects instead of using prototype.
Or it may be that the js engine does not garbage collect our closures correctly, for whatever reason.

Just to clarify:
    var obj = {fn: function(){}}
and
    function(){ function fn(){} }
do create *new* function objects, do they?
Are there any other patterns I should look for?
So I see just one obvious problem with tbpl in this regard: 
http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/file/962932eba448/js/UserInterface.js#l1255
_installComparisonClickHandler and _installTooltips creates *new* functions that it binds to event listeners.

We can move both to unique functions, adding the ComparisonClickHandler to an onclick="" node on the html itself.
The createPopup function needs to be bound using jquery, as we unbind it when it is first called, but we can at least avoid creating a *new* callback for every .bind().

Otherwise we have *A LOT* of anonymous .map() .filter() .forEach callbacks which I believe create a *new* function object on each call, right? They should be garbage collected when they go out of scope, unless the scope is kept alive for some reason?
How can I debug such cases?
Patch for the identified problem above.
Attachment #586658 - Flags: review?(mstange)
In past couple of days, tbpl has been eating much more ram than usual. Usually I can leave tbpl open all day or longer and the compartment is usually around 12-13meg.  The following is from a run of ~12 hrs,  note the start usage of the compartment vs the ending when I restarted the browser.  This appears to have cropped up in past couple of days.  Usually I've not seen 'mjit' using any ram except for brief periods during update of tbpl, but its hanging around and growing now, and appears to me to not being cleaned up ?  

20.39 MB (11.17%) -- compartment(https://tbpl.mozilla.org/)
│   │  ├───9.69 MB (05.31%) -- mjit-code
│   │  ├───7.05 MB (03.86%) -- gc-heap

24.74 MB (12.73%) -- compartment(https://tbpl.mozilla.org/)
│  │   ├──13.63 MB (07.01%) -- mjit-code
│  │   ├───7.36 MB (03.78%) -- gc-heap

41.13 MB (20.13%) -- compartment(https://tbpl.mozilla.org/)
│  │   ├──20.25 MB (09.91%) -- mjit-code
│  │   ├──10.39 MB (05.09%) -- gc-heap

29.35 MB (16.14%) -- compartment(https://tbpl.mozilla.org/)
│  │   ├──21.69 MB (11.93%) -- mjit-code
│  │   ├───4.75 MB (02.61%) -- gc-heap

62.02 MB (26.44%) -- compartment(https://tbpl.mozilla.org/)
│  │   ├──48.06 MB (20.49%) -- mjit-code
│  │   ├───5.43 MB (02.31%) -- analysis-temporary
│  │   ├───5.29 MB (02.25%) -- gc-heap

97.69 MB (33.18%) -- compartment(https://tbpl.mozilla.org/)
│  │   ├──85.38 MB (29.00%) -- mjit-code
│  │   ├───8.70 MB (02.95%) -- gc-heap
│  │   │   ├──4.67 MB (01.59%) -- arena
│  │   │   │  ├──4.61 MB (01.57%) -- unused
│  │   │   │  └──0.06 MB (00.02%) -- (2 omitted)
│  │   │   ├──2.28 MB (00.77%) -- (4 omitted)

───99.47 MB (34.88%) -- compartment(https://tbpl.mozilla.org/)
│  │   ├──87.06 MB (30.53%) -- mjit-code
│  │   ├───8.71 MB (03.05%) -- gc-heap
│  │   │   ├──4.65 MB (01.63%) -- arena


Subtracting the 87meg for mjit-code from the total compartment of 99 meg leave ~12meg as I stated above.
CC'ing some mjit guys due to comment 16.
Depends on: 720359
There was an mjit code leak introduced by bug 706914, I filed bug 720359 for that.  Thanks!
Comment on attachment 586658 [details] [diff] [review]
avoid creating new function objects

Sorry for the mad lag.

>diff --git a/js/UserInterface.js b/js/UserInterface.js

>+  _createTooltipPopup: function UserInterface__createTooltipPopup() {
>+    $(this).unbind();

Maybe add a comment about "this" being the li.push node here, not the UserInterface object.

Anyway, to really get a grip on TBPL's memory use, we should add a mode to TBPL in which the input data is fixed so that bad states can be reproduced easily and don't depend on changing outside effects. Then we can simulate long runs by running a fixed number of refreshes as fast as possible, and then we have a nice testcase with which we can test what code changes have an effect on memory use.
Attachment #586658 - Flags: review?(mstange) → review+
(In reply to Markus Stange from comment #19)
> Comment on attachment 586658 [details] [diff] [review]
> avoid creating new function objects
> 
> Sorry for the mad lag.
> 
> >diff --git a/js/UserInterface.js b/js/UserInterface.js
> 
> >+  _createTooltipPopup: function UserInterface__createTooltipPopup() {
> >+    $(this).unbind();
> 
> Maybe add a comment about "this" being the li.push node here, not the
> UserInterface object.

Done, I’ve pushed it: http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/rev/a22530c1052a
Depends on: 722400
Depends on: 722887
Some dependent bugs have been fixed... are TBPL leaks still a problem?
Assignee: n.nethercote → nobody
I haven't noticed TBPL leaks recently; I'll check (lost power over the weekend and had to restart, and used a fresh m-c build).
Still leaking:  (nightly built maybe a week ago and running for a long time with a tbpl tab open, though I often was viewing logs from it, reloading, etc)

│  ├────221,494,776 B (04.17%) -- compartment(https://tbpl.mozilla.org/?tree=Alder)
│  │    ├──128,471,040 B (02.42%) -- gc-heap
│  │    │  ├───45,596,032 B (00.86%) -- objects
│  │    │  │   ├──28,923,872 B (00.54%) ── non-function
│  │    │  │   └──16,672,160 B (00.31%) ── function
│  │    │  ├───30,541,952 B (00.57%) ── strings
│  │    │  ├───20,513,416 B (00.39%) -- shapes
│  │    │  │   ├──11,181,320 B (00.21%) ── dict
│  │    │  │   ├───5,533,280 B (00.10%) ── tree
│  │    │  │   └───3,798,816 B (00.07%) ── base
│  │    │  ├───17,286,464 B (00.33%) ── scripts
│  │    │  ├────9,726,984 B (00.18%) -- arena
│  │    │  │    ├──7,487,640 B (00.14%) ── unused
│  │    │  │    ├──1,235,664 B (00.02%) ── padding
│  │    │  │    └──1,003,680 B (00.02%) ── headers
│  │    │  └────4,806,192 B (00.09%) ── type-objects
│  │    ├───26,273,336 B (00.49%) ── string-chars
│  │    ├───22,716,096 B (00.43%) -- shapes-extra
│  │    │   ├──13,978,624 B (00.26%) ── tree-tables
│  │    │   ├───5,066,752 B (00.10%) ── dict-tables
│  │    │   ├───2,629,632 B (00.05%) ── compartment-tables
│  │    │   └───1,041,088 B (00.02%) ── tree-shape-kids
│  │    ├───22,288,320 B (00.42%) ── script-data
│  │    ├───11,685,200 B (00.22%) -- objects
│  │    │   ├──10,854,016 B (00.20%) ── slots
│  │    │   ├─────791,072 B (00.01%) ── elements
│  │    │   └──────40,112 B (00.00%) ── misc
│  │    ├────4,493,440 B (00.08%) ── analysis-temporary
│  │    ├────4,006,480 B (00.08%) -- type-inference
│  │    │    ├──2,983,056 B (00.06%) ── object-main
│  │    │    ├────765,792 B (00.01%) ── tables
│  │    │    └────257,632 B (00.00%) ── script-main
│  │    └────1,560,864 B (00.03%) -- mjit
│  │         ├──1,310,720 B (00.02%) ── code
│  │         └────250,144 B (00.00%) ── data
> Still leaking

Is this a leak?  TBPL is using lots of memory, sure.  But this isn't evidence that Firefox is leaking memory.

Closing because I think it's unlikely that there's something other than fragmentation here (which we're covering elsewhere).  Please re-open if that's incorrect!
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
tbpl normally uses far less memory than that.  My current (non-busy Try tbpl tab) uses 30MB.  Loading mozilla-inbound into tbpl is all of 40MB.  Using >222MB certainly sounds like a leak (or if you prefer, buggy webapp holding onto things).  I'm not saying core/etc is leaking, I'm saying it appears tbpl chews memory to the point where it's problematic to leave open.  I don't see how fragmentation would cause such a huge value to be reported in about:memory for tbpl.

It is definitely better than it was, but it's not fixed.
> I don't see how fragmentation would cause such a huge value to be reported in 
> about:memory for tbpl.

Yes.  Earlier in the bug, you talked about FF's memory usage *after* closing TBPL; that's where fragmentation might come into play.  But it sounds like you're saying that's no longer an issue.

> My current (non-busy Try tbpl tab) uses 30MB.  Loading mozilla-inbound into tbpl is all 
> of 40MB.  Using >222MB certainly sounds like a leak (or if you prefer, buggy webapp 
> holding onto things).

I wouldn't be surprised if it grows over time and/or never releases memory.  You don't get 222MB just as soon as you open the tab, right?

> I'm not saying core/etc is leaking, I'm saying it appears tbpl chews memory to the 
> point where it's problematic to leave open.

It sounds like you should re-open (or file a new bug, since things have changed a lot since comment 0) in the TBPL component.
This bug indirectly mentions fragmentation in comment 0, but has always been about explicit memory use/leaks by tbpl (tbpl was shown at 400+MB in comment 0, see all the other comments as well).  Reopening, since tbpl still leaks, just slower.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
And removing MemShrink, because this is about a site bug.
Whiteboard: [MemShrink:P2]
Situation seems better than it used to be (comment 18, comment 20, dependent bugs fixed) / not worth devoting any more time on this now, since TBPL2 underway.
Status: REOPENED → RESOLVED
Closed: 13 years ago12 years ago
Resolution: --- → FIXED
Assignee: nobody → arpad.borsos
Product: Webtools → Tree Management
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: