Closed Bug 653817 Opened 13 years ago Closed 12 years ago

js/gc-heap memory never fully reclaimed

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: jim.van.peursem, Assigned: n.nethercote)

References

Details

(Keywords: memory-leak, Whiteboard: [MemShrink:P2])

Attachments

(4 files)

User-Agent:       Mozilla/5.0 (Windows NT 5.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Build Identifier: Mozilla/5.0 (Windows NT 5.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1

When using JS heavy web sites, the about:memory field for js/gc-heap, js/string-data and js/mjit-code values can grow large, which isn't a big surprise. However when closing the tabs associated with those sites it appears string-data and mjit-code go back down to about where they were before the site(s) were loaded, but js/gc-heap never returns to normal.

I've reproduced this on mac, windows, and fennec.

Reproducible: Always

Steps to Reproduce:
1. Launch fresh firefox, open about:memory tab, observe the values.
2. Create a few new tabs and go to JS-heavy sites like google mail + calendar + docs, edit a few docs or copy/paste a large (e.g. few hundred kb) text string into a new google document.
3. Observe the about:memory values for js/gc-heap
4. Close all tabs except about:memory
5. Wait and observe about:memory

Actual Results:  
Depending on which platform you're on and how large you drive the memory allocation up to, you'll get different final resting amounts, but never close to the starting point. For example, on the mac a fresh launch put js/gc-heap in the ~5MB range. I created a new google text document. Copy/pasted in 291K of text from another text document. After the document was fully rendered, js/gc-heap was 100MB. I closed the document and waited an hour. js/gc-heap reduced to 27MB. Not the 5MB starting point. I did the same on windows. The peak js/gc-heap was 147MB, returning to 32MB. On fennec I had less memory so used a smaller test. I got js/gc-heap up to 68MB. After closing all tabs except about:memory, content process js/gc-heap was 9.4MB.

Expected Results:  
While I can understand that the final numbers might not return to the original in every case, it doesn't seem right for tens of megabytes of memory to remain when all content tabs are closed. I'm concerned about memory not being reclaimed properly and/or ongoing "bloated" feeling as firefox runs longer, especially when trying to use it on memory limited devices. Every byte matters.
Did you notice a difference in behavior relative to Firefox 3.6?
Blocks: mlk-fx5+
Kevin, do you want to try your tool on this bug?
I looked at this with Massif, opening a single gmail tab and then closing it.  The main single thing I see is that 2 MB (out of about 26MB of heap) of JSScripts are still held onto.  That seems high.
Here's some data from Massif -- the top 24 heap allocation points left after I opened 10 gmail tabs then closed them, leaving open only about:memory.

I'm not really sure what to make of this.
At the end, the majority of the JSScripts left are XDR scripts.  I wonder if they're involved with running the browser.

o1o> 05.09% (2,225,356B): JSScript::NewScript() (jsutil.h:239)
| o2o> 03.62% (1,580,817B): js_XDRScript() (jsscript.cpp:549)
| | o3o> 03.53% (1,540,964B): js_XDRFunctionObject() (jsfun.cpp:1904)
| | | o4o> 03.53% (1,540,964B): js_XDRScript() (jsscript.cpp:673)
| | |   o5o> 02.92% (1,278,012B): JS_XDRScriptObject ()
| | |   | o6o> 01.74% (762,260B): mozJSComponentLoader::ReadScript() (mozJSComponentLoader.cpp:365)
| | |   | | o7o> 01.74% (762,260B): mozJSComponentLoader::GlobalForLocation() (mozJSComponentLoader.cpp:1034)
| | |   | |   o8o> 01.09% (478,003B): mozJSComponentLoader::LoadModuleImpl() (mozJSComponentLoader.cpp:688)
| | |   | |   | o9o> 01.09% (478,003B): mozJSComponentLoader::LoadModule() (mozJSComponentLoader.cpp:615)
| | |   | |   |   o10> 01.09% (478,003B): nsComponentManagerImpl::KnownModule::Load() (nsComponentManager.cpp:954)
| | |   | |   |     o11> 01.09% (478,003B): nsFactoryEntry::GetFactory() (nsComponentManager.cpp:1976)
| | |   | |   |       o12> 00.70% (307,128B): nsComponentManagerImpl::CreateInstance() (nsComponentManager.cpp:1212)
| | |   | |   |       | o13> 00.70% (305,888B): nsComponentManagerImpl::GetService() (nsComponentManager.cpp:1508)
| | |   | |   |       | | o14> 00.70% (305,888B): nsJSCID::GetService() (xpcjsid.cpp:839)
| | |   | |   |       | |   o15> 00.70% (305,888B): NS_InvokeByIndex_P (xptcinvoke_x86_64_unix.cpp:195)
HashMgr is hunspell spell checking. 2.5MB of memory might be reasonable for that if it doesn't grow over time.

pcache1Alloc is SQLite storage, probably used by Places for history/bookmarks/etc. 3.5MB might also be reasonable for that if it doesn't grow over time.

All those nsSegmentedBuffers look suspicious to me. They're used for I/O, but I/O memory should have been freed if you're quiescent, I suspect.

Can you open GMail 10 times, close all the windows, then open GMail 10 times again and close those, and compare before and after the second 10 GMails?
A sizeable chunk is the spell-checking dictionary:

o1o> 06.31% (2,755,787B): HashMgr::add_word(char const*, int, int, unsigned short*, int, char const*, bool) (hashmgr.cpp:176)
| o2o> 06.26% (2,735,531B): HashMgr::load_tables(char const*, char const*) (hashmgr.cpp:520)
| | o3o> 06.26% (2,735,531B): HashMgr::HashMgr(char const*, char const*, char const*) (hashmgr.cpp:89)
| |   o4o> 06.26% (2,735,531B): Hunspell::Hunspell(char const*, char const*, char const*) (hunspell.cpp:79)
| |     o5o> 06.26% (2,735,531B): mozHunspell::SetDictionary(unsigned short const*) (mozHunspell.cpp:168)
| |       o6o> 06.26% (2,735,531B): mozSpellChecker::SetCurrentDictionary(nsAString_internal const&) (mozSpellChecker.cpp:386)

I guess it's reasonable that this stays around.

Really, the whole question of this bug is understanding what things are
likely to stick around for the full length of the process.
Attached file final snapshot
Here's the full Massif data.
Here's the nsSegmentedBuffer info in full:

o1o> 05.68% (2,482,176B): nsSegmentedBuffer::AppendNewSegment() (nsSegmentedBuffer.cpp:103)
| o2o> 05.68% (2,482,176B): nsStorageStream::Write() (nsStorageStream.cpp:195)
| | o3o> 05.68% (2,482,176B): nsInputStreamTeeWriteEvent::Run() (nsInputStreamTee.cpp:130)
| | | o4o> 05.68% (2,482,176B): nsThread::ProcessNextEvent() (nsThread.cpp:618)
| | |   o5o> 05.68% (2,482,176B): NS_ProcessNextEvent_P() (nsThreadUtils.cpp:250)
| | |     o6o> 05.68% (2,482,176B): nsThread::ThreadFunc() (nsThread.cpp:273)
| | |       o7o> 05.68% (2,482,176B): _pt_root (ptthread.c:187)
| | |         o8o> 05.68% (2,482,176B): start_thread (pthread_create.c:300)
| | |           o9o> 05.68% (2,482,176B): clone (clone.S:112)

roc, I'll try your experiment tomorrow, I don't have time today (it'll take
a while).
A better growth test for hunspell may be to open a big text document on google docs (or any online editor) to see if more spelling errors and/or words checked is increasing the retained allocation for the HashMgr.

Any good tests to try for pcache1Alloc growth? Is this impacted at all by safebrowsing database?

Any speculation on what's causing the nsSegmentedBuffers to be retained? On his simple test, that's a couple MB. I wonder if that grows with increased js network traffic on js heavy sites?
Whoever is seeing the segmented buffers; I assume that you're running a clean profile with no extensions?

The way nsStorageStream works is that you write data to it once, then can read it multiple times.  That means that the data in the nsStorageStream has to stick around for as long as the nsStorageStream does (and also for as long as any of the input streams that can read the data do, etc).

Looking at mxr, our nsStorageStream users are: necko memory cache, XHR (for XML uploads only), startup cache, web browser persist ("save as"), the JS component loader, about:cache.

That seems to be it.  The stack in comment 9 shows a storage stream in combination with an input stream tee, which sort of sounds like memory cache is the most likely thing, if this is a vanilla browser.
(In reply to comment #11)
> Whoever is seeing the segmented buffers; I assume that you're running a clean
> profile with no extensions?

No extensions and an almost-clean profile (it's been used to visit gmail a number of times, and little else).
So the nsStorageStream could just be the Necko memory cache? I guess jvp/njn could set the max cache size to something very small, or just flush the cache, to see if that's the problem.

You could probably also disable spellchecking to get an idea how much memory it's using.
> So the nsStorageStream could just be the Necko memory cache?

That's my best guess so far, yes.  Testing with a limited-size memory cache might be of use.
(In reply to comment #14)
> > So the nsStorageStream could just be the Necko memory cache?
> 
> That's my best guess so far, yes.  Testing with a limited-size memory cache
> might be of use.

I experimented a bit tonight. I'm not on a debug build at the moment so don't know a way to do any detailed tracing. I set browser.config.memory.capacity = 1024 and layout.spellcheckDefault=0. Not sure if that's exactly the Necko memory cache you're referring to. Is there a different setting I should be modifying?

Starting js/gc-heap was about 3MB. Loaded up gmail, calendar, docs, opened a big text file, big complex spreadsheet, and 7 other misc content tabs. js/gc-heap grew to 45MB. I then closed all tabs except about:memory and left the browser idle for 30 minutes. js/gc-heap quickly dropped to 33.5MB and stayed there for the duration.

In case it helps, I had browser.session.max_history=4 and browser.sessionhistory.max_total_viewers=1 during this test.
Attached file more Massif stats
I did the open-close-reopen-close experiment, but only with 2 tabs because I was having difficulty with higher numbers.

The top 20 allocation sites for the two minima are attached.  I'll shortly add another attachment that presents the same data in an easier-to-compare fashion.
In this version entries from the two minima are paired.  If an entry appeared in one but not the other, it's paired with "n/a".  So "n/a" after an entry means the entry only appeared in the first minimum, and "n/a" before an entry means the entry only appeared in the second minimum.  (Well, this data is only the top 20 from each minimum.)
jvp, the problem with this bug report is that's it's not clear that there's actually a leak.  It's possible that the heap increase you're seeing is due to stuff that is supposed to hang around so long as Firefox is running.

An experiment that would be useful would be to do like roc suggested to me -- do your experiment twice in a row (in a single execution of Firefox), and see if the memory usage after the second run is higher than the first run.  If it is, that is stronger evidence of a leak.
(In reply to comment #18)
> jvp, the problem with this bug report is that's it's not clear that there's
> actually a leak.  It's possible that the heap increase you're seeing is due to
> stuff that is supposed to hang around so long as Firefox is running.

Agree. Note that I didn't say leak in the problem description. But the overall footprint grows and is not meaningfully returned, which is problematic, especially in an embedded environment.

> An experiment that would be useful would be to do like roc suggested to me --
> do your experiment twice in a row (in a single execution of Firefox), and see
> if the memory usage after the second run is higher than the first run.  If it
> is, that is stronger evidence of a leak.

Good idea. I spent some time today running a long experiment. I measured both the firefox-bin size as reported in USS by procrank (in KB) as well as js/gc-heap as reported by about:memory. I loaded eight tabs of content, then closed, then opened, then closed. For the first load, I actually kept the tabs open and left the browser idle and sampled memory consumption every ten minutes for 150 minutes.

			firefox-bin	js/gc-heap
Fresh Launch		 47,972K	 3,145,728
+gmail			 99,940K	10,485,760
+calendar		102,896K	13,631,488
+docs dir		123,768K	16,777,216
+text doc		152,256K	20,971,520
+spreadsheet		209,512K	35,651,584
+espn.com		208,916K	35,651,584
+att.net		213,168K	35,651,584
+engadgetmobile		224,712K	36,700,160
Wait 10 minutes		231,248K	40,894,464
Wait 20 minutes		233,636K	40,894,464
Wait 30 minutes		238,572K	40,894,464
Wait 40 minutes		240,488K	42,991,616
Wait 50 minutes		242,260K	42,991,616
Wait 60 minutes		242,352K	42,991,616
Wait 70 minutes		250,016K	45,088,768
Wait 80 minutes		246,536K	45,088,768
Wait 90 minutes		251,844K	45,088,768
Wait 150 minutes	251,448K	49,283,072
Close all tabs		226,328K	38,797,312
Wait 30 minutes		226,252K	38,797,312
Wait 90 minutes		226,156K	38,797,312
Load all tabs		271,528K	38,797,312
Close all tabs		234,799K	38,797,312
Load all tabs		269,200K	38,797,312
Close all tabs		232,852K	38,797,312
Load all tabs		261,532K	38,797,312
Close all tabs		231,936K	37,748,736

A couple interesting things to note. From the fresh launch to the first full close, you'll notice that the working set of the browser increased by 178MB. That's the problem I'm trying to highlight. I can understand that some additional services (e.g. spell checker) are lazy loaded, but 178MB is a HUGE increase, especially in an embedded device. Note that I had spell checking disabled and browser.cache.memory.capacity=1024 and browser.sessionhistory.max_total_viewers=1.

Second, note that with the browser sitting "idle", memory use grew significantly. Granted, these are some JS hungry sites with live content. But after closing all tabs, the js/gc-heap and total process footprint remains very large.

Third, the peak/valley profile is interesting: 48MB, 251MB, 226MB, 272MB, 235MB, 270MB, 233MB, 262MB, 232MB. So yes, back to back loads/closes of identical content results in significant growth...but after a few load/close cycles it levels off.

Is it a leak? Maybe. Maybe not. At a minimum it's a significant increase in memory profile without sufficient reduction when content is unloaded. This needs more scrutiny. Something isn't right here.
In today's Nightly (2011-05-04) bug 654106 was fixed. It likely causes a reasonable amount of memory rise over time. I suggest another test with today's Nightly to eliminate this variable.
jvp:  thanks for the details.  Giving specific sites and specific measurements is really helpful.  Yes, it's hard to say if it's a leak but I agree there should be more drop-off after closing the tabs.

If you can repeat the experiment as Hughman suggests in comment 20, that would be helpful.  Also, I wonder if opening different sites the second time around would give different results.  (Sorry if that sounds like "hey reporter, do a bunch more work!"... these bugs are difficult to diagnose.)
Applied patch and did a build tonight. First reaction after just a quick test is that the rate of growth is MUCH better with that patch applied. It's definitely part of the problem. I'll spend more time tomorrow trying to reproduce a test like above and get more accurate numbers so we can see what's left.
Great!  Thanks for the feedback, I'm looking forward to detailed numbers.
Depends on: 654106
(In reply to comment #23)
> Great!  Thanks for the feedback, I'm looking forward to detailed numbers.

I ran the same test early this morning. As you can see, it makes an improvement on closing tabs. But as you can also see, it's only a partial solution.

			  firefox-bin	js/gc-heap
Fresh Launch		48,700	3,145,728
+gmail			91,252	11,534,336
+calendar			104,408	12,582,912
+docs dir			123,460	15,728,640
+text doc			149,704	19,922,944
+spreadsheet		203,388	36,700,160
+espn.com		215,308	36,700,160
+att.net			226,492	36,700,160
+engadgetmobile	232,036	36,700,160
Wait 10 minutes	235,600	38,797,120
Wait 20 minutes	236,152	39,845,888
Wait 30 minutes	235,924	39,845,888
Wait 40 minutes	234,480	39,845,888
Wait 50 minutes	238,248	41,943,040
Wait 60 minutes	234,964	42,991,616
Wait 70 minutes	236,444	42,991,616
Wait 90 minutes	240,064	45,088,768
Close all tabs		108,908	34,603,008
Load all tabs		239,220	34,603,008
Close all tabs		121,552	34,603,008
Load all tabs		240,100	34,603,008
Close all tabs		123,692	34,603,008
Load all tabs		245,759	37,748,736
Close all tabs		127,588	36,700,160
Wait 30 minutes	122,040	35,651,584
Wait 60 minutes	121,832	35,651,584

From fresh launch to end state (tabs closed except about:memory), there's still ~75MB of growth that isn't reclaimed. That's about 250% of the original. Something's still either leaking or needlessly hanging onto memory.
> From fresh launch to end state (tabs closed except about:memory), there's still
> ~75MB of growth that isn't reclaimed. That's about 250% of the original.

Just for clarity, that's down from close to 500%, so a big improvement.

> Something's still either leaking or needlessly hanging onto memory.

Yeah, still seems likely.
I just blogged about this bug and bug 654106;  the post may be interesting to people following this bug:
http://blog.mozilla.com/nnethercote/2011/05/06/another-leak-fixed-part-2/
Nicholas can you run a massif before/after snapshot of a similar experiment (loading tabs once and then closing) to what I did above?
jvp, I already did that with gmail.  You want me to do it with the mix of sites that you used?
(In reply to comment #28)
> jvp, I already did that with gmail.  You want me to do it with the mix of
> sites that you used?

Yea, two thoughts. First, your gmail massif was before the other memory leak was found and fixed so some of that leak may have factored into your numbers. Would be good to get a fresh run with the patch applied. Second, could use the massif output compared (at least partially anyway) against the overall numbers I was seeing, and maybe it gives some more clues.
Nicholas?
Sorry, haven't got round to another run yet.
Assignee: general → nnethercote
Keywords: mlk
Blocks: mlk-fx4
No longer blocks: mlk-fx5+
I think this affects me a lot, gc-heap starts around 100 M after loading session and is steadily growing.  The most javascript heavy tab is facebook of course.  This is after 712:03 of runtime (since May 27).

Main Process

Explicit Allocations
835.94 MB (100.0%) -- explicit
├──358.77 MB (42.92%) -- heap-unclassified
├──261.32 MB (31.26%) -- js
│  ├──224.00 MB (26.80%) -- gc-heap
│  ├───22.72 MB (02.72%) -- mjit-code
│  ├────8.98 MB (01.07%) -- tjit-data
│  │    ├──6.35 MB (00.76%) -- allocators-reserve
│  │    └──2.63 MB (00.31%) -- (1 omitted)
│  └────5.62 MB (00.67%) -- (2 omitted)
├──121.29 MB (14.51%) -- images
│  ├──120.29 MB (14.39%) -- content
│  │  ├──120.29 MB (14.39%) -- used
│  │  │  ├───94.03 MB (11.25%) -- uncompressed
│  │  │  └───26.25 MB (03.14%) -- raw
│  │  └────0.00 MB (00.00%) -- (1 omitted)
│  └────1.01 MB (00.12%) -- (1 omitted)
├───73.58 MB (08.80%) -- storage
│   └──73.58 MB (08.80%) -- sqlite
│      ├──64.12 MB (07.67%) -- places.sqlite
│      │  ├──63.61 MB (07.61%) -- cache-used
│      │  └───0.52 MB (00.06%) -- (2 omitted)
│      └───9.45 MB (01.13%) -- (16 omitted)
├───12.78 MB (01.53%) -- layout
│   ├──12.78 MB (01.53%) -- all
│   └───0.00 MB (00.00%) -- (1 omitted)
└────8.20 MB (00.98%) -- gfx
     └──8.20 MB (00.98%) -- surface
        └──8.20 MB (00.98%) -- image

Other Measurements
1,685.37 MB -- vsize
  966.00 MB -- heap-committed
  920.80 MB -- resident
  810.85 MB -- heap-used
  155.15 MB -- heap-unused
    3.71 MB -- canvas-2d-pixel-bytes
    3.45 MB -- heap-dirty

I plan to restart it soon so if there is any more information I can get from this process, let me know how.
Whiteboard: [MemShrink:P2]
jvp:  bug 656120 has been fixed.  It causes GC to run more often during idle times, preventing
Depends on: 656120
Ugh, let me try again.

jvp:  bug 656120 has been fixed.  It causes GC to run more often during idle times, preventing the gc-heap from getting so big, and has helped with a ton of bug reports about leak-like behaviour.  Furthermore, we have some evidence that when all tabs are closed, the size of the gc-heap is smaller as a result of less fragmentation -- see bug 666101 comment 5 and subsequent comments.

So, would you mind trying again with a Nightly from June 21 or later?  If you want to do a strict comparison, you could do so against a Nightly from June 20.  Thanks!
Depends on: 666058
jvp, could you try with recent versions of Firefox?
I think this is basically caused by fragmentation, so compacting GC (bug 619558) will fix it.  And we also have 668809 which is more general.  So I'll close this.
Status: UNCONFIRMED → RESOLVED
Closed: 12 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.