Closed Bug 625434 Opened 14 years ago Closed 14 years ago

HTML5/Canvas demo ends up using lots of memory

Categories

(Core :: General, defect, P1)

x86
All
defect

Tracking

()

RESOLVED FIXED
mozilla2.0b10
Tracking Status
blocking2.0 --- final+

People

(Reporter: jag+mozilla, Assigned: bzbarsky)

References

()

Details

(Keywords: memory-footprint, Whiteboard: [softblocker])

Attachments

(3 files)

Tested with 4.0b8 and a recent trunk build (both 64-bit) on Mac.

If I load the page at http://www.effectgames.com/demos/canvascycle/?sound=0 and let it sit there for a while -- 20 - 30 minutes -- "Memory in use" steadily goes up from ~38MB to a few GB[0]. Closing the tab releases this memory; or at least: "Memory in use" goes back down to ~38MB.

Running the same test in Chrome 8, memory used for that tab fluctuates between 20MB and 75MB.

I'm assuming here that 1) Chrome is running the same script Firefox is, 2) the script isn't itself hanging onto stuff (which would imply Chrome is dropping things it shouldn't).

I've attached a chart showing various values I grabbed from about:memory once every minute. Not shown are the ones that remained steady during the test, most of which didn't seem relevant anyway. Here are a few that might be useful:

Name                          before  during
gfx/image/surface              266kB   275kB
content/canvas/2d_pixel_bytes    0 B   1.2MB
js/mjit-code                    85kB   1.6MB -> 1.0MB
js/gc-heap                       5MB   ~79MB

As you can see from the chart, memory use doesn't really increase until after 2 minutes (roughly 2m24s ± 2s if my math is right), and "layout/all" accounts for about 40% of the growth.

[0] I've seen 4GB, but by then swapping has slowed everything down so much it takes a few minutes for the about:memory page to reload
Forgot to mention: the tab with the demo needs to be visible to trigger this.

With both in the same window and the about:memory tab focused memory usage is steady. With the demo tab focused memory usage goes up as described. For my testing and convenience I had the about:memory tab open in a second window.
Interesting.  If you disable image discarding, do you see the increase even when the tab is not focused?
> image.mem.discardable = false

No difference

> layout.throttled_frame_rate = 60

This explains comment 1. With that set I now see the problem even with the demo tab not focused.
OK, so that's just us not doing much restyling/layout in that background tab.  Gotcha.

I wonder whether we're seeing actual memory use in the presshell arena or fragmentation...
I can reproduce this on Windows 7. If it is fragmentation, jemalloc isn't helping.
OS: Mac OS X → All
I meant fragmentation in the frame arena.  If you're compiling, you could try turning off frame arenas to test that...
I uncommented the DEBUG_TRACEMALLOC_PRESARENA #define in nsPresArena.h. Now memory usage seems to be holding steady at about 50MB. I think you're on to something.
This sounds really bad. Is this a recent regression?
And does it depend on enabling/disabling GL acceleration?
Hrm, so I let it run all night (in this case, roughly 7 hours) and memory usage is now up to 5GB (malloc/allocated, malloc/mapped, malloc/zone0/committed, malloc/zone0/allocated). "layout/all" is staying at 32kB, which I guess is to be expected with the above change.

Interestingly, the browser was quite responsive and reloaded the about:memory page fairly quickly (I blinked and missed it), as opposed to taking a few minutes. malloc vs. arenas? Not laying blame, I don't think arenas were intended to handle large allocation numbers, just curious.


roc: I have no idea whether it's recent. Let me run this test with GL acceleration flipped (any other low hanging fruit?), then I'll see if I can get a regression range.
So you're still getting the memory increase, but it's in a different bucket now?
Yeah, I'm still seeing the memory increase, but it's now going straight to malloc instead of mFrameArena. Note that (with that #define commented out) "layout/all" only accounts for 40% of the memory leaked.

I checked 4.0b1 32-bit and 64-bit and could reproduce it there. I can't seem to trigger it on 3.6.13 though.
Seems like this should be possible to catch with valgrind/leakcheck -- I assume it should be reproducible on linux?
That depends.  jag, with arenas disabled do we still free the memory on tab close?
Yeah, with frame arenas disabled we still free the memory on tab/window close. Just be aware that it may take a bit longer before you'll start to see memory used go up. Once it does though it'll keep going.
OK, so it sounds like we need to diff two heaps here, basically...
To stop the animation:

  CC.stop();

Memory used stays level after that.

At roc's suggestion I ran malloc_history (with -all_by_count):

malloc_history Report Version:  2.0
Process:         firefox-bin [89428]
Path:            /Users/jag/moz-hg/ff-opt/dist/Minefield.app/Contents/MacOS/firefox-bin
Load Address:    0x100000000
Identifier:      org.mozilla.minefield
Version:         4.0b10pre (4.0b10pre)
Code Type:       X86-64 (Native)
Parent Process:  bash [89314]

Date/Time:       2011-01-14 21:42:17.663 +0100
OS Version:      Mac OS X 10.6.6 (10J567)
Report Version:  6

7315120 calls for 370931678 bytes: thread_7fff70f12ca0 |moz_xmalloc | malloc | malloc_zone_malloc 

1450060 calls for 104404320 bytes: thread_7fff70f12ca0 |nsRuleNode::Transition(nsIStyleRule*, unsigned char, unsigned char) | malloc | malloc_zone_malloc 

1450024 calls for 162402688 bytes: thread_7fff70f12ca0 |nsRuleNode::ComputeBackgroundData(void*, nsCSSStruct const&, nsStyleContext*, nsRuleNode*, nsRuleNode::RuleDetail, int) | malloc | malloc_zone_malloc 

5102 calls for 2005539 bytes: thread_7fff70f12ca0 |JSScript::NewScript(JSContext*, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned short, unsigned short) | malloc | malloc_zone_malloc
Ah, and -all_by_size is of course what I was looking for. Adding these lines to cover everything over 1MB:

155 calls for 35096320 bytes: thread_7fff70f12ca0 |ChangeTable | malloc | malloc_zone_malloc 

199 calls for 3675729 bytes: thread_7fff70f12ca0 |JS_ArenaAllocate | malloc | malloc_zone_malloc 

1475 calls for 3241752 bytes: thread_7fff70f12ca0 |JSObject::allocSlots(JSContext*, unsigned long) | malloc | malloc_zone_malloc 

3 calls for 1572984 bytes: thread_7fff70f12ca0 |js::InitJIT(js::TraceMonitor*) | operator new(unsigned long) | malloc | malloc_zone_malloc 

3 calls for 1572984 bytes: thread_7fff70f12ca0 |js::InitJIT(js::TraceMonitor*) | operator new(unsigned long) | malloc | malloc_zone_malloc 

3 calls for 1572984 bytes: thread_7fff70f12ca0 |js::InitJIT(js::TraceMonitor*) | operator new(unsigned long) | malloc | malloc_zone_malloc 

1228 calls for 1405624 bytes: thread_7fff70f12ca0 |XPT_ArenaMalloc | calloc | malloc_zone_calloc 

3331 calls for 1229176 bytes: thread_7fff70f12ca0 |moz_xrealloc | realloc | malloc_zone_realloc 

2 calls for 1228816 bytes: thread_7fff70f12ca0 |gfxQuartzSurface::gfxQuartzSurface(gfxSize const&, gfxASurface::gfxImageFormat, int) | _moz_cairo_quartz_surface_create | malloc | malloc_zone_malloc 

1 calls for 1228800 bytes: thread_7fff70f12ca0 |js::ArrayBuffer::create(JSContext*, unsigned int, js::Value*, js::Value*) | calloc | malloc_zone_calloc
So this doesn't seem to leak on Win32, and looking at the code it creates one IimageData object and reuses it.  (Watching win32 task manager private bytes.)  I am going to guess this is Mac-only.
I've definitely seen this happen on Windows 7. Running inside VMWare, fwiw.

STR:

1) load the URL and leave the tab visible
2) wait for 10-15 minutes
3) notice memory use soar

Step 2 is the tricky part. In an unmodified build I usually see memory use start climbing after 2.5 - 3 minutes. With that #define un-commented it can take quite a bit longer. This probably also depends on the FPS.
On x64-linux, I ran it on massif out to about 200 billion insns
(several minutes worth) and profiled both at the malloc heap level
and the process-wide (page) level.

For page level profiling, I can't see any leaking: the process size
pulls up quite rapidly to about 740MB and then stays absolutely at
that size for the rest of the run.

For heap profiling, basically it doesn't leak, but it isn't dead
flat either.  It assumes a modest sawtooth shape over time, with 
roughly one tooth every 50 billion insns.  I assume that this is 
either CC or GC dumping stuff periodically.  The heap component
that appears to be varying here is allocated by the following
chain

   js::mjit::stubs::SetElem<0>(js::VMFrame&)
      array_setProperty(JSContext*, JSObject*, long, 
          JSOject::ensureDenseArrayElements
              JSObject::growSlots
                  presumably calling malloc/calloc/new

with about 3.5MB allocation just before GC, falling to circa 500k
immediately after.

but I don't see how that can have cranked space use up the the
gigabytes range, unless GC is getting deferred indefinitely.

So basically I don't think I've found anything.  I'll see if I can
redo the same experiment on OSX.
Modify step 1 as such:

1) load the URL, click "Show Options" if need be, leave the tab visible.
I think I have a reduced testcase (basically the animated palette).
This simpler testcase shows the bug for me and bz, who has an idea (or two?) for fixes.
I just noticed that the testcase is missing an & 255 for the r value (or modulo 0x1000000 on j), not that anyone will get that far in any normal testing ;-)

For the curious, the preliminary patch (suggested by bz & dbaron) I'll be testing overnight:

diff --git a/layout/style/nsStyleSet.cpp b/layout/style/nsStyleSet.cpp
--- a/layout/style/nsStyleSet.cpp
+++ b/layout/style/nsStyleSet.cpp
@@ -1092,17 +1092,17 @@ nsStyleSet::NotifyStyleContextDestroyed(
   if (mInReconstruct)
     return;
 
-  if (mUnusedRuleNodeCount == kGCInterval) {
+  if (mUnusedRuleNodeCount >= kGCInterval) {
     GCRuleTrees();
   }
 }
(In reply to comment #25)
> [...] preliminary patch [...]

Do we have any specific, unambiguous profiler-based evidence about
exactly where the extra space is being allocated?

I ran the original URL on M-C on OSX 10.6 (64-bit) overnight on
Massif, and didn't see any leakage.


(In reply to comment #16)
> OK, so it sounds like we need to diff two heaps here, basically...

We don't need anything that high-tech.  For a leak of this magnitude,
all we need is to be able to run it on Massif and reproduce the leak.
Then a minute or two peering at the data with massif-visualizer should
make it clear what the allocation stuff is.  This is what we did with
various JS engine leaks.
What bz saw is that in certain cases mUnusedRuleNodeCount can be kGCInterval + 1, so that GCRuleTrees(), normally used to garbage collect nsRuleNodes, wouldn't get called. Once that's happened, mUnusedRuleNodeCount will continue to grow, so no more GC for nsRuleNodes, at least until document tear-down time.

With the change from comment 25 I haven't been able to reproduce the problem so far.
blocking2.0: --- → ?
> Do we have any specific, unambiguous profiler-based evidence about
> exactly where the extra space is being allocated?

"Maybe".  We have comment 17, which shows tons of allocations for rulenodes and style data.  But in this case that's expected.  What's also expected is that they would be freed; I don't know whether the comment 17 data excludes things that have been freed...

What we also have is me running the "simpler testcase" locally while logging the number of live rulenodes and seeing the number stay stable in the 1600-1800 range for a while, then explode into the millions.  That's definitely fixed by the proposed patch.

> I ran the original URL on M-C on OSX 10.6 (64-bit) overnight on
> Massif, and didn't see any leakage.

This _could_ randomly happen, I suppose, if you happen to always resolve pseudo style at the right time.  Seems odd, though.  I assume "leakage" here would include "presshell arena gets really big", right?

> We don't need anything that high-tech.

Yeah, not for this case.  I was initially worried that the big leak might not be reproducible and hoping that looking at memory usage changes even without the big leak would give some hints.  In retrospect, it wouldn't have, though.
Blocks: 576726
Oh, and the other question that arose while debugging this is the issue of rulenodes not releasing their parents.  That was added in bug 514773 and we sort of discussed it at the time.  Doing that means that we gc whenever we get to 300 unused _leaves_ as opposed to 300 unused rulenodes.  That seems fine to me; if we switched to releasing parents when we transition to a zero refcount, we would need to bump the gc threshold, imo.

I'll add some comments explaining that, though.
Oh, and just for the record the issue was that probing for a pseudo style could increase the number of unused rulenodes (due to the restriction rule) without triggering StyleContextDestroyed.  Thus we would not attempt to gc the ruletree at that point.  If the probing increased the unused count to precisely kGCInterval, then later, when we did try to gc, the unused rulenode count would already be kGCInterval+1, so we would lose.
Assignee: nobody → bzbarsky
blocking2.0: ? → final+
Priority: -- → P1
Whiteboard: [softblocker]
Whiteboard: [softblocker] → [need review][softblocker]
Comment on attachment 504792 [details] [diff] [review]
Make sure we don't get into a situation where we're failing to gc the ruletree for a page.

r=dbaron
Attachment #504792 - Flags: review?(dbaron) → review+
Whiteboard: [need review][softblocker] → [need landing][softblocker]
Pushed http://hg.mozilla.org/mozilla-central/rev/92ca75d707ae
Status: NEW → RESOLVED
Closed: 14 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Whiteboard: [need landing][softblocker] → [softblocker]
Target Milestone: --- → mozilla2.0b10
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: