625434 - HTML5/Canvas demo ends up using lots of memory

Reporter

Description

•

14 years ago

Attached image 8-bit demo memory use over time — Details

Tested with 4.0b8 and a recent trunk build (both 64-bit) on Mac.

If I load the page at http://www.effectgames.com/demos/canvascycle/?sound=0 and let it sit there for a while -- 20 - 30 minutes -- "Memory in use" steadily goes up from ~38MB to a few GB[0]. Closing the tab releases this memory; or at least: "Memory in use" goes back down to ~38MB.

Running the same test in Chrome 8, memory used for that tab fluctuates between 20MB and 75MB.

I'm assuming here that 1) Chrome is running the same script Firefox is, 2) the script isn't itself hanging onto stuff (which would imply Chrome is dropping things it shouldn't).

I've attached a chart showing various values I grabbed from about:memory once every minute. Not shown are the ones that remained steady during the test, most of which didn't seem relevant anyway. Here are a few that might be useful:

Name                          before  during
gfx/image/surface              266kB   275kB
content/canvas/2d_pixel_bytes    0 B   1.2MB
js/mjit-code                    85kB   1.6MB -> 1.0MB
js/gc-heap                       5MB   ~79MB

As you can see from the chart, memory use doesn't really increase until after 2 minutes (roughly 2m24s ± 2s if my math is right), and "layout/all" accounts for about 40% of the growth.

[0] I've seen 4GB, but by then swapping has slowed everything down so much it takes a few minutes for the about:memory page to reload

jag (Peter Annema)

Reporter

Comment 1

•

14 years ago

Forgot to mention: the tab with the demo needs to be visible to trigger this.

With both in the same window and the about:memory tab focused memory usage is steady. With the demo tab focused memory usage goes up as described. For my testing and convenience I had the about:memory tab open in a second window.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 2

•

14 years ago

Interesting.  If you disable image discarding, do you see the increase even when the tab is not focused?

jag (Peter Annema)

Reporter

Comment 3

•

14 years ago

> image.mem.discardable = false

No difference

> layout.throttled_frame_rate = 60

This explains comment 1. With that set I now see the problem even with the demo tab not focused.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 4

•

14 years ago

OK, so that's just us not doing much restyling/layout in that background tab.  Gotcha.

I wonder whether we're seeing actual memory use in the presshell arena or fragmentation...

jag (Peter Annema)

Reporter

Comment 5

•

14 years ago

I can reproduce this on Windows 7. If it is fragmentation, jemalloc isn't helping.

OS: Mac OS X → All

Boris Zbarsky [:bzbarsky]

Assignee

Comment 6

•

14 years ago

I meant fragmentation in the frame arena.  If you're compiling, you could try turning off frame arenas to test that...

jag (Peter Annema)

Reporter

Comment 7

•

14 years ago

I uncommented the DEBUG_TRACEMALLOC_PRESARENA #define in nsPresArena.h. Now memory usage seems to be holding steady at about 50MB. I think you're on to something.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 8

•

14 years ago

This sounds really bad. Is this a recent regression?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 9

•

14 years ago

And does it depend on enabling/disabling GL acceleration?

jag (Peter Annema)

Reporter

Comment 10

•

14 years ago

Hrm, so I let it run all night (in this case, roughly 7 hours) and memory usage is now up to 5GB (malloc/allocated, malloc/mapped, malloc/zone0/committed, malloc/zone0/allocated). "layout/all" is staying at 32kB, which I guess is to be expected with the above change.

Interestingly, the browser was quite responsive and reloaded the about:memory page fairly quickly (I blinked and missed it), as opposed to taking a few minutes. malloc vs. arenas? Not laying blame, I don't think arenas were intended to handle large allocation numbers, just curious.


roc: I have no idea whether it's recent. Let me run this test with GL acceleration flipped (any other low hanging fruit?), then I'll see if I can get a regression range.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 11

•

14 years ago

So you're still getting the memory increase, but it's in a different bucket now?

jag (Peter Annema)

Reporter

Comment 12

•

14 years ago

Yeah, I'm still seeing the memory increase, but it's now going straight to malloc instead of mFrameArena. Note that (with that #define commented out) "layout/all" only accounts for 40% of the memory leaked.

I checked 4.0b1 32-bit and 64-bit and could reproduce it there. I can't seem to trigger it on 3.6.13 though.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 13

•

14 years ago

Seems like this should be possible to catch with valgrind/leakcheck -- I assume it should be reproducible on linux?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 14

•

14 years ago

That depends.  jag, with arenas disabled do we still free the memory on tab close?

jag (Peter Annema)

Reporter

Comment 15

•

14 years ago

Yeah, with frame arenas disabled we still free the memory on tab/window close. Just be aware that it may take a bit longer before you'll start to see memory used go up. Once it does though it'll keep going.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 16

•

14 years ago

OK, so it sounds like we need to diff two heaps here, basically...

jag (Peter Annema)

Reporter

Comment 17

•

14 years ago

To stop the animation:

  CC.stop();

Memory used stays level after that.

At roc's suggestion I ran malloc_history (with -all_by_count):

malloc_history Report Version:  2.0
Process:         firefox-bin [89428]
Path:            /Users/jag/moz-hg/ff-opt/dist/Minefield.app/Contents/MacOS/firefox-bin
Load Address:    0x100000000
Identifier:      org.mozilla.minefield
Version:         4.0b10pre (4.0b10pre)
Code Type:       X86-64 (Native)
Parent Process:  bash [89314]

Date/Time:       2011-01-14 21:42:17.663 +0100
OS Version:      Mac OS X 10.6.6 (10J567)
Report Version:  6

7315120 calls for 370931678 bytes: thread_7fff70f12ca0 |moz_xmalloc | malloc | malloc_zone_malloc 

1450060 calls for 104404320 bytes: thread_7fff70f12ca0 |nsRuleNode::Transition(nsIStyleRule*, unsigned char, unsigned char) | malloc | malloc_zone_malloc 

1450024 calls for 162402688 bytes: thread_7fff70f12ca0 |nsRuleNode::ComputeBackgroundData(void*, nsCSSStruct const&, nsStyleContext*, nsRuleNode*, nsRuleNode::RuleDetail, int) | malloc | malloc_zone_malloc 

5102 calls for 2005539 bytes: thread_7fff70f12ca0 |JSScript::NewScript(JSContext*, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned short, unsigned short) | malloc | malloc_zone_malloc

jag (Peter Annema)

Reporter

Comment 18

•

14 years ago

Ah, and -all_by_size is of course what I was looking for. Adding these lines to cover everything over 1MB:

155 calls for 35096320 bytes: thread_7fff70f12ca0 |ChangeTable | malloc | malloc_zone_malloc 

199 calls for 3675729 bytes: thread_7fff70f12ca0 |JS_ArenaAllocate | malloc | malloc_zone_malloc 

1475 calls for 3241752 bytes: thread_7fff70f12ca0 |JSObject::allocSlots(JSContext*, unsigned long) | malloc | malloc_zone_malloc 

3 calls for 1572984 bytes: thread_7fff70f12ca0 |js::InitJIT(js::TraceMonitor*) | operator new(unsigned long) | malloc | malloc_zone_malloc 

3 calls for 1572984 bytes: thread_7fff70f12ca0 |js::InitJIT(js::TraceMonitor*) | operator new(unsigned long) | malloc | malloc_zone_malloc 

3 calls for 1572984 bytes: thread_7fff70f12ca0 |js::InitJIT(js::TraceMonitor*) | operator new(unsigned long) | malloc | malloc_zone_malloc 

1228 calls for 1405624 bytes: thread_7fff70f12ca0 |XPT_ArenaMalloc | calloc | malloc_zone_calloc 

3331 calls for 1229176 bytes: thread_7fff70f12ca0 |moz_xrealloc | realloc | malloc_zone_realloc 

2 calls for 1228816 bytes: thread_7fff70f12ca0 |gfxQuartzSurface::gfxQuartzSurface(gfxSize const&, gfxASurface::gfxImageFormat, int) | _moz_cairo_quartz_surface_create | malloc | malloc_zone_malloc 

1 calls for 1228800 bytes: thread_7fff70f12ca0 |js::ArrayBuffer::create(JSContext*, unsigned int, js::Value*, js::Value*) | calloc | malloc_zone_calloc

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 19

•

14 years ago

So this doesn't seem to leak on Win32, and looking at the code it creates one IimageData object and reuses it.  (Watching win32 task manager private bytes.)  I am going to guess this is Mac-only.

jag (Peter Annema)

Reporter

Comment 20

•

14 years ago

I've definitely seen this happen on Windows 7. Running inside VMWare, fwiw.

STR:

1) load the URL and leave the tab visible
2) wait for 10-15 minutes
3) notice memory use soar

Step 2 is the tricky part. In an unmodified build I usually see memory use start climbing after 2.5 - 3 minutes. With that #define un-commented it can take quite a bit longer. This probably also depends on the FPS.

Julian Seward [:jseward]

Comment 21

•

14 years ago

On x64-linux, I ran it on massif out to about 200 billion insns
(several minutes worth) and profiled both at the malloc heap level
and the process-wide (page) level.

For page level profiling, I can't see any leaking: the process size
pulls up quite rapidly to about 740MB and then stays absolutely at
that size for the rest of the run.

For heap profiling, basically it doesn't leak, but it isn't dead
flat either.  It assumes a modest sawtooth shape over time, with 
roughly one tooth every 50 billion insns.  I assume that this is 
either CC or GC dumping stuff periodically.  The heap component
that appears to be varying here is allocated by the following
chain

   js::mjit::stubs::SetElem<0>(js::VMFrame&)
      array_setProperty(JSContext*, JSObject*, long, 
          JSOject::ensureDenseArrayElements
              JSObject::growSlots
                  presumably calling malloc/calloc/new

with about 3.5MB allocation just before GC, falling to circa 500k
immediately after.

but I don't see how that can have cranked space use up the the
gigabytes range, unless GC is getting deferred indefinitely.

So basically I don't think I've found anything.  I'll see if I can
redo the same experiment on OSX.

jag (Peter Annema)

Reporter

Comment 22

•

14 years ago

Modify step 1 as such:

1) load the URL, click "Show Options" if need be, leave the tab visible.

jag (Peter Annema)

Reporter

Comment 23

•

14 years ago

I think I have a reduced testcase (basically the animated palette).

jag (Peter Annema)

Reporter

Comment 24

•

14 years ago

Attached file Simpler testcase of just the palette — Details

This simpler testcase shows the bug for me and bz, who has an idea (or two?) for fixes.

jag (Peter Annema)

Reporter

Comment 25

•

14 years ago

I just noticed that the testcase is missing an & 255 for the r value (or modulo 0x1000000 on j), not that anyone will get that far in any normal testing ;-)

For the curious, the preliminary patch (suggested by bz & dbaron) I'll be testing overnight:

diff --git a/layout/style/nsStyleSet.cpp b/layout/style/nsStyleSet.cpp
--- a/layout/style/nsStyleSet.cpp
+++ b/layout/style/nsStyleSet.cpp
@@ -1092,17 +1092,17 @@ nsStyleSet::NotifyStyleContextDestroyed(
   if (mInReconstruct)
     return;
 
-  if (mUnusedRuleNodeCount == kGCInterval) {
+  if (mUnusedRuleNodeCount >= kGCInterval) {
     GCRuleTrees();
   }
 }

Julian Seward [:jseward]

Comment 26

•

14 years ago

(In reply to comment #25)
> [...] preliminary patch [...]

Do we have any specific, unambiguous profiler-based evidence about
exactly where the extra space is being allocated?

I ran the original URL on M-C on OSX 10.6 (64-bit) overnight on
Massif, and didn't see any leakage.


(In reply to comment #16)
> OK, so it sounds like we need to diff two heaps here, basically...

We don't need anything that high-tech.  For a leak of this magnitude,
all we need is to be able to run it on Massif and reproduce the leak.
Then a minute or two peering at the data with massif-visualizer should
make it clear what the allocation stuff is.  This is what we did with
various JS engine leaks.

jag (Peter Annema)

Reporter

Comment 27

•

14 years ago

What bz saw is that in certain cases mUnusedRuleNodeCount can be kGCInterval + 1, so that GCRuleTrees(), normally used to garbage collect nsRuleNodes, wouldn't get called. Once that's happened, mUnusedRuleNodeCount will continue to grow, so no more GC for nsRuleNodes, at least until document tear-down time.

With the change from comment 25 I haven't been able to reproduce the problem so far.

Nicholas Nethercote [inactive]

Updated

•

14 years ago

blocking2.0: --- → ?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 28

•

14 years ago

> Do we have any specific, unambiguous profiler-based evidence about
> exactly where the extra space is being allocated?

"Maybe".  We have comment 17, which shows tons of allocations for rulenodes and style data.  But in this case that's expected.  What's also expected is that they would be freed; I don't know whether the comment 17 data excludes things that have been freed...

What we also have is me running the "simpler testcase" locally while logging the number of live rulenodes and seeing the number stay stable in the 1600-1800 range for a while, then explode into the millions.  That's definitely fixed by the proposed patch.

> I ran the original URL on M-C on OSX 10.6 (64-bit) overnight on
> Massif, and didn't see any leakage.

This _could_ randomly happen, I suppose, if you happen to always resolve pseudo style at the right time.  Seems odd, though.  I assume "leakage" here would include "presshell arena gets really big", right?

> We don't need anything that high-tech.

Yeah, not for this case.  I was initially worried that the big leak might not be reproducible and hoping that looking at memory usage changes even without the big leak would give some hints.  In retrospect, it wouldn't have, though.

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

14 years ago

Blocks: 576726

Boris Zbarsky [:bzbarsky]

Assignee

Comment 29

•

14 years ago

Oh, and the other question that arose while debugging this is the issue of rulenodes not releasing their parents.  That was added in bug 514773 and we sort of discussed it at the time.  Doing that means that we gc whenever we get to 300 unused _leaves_ as opposed to 300 unused rulenodes.  That seems fine to me; if we switched to releasing parents when we transition to a zero refcount, we would need to bump the gc threshold, imo.

I'll add some comments explaining that, though.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 30

•

14 years ago

Oh, and just for the record the issue was that probing for a pseudo style could increase the number of unused rulenodes (due to the restriction rule) without triggering StyleContextDestroyed.  Thus we would not attempt to gc the ruletree at that point.  If the probing increased the unused count to precisely kGCInterval, then later, when we did try to gc, the unused rulenode count would already be kGCInterval+1, so we would lose.

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

14 years ago

Assignee: nobody → bzbarsky

blocking2.0: ? → final+

Priority: -- → P1

Boris Zbarsky [:bzbarsky]

Assignee

Comment 31

•

14 years ago

Attached patch Make sure we don't get into a situation where we're failing to gc the ruletree for a page. — Details — Splinter Review

Attachment #504792 - Flags: review?(dbaron)

Benjamin Smedberg

Updated

•

14 years ago

Whiteboard: [softblocker]

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

14 years ago

Whiteboard: [softblocker] → [need review][softblocker]

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 32

•

14 years ago

Comment on attachment 504792 [details] [diff] [review]
Make sure we don't get into a situation where we're failing to gc the ruletree for a page.

r=dbaron

Attachment #504792 - Flags: review?(dbaron) → review+

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

14 years ago

Whiteboard: [need review][softblocker] → [need landing][softblocker]

Boris Zbarsky [:bzbarsky]

Assignee

Comment 33

•

14 years ago

Pushed http://hg.mozilla.org/mozilla-central/rev/92ca75d707ae

Status: NEW → RESOLVED

Closed: 14 years ago

Flags: in-testsuite?

Resolution: --- → FIXED

Whiteboard: [need landing][softblocker] → [softblocker]

Target Milestone: --- → mozilla2.0b10

8-bit demo memory use over time 14 years ago jag (Peter Annema) 43.33 KB, image/png		Details
Simpler testcase of just the palette 14 years ago jag (Peter Annema) 1.79 KB, text/html		Details
Make sure we don't get into a situation where we're failing to gc the ruletree for a page. 14 years ago Boris Zbarsky [:bzbarsky] 2.67 KB, patch	dbaron : review+	Details \| Diff \| Splinter Review