Closed
Bug 625434
Opened 14 years ago
Closed 14 years ago
HTML5/Canvas demo ends up using lots of memory
Categories
(Core :: General, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla2.0b10
Tracking | Status | |
---|---|---|
blocking2.0 | --- | final+ |
People
(Reporter: jag+mozilla, Assigned: bzbarsky)
References
()
Details
(Keywords: memory-footprint, Whiteboard: [softblocker])
Attachments
(3 files)
Tested with 4.0b8 and a recent trunk build (both 64-bit) on Mac. If I load the page at http://www.effectgames.com/demos/canvascycle/?sound=0 and let it sit there for a while -- 20 - 30 minutes -- "Memory in use" steadily goes up from ~38MB to a few GB[0]. Closing the tab releases this memory; or at least: "Memory in use" goes back down to ~38MB. Running the same test in Chrome 8, memory used for that tab fluctuates between 20MB and 75MB. I'm assuming here that 1) Chrome is running the same script Firefox is, 2) the script isn't itself hanging onto stuff (which would imply Chrome is dropping things it shouldn't). I've attached a chart showing various values I grabbed from about:memory once every minute. Not shown are the ones that remained steady during the test, most of which didn't seem relevant anyway. Here are a few that might be useful: Name before during gfx/image/surface 266kB 275kB content/canvas/2d_pixel_bytes 0 B 1.2MB js/mjit-code 85kB 1.6MB -> 1.0MB js/gc-heap 5MB ~79MB As you can see from the chart, memory use doesn't really increase until after 2 minutes (roughly 2m24s ± 2s if my math is right), and "layout/all" accounts for about 40% of the growth. [0] I've seen 4GB, but by then swapping has slowed everything down so much it takes a few minutes for the about:memory page to reload
Reporter | ||
Comment 1•14 years ago
|
||
Forgot to mention: the tab with the demo needs to be visible to trigger this. With both in the same window and the about:memory tab focused memory usage is steady. With the demo tab focused memory usage goes up as described. For my testing and convenience I had the about:memory tab open in a second window.
Assignee | ||
Comment 2•14 years ago
|
||
Interesting. If you disable image discarding, do you see the increase even when the tab is not focused?
Reporter | ||
Comment 3•14 years ago
|
||
> image.mem.discardable = false No difference > layout.throttled_frame_rate = 60 This explains comment 1. With that set I now see the problem even with the demo tab not focused.
Assignee | ||
Comment 4•14 years ago
|
||
OK, so that's just us not doing much restyling/layout in that background tab. Gotcha. I wonder whether we're seeing actual memory use in the presshell arena or fragmentation...
Reporter | ||
Comment 5•14 years ago
|
||
I can reproduce this on Windows 7. If it is fragmentation, jemalloc isn't helping.
OS: Mac OS X → All
Assignee | ||
Comment 6•14 years ago
|
||
I meant fragmentation in the frame arena. If you're compiling, you could try turning off frame arenas to test that...
Reporter | ||
Comment 7•14 years ago
|
||
I uncommented the DEBUG_TRACEMALLOC_PRESARENA #define in nsPresArena.h. Now memory usage seems to be holding steady at about 50MB. I think you're on to something.
This sounds really bad. Is this a recent regression?
And does it depend on enabling/disabling GL acceleration?
Reporter | ||
Comment 10•14 years ago
|
||
Hrm, so I let it run all night (in this case, roughly 7 hours) and memory usage is now up to 5GB (malloc/allocated, malloc/mapped, malloc/zone0/committed, malloc/zone0/allocated). "layout/all" is staying at 32kB, which I guess is to be expected with the above change. Interestingly, the browser was quite responsive and reloaded the about:memory page fairly quickly (I blinked and missed it), as opposed to taking a few minutes. malloc vs. arenas? Not laying blame, I don't think arenas were intended to handle large allocation numbers, just curious. roc: I have no idea whether it's recent. Let me run this test with GL acceleration flipped (any other low hanging fruit?), then I'll see if I can get a regression range.
Assignee | ||
Comment 11•14 years ago
|
||
So you're still getting the memory increase, but it's in a different bucket now?
Reporter | ||
Comment 12•14 years ago
|
||
Yeah, I'm still seeing the memory increase, but it's now going straight to malloc instead of mFrameArena. Note that (with that #define commented out) "layout/all" only accounts for 40% of the memory leaked. I checked 4.0b1 32-bit and 64-bit and could reproduce it there. I can't seem to trigger it on 3.6.13 though.
Seems like this should be possible to catch with valgrind/leakcheck -- I assume it should be reproducible on linux?
Assignee | ||
Comment 14•14 years ago
|
||
That depends. jag, with arenas disabled do we still free the memory on tab close?
Reporter | ||
Comment 15•14 years ago
|
||
Yeah, with frame arenas disabled we still free the memory on tab/window close. Just be aware that it may take a bit longer before you'll start to see memory used go up. Once it does though it'll keep going.
Assignee | ||
Comment 16•14 years ago
|
||
OK, so it sounds like we need to diff two heaps here, basically...
Reporter | ||
Comment 17•14 years ago
|
||
To stop the animation: CC.stop(); Memory used stays level after that. At roc's suggestion I ran malloc_history (with -all_by_count): malloc_history Report Version: 2.0 Process: firefox-bin [89428] Path: /Users/jag/moz-hg/ff-opt/dist/Minefield.app/Contents/MacOS/firefox-bin Load Address: 0x100000000 Identifier: org.mozilla.minefield Version: 4.0b10pre (4.0b10pre) Code Type: X86-64 (Native) Parent Process: bash [89314] Date/Time: 2011-01-14 21:42:17.663 +0100 OS Version: Mac OS X 10.6.6 (10J567) Report Version: 6 7315120 calls for 370931678 bytes: thread_7fff70f12ca0 |moz_xmalloc | malloc | malloc_zone_malloc 1450060 calls for 104404320 bytes: thread_7fff70f12ca0 |nsRuleNode::Transition(nsIStyleRule*, unsigned char, unsigned char) | malloc | malloc_zone_malloc 1450024 calls for 162402688 bytes: thread_7fff70f12ca0 |nsRuleNode::ComputeBackgroundData(void*, nsCSSStruct const&, nsStyleContext*, nsRuleNode*, nsRuleNode::RuleDetail, int) | malloc | malloc_zone_malloc 5102 calls for 2005539 bytes: thread_7fff70f12ca0 |JSScript::NewScript(JSContext*, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned short, unsigned short) | malloc | malloc_zone_malloc
Reporter | ||
Comment 18•14 years ago
|
||
Ah, and -all_by_size is of course what I was looking for. Adding these lines to cover everything over 1MB: 155 calls for 35096320 bytes: thread_7fff70f12ca0 |ChangeTable | malloc | malloc_zone_malloc 199 calls for 3675729 bytes: thread_7fff70f12ca0 |JS_ArenaAllocate | malloc | malloc_zone_malloc 1475 calls for 3241752 bytes: thread_7fff70f12ca0 |JSObject::allocSlots(JSContext*, unsigned long) | malloc | malloc_zone_malloc 3 calls for 1572984 bytes: thread_7fff70f12ca0 |js::InitJIT(js::TraceMonitor*) | operator new(unsigned long) | malloc | malloc_zone_malloc 3 calls for 1572984 bytes: thread_7fff70f12ca0 |js::InitJIT(js::TraceMonitor*) | operator new(unsigned long) | malloc | malloc_zone_malloc 3 calls for 1572984 bytes: thread_7fff70f12ca0 |js::InitJIT(js::TraceMonitor*) | operator new(unsigned long) | malloc | malloc_zone_malloc 1228 calls for 1405624 bytes: thread_7fff70f12ca0 |XPT_ArenaMalloc | calloc | malloc_zone_calloc 3331 calls for 1229176 bytes: thread_7fff70f12ca0 |moz_xrealloc | realloc | malloc_zone_realloc 2 calls for 1228816 bytes: thread_7fff70f12ca0 |gfxQuartzSurface::gfxQuartzSurface(gfxSize const&, gfxASurface::gfxImageFormat, int) | _moz_cairo_quartz_surface_create | malloc | malloc_zone_malloc 1 calls for 1228800 bytes: thread_7fff70f12ca0 |js::ArrayBuffer::create(JSContext*, unsigned int, js::Value*, js::Value*) | calloc | malloc_zone_calloc
So this doesn't seem to leak on Win32, and looking at the code it creates one IimageData object and reuses it. (Watching win32 task manager private bytes.) I am going to guess this is Mac-only.
Reporter | ||
Comment 20•14 years ago
|
||
I've definitely seen this happen on Windows 7. Running inside VMWare, fwiw. STR: 1) load the URL and leave the tab visible 2) wait for 10-15 minutes 3) notice memory use soar Step 2 is the tricky part. In an unmodified build I usually see memory use start climbing after 2.5 - 3 minutes. With that #define un-commented it can take quite a bit longer. This probably also depends on the FPS.
Comment 21•14 years ago
|
||
On x64-linux, I ran it on massif out to about 200 billion insns (several minutes worth) and profiled both at the malloc heap level and the process-wide (page) level. For page level profiling, I can't see any leaking: the process size pulls up quite rapidly to about 740MB and then stays absolutely at that size for the rest of the run. For heap profiling, basically it doesn't leak, but it isn't dead flat either. It assumes a modest sawtooth shape over time, with roughly one tooth every 50 billion insns. I assume that this is either CC or GC dumping stuff periodically. The heap component that appears to be varying here is allocated by the following chain js::mjit::stubs::SetElem<0>(js::VMFrame&) array_setProperty(JSContext*, JSObject*, long, JSOject::ensureDenseArrayElements JSObject::growSlots presumably calling malloc/calloc/new with about 3.5MB allocation just before GC, falling to circa 500k immediately after. but I don't see how that can have cranked space use up the the gigabytes range, unless GC is getting deferred indefinitely. So basically I don't think I've found anything. I'll see if I can redo the same experiment on OSX.
Reporter | ||
Comment 22•14 years ago
|
||
Modify step 1 as such: 1) load the URL, click "Show Options" if need be, leave the tab visible.
Reporter | ||
Comment 23•14 years ago
|
||
I think I have a reduced testcase (basically the animated palette).
Reporter | ||
Comment 24•14 years ago
|
||
This simpler testcase shows the bug for me and bz, who has an idea (or two?) for fixes.
Reporter | ||
Comment 25•14 years ago
|
||
I just noticed that the testcase is missing an & 255 for the r value (or modulo 0x1000000 on j), not that anyone will get that far in any normal testing ;-) For the curious, the preliminary patch (suggested by bz & dbaron) I'll be testing overnight: diff --git a/layout/style/nsStyleSet.cpp b/layout/style/nsStyleSet.cpp --- a/layout/style/nsStyleSet.cpp +++ b/layout/style/nsStyleSet.cpp @@ -1092,17 +1092,17 @@ nsStyleSet::NotifyStyleContextDestroyed( if (mInReconstruct) return; - if (mUnusedRuleNodeCount == kGCInterval) { + if (mUnusedRuleNodeCount >= kGCInterval) { GCRuleTrees(); } }
Comment 26•14 years ago
|
||
(In reply to comment #25) > [...] preliminary patch [...] Do we have any specific, unambiguous profiler-based evidence about exactly where the extra space is being allocated? I ran the original URL on M-C on OSX 10.6 (64-bit) overnight on Massif, and didn't see any leakage. (In reply to comment #16) > OK, so it sounds like we need to diff two heaps here, basically... We don't need anything that high-tech. For a leak of this magnitude, all we need is to be able to run it on Massif and reproduce the leak. Then a minute or two peering at the data with massif-visualizer should make it clear what the allocation stuff is. This is what we did with various JS engine leaks.
Reporter | ||
Comment 27•14 years ago
|
||
What bz saw is that in certain cases mUnusedRuleNodeCount can be kGCInterval + 1, so that GCRuleTrees(), normally used to garbage collect nsRuleNodes, wouldn't get called. Once that's happened, mUnusedRuleNodeCount will continue to grow, so no more GC for nsRuleNodes, at least until document tear-down time. With the change from comment 25 I haven't been able to reproduce the problem so far.
Updated•14 years ago
|
blocking2.0: --- → ?
Assignee | ||
Comment 28•14 years ago
|
||
> Do we have any specific, unambiguous profiler-based evidence about > exactly where the extra space is being allocated? "Maybe". We have comment 17, which shows tons of allocations for rulenodes and style data. But in this case that's expected. What's also expected is that they would be freed; I don't know whether the comment 17 data excludes things that have been freed... What we also have is me running the "simpler testcase" locally while logging the number of live rulenodes and seeing the number stay stable in the 1600-1800 range for a while, then explode into the millions. That's definitely fixed by the proposed patch. > I ran the original URL on M-C on OSX 10.6 (64-bit) overnight on > Massif, and didn't see any leakage. This _could_ randomly happen, I suppose, if you happen to always resolve pseudo style at the right time. Seems odd, though. I assume "leakage" here would include "presshell arena gets really big", right? > We don't need anything that high-tech. Yeah, not for this case. I was initially worried that the big leak might not be reproducible and hoping that looking at memory usage changes even without the big leak would give some hints. In retrospect, it wouldn't have, though.
Assignee | ||
Comment 29•14 years ago
|
||
Oh, and the other question that arose while debugging this is the issue of rulenodes not releasing their parents. That was added in bug 514773 and we sort of discussed it at the time. Doing that means that we gc whenever we get to 300 unused _leaves_ as opposed to 300 unused rulenodes. That seems fine to me; if we switched to releasing parents when we transition to a zero refcount, we would need to bump the gc threshold, imo. I'll add some comments explaining that, though.
Assignee | ||
Comment 30•14 years ago
|
||
Oh, and just for the record the issue was that probing for a pseudo style could increase the number of unused rulenodes (due to the restriction rule) without triggering StyleContextDestroyed. Thus we would not attempt to gc the ruletree at that point. If the probing increased the unused count to precisely kGCInterval, then later, when we did try to gc, the unused rulenode count would already be kGCInterval+1, so we would lose.
Assignee | ||
Updated•14 years ago
|
Assignee: nobody → bzbarsky
blocking2.0: ? → final+
Priority: -- → P1
Assignee | ||
Comment 31•14 years ago
|
||
Attachment #504792 -
Flags: review?(dbaron)
Updated•14 years ago
|
Whiteboard: [softblocker]
Assignee | ||
Updated•14 years ago
|
Whiteboard: [softblocker] → [need review][softblocker]
Comment on attachment 504792 [details] [diff] [review] Make sure we don't get into a situation where we're failing to gc the ruletree for a page. r=dbaron
Attachment #504792 -
Flags: review?(dbaron) → review+
Assignee | ||
Updated•14 years ago
|
Whiteboard: [need review][softblocker] → [need landing][softblocker]
Assignee | ||
Comment 33•14 years ago
|
||
Pushed http://hg.mozilla.org/mozilla-central/rev/92ca75d707ae
Status: NEW → RESOLVED
Closed: 14 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Whiteboard: [need landing][softblocker] → [softblocker]
Target Milestone: --- → mozilla2.0b10
You need to log in
before you can comment on or make changes to this bug.
Description
•