Closed
Bug 657115
Opened 14 years ago
Closed 7 years ago
Extensive memory usage during ROME demo
Categories
(Core :: Graphics: CanvasWebGL, defect)
Core
Graphics: CanvasWebGL
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: WonderCsabo, Unassigned)
References
()
Details
(Whiteboard: [MemShrink:P2][see comment 54])
Attachments
(4 files)
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Build Identifier: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0a1) Gecko/20110512 Firefox/6.0a1
The memory usage during the interactive film is terribly huge. It's far worse than Google Chrome 12.
Reproducible: Always
Steps to Reproduce:
1. Open the webpage
2. Play the demo
3. Check the memory usage
Reporter | ||
Updated•14 years ago
|
Version: unspecified → Trunk
Here's the link to the demo: http://www.ro.me/
Reporter | ||
Comment 2•14 years ago
|
||
I already provided it in the URL section. ;)
I have also noticed that after I have closed tab with that Demo, Firefox was doing something strange:
Memory usage was rising up to 2GB, then went down to 1,5GB and this behavior started to loop with some "rest time"
Comment 4•14 years ago
|
||
Related to Bug 656761 and thus Bug 652931?
Comment 5•14 years ago
|
||
Ah wait, Bug 652931 landed before your tested Nightly Build ...
Confirmed for Firefox 4.0.1 on OS X 10.6.7.
I ran the demo on my MacBook Pro with 4 GB RAM. I can't give any concrete numbers about the RAM usage though as after some time the whole OS become unresponsive to the point that I had real trouble killing the Firefox process. The demo runs fine with Chrome 11 where it uses about 700-800 RAM according to its internal task manager.
Confirmed on :::: Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110515 Firefox/6.0a1 #1305448436
Comment 10•14 years ago
|
||
I have some observations here (NoScript forums):
"Nightly 6.0a1 + NS 2.1.0.3: memory leak"
http://forums.informaction.com/viewtopic.php?f=10&t=6467
(Note that my methodology was not correct, nevertheless ...)
Other "ROME" Bugs:
Bug 657118 - Several problems with ROME demo's tech page
Bug 657151 - Long GC pause time caused by JSGC_FINALIZE_END callback
Reporter | ||
Comment 11•14 years ago
|
||
I tried with fresh profile when tested the memory usage. It's not a NoScript problem.
Reporter | ||
Comment 12•14 years ago
|
||
bjacob, any thoughts?
Comment 13•14 years ago
|
||
We are leaking here. STR is to run the demo til it finishes (run it in a background tab if it crashes due to OOM otherwise), then see that after closing the tab, even GC+GC does not clear the memory.
Longer version:
After preloading (that is, right before pressing BEGIN for the demo to begin), about:memory shows 650MB in js gc heap and 530MB in heap-unclassified (video stuff?). resident is 1.3GB and vsize is 2GB.
During the demo js gc rises to 750GB or higher. vsize rises to over 2.2GB, at which point the demo usually errors on out of memory on my 4GB laptop.
If I run the demo in another tab, it uses less memory (less frames since we throttle timers, rendering, etc.). In this case my laptop survives until the end of the demo.
*However*, it is left in a very bad state: even after closing the ro.me tab, memory usage only decreases a tiny bit (to 1.4GB resident, 2.2GB virtual). Every few seconds the browser locks up entirely for a few seconds, breaking shows this is in cycle collection. But nothing gets cleaned up, so there is a leak here.
It seems likely that we were leaking all through the demo then, which can explain the excessive memory usage during the demo.
njn, a side issue that I hope you can clarify: Why is virtual memory usage so much higher than resident here? I suspect that running out of virtual memory is why it OOMs on my 4GB laptop (since there is 1GB of physical memory left, + swap).
Blocks: mlk-fx7
Updated•14 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 14•14 years ago
|
||
Sorry I've still not finished the about:memory support for WebGL patch..
Just a naive question, if we were leaking video memory, could that explain the discrepancy between virtual and resident pointed out in comment 13?
Comment 15•14 years ago
|
||
If I just preload, and do not actually start the demo, I mostly OOM during CC (memory rises before it goes down, I guess). However, I succeeded once to run CC to completion, and memory usage went all the way down (note that this doesn't mean there isn't a leak later - this was just preloading, not running the actual demo as before).
I got the attached output during the successful CC.
1. "ASSERTION: Wrong element being closed" appears a lot. No idea if important or not.
2. WebGL reports a lot of 'Outstanding Textures' and other items, in a very lengthy output (I shortened it here, [MANY MANY NUMBERS] etc. were very very long lists of numerical indexes).
Comment 16•14 years ago
|
||
Also, 57 (!) threads exit as we do that CC after preload.
Comment 17•14 years ago
|
||
After the demo preloads, we have 50 threads doing media stuff, half nsBuiltinDecoderStateMachine::DecodeLoop and half nsBuiltinDecoderStateMachine::AdvanceFrame (see stacks below). There are also a few audio threads (nsBuiltinDecoderStateMachine::AudioLoop and nsOggReader::DecodeAudioData). All these threads stay active until the page is closed and CC is done.
kinetik, is this amount of threads expected?
I recall hearing that Linux threads take a lot of virtual memory for their stacks. A little googling gives sites claiming 2MB or 8MB. If 8MB, then these 50 threads can explain 400MB of virtual memory - about half of the difference between virtual and resident memory.
Stacks for the two types of threads (each appearing 25 times):
=====
#2 0xb538dab1 in PR_WaitCondVar (cvar=0x758ff7c0, timeout=4294967295)
at /home/alon/Dev/mozilla-central/nsprpub/pr/src/pthreads/ptsynch.c:417
#3 0xb538dbb8 in PR_Wait (mon=0x74afc5c0, timeout=4294967295) at /home/alon/Dev/mozilla-central/nsprpub/pr/src/pthreads/ptsynch.c:614
#4 0xb72e4765 in mozilla::ReentrantMonitor::Wait (this=0xb3f4ce00, interval=4294967295)
at /home/alon/Dev/m-c/xpcom/build/BlockingResourceBase.cpp:346
#5 0xb664f2dc in mozilla::ReentrantMonitorAutoEnter::Wait (this=0x734fa580) at ../../dist/include/mozilla/ReentrantMonitor.h:224
#6 nsBuiltinDecoderStateMachine::DecodeLoop (this=0x734fa580)
at /home/alon/Dev/mozilla-central/content/media/nsBuiltinDecoderStateMachine.cpp:380
#7 0xb665490e in nsRunnableMethodImpl<void (nsBuiltinDecoderStateMachine::*)(), true>::Run (this=0xace2d5c0)
at ../../dist/include/nsThreadUtils.h:341
#8 0xb7376398 in nsThread::ProcessNextEvent (this=0x75aff700, mayWait=1, result=0x5e6ff23c)
at /home/alon/Dev/mozilla-central/xpcom/threads/nsThread.cpp:618
====
#2 0xb538dab1 in PR_WaitCondVar (cvar=0x758ff7c0, timeout=4294967295)
at /home/alon/Dev/mozilla-central/nsprpub/pr/src/pthreads/ptsynch.c:417
#3 0xb538dbb8 in PR_Wait (mon=0x74afc5c0, timeout=4294967295) at /home/alon/Dev/mozilla-central/nsprpub/pr/src/pthreads/ptsynch.c:614
#4 0xb72e4765 in mozilla::ReentrantMonitor::Wait (this=0xb3f4ce00, interval=4294967295)
at /home/alon/Dev/m-c/xpcom/build/BlockingResourceBase.cpp:346
#5 0xb664f9d8 in nsBuiltinDecoderStateMachine::AdvanceFrame (this=0x734fa580)
at /home/alon/Dev/mozilla-central/content/media/nsBuiltinDecoderStateMachine.cpp:1544
#6 0xb6653215 in nsBuiltinDecoderStateMachine::Run (this=0x734fa580)
at /home/alon/Dev/mozilla-central/content/media/nsBuiltinDecoderStateMachine.cpp:1141
#7 0xb7376398 in nsThread::ProcessNextEvent (this=0x75afe880, mayWait=1, result=0x5f0ff23c)
at /home/alon/Dev/mozilla-central/xpcom/threads/nsThread.cpp:618
====
Side comment: Using gdb and breaking on NS_NewThread and seeing each thread as it is created will consistently hang X for me. Much fun.
Comment 18•14 years ago
|
||
(In reply to comment #17)
> kinetik, is this amount of threads expected?
For each playing element there will be at least a decode thread (DecodeLoop), a state machine thread (AdvanceFrame), and an audio thread (AudioLoop). Behind the scenes, there's also another audio thread created by ALSA on Linux for every stream we open. OS X and Windows tend to have only a single audio thread for all open streams.
> I recall hearing that Linux threads take a lot of virtual memory for their
> stacks. A little googling gives sites claiming 2MB or 8MB. If 8MB, then
> these 50 threads can explain 400MB of virtual memory - about half of the
> difference between virtual and resident memory.
The default stack size is 8MB:
~% ulimit -Sa
-s: stack size (kbytes) 8192
Last I checked (mainly looking for calls to pthread_attr_setstacksize), I couldn't see anywhere that we change that at runtime.
> Side comment: Using gdb and breaking on NS_NewThread and seeing each thread
> as it is created will consistently hang X for me. Much fun.
Export GDK_DEBUG=nograbs in your environment before running the browser.
Comment 19•14 years ago
|
||
(In reply to comment #17)
> After the demo preloads, we have 50 threads doing media stuff
That sounds reminiscent of bug 592833. Mark as blocking this one?
Comment 20•14 years ago
|
||
azakai, it's entirely typical for the vsize to be a lot bigger than the resident; take a look at about:memory on some more typical workloads. So don't focus on that, vsize isn't a very useful metric.
This is a good bug report, though, sounds like a big problem.
Comment 21•14 years ago
|
||
> > Side comment: Using gdb and breaking on NS_NewThread and seeing each thread
> > as it is created will consistently hang X for me. Much fun.
>
> Export GDK_DEBUG=nograbs in your environment before running the browser.
Thanks, I did not know that!
Comment 22•14 years ago
|
||
This demo suffers from bug 656120 as well. Closing the tab is not always enough for memory to be reclaimed, but forcing a GC is (except when the leak mentioned earlier in this bug happens).
Depends on: 656120
Comment 23•14 years ago
|
||
The leak I saw, where GC takes a very long time and doesn't succeed, is apparently due to the cycle collector itself OOMing, which prevents it from working properly,
WARNING: out of memory: file /home/alon/Dev/m-c/xpcom/build/nsDeque.cpp, line 210
#0 nsDeque::Push (this=0xb2dfe038, aItem=0x3bdac0f4) at /home/alon/Dev/m-c/xpcom/build/nsDeque.cpp:211
#1 0xb739d176 in GraphWalker<scanVisitor>::DoWalk (this=0xb2dfe06c, aQueue=...)
at /home/alon/Dev/mozilla-central/xpcom/base/nsCycleCollector.cpp:1271
#2 0xb73993ec in GraphWalker<scanVisitor>::WalkFromRoots (this=0xb3f85800)
at /home/alon/Dev/mozilla-central/xpcom/base/nsCycleCollector.cpp:1254
#3 nsCycleCollector::ScanRoots (this=0xb3f85800) at /home/alon/Dev/mozilla-central/xpcom/base/nsCycleCollector.cpp:1850
#4 0xb739b584 in nsCycleCollector::BeginCollection (this=0xb3f85800, aForceGC=0, aListener=0x0)
at /home/alon/Dev/mozilla-central/xpcom/base/nsCycleCollector.cpp:2651
#5 0xb739dd6a in nsCycleCollectorRunner::Run (this=0xb3fe0d80) at /home/alon/Dev/mozilla-central/xpcom/base/nsCycleCollector.cpp:3327
#6 0xb7376398 in nsThread::ProcessNextEvent (this=0xb3f06e00, mayWait=1, result=0xb2dfe23c)
at /home/alon/Dev/mozilla-central/xpcom/threads/nsThread.cpp:618
#7 0xb72e2c8f in NS_ProcessNextEvent_P (thread=0x0, mayWait=1) at /home/alon/Dev/m-c/xpcom/build/nsThreadUtils.cpp:245
#8 0xb7376e50 in nsThread::ThreadFunc (arg=0xb3f06e00) at /home/alon/Dev/mozilla-central/xpcom/threads/nsThread.cpp:273
#9 0xb539567c in _pt_root (arg=0xb3f117f0) at /home/alon/Dev/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:187
#10 0xb7fb996e in start_thread (arg=0xb2dfeb70) at pthread_create.c:300
#11 0xb519aa4e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130
Perhaps we should abort when OOMing in the cycle collector? It might work out in other cases though, so I guess not.
So in summary, the main issues in this bug are
1. Many threads for media elements (bug 592833). A contributing factor to memory issues.
2. GC does not always run, memory which could be reclaimed is not reclaimed (bug 656120).
3. In general, higher JS memory usage than chrome which hits us in some cases, and I suspect here.
Updated•14 years ago
|
Whiteboard: [MemShrink:P1]
Comment 24•14 years ago
|
||
Oh, another thing worth noting about vsize is that you'll see 64MB of address space allocated per open audio stream due to bug 617852.
Updated•14 years ago
|
Assignee: nobody → jmuizelaar
Updated•14 years ago
|
Assignee: jmuizelaar → nobody
Updated•14 years ago
|
OS: Windows 7 → All
Hardware: x86 → All
Updated•14 years ago
|
Whiteboard: [MemShrink:P1] → [MemShrink:P2]
Comment 25•14 years ago
|
||
azakai, now that bug 656120 is fixed can you try the demo again and see if things have improved?
Comment 26•14 years ago
|
||
Overall memory use seems the same, perhaps slightly better - I max out at 2.0 GB instead of 2.2. It still OOMs sometimes. There are more GC pauses which I guess is to be expected. They are now very noticeable in some parts of the demo.
The leak at the end seems better or maybe I was just lucky. But closing the tab led to a several-minute-long GC which eventually did work, and memory went almost all the way back down (to 125MB resident, I guess higher than originally due to fragmentation etc.).
So there seems like there is a change (less memory, more GC pauses), but GC not being done was just one of the issues here.
Comment 27•14 years ago
|
||
Yeah the problems during this demo can't be solved with an "idle" GC :)
Comment 28•14 years ago
|
||
I looked a little bit deeper into the GC performance here.
For the 18 GCs we perform during this demo, 3 are coming from the new GC trigger.
As you mentioned, the working set for this demo is huge. About 370MB remains reachable after each GC. 95% of our pause time we spend in marking.
An average GC takes about 450 - 500ms which is definitely too long. Our max pause time is 600ms.
So until we have incremental and generational GC in place, the parallel marking patch should be a nice improvement here.
It's a chrome experiment and with their current canary version (where they fixed the typed array bug they had for bad GC performance in FOTN) I also see 2 GCs that take about 1.5sec and one GC that takes 700ms almost at the end of the demo. But they keep their heap much smaller and perform many short GCs.
Comment 29•14 years ago
|
||
Gregor, did you see the minutes-long GC that happens after the demo is done? Any idea why that one is so bad?
STR: Run the demo to completion. When it's all finished, close the tab and wait. The browser will start to GC soon after, and it completely locks up and is unresponsive for several minutes until GC finally concludes.
Comment 30•14 years ago
|
||
(In reply to comment #29)
> Gregor, did you see the minutes-long GC that happens after the demo is done?
> Any idea why that one is so bad?
>
> STR: Run the demo to completion. When it's all finished, close the tab and
> wait. The browser will start to GC soon after, and it completely locks up
> and is unresponsive for several minutes until GC finally concludes.
Yes once I close the animation I get a 800ms pause time caused by JSGC_FINALIZE_END callback. I am not an expert about this bottleneck. Maybe Andrew knows more.
Comment 31•14 years ago
|
||
From reading the comments, it looks like that is where a wrapper for a JS object held by XPCOM will be released (and other JS bric-a-brac XPCOM cares about). I guess if it sets up a huge hierarchy of things that hook into XPCOM then when you tear it down there's a lot to collect.
Comment 32•14 years ago
|
||
Could that be the WebGL context, with all its associated WebGL objects such as textures? If you look at the terminal during that pause, on a debug build, do you see stuff about WebGL?
Comment 33•14 years ago
|
||
I don't have a debug build handy now, but I definitely saw a lot of WebGL debug output before if I CC'd after preload (which isn't the same, but maybe close enough). So it makes a lot of sense I think that lots of WebGL objects are a part of that huge CC at the end.
Comment 34•14 years ago
|
||
OK. Note that unreferenced WebGL objects being GC'd all together surprisingly late is bug 617453 but that's 100 miles above my understanding of things.
Upon WebGL context destructoin, I wonder if just "leaking" WebGL texture/buffers, instead of destroying them one by one, would make that faster. They would be destroyed anyway when the context gets destroyed.
Comment 35•14 years ago
|
||
In bug 638549 I've made patches that add WebGL accounting in about:support.
Try builds are ready at:
https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-c2df6455c534/
Can you please try this demo in these builds, and go to about:memory and check the webgl entries there, at the bottom under the 'Other Measurements' category. They only appear if webgl objects are in memory. If you don't see webgl there, that means that no webgl objects are currently in memory.
Comment 36•14 years ago
|
||
Nice, the WebGL info in about:memory looks like it will be very useful!
Here is some data:
Preload
2,070.06 MB -- vsize
1,284.00 MB -- heap-committed
1,240.23 MB -- resident
1,212.10 MB -- heap-used
642.00 MB -- js-gc-heap
96.10 MB -- webgl-buffer-memory
71.90 MB -- heap-unused
52.22 MB -- webgl-texture-memory
2.30 MB -- heap-dirty
0.60 MB -- canvas-2d-pixel-bytes
0.04 MB -- gfx-surface-image
764,691 -- soft-page-faults
6,602 -- webgl-buffer-count
35 -- webgl-texture-count
27 -- hard-page-faults
4 -- webgl-context-count
During
2,082.72 MB -- vsize
1,401.00 MB -- heap-committed
1,356.87 MB -- resident
1,328.33 MB -- heap-used
755.00 MB -- js-gc-heap
97.27 MB -- webgl-buffer-memory
72.67 MB -- heap-unused
68.81 MB -- webgl-texture-memory
2.64 MB -- heap-dirty
0.60 MB -- canvas-2d-pixel-bytes
0.04 MB -- gfx-surface-image
1,833,404 -- soft-page-faults
6,890 -- webgl-buffer-count
55 -- webgl-texture-count
40 -- hard-page-faults
4 -- webgl-context-count
After demo, before closing tab
2,066.62 MB -- vsize
1,441.00 MB -- heap-committed
1,387.55 MB -- resident
1,365.83 MB -- heap-used
780.00 MB -- js-gc-heap
99.49 MB -- webgl-buffer-memory
76.77 MB -- webgl-texture-memory
75.17 MB -- heap-unused
3.16 MB -- heap-dirty
0.60 MB -- canvas-2d-pixel-bytes
0.04 MB -- gfx-surface-image
2,105,187 -- soft-page-faults
7,113 -- webgl-buffer-count
65 -- webgl-texture-count
40 -- hard-page-faults
4 -- webgl-context-count
After closing tab and massive multi-minute GC
619.40 MB -- vsize
297.00 MB -- heap-committed
248.78 MB -- heap-unused
127.62 MB -- resident
48.21 MB -- heap-used
21.00 MB -- js-gc-heap
2.46 MB -- heap-dirty
0.04 MB -- gfx-surface-image
0.00 MB -- canvas-2d-pixel-bytes
2,220,593 -- soft-page-faults
40 -- hard-page-faults
Looks like there is WebGL stuff in memory that is cleaned up by the final GC.
Comment 37•14 years ago
|
||
Thanks. I don't really see something abnormal here. WebGL textures and buffers using 175M of memory in a very large demo like this sounds reasonable and doesn't explain the 2G memory usage so far. I'll update this bug as WebGL accounting in about:memory becomes more complete.
Comment 38•14 years ago
|
||
> 764,691 -- soft-page-faults
> 6,602 -- webgl-buffer-count
> 35 -- webgl-texture-count
> 27 -- hard-page-faults
> 4 -- webgl-context-count
This is unrelated to this bug... sorting the "counts" events by size doesn't seem useful, alphabetical order would be better. In which case, hard-page-faults and soft-page-faults would be better renamed as page-faults-hard and page-faults-soft.
Comment 39•14 years ago
|
||
Filed bug 668893 on the ordering.
Comment 40•14 years ago
|
||
New tryserver builds, with expanded WebGL about:memory coverage, will soon be available at:
https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-087d304b9757
Reporter | ||
Comment 41•14 years ago
|
||
It seems GC is working, it cleans everything after the page is closed. http://pastebin.com/wDTgN8SX Although, the memory usage is still horrible during the demo.
Comment 42•13 years ago
|
||
Tried again today, it's using 2G of heap out of which only 8M fall in WebGL heap categories. It's using 150M of (non-heap) GL memory for WebGL objects.
May I suggest that the next thing that needs to get instrumented with about:memory reporters, is the media stuff. There's a ton of video playing in this demo.
Comment 43•13 years ago
|
||
Filed bug 674343 about the fact that we spend 10% of our time in the GC during this demo.
Comment 44•13 years ago
|
||
You could try seeing if the builds in this directory have lower heap usage in the ro.me demo:
https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/cpearce@mozilla.com-2e106a76a194/
These builds change the amount of memory reserved for the video decoder's thread stacks to be smaller, in particular drastically smaller on Linux.
Comment 45•13 years ago
|
||
> May I suggest that the next thing that needs to get instrumented with
> about:memory reporters, is the media stuff. There's a ton of video playing
> in this demo.
That's bug 672755.
Comment 46•13 years ago
|
||
I ran this demo just now, with a recentish but (I think) pre-objShrink build. I looked at memory usage at a few random points. The heap-unclassified story at least looks a lot better than in comment 13:
opening video part: 1.2gigs total, 10% heap unclassified, 133mb
city part: 1.9gigs total, 9% heap unclassified (167mb)
desert part: 1.5gigs total, 10.36% heap unclassified (152mb)
Comment 47•13 years ago
|
||
I guess I should have copied the part at the bottom, too, but I forgot. 83% of memory usage of JS, so it would be worth seeing what it looks like post-ObjShrink.
Comment 48•13 years ago
|
||
> I guess I should have copied the part at the bottom, too, but I forgot. 83%
> of memory usage of JS, so it would be worth seeing what it looks like
> post-ObjShrink.
Definitely -- you've got 985MB of non-function objects in the ro.me compartment, so that should more than halve (the exact change depends on whether this is a 32-bit or 64-bit build).
Comment 49•13 years ago
|
||
On nightly, in the desert part, it was about 1.2gigs of memory, with 12% (141mb) heap-unclassified. Memory went down to 80mb after I closed it down.
GC pause times were about 280 to 350ms. They were almost all global GCs, except a few compartmental ones near the start.
Comment 50•13 years ago
|
||
Comment 51•13 years ago
|
||
azakai, do you think we've done enough to close this bug? If the GC/CC pauses are still too big, seems like that should be spun of as a separate bug.
Comment 52•13 years ago
|
||
Note that important changes about WebGL objects referencing each other landed earlier this week, so I recommend using a nightly build no older than 2 days old.
Comment 53•13 years ago
|
||
I updated the Nightly I used before I ran the demo, so it should be pretty new.
Comment 54•13 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #51)
> azakai, do you think we've done enough to close this bug? If the GC/CC
> pauses are still too big, seems like that should be spun of as a separate
> bug.
I don't think we should close this. We still take 2-3X more memory than Chrome. Some numbers from latest nightly:
1.1GB to preload. Chrome takes 614MB.
1.6GB maximum during the city scene, and huge lag/nonresponsiveness. Chrome takes 634MB there and is smooth.
1.3GB in the desert scene. Chrome takes 672MB.
On the plus side, all memory is reclaimed once the tab is closed, after 10-15 seconds. However, during that there is a huge multisecond GC where we reach 1.6GB.
Updated•13 years ago
|
Whiteboard: [MemShrink:P2] → [MemShrink:P2][see comment 54]
Reporter | ||
Comment 55•13 years ago
|
||
I repeated the test for this bug. azakai's numbers are still valid, Fx uses much more memory than Chrome.
Comment 56•12 years ago
|
||
I can confirm those numbers for Firefox Aurora 16.0a2 (2012-08-14) / Windows 7 64 bit / 8GB RAM / NVIDIA 8800GT / Intel Core 2 Quad Q6600 2.40 GHz. Firefox is also choppy with pauses in the city - Chrome runs smoothly. GC pauses?
Comment 57•12 years ago
|
||
Mac stats :) On a Sandybridge Macbook Pro 13" OSX 10.7.4 4GB RAM Firefox Aurora 16.0a2 (2012-08-14)the numbers are:
1.18GB to preload. Chrome takes 880MB (3 processes)
1.72GB maximum during the city scene. Chrome takes 934MB
1.5GB in the desert scene. Chrome takes 1007MB
Firefox does reclaim the memory when the tab closes, within 10 - 15 seconds, without too much pause.
Comment 58•12 years ago
|
||
The GC output after the window was closed after comment 57 was:
CC(T+560.6) duration: 7ms, suspected: 313, visited: 1218 RCed and 4535 GCed, collected: 290 RCed and 0 GCed (290 waiting for GC)
ForgetSkippable 2 times before CC, min: 0 ms, max: 0 ms, avg: 0 ms, total: 1 ms, removed: 1097
GC(T+563.1) Total Time: 354.6ms, Compartments Collected: 161, Total Compartments: 161, MMU (20ms): 0%, MMU (50ms): 0%, Max Pause: 316.6ms, Allocated: 685MB, +Chunks: 0, -Chunks: 0
Slice: 0, Pause: 37.5 (When: 0.0ms, Reason: PAGE_HIDE): Mark: 31.6ms, Mark Discard Code: 5.7ms, Mark Roots: 21.4ms
Slice: 2, Pause: 316.6 (When: 239.2ms, Reason: INTER_SLICE_GC): Mark: 303.6ms, Mark Gray: 303.6ms, Sweep: 11.2ms, Sweep Atoms: 1.8ms, Sweep Compartments: 5.6ms, Sweep Tables: 1.5ms, Sweep Object: 0.8ms, Sweep Script: 0.2ms, Sweep Shape: 1.2ms, Sweep Discard Code: 0.2ms, Discard Analysis: 3.8ms, Discard TI: 0.1ms, Free TI Arena: 3.2ms, Sweep Types: 0.3ms, Clear Script Analysis: 0.1ms, Finalize End Callback: 1.5ms, End Callback: 1.5ms
Totals: Mark: 335.6ms, Mark Discard Code: 5.7ms, Mark Roots: 21.4ms, Mark Gray: 303.6ms, Sweep: 11.2ms, Sweep Atoms: 1.8ms, Sweep Compartments: 5.6ms, Sweep Tables: 1.5ms, Sweep Object: 0.8ms, Sweep Script: 0.2ms, Sweep Shape: 1.2ms, Sweep Discard Code: 0.2ms, Discard Analysis: 3.8ms, Discard TI: 0.1ms, Free TI Arena: 3.2ms, Sweep Types: 0.3ms, Clear Script Analysis: 0.1ms, Finalize End Callback: 1.5ms, End Callback: 1.5ms
CC(T+569.3) duration: 224ms, suspected: 334, visited: 10578 RCed and 3896 merged GCed, collected: 9212 RCed and 1008 GCed (10220 waiting for GC)
ForgetSkippable 2 times before CC, min: 1 ms, max: 2 ms, avg: 1 ms, total: 3 ms, removed: 418
GC(T+573.9) Total Time: 372.7ms, Compartments Collected: 160, Total Compartments: 160, MMU (20ms): 0%, MMU (50ms): 0%, Max Pause: 338.8ms, Allocated: 684MB, +Chunks: 0, -Chunks: 1
Slice: 0, Pause: 33.4 (When: 0.0ms, Reason: CC_WAITING): Mark: 32.8ms, Mark Discard Code: 0.4ms, Mark Roots: 22.7ms
Slice: 2, Pause: 338.8 (When: 234.1ms, Reason: INTER_SLICE_GC): Mark: 2.6ms, Mark Gray: 2.5ms, Sweep: 58.5ms, Sweep Atoms: 2.7ms, Sweep Compartments: 2.7ms, Sweep Tables: 1.8ms, Sweep Object: 48.1ms, Sweep Script: 1.8ms, Sweep Shape: 2.3ms, Sweep Discard Code: 0.3ms, Discard Analysis: 0.6ms, Discard TI: 0.2ms, Sweep Types: 0.2ms, Clear Script Analysis: 0.2ms, Finalize End Callback: 0.6ms, End Callback: 277.3ms
Totals: Mark: 35.8ms, Mark Discard Code: 0.4ms, Mark Roots: 22.7ms, Mark Gray: 2.5ms, Sweep: 58.5ms, Sweep Atoms: 2.7ms, Sweep Compartments: 2.7ms, Sweep Tables: 1.8ms, Sweep Object: 48.1ms, Sweep Script: 1.8ms, Sweep Shape: 2.3ms, Sweep Discard Code: 0.3ms, Discard Analysis: 0.6ms, Discard TI: 0.2ms, Sweep Types: 0.2ms, Clear Script Analysis: 0.2ms, Finalize End Callback: 0.6ms, End Callback: 277.3ms
CC(T+574.0) duration: 7ms, suspected: 264, visited: 1304 RCed and 2888 GCed, collected: 0 RCed and 0 GCed (0 waiting for GC)
ForgetSkippable 3 times before CC, min: 0 ms, max: 1 ms, avg: 0 ms, total: 2 ms, removed: 108
Comment 59•12 years ago
|
||
During the city scene in the run from comment 57 the GC debug follows. There's a couple of non incremental GC, and the max pauses are still around 100ms. The experience is still slower and more stuttery than Chrome.
GC(T+159.1) Total Time: 481.0ms, Compartments Collected: 1, Total Compartments: 151, MMU (20ms): 0%, MMU (50ms): 0%, Max Pause: 126.3ms, Nonincremental Reason: allocation trigger, Allocated: 828MB, +Chunks: 231, -Chunks: 0
Slice: 0, Pause: 39.8 (When: 0.0ms, Reason: MAYBEGC): Mark: 38.4ms, Mark Discard Code: 1.3ms, Mark Roots: 28.4ms, Mark Types: 0.6ms
Slice: 32, Pause: 126.3 (When: 2521.6ms, Reason: ALLOC_TRIGGER): Mark: 90.4ms, Mark Delayed: 24.2ms, Mark Gray: 0.4ms, Sweep: 34.6ms, Sweep Atoms: 1.5ms, Sweep Compartments: 1.1ms, Sweep Tables: 0.2ms, Sweep Object: 30.6ms, Sweep Shape: 0.2ms, Sweep Discard Code: 0.9ms, Finalize End Callback: 1.1ms, End Callback: 0.3ms
Totals: Mark: 440.2ms, Mark Discard Code: 1.3ms, Mark Roots: 28.4ms, Mark Types: 0.6ms, Mark Delayed: 24.2ms, Mark Gray: 0.4ms, Sweep: 34.6ms, Sweep Atoms: 1.5ms, Sweep Compartments: 1.1ms, Sweep Tables: 0.2ms, Sweep Object: 30.6ms, Sweep Shape: 0.2ms, Sweep Discard Code: 0.9ms, Finalize End Callback: 1.1ms, End Callback: 0.3ms
CC(T+160.7) duration: 8ms, suspected: 51, visited: 573 RCed and 3463 GCed, collected: 0 RCed and 0 GCed (0 waiting for GC)
ForgetSkippable 4 times before CC, min: 0 ms, max: 1 ms, avg: 0 ms, total: 2 ms, removed: 460
GC(T+165.6) Total Time: 521.6ms, Compartments Collected: 1, Total Compartments: 151, MMU (20ms): 0%, MMU (50ms): 0%, Max Pause: 112.9ms, Nonincremental Reason: allocation trigger, Allocated: 946MB, +Chunks: 135, -Chunks: 0
Slice: 0, Pause: 40.5 (When: 0.0ms, Reason: MAYBEGC): Mark: 38.8ms, Mark Discard Code: 1.5ms, Mark Roots: 28.8ms, Mark Types: 0.6ms
Slice: 36, Pause: 12.5 (When: 2869.7ms, Reason: REFRESH_FRAME): Mark: 12.4ms, Mark Delayed: 7.4ms
Slice: 37, Pause: 112.9 (When: 2916.6ms, Reason: ALLOC_TRIGGER): Mark: 58.9ms, Mark Delayed: 18.1ms, Mark Gray: 0.3ms, Sweep: 53.2ms, Sweep Atoms: 1.5ms, Sweep Compartments: 1.5ms, Sweep Tables: 0.3ms, Sweep Object: 48.5ms, Sweep Shape: 0.2ms, Sweep Discard Code: 1.1ms, Finalize End Callback: 1.4ms
Totals: Mark: 461.7ms, Mark Discard Code: 1.5ms, Mark Roots: 28.8ms, Mark Types: 0.6ms, Mark Delayed: 26.9ms, Mark Gray: 0.3ms, Sweep: 53.2ms, Sweep Atoms: 1.5ms, Sweep Compartments: 1.5ms, Sweep Tables: 0.3ms, Sweep Object: 48.5ms, Sweep Shape: 0.2ms, Sweep Discard Code: 1.1ms, Finalize End Callback: 1.4ms
GC(T+172.5) Total Time: 534.1ms, Compartments Collected: 1, Total Compartments: 151, MMU (20ms): 0%, MMU (50ms): 0%, Max Pause: 64.4ms, Allocated: 968MB, +Chunks: 16, -Chunks: 0
Slice: 0, Pause: 41.2 (When: 0.0ms, Reason: MAYBEGC): Mark: 39.6ms, Mark Discard Code: 1.5ms, Mark Roots: 29.6ms, Mark Types: 0.6ms
Slice: 36, Pause: 12.1 (When: 2494.4ms, Reason: REFRESH_FRAME): Mark: 12.0ms, Mark Delayed: 6.2ms
Slice: 38, Pause: 13.6 (When: 2625.9ms, Reason: REFRESH_FRAME): Mark: 13.4ms, Mark Delayed: 7.7ms
Slice: 43, Pause: 64.4 (When: 2955.6ms, Reason: REFRESH_FRAME): Mark: 2.1ms, Mark Delayed: 0.5ms, Mark Gray: 0.4ms, Sweep: 61.9ms, Sweep Atoms: 1.7ms, Sweep Compartments: 1.3ms, Sweep Tables: 0.3ms, Sweep Object: 57.4ms, Sweep Shape: 0.2ms, Sweep Discard Code: 1.0ms, Finalize End Callback: 1.2ms
Totals: Mark: 465.3ms, Mark Discard Code: 1.5ms, Mark Roots: 29.6ms, Mark Types: 0.6ms, Mark Delayed: 24.9ms, Mark Gray: 0.4ms, Sweep: 61.9ms, Sweep Atoms: 1.7ms, Sweep Compartments: 1.3ms, Sweep Tables: 0.3ms, Sweep Object: 57.4ms, Sweep Shape: 0.2ms, Sweep Discard Code: 1.0ms, Finalize End Callback: 1.2ms
CC(T+172.8) duration: 6ms, suspected: 367, visited: 884 RCed and 3471 GCed, collected: 0 RCed and 0 GCed (0 waiting for GC)
ForgetSkippable 3 times before CC, min: 0 ms, max: 1 ms, avg: 0 ms, total: 2 ms, removed: 75
GC(T+178.8) Total Time: 475.4ms, Compartments Collected: 1, Total Compartments: 151, MMU (20ms): 0%, MMU (50ms): 0%, Max Pause: 74.0ms, Allocated: 958MB, +Chunks: 0, -Chunks: 0
Slice: 0, Pause: 47.4 (When: 0.0ms, Reason: MAYBEGC): Mark: 45.5ms, Mark Discard Code: 1.7ms, Mark Roots: 35.5ms, Mark Types: 0.6ms
Slice: 36, Pause: 74.0 (When: 1325.5ms, Reason: REFRESH_FRAME): Mark: 1.1ms, Mark Gray: 1.0ms, Finalize Start Callback: 0.1ms, Sweep: 72.8ms, Sweep Atoms: 2.4ms, Sweep Compartments: 2.0ms, Sweep Tables: 0.4ms, Sweep Object: 66.9ms, Sweep Shape: 0.2ms, Sweep Discard Code: 1.6ms, Finalize End Callback: 1.2ms
Totals: Mark: 396.7ms, Mark Discard Code: 1.7ms, Mark Roots: 35.5ms, Mark Types: 0.6ms, Mark Delayed: 0.4ms, Mark Gray: 1.0ms, Finalize Start Callback: 0.1ms, Sweep: 72.8ms, Sweep Atoms: 2.4ms, Sweep Compartments: 2.0ms, Sweep Tables: 0.4ms, Sweep Object: 66.9ms, Sweep Shape: 0.2ms, Sweep Discard Code: 1.6ms, Finalize End Callback: 1.2ms
Comment 60•12 years ago
|
||
Thanks for the information! GC pauses are a slightly separate issue from memory usage, but I guess we can leave it here for now. Some of these problems should be fixed in Nightly.
Comment 58: This is the teardown, and it is clearly dominated by a ton of C++ <-> JS references.
First GC is dominated by a 303ms Mark Gray.
Second GC is dominated by a 277ms End Callback, which should be fixed by bug 743112.
Comment 59:
First GC: Main problem is a 90ms Mark phase, I guess due to falling behind?
Second GC: about 60ms of Mark again, plus 50ms of Sweep Object. The latter should be fixed by bug 729760.
Third GC: Main problem is Sweep Object, which as before should be fixed in Nightly.
Fourth GC: Mark Roots takes 35ms in the first slice. Last slice is Sweep Object again.
Comment 62•12 years ago
|
||
Updated Mac stats for Aurora Firefox 17.0a2 (2012-09-03) :) On a Sandybridge Macbook Pro 13" OSX 10.7.4 4GB RAM the numbers are (from MemChaser now):
1.21GB to preload.
1.61GB maximum during the city scene. That's down about 100MB.
On a clean Firefox start the GC situation (from MemChaser this time) is much nicer, although 58.7ms is still a very long time to add to a frame of graphics:
Resident 1606MB: IGC: 58.7ms (6.0s) CC: 3.5ms (12.1s)
Running with a long-open Firefox was showing some really nasty GC spikes:
Resident 1710MB: IGC: 20291.3ms 99.8s CC: 2969ms (129.8s)
There also seems to be flickering in the rendering, where some frames are black. I'll bug that separately.
Comment 63•12 years ago
|
||
I've reported the flickering: bug 788026
Comment 64•12 years ago
|
||
There is a possibility that bug 795186 might help here. It is worth retrying with tomorrow's nightly build.
Comment 65•12 years ago
|
||
Mac stats :) On a Sandybridge Macbook Pro 13" OSX 10.7.4 4GB RAM Firefox Nightly 19.0a1 (2012-10-25) the numbers are:
1.10GB to preload.
1.67GB maximum during the city scene
This is maybe a bit lower than before. Firefox does reclaim the memory when the tab closes, within 10 - 15 seconds, without too much pause, dropping to 298MB.
The memory stats are reported with Mem Chaser. There's also a really big pause when ro.me is not in the Firefox cache after preload when you hit "Begin". The first time I got a black screen for about 10 seconds, with no audio, then the audio/video kicks in that far into the song. The second time (after clearing the last hour of cache) I got the first frame of video, and playing audio, but then the video didn't update for about 10s. Mem Chaser didn't show long/red GC/CC pauses. This don't happen if you load Ro.me a second time without clearing the cache.
Comment 66•7 years ago
|
||
I remeasured with Firefox nightly 58 on Linux vs Chrome beta 62 using atsy's stats module. On the Firefox side I had DMD enabled.
> from atsy import stats
> f = status.ProcessStats(lambda x: 'obj-x86_64-pc-linux-gnu-clang' in x, lambda x: 'firefox -content' not in x)
> c = stats.ProcessStats(lambda x: 'chrome-beta' in x, lambda x: 'chrome --type' not in x)
Firefox + DMD:
> >>> f.print_stats()
> [24936] - /var/dev/erahm/mozilla-unified/obj-x86_64-pc-linux-gnu-clang/dist/bin/firefox
> * RSS - 382935040
> USS - 250073088
> [25071] - /var/dev/erahm/mozilla-unified/obj-x86_64-pc-linux-gnu-clang/dist/bin/firefox
> RSS - 179822592
> * USS - 73367552
> [25132] - /var/dev/erahm/mozilla-unified/obj-x86_64-pc-linux-gnu-clang/dist/bin/firefox
> RSS - 1799127040
> * USS - 1658810368
>
> Total: 2,115,112,960 bytes
Chrome:
> >>> c.print_stats()
> [26578] - /opt/google/chrome-beta/chrome
> * RSS - 165359616
> USS - 80101376
> [26588] - /opt/google/chrome-beta/chrome
> RSS - 54571008
> * USS - 8699904
> [26589] - /opt/google/chrome-beta/nacl_helper
> * RSS - 7880704
> USS - 2338816
> [26592] - /opt/google/chrome-beta/chrome
> RSS - 12619776
> * USS - 114688
> [26646] - /opt/google/chrome-beta/chrome
> RSS - 414932992
> * USS - 298663936
> [26723] - /opt/google/chrome-beta/chrome
> RSS - 23814144
> * USS - 3039232
> [26923] - /opt/google/chrome-beta/chrome
> RSS - 1087041536
> * USS - 1004920832
>
> Total: 1,488,678,912 bytes
If we subtract the amount of overhead for DMD (458 MiB) and about:memory (10 MiB) that leaves us with:
> Fx - 1,624,379,392 bytes
> Cr - 1,488,678,912 bytes
> ===================
> 135,700,480 bytes
So we're talking a 130 MiB difference on a Chrome-specific experiment, at this point I think we can just close this. If someone feels strongly feel free to reopen but this isn't more than MemShrink:P3.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•