657115 - Extensive memory usage during ROME demo

Reporter

Description

•

14 years ago

User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1 Build Identifier: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0a1) Gecko/20110512 Firefox/6.0a1 The memory usage during the interactive film is terribly huge. It's far worse than Google Chrome 12. Reproducible: Always Steps to Reproduce: 1. Open the webpage 2. Play the demo 3. Check the memory usage

Csaba Kozák [:WonderCsabo]

Reporter

Updated

•

14 years ago

Version: unspecified → Trunk

Erunno

Comment 1

•

14 years ago

Here's the link to the demo: http://www.ro.me/

Csaba Kozák [:WonderCsabo]

Reporter

Comment 2

•

14 years ago

I already provided it in the URL section. ;)

Michał

Comment 3

•

14 years ago

I have also noticed that after I have closed tab with that Demo, Firefox was doing something strange: Memory usage was rising up to 2GB, then went down to 1,5GB and this behavior started to loop with some "rest time"

(mostly gone) XtC4UaLL [:xtc4uall]

Comment 4

•

14 years ago

Related to Bug 656761 and thus Bug 652931?

(mostly gone) XtC4UaLL [:xtc4uall]

Comment 5

•

14 years ago

Ah wait, Bug 652931 landed before your tested Nightly Build ...

Erunno

Comment 6

•

14 years ago

Confirmed for Firefox 4.0.1 on OS X 10.6.7. I ran the demo on my MacBook Pro with 4 GB RAM. I can't give any concrete numbers about the RAM usage though as after some time the whole OS become unresponsive to the point that I had real trouble killing the Firefox process. The demo runs fine with Chrome 11 where it uses about 700-800 RAM according to its internal task manager.

Yakove

Comment 7

•

14 years ago

Confirmed on latest nightly

=Haron=

Comment 8

•

14 years ago

Confirmed on :::: Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110515 Firefox/6.0a1 #1305448436

adminnu

Comment 9

•

14 years ago

Blocking #640452?

therube

Comment 10

•

14 years ago

I have some observations here (NoScript forums): "Nightly 6.0a1 + NS 2.1.0.3: memory leak" http://forums.informaction.com/viewtopic.php?f=10&t=6467 (Note that my methodology was not correct, nevertheless ...) Other "ROME" Bugs: Bug 657118 - Several problems with ROME demo's tech page Bug 657151 - Long GC pause time caused by JSGC_FINALIZE_END callback

Csaba Kozák [:WonderCsabo]

Reporter

Comment 11

•

14 years ago

I tried with fresh profile when tested the memory usage. It's not a NoScript problem.

Benoit Jacob [:bjacob] (mostly away)

Updated

•

14 years ago

Depends on: 638549

Csaba Kozák [:WonderCsabo]

Reporter

Comment 12

•

14 years ago

bjacob, any thoughts?

Alon Zakai (:azakai)

Comment 13

•

14 years ago

We are leaking here. STR is to run the demo til it finishes (run it in a background tab if it crashes due to OOM otherwise), then see that after closing the tab, even GC+GC does not clear the memory. Longer version: After preloading (that is, right before pressing BEGIN for the demo to begin), about:memory shows 650MB in js gc heap and 530MB in heap-unclassified (video stuff?). resident is 1.3GB and vsize is 2GB. During the demo js gc rises to 750GB or higher. vsize rises to over 2.2GB, at which point the demo usually errors on out of memory on my 4GB laptop. If I run the demo in another tab, it uses less memory (less frames since we throttle timers, rendering, etc.). In this case my laptop survives until the end of the demo. *However*, it is left in a very bad state: even after closing the ro.me tab, memory usage only decreases a tiny bit (to 1.4GB resident, 2.2GB virtual). Every few seconds the browser locks up entirely for a few seconds, breaking shows this is in cycle collection. But nothing gets cleaned up, so there is a leak here. It seems likely that we were leaking all through the demo then, which can explain the excessive memory usage during the demo. njn, a side issue that I hope you can clarify: Why is virtual memory usage so much higher than resident here? I suspect that running out of virtual memory is why it OOMs on my 4GB laptop (since there is 1GB of physical memory left, + swap).

Blocks: mlk-fx7

Alon Zakai (:azakai)

Updated

•

14 years ago

Status: UNCONFIRMED → NEW

Ever confirmed: true

Benoit Jacob [:bjacob] (mostly away)

Comment 14

•

14 years ago

Sorry I've still not finished the about:memory support for WebGL patch.. Just a naive question, if we were leaking video memory, could that explain the discrepancy between virtual and resident pointed out in comment 13?

Alon Zakai (:azakai)

Comment 15

•

14 years ago

Attached file output during successful CC after preload — Details

If I just preload, and do not actually start the demo, I mostly OOM during CC (memory rises before it goes down, I guess). However, I succeeded once to run CC to completion, and memory usage went all the way down (note that this doesn't mean there isn't a leak later - this was just preloading, not running the actual demo as before). I got the attached output during the successful CC. 1. "ASSERTION: Wrong element being closed" appears a lot. No idea if important or not. 2. WebGL reports a lot of 'Outstanding Textures' and other items, in a very lengthy output (I shortened it here, [MANY MANY NUMBERS] etc. were very very long lists of numerical indexes).

Alon Zakai (:azakai)

Comment 16

•

14 years ago

Also, 57 (!) threads exit as we do that CC after preload.

Alon Zakai (:azakai)

Comment 17

•

14 years ago

After the demo preloads, we have 50 threads doing media stuff, half nsBuiltinDecoderStateMachine::DecodeLoop and half nsBuiltinDecoderStateMachine::AdvanceFrame (see stacks below). There are also a few audio threads (nsBuiltinDecoderStateMachine::AudioLoop and nsOggReader::DecodeAudioData). All these threads stay active until the page is closed and CC is done. kinetik, is this amount of threads expected? I recall hearing that Linux threads take a lot of virtual memory for their stacks. A little googling gives sites claiming 2MB or 8MB. If 8MB, then these 50 threads can explain 400MB of virtual memory - about half of the difference between virtual and resident memory. Stacks for the two types of threads (each appearing 25 times): ===== #2 0xb538dab1 in PR_WaitCondVar (cvar=0x758ff7c0, timeout=4294967295) at /home/alon/Dev/mozilla-central/nsprpub/pr/src/pthreads/ptsynch.c:417 #3 0xb538dbb8 in PR_Wait (mon=0x74afc5c0, timeout=4294967295) at /home/alon/Dev/mozilla-central/nsprpub/pr/src/pthreads/ptsynch.c:614 #4 0xb72e4765 in mozilla::ReentrantMonitor::Wait (this=0xb3f4ce00, interval=4294967295) at /home/alon/Dev/m-c/xpcom/build/BlockingResourceBase.cpp:346 #5 0xb664f2dc in mozilla::ReentrantMonitorAutoEnter::Wait (this=0x734fa580) at ../../dist/include/mozilla/ReentrantMonitor.h:224 #6 nsBuiltinDecoderStateMachine::DecodeLoop (this=0x734fa580) at /home/alon/Dev/mozilla-central/content/media/nsBuiltinDecoderStateMachine.cpp:380 #7 0xb665490e in nsRunnableMethodImpl<void (nsBuiltinDecoderStateMachine::*)(), true>::Run (this=0xace2d5c0) at ../../dist/include/nsThreadUtils.h:341 #8 0xb7376398 in nsThread::ProcessNextEvent (this=0x75aff700, mayWait=1, result=0x5e6ff23c) at /home/alon/Dev/mozilla-central/xpcom/threads/nsThread.cpp:618 ==== #2 0xb538dab1 in PR_WaitCondVar (cvar=0x758ff7c0, timeout=4294967295) at /home/alon/Dev/mozilla-central/nsprpub/pr/src/pthreads/ptsynch.c:417 #3 0xb538dbb8 in PR_Wait (mon=0x74afc5c0, timeout=4294967295) at /home/alon/Dev/mozilla-central/nsprpub/pr/src/pthreads/ptsynch.c:614 #4 0xb72e4765 in mozilla::ReentrantMonitor::Wait (this=0xb3f4ce00, interval=4294967295) at /home/alon/Dev/m-c/xpcom/build/BlockingResourceBase.cpp:346 #5 0xb664f9d8 in nsBuiltinDecoderStateMachine::AdvanceFrame (this=0x734fa580) at /home/alon/Dev/mozilla-central/content/media/nsBuiltinDecoderStateMachine.cpp:1544 #6 0xb6653215 in nsBuiltinDecoderStateMachine::Run (this=0x734fa580) at /home/alon/Dev/mozilla-central/content/media/nsBuiltinDecoderStateMachine.cpp:1141 #7 0xb7376398 in nsThread::ProcessNextEvent (this=0x75afe880, mayWait=1, result=0x5f0ff23c) at /home/alon/Dev/mozilla-central/xpcom/threads/nsThread.cpp:618 ==== Side comment: Using gdb and breaking on NS_NewThread and seeing each thread as it is created will consistently hang X for me. Much fun.

Matthew Gregan [:kinetik]

Comment 18

•

14 years ago

(In reply to comment #17) > kinetik, is this amount of threads expected? For each playing element there will be at least a decode thread (DecodeLoop), a state machine thread (AdvanceFrame), and an audio thread (AudioLoop). Behind the scenes, there's also another audio thread created by ALSA on Linux for every stream we open. OS X and Windows tend to have only a single audio thread for all open streams. > I recall hearing that Linux threads take a lot of virtual memory for their > stacks. A little googling gives sites claiming 2MB or 8MB. If 8MB, then > these 50 threads can explain 400MB of virtual memory - about half of the > difference between virtual and resident memory. The default stack size is 8MB: ~% ulimit -Sa -s: stack size (kbytes) 8192 Last I checked (mainly looking for calls to pthread_attr_setstacksize), I couldn't see anywhere that we change that at runtime. > Side comment: Using gdb and breaking on NS_NewThread and seeing each thread > as it is created will consistently hang X for me. Much fun. Export GDK_DEBUG=nograbs in your environment before running the browser.

Benoit Jacob [:bjacob] (mostly away)

Comment 19

•

14 years ago

(In reply to comment #17) > After the demo preloads, we have 50 threads doing media stuff That sounds reminiscent of bug 592833. Mark as blocking this one?

Nicholas Nethercote [inactive]

Comment 20

•

14 years ago

azakai, it's entirely typical for the vsize to be a lot bigger than the resident; take a look at about:memory on some more typical workloads. So don't focus on that, vsize isn't a very useful metric. This is a good bug report, though, sounds like a big problem.

Benoit Jacob [:bjacob] (mostly away)

Updated

•

14 years ago

Depends on: 592833

Alon Zakai (:azakai)

Comment 21

•

14 years ago

> > Side comment: Using gdb and breaking on NS_NewThread and seeing each thread > > as it is created will consistently hang X for me. Much fun. > > Export GDK_DEBUG=nograbs in your environment before running the browser. Thanks, I did not know that!

Alon Zakai (:azakai)

Comment 22

•

14 years ago

This demo suffers from bug 656120 as well. Closing the tab is not always enough for memory to be reclaimed, but forcing a GC is (except when the leak mentioned earlier in this bug happens).

Depends on: 656120

Alon Zakai (:azakai)

Comment 23

•

14 years ago

The leak I saw, where GC takes a very long time and doesn't succeed, is apparently due to the cycle collector itself OOMing, which prevents it from working properly, WARNING: out of memory: file /home/alon/Dev/m-c/xpcom/build/nsDeque.cpp, line 210 #0 nsDeque::Push (this=0xb2dfe038, aItem=0x3bdac0f4) at /home/alon/Dev/m-c/xpcom/build/nsDeque.cpp:211 #1 0xb739d176 in GraphWalker<scanVisitor>::DoWalk (this=0xb2dfe06c, aQueue=...) at /home/alon/Dev/mozilla-central/xpcom/base/nsCycleCollector.cpp:1271 #2 0xb73993ec in GraphWalker<scanVisitor>::WalkFromRoots (this=0xb3f85800) at /home/alon/Dev/mozilla-central/xpcom/base/nsCycleCollector.cpp:1254 #3 nsCycleCollector::ScanRoots (this=0xb3f85800) at /home/alon/Dev/mozilla-central/xpcom/base/nsCycleCollector.cpp:1850 #4 0xb739b584 in nsCycleCollector::BeginCollection (this=0xb3f85800, aForceGC=0, aListener=0x0) at /home/alon/Dev/mozilla-central/xpcom/base/nsCycleCollector.cpp:2651 #5 0xb739dd6a in nsCycleCollectorRunner::Run (this=0xb3fe0d80) at /home/alon/Dev/mozilla-central/xpcom/base/nsCycleCollector.cpp:3327 #6 0xb7376398 in nsThread::ProcessNextEvent (this=0xb3f06e00, mayWait=1, result=0xb2dfe23c) at /home/alon/Dev/mozilla-central/xpcom/threads/nsThread.cpp:618 #7 0xb72e2c8f in NS_ProcessNextEvent_P (thread=0x0, mayWait=1) at /home/alon/Dev/m-c/xpcom/build/nsThreadUtils.cpp:245 #8 0xb7376e50 in nsThread::ThreadFunc (arg=0xb3f06e00) at /home/alon/Dev/mozilla-central/xpcom/threads/nsThread.cpp:273 #9 0xb539567c in _pt_root (arg=0xb3f117f0) at /home/alon/Dev/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:187 #10 0xb7fb996e in start_thread (arg=0xb2dfeb70) at pthread_create.c:300 #11 0xb519aa4e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130 Perhaps we should abort when OOMing in the cycle collector? It might work out in other cases though, so I guess not. So in summary, the main issues in this bug are 1. Many threads for media elements (bug 592833). A contributing factor to memory issues. 2. GC does not always run, memory which could be reclaimed is not reclaimed (bug 656120). 3. In general, higher JS memory usage than chrome which hits us in some cases, and I suspect here.

Nicholas Nethercote [inactive]

Updated

•

14 years ago

Whiteboard: [MemShrink:P1]

Matthew Gregan [:kinetik]

Comment 24

•

14 years ago

Oh, another thing worth noting about vsize is that you'll see 64MB of address space allocated per open audio stream due to bug 617852.

Jeff Muizelaar [:jrmuizel]

Updated

•

14 years ago

Assignee: nobody → jmuizelaar

Jeff Muizelaar [:jrmuizel]

Updated

•

14 years ago

Assignee: jmuizelaar → nobody

Mounir Lamouri (:mounir)

Updated

•

14 years ago

OS: Windows 7 → All

Hardware: x86 → All

Nicholas Nethercote [inactive]

Updated

•

14 years ago

Whiteboard: [MemShrink:P1] → [MemShrink:P2]

Nicholas Nethercote [inactive]

Comment 25

•

14 years ago

azakai, now that bug 656120 is fixed can you try the demo again and see if things have improved?

Alon Zakai (:azakai)

Comment 26

•

14 years ago

Overall memory use seems the same, perhaps slightly better - I max out at 2.0 GB instead of 2.2. It still OOMs sometimes. There are more GC pauses which I guess is to be expected. They are now very noticeable in some parts of the demo. The leak at the end seems better or maybe I was just lucky. But closing the tab led to a several-minute-long GC which eventually did work, and memory went almost all the way back down (to 125MB resident, I guess higher than originally due to fragmentation etc.). So there seems like there is a change (less memory, more GC pauses), but GC not being done was just one of the issues here.

Gregor Wagner [:gwagner]

Comment 27

•

14 years ago

Yeah the problems during this demo can't be solved with an "idle" GC :)

Gregor Wagner [:gwagner]

Comment 28

•

14 years ago

I looked a little bit deeper into the GC performance here. For the 18 GCs we perform during this demo, 3 are coming from the new GC trigger. As you mentioned, the working set for this demo is huge. About 370MB remains reachable after each GC. 95% of our pause time we spend in marking. An average GC takes about 450 - 500ms which is definitely too long. Our max pause time is 600ms. So until we have incremental and generational GC in place, the parallel marking patch should be a nice improvement here. It's a chrome experiment and with their current canary version (where they fixed the typed array bug they had for bad GC performance in FOTN) I also see 2 GCs that take about 1.5sec and one GC that takes 700ms almost at the end of the demo. But they keep their heap much smaller and perform many short GCs.

Alon Zakai (:azakai)

Comment 29

•

14 years ago

Gregor, did you see the minutes-long GC that happens after the demo is done? Any idea why that one is so bad? STR: Run the demo to completion. When it's all finished, close the tab and wait. The browser will start to GC soon after, and it completely locks up and is unresponsive for several minutes until GC finally concludes.

Gregor Wagner [:gwagner]

Comment 30

•

14 years ago

(In reply to comment #29) > Gregor, did you see the minutes-long GC that happens after the demo is done? > Any idea why that one is so bad? > > STR: Run the demo to completion. When it's all finished, close the tab and > wait. The browser will start to GC soon after, and it completely locks up > and is unresponsive for several minutes until GC finally concludes. Yes once I close the animation I get a 800ms pause time caused by JSGC_FINALIZE_END callback. I am not an expert about this bottleneck. Maybe Andrew knows more.

Andrew McCreight (out of office until 8/21) [:mccr8]

Comment 31

•

14 years ago

From reading the comments, it looks like that is where a wrapper for a JS object held by XPCOM will be released (and other JS bric-a-brac XPCOM cares about). I guess if it sets up a huge hierarchy of things that hook into XPCOM then when you tear it down there's a lot to collect.

Benoit Jacob [:bjacob] (mostly away)

Comment 32

•

14 years ago

Could that be the WebGL context, with all its associated WebGL objects such as textures? If you look at the terminal during that pause, on a debug build, do you see stuff about WebGL?

Alon Zakai (:azakai)

Comment 33

•

14 years ago

I don't have a debug build handy now, but I definitely saw a lot of WebGL debug output before if I CC'd after preload (which isn't the same, but maybe close enough). So it makes a lot of sense I think that lots of WebGL objects are a part of that huge CC at the end.

Benoit Jacob [:bjacob] (mostly away)

Comment 34

•

14 years ago

OK. Note that unreferenced WebGL objects being GC'd all together surprisingly late is bug 617453 but that's 100 miles above my understanding of things. Upon WebGL context destructoin, I wonder if just "leaking" WebGL texture/buffers, instead of destroying them one by one, would make that faster. They would be destroyed anyway when the context gets destroyed.

Benoit Jacob [:bjacob] (mostly away)

Comment 35

•

14 years ago

In bug 638549 I've made patches that add WebGL accounting in about:support. Try builds are ready at: https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-c2df6455c534/ Can you please try this demo in these builds, and go to about:memory and check the webgl entries there, at the bottom under the 'Other Measurements' category. They only appear if webgl objects are in memory. If you don't see webgl there, that means that no webgl objects are currently in memory.

Alon Zakai (:azakai)

Comment 36

•

14 years ago

Nice, the WebGL info in about:memory looks like it will be very useful! Here is some data: Preload 2,070.06 MB -- vsize 1,284.00 MB -- heap-committed 1,240.23 MB -- resident 1,212.10 MB -- heap-used 642.00 MB -- js-gc-heap 96.10 MB -- webgl-buffer-memory 71.90 MB -- heap-unused 52.22 MB -- webgl-texture-memory 2.30 MB -- heap-dirty 0.60 MB -- canvas-2d-pixel-bytes 0.04 MB -- gfx-surface-image 764,691 -- soft-page-faults 6,602 -- webgl-buffer-count 35 -- webgl-texture-count 27 -- hard-page-faults 4 -- webgl-context-count During 2,082.72 MB -- vsize 1,401.00 MB -- heap-committed 1,356.87 MB -- resident 1,328.33 MB -- heap-used 755.00 MB -- js-gc-heap 97.27 MB -- webgl-buffer-memory 72.67 MB -- heap-unused 68.81 MB -- webgl-texture-memory 2.64 MB -- heap-dirty 0.60 MB -- canvas-2d-pixel-bytes 0.04 MB -- gfx-surface-image 1,833,404 -- soft-page-faults 6,890 -- webgl-buffer-count 55 -- webgl-texture-count 40 -- hard-page-faults 4 -- webgl-context-count After demo, before closing tab 2,066.62 MB -- vsize 1,441.00 MB -- heap-committed 1,387.55 MB -- resident 1,365.83 MB -- heap-used 780.00 MB -- js-gc-heap 99.49 MB -- webgl-buffer-memory 76.77 MB -- webgl-texture-memory 75.17 MB -- heap-unused 3.16 MB -- heap-dirty 0.60 MB -- canvas-2d-pixel-bytes 0.04 MB -- gfx-surface-image 2,105,187 -- soft-page-faults 7,113 -- webgl-buffer-count 65 -- webgl-texture-count 40 -- hard-page-faults 4 -- webgl-context-count After closing tab and massive multi-minute GC 619.40 MB -- vsize 297.00 MB -- heap-committed 248.78 MB -- heap-unused 127.62 MB -- resident 48.21 MB -- heap-used 21.00 MB -- js-gc-heap 2.46 MB -- heap-dirty 0.04 MB -- gfx-surface-image 0.00 MB -- canvas-2d-pixel-bytes 2,220,593 -- soft-page-faults 40 -- hard-page-faults Looks like there is WebGL stuff in memory that is cleaned up by the final GC.

Benoit Jacob [:bjacob] (mostly away)

Comment 37

•

14 years ago

Thanks. I don't really see something abnormal here. WebGL textures and buffers using 175M of memory in a very large demo like this sounds reasonable and doesn't explain the 2G memory usage so far. I'll update this bug as WebGL accounting in about:memory becomes more complete.

Nicholas Nethercote [inactive]

Comment 38

•

14 years ago

> 764,691 -- soft-page-faults > 6,602 -- webgl-buffer-count > 35 -- webgl-texture-count > 27 -- hard-page-faults > 4 -- webgl-context-count This is unrelated to this bug... sorting the "counts" events by size doesn't seem useful, alphabetical order would be better. In which case, hard-page-faults and soft-page-faults would be better renamed as page-faults-hard and page-faults-soft.

Justin Lebar (not reading bugmail)

Comment 39

•

14 years ago

Filed bug 668893 on the ordering.

Benoit Jacob [:bjacob] (mostly away)

Comment 40

•

14 years ago

New tryserver builds, with expanded WebGL about:memory coverage, will soon be available at: https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-087d304b9757

Csaba Kozák [:WonderCsabo]

Reporter

Comment 41

•

14 years ago

It seems GC is working, it cleans everything after the page is closed. http://pastebin.com/wDTgN8SX Although, the memory usage is still horrible during the demo.

Benoit Jacob [:bjacob] (mostly away)

Comment 42

•

14 years ago

Attached file about:memory while running ROME — Details

Tried again today, it's using 2G of heap out of which only 8M fall in WebGL heap categories. It's using 150M of (non-heap) GL memory for WebGL objects. May I suggest that the next thing that needs to get instrumented with about:memory reporters, is the media stuff. There's a ton of video playing in this demo.

Andrew McCreight (out of office until 8/21) [:mccr8]

Updated

•

14 years ago

Depends on: 672755

Benoit Jacob [:bjacob] (mostly away)

Comment 43

•

14 years ago

Filed bug 674343 about the fact that we spend 10% of our time in the GC during this demo.

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 44

•

14 years ago

You could try seeing if the builds in this directory have lower heap usage in the ro.me demo: https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/cpearce@mozilla.com-2e106a76a194/ These builds change the amount of memory reserved for the video decoder's thread stacks to be smaller, in particular drastically smaller on Linux.

Nicholas Nethercote [inactive]

Comment 45

•

14 years ago

> May I suggest that the next thing that needs to get instrumented with > about:memory reporters, is the media stuff. There's a ton of video playing > in this demo. That's bug 672755.

Andrew McCreight (out of office until 8/21) [:mccr8]

Comment 46

•

14 years ago

I ran this demo just now, with a recentish but (I think) pre-objShrink build. I looked at memory usage at a few random points. The heap-unclassified story at least looks a lot better than in comment 13: opening video part: 1.2gigs total, 10% heap unclassified, 133mb city part: 1.9gigs total, 9% heap unclassified (167mb) desert part: 1.5gigs total, 10.36% heap unclassified (152mb)

Andrew McCreight (out of office until 8/21) [:mccr8]

Comment 47

•

14 years ago

Attached file about:memory while running ROME — Details

I guess I should have copied the part at the bottom, too, but I forgot. 83% of memory usage of JS, so it would be worth seeing what it looks like post-ObjShrink.

Nicholas Nethercote [inactive]

Comment 48

•

14 years ago

> I guess I should have copied the part at the bottom, too, but I forgot. 83% > of memory usage of JS, so it would be worth seeing what it looks like > post-ObjShrink. Definitely -- you've got 985MB of non-function objects in the ro.me compartment, so that should more than halve (the exact change depends on whether this is a 32-bit or 64-bit build).

Andrew McCreight (out of office until 8/21) [:mccr8]

Comment 49

•

14 years ago

On nightly, in the desert part, it was about 1.2gigs of memory, with 12% (141mb) heap-unclassified. Memory went down to 80mb after I closed it down. GC pause times were about 280 to 350ms. They were almost all global GCs, except a few compartmental ones near the start.

Andrew McCreight (out of office until 8/21) [:mccr8]

Comment 50

•

14 years ago

Attached file about memory at various points post-ObjShrink — Details

Nicholas Nethercote [inactive]

Comment 51

•

14 years ago

azakai, do you think we've done enough to close this bug? If the GC/CC pauses are still too big, seems like that should be spun of as a separate bug.

Benoit Jacob [:bjacob] (mostly away)

Comment 52

•

14 years ago

Note that important changes about WebGL objects referencing each other landed earlier this week, so I recommend using a nightly build no older than 2 days old.

Andrew McCreight (out of office until 8/21) [:mccr8]

Comment 53

•

14 years ago

I updated the Nightly I used before I ran the demo, so it should be pretty new.

Alon Zakai (:azakai)

Comment 54

•

14 years ago

(In reply to Nicholas Nethercote [:njn] from comment #51) > azakai, do you think we've done enough to close this bug? If the GC/CC > pauses are still too big, seems like that should be spun of as a separate > bug. I don't think we should close this. We still take 2-3X more memory than Chrome. Some numbers from latest nightly: 1.1GB to preload. Chrome takes 614MB. 1.6GB maximum during the city scene, and huge lag/nonresponsiveness. Chrome takes 634MB there and is smooth. 1.3GB in the desert scene. Chrome takes 672MB. On the plus side, all memory is reclaimed once the tab is closed, after 10-15 seconds. However, during that there is a huge multisecond GC where we reach 1.6GB.

Nicholas Nethercote [inactive]

Updated

•

14 years ago

Whiteboard: [MemShrink:P2] → [MemShrink:P2][see comment 54]

Csaba Kozák [:WonderCsabo]

Reporter

Comment 55

•

13 years ago

I repeated the test for this bug. azakai's numbers are still valid, Fx uses much more memory than Chrome.

Julian Adams

Comment 56

•

13 years ago

I can confirm those numbers for Firefox Aurora 16.0a2 (2012-08-14) / Windows 7 64 bit / 8GB RAM / NVIDIA 8800GT / Intel Core 2 Quad Q6600 2.40 GHz. Firefox is also choppy with pauses in the city - Chrome runs smoothly. GC pauses?

Julian Adams

Comment 57

•

13 years ago

Mac stats :) On a Sandybridge Macbook Pro 13" OSX 10.7.4 4GB RAM Firefox Aurora 16.0a2 (2012-08-14)the numbers are: 1.18GB to preload. Chrome takes 880MB (3 processes) 1.72GB maximum during the city scene. Chrome takes 934MB 1.5GB in the desert scene. Chrome takes 1007MB Firefox does reclaim the memory when the tab closes, within 10 - 15 seconds, without too much pause.

Julian Adams

Comment 58

•

13 years ago

The GC output after the window was closed after comment 57 was: CC(T+560.6) duration: 7ms, suspected: 313, visited: 1218 RCed and 4535 GCed, collected: 290 RCed and 0 GCed (290 waiting for GC) ForgetSkippable 2 times before CC, min: 0 ms, max: 0 ms, avg: 0 ms, total: 1 ms, removed: 1097 GC(T+563.1) Total Time: 354.6ms, Compartments Collected: 161, Total Compartments: 161, MMU (20ms): 0%, MMU (50ms): 0%, Max Pause: 316.6ms, Allocated: 685MB, +Chunks: 0, -Chunks: 0 Slice: 0, Pause: 37.5 (When: 0.0ms, Reason: PAGE_HIDE): Mark: 31.6ms, Mark Discard Code: 5.7ms, Mark Roots: 21.4ms Slice: 2, Pause: 316.6 (When: 239.2ms, Reason: INTER_SLICE_GC): Mark: 303.6ms, Mark Gray: 303.6ms, Sweep: 11.2ms, Sweep Atoms: 1.8ms, Sweep Compartments: 5.6ms, Sweep Tables: 1.5ms, Sweep Object: 0.8ms, Sweep Script: 0.2ms, Sweep Shape: 1.2ms, Sweep Discard Code: 0.2ms, Discard Analysis: 3.8ms, Discard TI: 0.1ms, Free TI Arena: 3.2ms, Sweep Types: 0.3ms, Clear Script Analysis: 0.1ms, Finalize End Callback: 1.5ms, End Callback: 1.5ms Totals: Mark: 335.6ms, Mark Discard Code: 5.7ms, Mark Roots: 21.4ms, Mark Gray: 303.6ms, Sweep: 11.2ms, Sweep Atoms: 1.8ms, Sweep Compartments: 5.6ms, Sweep Tables: 1.5ms, Sweep Object: 0.8ms, Sweep Script: 0.2ms, Sweep Shape: 1.2ms, Sweep Discard Code: 0.2ms, Discard Analysis: 3.8ms, Discard TI: 0.1ms, Free TI Arena: 3.2ms, Sweep Types: 0.3ms, Clear Script Analysis: 0.1ms, Finalize End Callback: 1.5ms, End Callback: 1.5ms CC(T+569.3) duration: 224ms, suspected: 334, visited: 10578 RCed and 3896 merged GCed, collected: 9212 RCed and 1008 GCed (10220 waiting for GC) ForgetSkippable 2 times before CC, min: 1 ms, max: 2 ms, avg: 1 ms, total: 3 ms, removed: 418 GC(T+573.9) Total Time: 372.7ms, Compartments Collected: 160, Total Compartments: 160, MMU (20ms): 0%, MMU (50ms): 0%, Max Pause: 338.8ms, Allocated: 684MB, +Chunks: 0, -Chunks: 1 Slice: 0, Pause: 33.4 (When: 0.0ms, Reason: CC_WAITING): Mark: 32.8ms, Mark Discard Code: 0.4ms, Mark Roots: 22.7ms Slice: 2, Pause: 338.8 (When: 234.1ms, Reason: INTER_SLICE_GC): Mark: 2.6ms, Mark Gray: 2.5ms, Sweep: 58.5ms, Sweep Atoms: 2.7ms, Sweep Compartments: 2.7ms, Sweep Tables: 1.8ms, Sweep Object: 48.1ms, Sweep Script: 1.8ms, Sweep Shape: 2.3ms, Sweep Discard Code: 0.3ms, Discard Analysis: 0.6ms, Discard TI: 0.2ms, Sweep Types: 0.2ms, Clear Script Analysis: 0.2ms, Finalize End Callback: 0.6ms, End Callback: 277.3ms Totals: Mark: 35.8ms, Mark Discard Code: 0.4ms, Mark Roots: 22.7ms, Mark Gray: 2.5ms, Sweep: 58.5ms, Sweep Atoms: 2.7ms, Sweep Compartments: 2.7ms, Sweep Tables: 1.8ms, Sweep Object: 48.1ms, Sweep Script: 1.8ms, Sweep Shape: 2.3ms, Sweep Discard Code: 0.3ms, Discard Analysis: 0.6ms, Discard TI: 0.2ms, Sweep Types: 0.2ms, Clear Script Analysis: 0.2ms, Finalize End Callback: 0.6ms, End Callback: 277.3ms CC(T+574.0) duration: 7ms, suspected: 264, visited: 1304 RCed and 2888 GCed, collected: 0 RCed and 0 GCed (0 waiting for GC) ForgetSkippable 3 times before CC, min: 0 ms, max: 1 ms, avg: 0 ms, total: 2 ms, removed: 108

Julian Adams

Comment 59

•

13 years ago

During the city scene in the run from comment 57 the GC debug follows. There's a couple of non incremental GC, and the max pauses are still around 100ms. The experience is still slower and more stuttery than Chrome. GC(T+159.1) Total Time: 481.0ms, Compartments Collected: 1, Total Compartments: 151, MMU (20ms): 0%, MMU (50ms): 0%, Max Pause: 126.3ms, Nonincremental Reason: allocation trigger, Allocated: 828MB, +Chunks: 231, -Chunks: 0 Slice: 0, Pause: 39.8 (When: 0.0ms, Reason: MAYBEGC): Mark: 38.4ms, Mark Discard Code: 1.3ms, Mark Roots: 28.4ms, Mark Types: 0.6ms Slice: 32, Pause: 126.3 (When: 2521.6ms, Reason: ALLOC_TRIGGER): Mark: 90.4ms, Mark Delayed: 24.2ms, Mark Gray: 0.4ms, Sweep: 34.6ms, Sweep Atoms: 1.5ms, Sweep Compartments: 1.1ms, Sweep Tables: 0.2ms, Sweep Object: 30.6ms, Sweep Shape: 0.2ms, Sweep Discard Code: 0.9ms, Finalize End Callback: 1.1ms, End Callback: 0.3ms Totals: Mark: 440.2ms, Mark Discard Code: 1.3ms, Mark Roots: 28.4ms, Mark Types: 0.6ms, Mark Delayed: 24.2ms, Mark Gray: 0.4ms, Sweep: 34.6ms, Sweep Atoms: 1.5ms, Sweep Compartments: 1.1ms, Sweep Tables: 0.2ms, Sweep Object: 30.6ms, Sweep Shape: 0.2ms, Sweep Discard Code: 0.9ms, Finalize End Callback: 1.1ms, End Callback: 0.3ms CC(T+160.7) duration: 8ms, suspected: 51, visited: 573 RCed and 3463 GCed, collected: 0 RCed and 0 GCed (0 waiting for GC) ForgetSkippable 4 times before CC, min: 0 ms, max: 1 ms, avg: 0 ms, total: 2 ms, removed: 460 GC(T+165.6) Total Time: 521.6ms, Compartments Collected: 1, Total Compartments: 151, MMU (20ms): 0%, MMU (50ms): 0%, Max Pause: 112.9ms, Nonincremental Reason: allocation trigger, Allocated: 946MB, +Chunks: 135, -Chunks: 0 Slice: 0, Pause: 40.5 (When: 0.0ms, Reason: MAYBEGC): Mark: 38.8ms, Mark Discard Code: 1.5ms, Mark Roots: 28.8ms, Mark Types: 0.6ms Slice: 36, Pause: 12.5 (When: 2869.7ms, Reason: REFRESH_FRAME): Mark: 12.4ms, Mark Delayed: 7.4ms Slice: 37, Pause: 112.9 (When: 2916.6ms, Reason: ALLOC_TRIGGER): Mark: 58.9ms, Mark Delayed: 18.1ms, Mark Gray: 0.3ms, Sweep: 53.2ms, Sweep Atoms: 1.5ms, Sweep Compartments: 1.5ms, Sweep Tables: 0.3ms, Sweep Object: 48.5ms, Sweep Shape: 0.2ms, Sweep Discard Code: 1.1ms, Finalize End Callback: 1.4ms Totals: Mark: 461.7ms, Mark Discard Code: 1.5ms, Mark Roots: 28.8ms, Mark Types: 0.6ms, Mark Delayed: 26.9ms, Mark Gray: 0.3ms, Sweep: 53.2ms, Sweep Atoms: 1.5ms, Sweep Compartments: 1.5ms, Sweep Tables: 0.3ms, Sweep Object: 48.5ms, Sweep Shape: 0.2ms, Sweep Discard Code: 1.1ms, Finalize End Callback: 1.4ms GC(T+172.5) Total Time: 534.1ms, Compartments Collected: 1, Total Compartments: 151, MMU (20ms): 0%, MMU (50ms): 0%, Max Pause: 64.4ms, Allocated: 968MB, +Chunks: 16, -Chunks: 0 Slice: 0, Pause: 41.2 (When: 0.0ms, Reason: MAYBEGC): Mark: 39.6ms, Mark Discard Code: 1.5ms, Mark Roots: 29.6ms, Mark Types: 0.6ms Slice: 36, Pause: 12.1 (When: 2494.4ms, Reason: REFRESH_FRAME): Mark: 12.0ms, Mark Delayed: 6.2ms Slice: 38, Pause: 13.6 (When: 2625.9ms, Reason: REFRESH_FRAME): Mark: 13.4ms, Mark Delayed: 7.7ms Slice: 43, Pause: 64.4 (When: 2955.6ms, Reason: REFRESH_FRAME): Mark: 2.1ms, Mark Delayed: 0.5ms, Mark Gray: 0.4ms, Sweep: 61.9ms, Sweep Atoms: 1.7ms, Sweep Compartments: 1.3ms, Sweep Tables: 0.3ms, Sweep Object: 57.4ms, Sweep Shape: 0.2ms, Sweep Discard Code: 1.0ms, Finalize End Callback: 1.2ms Totals: Mark: 465.3ms, Mark Discard Code: 1.5ms, Mark Roots: 29.6ms, Mark Types: 0.6ms, Mark Delayed: 24.9ms, Mark Gray: 0.4ms, Sweep: 61.9ms, Sweep Atoms: 1.7ms, Sweep Compartments: 1.3ms, Sweep Tables: 0.3ms, Sweep Object: 57.4ms, Sweep Shape: 0.2ms, Sweep Discard Code: 1.0ms, Finalize End Callback: 1.2ms CC(T+172.8) duration: 6ms, suspected: 367, visited: 884 RCed and 3471 GCed, collected: 0 RCed and 0 GCed (0 waiting for GC) ForgetSkippable 3 times before CC, min: 0 ms, max: 1 ms, avg: 0 ms, total: 2 ms, removed: 75 GC(T+178.8) Total Time: 475.4ms, Compartments Collected: 1, Total Compartments: 151, MMU (20ms): 0%, MMU (50ms): 0%, Max Pause: 74.0ms, Allocated: 958MB, +Chunks: 0, -Chunks: 0 Slice: 0, Pause: 47.4 (When: 0.0ms, Reason: MAYBEGC): Mark: 45.5ms, Mark Discard Code: 1.7ms, Mark Roots: 35.5ms, Mark Types: 0.6ms Slice: 36, Pause: 74.0 (When: 1325.5ms, Reason: REFRESH_FRAME): Mark: 1.1ms, Mark Gray: 1.0ms, Finalize Start Callback: 0.1ms, Sweep: 72.8ms, Sweep Atoms: 2.4ms, Sweep Compartments: 2.0ms, Sweep Tables: 0.4ms, Sweep Object: 66.9ms, Sweep Shape: 0.2ms, Sweep Discard Code: 1.6ms, Finalize End Callback: 1.2ms Totals: Mark: 396.7ms, Mark Discard Code: 1.7ms, Mark Roots: 35.5ms, Mark Types: 0.6ms, Mark Delayed: 0.4ms, Mark Gray: 1.0ms, Finalize Start Callback: 0.1ms, Sweep: 72.8ms, Sweep Atoms: 2.4ms, Sweep Compartments: 2.0ms, Sweep Tables: 0.4ms, Sweep Object: 66.9ms, Sweep Shape: 0.2ms, Sweep Discard Code: 1.6ms, Finalize End Callback: 1.2ms

Andrew McCreight (out of office until 8/21) [:mccr8]

Comment 60

•

13 years ago

Thanks for the information! GC pauses are a slightly separate issue from memory usage, but I guess we can leave it here for now. Some of these problems should be fixed in Nightly. Comment 58: This is the teardown, and it is clearly dominated by a ton of C++ <-> JS references. First GC is dominated by a 303ms Mark Gray. Second GC is dominated by a 277ms End Callback, which should be fixed by bug 743112. Comment 59: First GC: Main problem is a 90ms Mark phase, I guess due to falling behind? Second GC: about 60ms of Mark again, plus 50ms of Sweep Object. The latter should be fixed by bug 729760. Third GC: Main problem is Sweep Object, which as before should be fixed in Nightly. Fourth GC: Mark Roots takes 35ms in the first slice. Last slice is Sweep Object again.

Julian Adams

Comment 62

•

13 years ago

Updated Mac stats for Aurora Firefox 17.0a2 (2012-09-03) :) On a Sandybridge Macbook Pro 13" OSX 10.7.4 4GB RAM the numbers are (from MemChaser now): 1.21GB to preload. 1.61GB maximum during the city scene. That's down about 100MB. On a clean Firefox start the GC situation (from MemChaser this time) is much nicer, although 58.7ms is still a very long time to add to a frame of graphics: Resident 1606MB: IGC: 58.7ms (6.0s) CC: 3.5ms (12.1s) Running with a long-open Firefox was showing some really nasty GC spikes: Resident 1710MB: IGC: 20291.3ms 99.8s CC: 2969ms (129.8s) There also seems to be flickering in the rendering, where some frames are black. I'll bug that separately.

Julian Adams

Comment 63

•

13 years ago

I've reported the flickering: bug 788026

Benoit Jacob [:bjacob] (mostly away)

Comment 64

•

13 years ago

There is a possibility that bug 795186 might help here. It is worth retrying with tomorrow's nightly build.

Julian Adams

Comment 65

•

13 years ago

Mac stats :) On a Sandybridge Macbook Pro 13" OSX 10.7.4 4GB RAM Firefox Nightly 19.0a1 (2012-10-25) the numbers are: 1.10GB to preload. 1.67GB maximum during the city scene This is maybe a bit lower than before. Firefox does reclaim the memory when the tab closes, within 10 - 15 seconds, without too much pause, dropping to 298MB. The memory stats are reported with Mem Chaser. There's also a really big pause when ro.me is not in the Firefox cache after preload when you hit "Begin". The first time I got a black screen for about 10 seconds, with no audio, then the audio/video kicks in that far into the song. The second time (after clearing the last hour of cache) I got the first frame of video, and playing audio, but then the video didn't update for about 10s. Mem Chaser didn't show long/red GC/CC pauses. This don't happen if you load Ro.me a second time without clearing the cache.

Eric Rahm [:erahm]

Comment 66

•

8 years ago

I remeasured with Firefox nightly 58 on Linux vs Chrome beta 62 using atsy's stats module. On the Firefox side I had DMD enabled. > from atsy import stats > f = status.ProcessStats(lambda x: 'obj-x86_64-pc-linux-gnu-clang' in x, lambda x: 'firefox -content' not in x) > c = stats.ProcessStats(lambda x: 'chrome-beta' in x, lambda x: 'chrome --type' not in x) Firefox + DMD: > >>> f.print_stats() > [24936] - /var/dev/erahm/mozilla-unified/obj-x86_64-pc-linux-gnu-clang/dist/bin/firefox > * RSS - 382935040 > USS - 250073088 > [25071] - /var/dev/erahm/mozilla-unified/obj-x86_64-pc-linux-gnu-clang/dist/bin/firefox > RSS - 179822592 > * USS - 73367552 > [25132] - /var/dev/erahm/mozilla-unified/obj-x86_64-pc-linux-gnu-clang/dist/bin/firefox > RSS - 1799127040 > * USS - 1658810368 > > Total: 2,115,112,960 bytes Chrome: > >>> c.print_stats() > [26578] - /opt/google/chrome-beta/chrome > * RSS - 165359616 > USS - 80101376 > [26588] - /opt/google/chrome-beta/chrome > RSS - 54571008 > * USS - 8699904 > [26589] - /opt/google/chrome-beta/nacl_helper > * RSS - 7880704 > USS - 2338816 > [26592] - /opt/google/chrome-beta/chrome > RSS - 12619776 > * USS - 114688 > [26646] - /opt/google/chrome-beta/chrome > RSS - 414932992 > * USS - 298663936 > [26723] - /opt/google/chrome-beta/chrome > RSS - 23814144 > * USS - 3039232 > [26923] - /opt/google/chrome-beta/chrome > RSS - 1087041536 > * USS - 1004920832 > > Total: 1,488,678,912 bytes If we subtract the amount of overhead for DMD (458 MiB) and about:memory (10 MiB) that leaves us with: > Fx - 1,624,379,392 bytes > Cr - 1,488,678,912 bytes > =================== > 135,700,480 bytes So we're talking a 130 MiB difference on a Chrome-specific experiment, at this point I think we can just close this. If someone feels strongly feel free to reopen but this isn't more than MemShrink:P3.

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → WORKSFORME

output during successful CC after preload 14 years ago Alon Zakai (:azakai) 3.43 KB, text/plain		Details
about:memory while running ROME 14 years ago Benoit Jacob [:bjacob] (mostly away) 22.05 KB, text/plain		Details
about:memory while running ROME 14 years ago Andrew McCreight (out of office until 8/21) [:mccr8] 2.18 KB, text/plain		Details
about memory at various points post-ObjShrink 14 years ago Andrew McCreight (out of office until 8/21) [:mccr8] 15.06 KB, text/plain		Details