<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Comment 2

•

11 years ago

(In reply to Jon Coppeard (:jonco) from comment #1) > I think that's a GC triggered following a cycle collection - Andrew, any > ideas why this is happening? Not off hand. Luke, does this happen in a browser session with other stuff open, or also with just the game open? If you run with MOZ_CC_LOG_ALL=1 (and probably MOZ_CC_LOG_DIRECTORY set to some directory) then I could look at the CC logs. I can hopefully try it out myself if not.

Flags: needinfo?(continuation)

Comment 3

•

11 years ago

Also, it sounds like you have javascript.options.mem.log set to true. What do the CC entries look like?

Comment 4

•

11 years ago

I would be very excited to have tools directly in the Firefox developer UIs to trace call sites that dynamically allocate new objects, i.e. not just call sites that generate garbage, because the former is also useful for catching running memory leaks that are still referenced to. Of course having both would be most interesting. Emscripten applications should generate very little garbage because they use a fixed typed array as their ram space. We currently are not able to offer good tools to developers so that they could profile their Emscripten applications to identify opportunities where redundant object allocations could be cleaned up. Multiple developers have been asking for those and there is also one bug report inquiring for that in Emscripten: https://github.com/kripken/emscripten/issues/2299 . I was recently pointed to this: https://developer.mozilla.org/en-US/docs/Tools/Debugger-API/Tutorial-Allocation-Log-Tree , which looks good, so adding a tool like that to Firefox itself would be very useful!

Reporter

Comment 5

•

11 years ago

Attached file cc_log.tar.gz — Details

Here ya go. These are the last two CC/GC log files produced playing around the first tutorial level of osmos. You're right I am using mem.log = true to watch GC/CC activity. The CC pretty regularly interleave with the GCs (as expected) and the pause times for CC are all great; they tend to look like: CC(T+85.2) max pause: 4ms, total time: 4ms, slices: 1, suspected: 1130, visited: 1168 RCed and 387 GCed, collected: 113 RCed and 84 GCed (197|822 waiting for GC) ForgetSkippable 13 times before CC, min: 0 ms, max: 4 ms, avg: 0 ms, total: 12 ms, max sync: 0 ms, removed: 5286 A note on trying this yourself: apparently opening the Browser Console enables debug mode which inhibits both Ion and Odin, so the games run in Baseline and super-slow at that. So you have to first start the game (which compiles all the asm.js with Odin) and *then* open the Browser Console.

Comment 6

•

11 years ago

(In reply to Luke Wagner [:luke] from comment #5) > apparently opening the Browser Console > enables debug mode which inhibits both Ion and Odin Is there a bug open for that? (I don't understand why opening browser console should affect to JITs)

Comment 7

•

11 years ago

Odd looking logs. Are those for workers, not for the main thread?

Reporter

Comment 8

•

11 years ago

I assume this is mostly main-thread. Jukka?

Flags: needinfo?(jjylanki)

Comment 9

•

11 years ago

Also, even if we can reduce CC frequency, we really need to get the max GC pause time down. So some profiles showing where we spend time during that 40ms would be good. (4ms CC pauses don't sound too bad)

Comment 10

•

11 years ago

The games all use only main thread, there should be no workers used in any of them as far as I know. I don't understand how to read the logs. Is it possible to associate the gcs to file:lines where they were allocated? The SDL2 library should interop outside asm.js mostly via the Emscripten HTML5 API, which is at https://github.com/kripken/emscripten/blob/master/src/library_html5.js and http://kripken.github.io/emscripten-site/docs/api_reference/html5.h.html

Reporter

Comment 11

•

11 years ago

(In reply to Olli Pettay [:smaug] from comment #9) Sure, that'd be great, but I was hoping to keep the focus of this bug on why GC/CC are continually happening. Max-pause times seem like a separate bug.

Comment 12

•

11 years ago

I'm not seeing main thread objects in those CC logs. There is for example DedicatedWorkerGlobalScope which should live only in workers. How did you create the logs?

Reporter

Comment 13

•

11 years ago

Just what comment 2 said. Btw, it's really easy to repro this yourself; just visit humblebundle.com.

Comment 14

•

11 years ago

For a game loop running at 60fps, a hiccup of 4ms is well more than enough to cause visually observable stutter. Microstuttering problems practically killed gamers' interest in SLI rigs, I can imagine the same could occur for browser gaming. I'm not saying that a 4ms CC time is unreasonable, but rather mean that we should identify why they continuously occur and how we can help Emscripten to generate code that avoids them.

Flags: needinfo?(jjylanki)

Emanuel Hoogeveen [:ehoogeveen]

Comment 15

•

11 years ago

(In reply to Jukka Jylänki from comment #14) > For a game loop running at 60fps, a hiccup of 4ms is well more than enough > to cause visually observable stutter. Depends highly on what else the page is doing. If 12ms is enough for game logic + painting per slice, there is no stutter. 40ms on the other hand... > I'm not saying that a 4ms CC time is unreasonable, but rather mean that we > should identify why they continuously occur and how we can help Emscripten > to generate code that avoids them. Sure. Just trying to reproduce this. (I expect that something in the page is causing lots of Events which are cycle collectable objects.)

Comment 16

•

11 years ago

I would imagine that bug 1052793 is causing the GC to be longer than it needs to be, though 40ms for a single slice isn't ideal.

Comment 17

•

11 years ago

At least https://www.humblebundle.com/?sel=zenbound2_asm_demo uses mouseenter/leave event listeners on the canvas object (assuming devtools report it correctly). That is just insane for any performance critical app. mouseenter/leave usage forces browser to dispatch separate event for each element in the ancestor chain. One really should use mouseover/out. (if there are no listeners for mouseenter/leave, Gecko doesn't dispatch those events at all) Still investigating if mouseenter/leave events are causing the CC to triggered so often...

Comment 18

•

11 years ago

... but we can actually optimize some of that out..

Alon Zakai (:azakai)

Comment 19

•

11 years ago

I see that the SDL2-emscripten port, which I think all these games use, uses mouseenter/leave, through emscripten's html5.h (which only has mouseenter/leave and not mouseover/out). Should we add mouseover/out to html5.h, and use that in SDL2 instead?

Comment 20

•

11 years ago

There is only one mouseover and one mouseout event when mouse is moved to be on top of some element. But there are lots of mouseenter/leave events. So even if those wouldn't cause GC or CC, they would use some memory and dispatching events isn't free either. Also, if we're talking about event listeners for <canvas> which doesn't have child elements, it is hard to see any reasons to use mouseenter/leave.

Comment 21

•

11 years ago

Yeah, makes sense. Marked down https://github.com/kripken/emscripten/issues/2894.

Updated

•

11 years ago

Depends on: 1084421

Comment 22

•

11 years ago

We got a pretty good feel for what's going on here at the CC/GC meeting this morning. 1) When the CC runs, it unlinks things which participate in a cycle, but it can't free them directly. Thus, CC tracks how many JSObjects it unlinked: if during the CC it unlinked "lots" of things which are now garbage, it will trigger a GC to clean these up. These are the CC_WAITING(_TO_CLEAR_DEAD_JS) GC's. Currently "lots" is like 250 -- way too low. Moreover, the GC's normal heuristics should handle normal growth -- CC_WAITING GC's should probably only happen at all when unlink a coalesced zone (e.g. after tab closure). I don't think we have a bug open for this yet. 2) Because the CC does not track what zones it is touching, the GC that happens is an all-zones GC. This will cause the atoms table to get swept -- the atoms table is enormous, so this is probably 5-10ms of that 40ms. Just making this not happen all the time will be a huge win on all workloads. This is bug 1052793, which I think Andrew is actively working on after this morning's meeting. 3) CC's choose when to run based on a pile of heuristics. It seems like 5s is way too quickly to be turning around CC's for these cases, so we should probably find out why and delay if possible. This may be a good bug to use for this problem.

Comment 23

•

11 years ago

(In reply to Terrence Cole [:terrence] from comment #22) > 3) CC's choose when to run based on a pile of heuristics. It seems like 5s > is way too quickly to be turning around CC's for these cases, so we should > probably find out why and delay if possible. This may be a good bug to use > for this problem. There just is enough garbage to trigger CC quite often unfortunately. > 1) When the CC runs, it unlinks things which participate in a cycle, but it > can't free them directly. Thus, CC tracks how many JSObjects it unlinked: if > during the CC it unlinked "lots" of things which are now garbage, it will > trigger a GC to clean these up. These are the CC_WAITING(_TO_CLEAR_DEAD_JS) > GC's. Currently "lots" is like 250 -- way too low. Actually we currently trigger GC often because of too many (limit is 2500, not 250 for those) shortliving objects alive. Bug 1084421 will help with that a bit. > CC_WAITING GC's should probably only happen at all when unlink a coalesced zone (e.g. after tab > closure). I don't think we have a bug open for this yet. Unfortunately we really need to GC more often. We should still figure out how to prevent GC spending so much time in the last slice (especially when bug 1052793 doesn't kick in.)

Comment 24

•

11 years ago

Something in the game is causing quite a few AudioBufferSourceNode objects to stay alive just for a short time. Is that expected? Couldn't one reuse those?

Reporter

Comment 25

•

11 years ago

Wow, great job digging into this everyone! Re (1): by any chance are the JSObjects that get unlinked in the nursery (you'd hope so, given that these events are so likely to die young)? If so, perhaps the CC could observe this fact and just not count them at all since a minor GC will inevitably happen before long. (In reply to Olli Pettay [:smaug] from comment #23) > We should still figure out how to prevent GC spending so much time in the > last slice (especially when bug 1052793 doesn't kick in.) Also, from what I usually see on the humble bundle games, the first slice.

Comment 26

•

11 years ago

(In reply to Luke Wagner [:luke] from comment #25) > Re (1): by any chance are the JSObjects that get unlinked in the nursery > (you'd hope so, given that these events are so likely to die young)? If so, > perhaps the CC could observe this fact and just not count them at all since > a minor GC will inevitably happen before long. I'm talking about (likely) short living C++ objects. I don't think we put any wrappers to nursery.

Reporter

Comment 27

•

11 years ago

(In reply to Olli Pettay [:smaug] from comment #26) > I don't think we put any wrappers to nursery. Could we?

Comment 28

•

11 years ago

(In reply to Terrence Cole [:terrence] from comment #22) > Currently "lots" is like 250 -- way too low. I'm not sure if that is too low, if we're really freeing that many objects (currently we're not). One basic problem is that we don't really know how much memory we're freeing for each object. > CC_WAITING GC's should probably > only happen at all when unlink a coalesced zone (e.g. after tab closure). No, that is too strong of a statement. We need to run them occasionally to clean up wrappery things as we go. But certainly every 5 seconds is too much. > 3) CC's choose when to run based on a pile of heuristics. It seems like 5s > is way too quickly to be turning around CC's for these cases, so we should > probably find out why and delay if possible. This may be a good bug to use > for this problem. Yeah, it might be interesting to bump the delay to 10s and see how bad it is. But we certainly want to do that in a separate bug. ;)

Updated

•

11 years ago

Depends on: 1052793, 727965

Comment 29

•

11 years ago

(In reply to Olli Pettay [:smaug] from comment #26) > I'm talking about (likely) short living C++ objects. I don't think we put > any wrappers to nursery. I had the impression that we do nursery allocate some wrappers, but not for, say, window globals.

Comment 30

•

11 years ago

(In reply to Olli Pettay [:smaug] from comment #24) > Something in the game is causing quite a few AudioBufferSourceNode objects > to stay alive just for a short time. Is that expected? Couldn't one reuse > those? Unfortunately not, that's how the Web Audio spec worked them out to be. https://developer.mozilla.org/en-US/docs/Web/API/AudioBufferSourceNode illustrates "A AudioBufferSourceNode can only be played once, that is, only one call to AudioBufferSourceNode.start() is allowed. If the sound needs to be played again, another AudioBufferSourceNode has to be created. Those nodes are cheap to create, and AudioBuffers can be reused accross plays. It is often said that AudioBufferSourceNodes have to be used in a "fire and forget" fashion: once it has beed started, all references to the node can be dropped, and it will be garbage-collected automatically." Some of the games (FTL at least) stream in background music themselves by decoding ogg vorbis files, so they continuously create new AudioBufferSourceNodes even if there are no audio sfx clips playing.

Comment 31

•

11 years ago

(In reply to Luke Wagner [:luke] from comment #25) > Wow, great job digging into this everyone! > > Re (1): by any chance are the JSObjects that get unlinked in the nursery > (you'd hope so, given that these events are so likely to die young)? If so, > perhaps the CC could observe this fact and just not count them at all since > a minor GC will inevitably happen before long. We do a minor collection at the start of each CC slice and assert that nothing that gets into the CC graph is in the nursery -- it is live across minor collections and needs to be unbarriered for speed anyway. > (In reply to Olli Pettay [:smaug] from comment #23) > > We should still figure out how to prevent GC spending so much time in the > > last slice (especially when bug 1052793 doesn't kick in.) > > Also, from what I usually see on the humble bundle games, the first slice. Could you do a run with MOZ_GCTIMER=file and post the file here? We moved to an easier to read format when dumping to a file, since space is less precious than in the browser console. (In reply to Olli Pettay [:smaug] from comment #23) > (In reply to Terrence Cole [:terrence] from comment #22) > > 3) CC's choose when to run based on a pile of heuristics. It seems like 5s > > is way too quickly to be turning around CC's for these cases, so we should > > probably find out why and delay if possible. This may be a good bug to use > > for this problem. > There just is enough garbage to trigger CC quite often unfortunately. I saw that Jukka got exact counts for AB creation in bug 936168 so that we can adjust our triggers for the nursery. Could we maybe get exact counts for the objects that are tripping the CC's triggers and increase the triggers so that these games work better? > > 1) When the CC runs, it unlinks things which participate in a cycle, but it > > can't free them directly. Thus, CC tracks how many JSObjects it unlinked: if > > during the CC it unlinked "lots" of things which are now garbage, it will > > trigger a GC to clean these up. These are the CC_WAITING(_TO_CLEAR_DEAD_JS) > > GC's. Currently "lots" is like 250 -- way too low. > Actually we currently trigger GC often because of too many (limit is 2500, > not 250 for those) > shortliving objects alive. Bug 1084421 will help with that a bit. Yes, that bug is a great idea! > > CC_WAITING GC's should probably only happen at all when unlink a coalesced zone (e.g. after tab > > closure). I don't think we have a bug open for this yet. > Unfortunately we really need to GC more often. > > > We should still figure out how to prevent GC spending so much time in the > last slice (especially when bug 1052793 doesn't kick in.) That is bug 1057530. However, please do read Bill's comment 7 from that bug as it puts the scale of the request in perspective: https://bugzilla.mozilla.org/show_bug.cgi?id=1057530#c7 We're working on it, but it's not going to happen overnight.

Comment 32

•

11 years ago

I should point out that bug 1064578 already appears to be helping somewhat, and we have more in that vein planned.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 33

•

11 years ago

One thing I noticed is that the largest pause, at 40ms, seems to always be in slice 1, and is entirely marking. Because it is marking, we should be able to reduce that. Maybe there's something that could be done to better detect that we want lower pause times and thus reduce the budget? These GCs I am seeing have 4 slices: the mark setup (12ms), a full mark slice (40ms), a partial mark slice (25ms), then a final mark slice that is 20ms (15ms with bug 1052793). If we split those mark slices up better, then we could get things down to more like 15 or 20ms, which is at least better.

Comment 34

•

11 years ago

I haven't been reading this closely, but I wanted to point out that 40ms is the budget we set in some cases for GC slices: http://mxr.mozilla.org/mozilla-central/source/dom/base/nsJSEnvironment.cpp#113 Basically, if an IGC has already started and if we haven't done a slice for 100ms, then we do a 40ms slice. The intention is that, if we're running animation, then we should be getting notifyDidPaint calls for every frame: http://mxr.mozilla.org/mozilla-central/source/js/src/jsgc.cpp#5892 So the 40ms slices should only happen if we're not animating. Maybe something is going wrong though. It would help to see a GC log from mem.log.

Updated

•

11 years ago

Depends on: 1084672

Comment 35

•

11 years ago

Olli raised the point about needlessly generating garbage in Web Audio API. I forwarded the spec question to https://github.com/WebAudio/web-audio-api/issues/373.

Comment 36

•

11 years ago

The 250 collected objects triggering GC after CC is rather ancient, way before any CC optimizations or before check for short living C++ objects (Events and Touch objects). See https://bugzilla.mozilla.org/show_bug.cgi?id=637206 Even if we changed that to count only GCed objects, I think we could probably increase the number a bit.

Depends on: 1084995

Jon Coppeard (:jonco)

Comment 37

•

11 years ago

(In reply to Luke Wagner [:luke] from comment #27) > (In reply to Olli Pettay [:smaug] from comment #26) > > I don't think we put any wrappers to nursery. > > Could we? Wrappers have finalizers and so can't be allocated in the nursery. It might be possible to allow objects with finalizers in the nursery in future, although I think there is browser machinery (e.g. the wrapper cache) which assumes wrappers are always tenured at the moment.

Jon Coppeard (:jonco)

Comment 38

•

11 years ago

(In reply to Jon Coppeard (:jonco) from comment #37) > (e.g. the wrapper cache) which assumes wrappers are always tenured at the moment. Oh, actually the wrapper cache does not assume this so nursery allocated wrappers should be fine from that POV.

Reporter

Comment 39

•

11 years ago

Thanks for the answers re: nursery wrappers! It seems like Emscripten is a pretty good use case for this change since practically all wrappers created are short-lived. When a nursery-allocated Event wrapper was finalized, would the finalizer drop a last ref and destroy the C++ Event object or would a CC still (eventually) be necessary to kill it?

Comment 40

•

11 years ago

(In reply to Luke Wagner [:luke] from comment #39) > When a nursery-allocated Event wrapper was finalized, > would the finalizer drop a last ref Not necessarily the last ref. Wrappers may come and go, yet the C++ object stay alive. > and destroy the C++ Event object No > or > would a CC still (eventually) be necessary to kill it? CC's showwhite killer will call deleteCycleCollectable on things which have ref count 0.

Reporter

Comment 41

•

11 years ago

(In reply to Olli Pettay [:smaug] from comment #40) > CC's showwhite killer will call deleteCycleCollectable on things which have > ref count 0. Ah, I see. Sorry to ask probably-naive questions, but are there any GGC-like schemes we could use for CC'd objects which we know after often short-lived and asm.js code can't help but produce? I know a 4ms CC pause time isn't our biggest fish to fry atm (by far) so this is mostly a long-term ultra-low-jank-goal question.

Reporter

Comment 42

•

11 years ago

Concerning the middle-sized fish to fry, the 10ms slices: would it make sense to file a bug for trying to reduce slice time when animation is active? (An obvious heuristic we've talked about forever is having the slice stop as soon as we're ready to fire the next requestAnimationFrame.)

Comment 43

•

11 years ago

SnowWhite Killer tends to delete objects asynchronously, not only during CC. So in case of DOM Events I'd expect SWK to delete the objects soon after GC has run, since usually only JS keeps DOM events alive after event dispatch. So CC's 4ms pause is probably something else, like we're not having short enough slices.

Comment 44

•

11 years ago

Yes, right now the ICC slice budget is 10ms. To make it lower, we need to do something like IGC does and have a request animation frame callback thing to tell when we're animating (though that is apparently broken with OMTC). Only graph building is incremental in ICC, so I'm not sure how much lower we can make the pauses.

Updated

•

11 years ago

Depends on: 1085466

Updated

•

11 years ago

Depends on: 1085597