Open Bug 1083418 Opened 6 years ago Updated 4 years ago

on Nightly, all the humble bundle games seem to GC every 5 seconds or so

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

People

(Reporter: luke, Unassigned)

References

(Depends on 1 open bug, )

Details

Attachments

(1 file)

While most Emscripten apps do continually generate garbage (creating temporary strings or view objects to call web APIs, or perhaps event objects created from DOM), when I've looked at this in the past, after the game started and stabilized, there was only a GC every minute or so (with reasons being, iirc, TOO_MUCH_MALLOC or MAYBEGC).  With all the humble bundle games, though, we seem to do a GC/CC every 5-10 seconds, even once the game finished loading.  The reason for all these GCs is CC_WAITING, which I don't recognize.

While iGC helps, the max pause time is usually ~40ms on my machine (in a clean browser session, only one tab) and there are usually 10 slices taking 10ms.  In particular, on Osmos on Linux (where I think we spend relatively longer per-frame on gfx than Windows), this can produce pretty noticeable jankiness in animation, much worse than chrome.

It's possible something common to all these games (maybe the new SDL2 code?) is causing these games to spew garbage; it'd be nice to get more insight into what is causing these GCs.
I think that's a GC triggered following a cycle collection - Andrew, any ideas why this is happening?
Flags: needinfo?(continuation)
(In reply to Jon Coppeard (:jonco) from comment #1)
> I think that's a GC triggered following a cycle collection - Andrew, any
> ideas why this is happening?

Not off hand.

Luke, does this happen in a browser session with other stuff open, or also with just the game open?

If you run with MOZ_CC_LOG_ALL=1 (and probably MOZ_CC_LOG_DIRECTORY set to some directory) then I could look at the CC logs.  I can hopefully try it out myself if not.
Flags: needinfo?(continuation)
Also, it sounds like you have javascript.options.mem.log set to true.  What do the CC entries look like?
I would be very excited to have tools directly in the Firefox developer UIs to trace call sites that dynamically allocate new objects, i.e. not just call sites that generate garbage, because the former is also useful for catching running memory leaks that are still referenced to. Of course having both would be most interesting.

Emscripten applications should generate very little garbage because they use a fixed typed array as their ram space. We currently are not able to offer good tools to developers so that they could profile their Emscripten applications to identify opportunities where redundant object allocations could be cleaned up. Multiple developers have been asking for those and there is also one bug report inquiring for that in Emscripten: https://github.com/kripken/emscripten/issues/2299 . I was recently pointed to this: https://developer.mozilla.org/en-US/docs/Tools/Debugger-API/Tutorial-Allocation-Log-Tree , which looks good, so adding a tool like that to Firefox itself would be very useful!
Attached file cc_log.tar.gz
Here ya go.  These are the last two CC/GC log files produced playing around the first tutorial level of osmos.

You're right I am using mem.log = true to watch GC/CC activity.  The CC pretty regularly interleave with the GCs (as expected) and the pause times for CC are all great; they tend to look like:

CC(T+85.2) max pause: 4ms, total time: 4ms, slices: 1, suspected: 1130, visited: 1168 RCed and 387 GCed, collected: 113 RCed and 84 GCed (197|822 waiting for GC)
ForgetSkippable 13 times before CC, min: 0 ms, max: 4 ms, avg: 0 ms, total: 12 ms, max sync: 0 ms, removed: 5286

A note on trying this yourself: apparently opening the Browser Console enables debug mode which inhibits both Ion and Odin, so the games run in Baseline and super-slow at that.  So you have to first start the game (which compiles all the asm.js with Odin) and *then* open the Browser Console.
(In reply to Luke Wagner [:luke] from comment #5)
> apparently opening the Browser Console
> enables debug mode which inhibits both Ion and Odin
Is there a bug open for that? (I don't understand why opening browser console should affect to JITs)
Odd looking logs. Are those for workers, not for the main thread?
I assume this is mostly main-thread.  Jukka?
Flags: needinfo?(jjylanki)
Also, even if we can reduce CC frequency, we really need to get the max GC pause time down.
So some profiles showing where we spend time during that 40ms would be good.
(4ms CC pauses don't sound too bad)
The games all use only main thread, there should be no workers used in any of them as far as I know. I don't understand how to read the logs. Is it possible to associate the gcs to file:lines where they were allocated?

The SDL2 library should interop outside asm.js mostly via the Emscripten HTML5 API, which is at https://github.com/kripken/emscripten/blob/master/src/library_html5.js and http://kripken.github.io/emscripten-site/docs/api_reference/html5.h.html
(In reply to Olli Pettay [:smaug] from comment #9)
Sure, that'd be great, but I was hoping to keep the focus of this bug on why GC/CC are continually happening.  Max-pause times seem like a separate bug.
I'm not seeing main thread objects in those CC logs.
There is for example DedicatedWorkerGlobalScope which should live only in workers.

How did you create the logs?
Just what comment 2 said.  Btw, it's really easy to repro this yourself; just visit humblebundle.com.
For a game loop running at 60fps, a hiccup of 4ms is well more than enough to cause visually observable stutter. Microstuttering problems practically killed gamers' interest in SLI rigs, I can imagine the same could occur for browser gaming.

I'm not saying that a 4ms CC time is unreasonable, but rather mean that we should identify why they continuously occur and how we can help Emscripten to generate code that avoids them.
Flags: needinfo?(jjylanki)
(In reply to Jukka Jylänki from comment #14)
> For a game loop running at 60fps, a hiccup of 4ms is well more than enough
> to cause visually observable stutter.
Depends highly on what else the page is doing. If 12ms is enough for game logic + painting per slice, there is no stutter.

40ms on the other hand...

> I'm not saying that a 4ms CC time is unreasonable, but rather mean that we
> should identify why they continuously occur and how we can help Emscripten
> to generate code that avoids them.
Sure. Just trying to reproduce this. (I expect that something in the page is causing lots of Events which are cycle collectable objects.)
I would imagine that bug 1052793 is causing the GC to be longer than it needs to be, though 40ms for a single slice isn't ideal.
At least https://www.humblebundle.com/?sel=zenbound2_asm_demo uses mouseenter/leave event listeners on the canvas object (assuming devtools report it correctly). That is just insane for any
performance critical app. mouseenter/leave usage forces browser to dispatch
separate event for each element in the ancestor chain.
One really should use mouseover/out.
(if there are no listeners for mouseenter/leave, Gecko doesn't dispatch those events at all)

Still investigating if mouseenter/leave events are causing the CC to triggered so often...
... but we can actually optimize some of that out..
I see that the SDL2-emscripten port, which I think all these games use, uses mouseenter/leave, through emscripten's html5.h (which only has mouseenter/leave and not mouseover/out).

Should we add mouseover/out to html5.h, and use that in SDL2 instead?
There is only one mouseover and one mouseout event when mouse is moved to be on top of some element.
But there are lots of mouseenter/leave events. So even if those wouldn't cause GC or CC, they would use some memory and dispatching events isn't free either.
Also, if we're talking about event listeners for <canvas> which doesn't have child elements, 
it is hard to see any reasons to use mouseenter/leave.
Yeah, makes sense. Marked down https://github.com/kripken/emscripten/issues/2894.
Depends on: 1084421
We got a pretty good feel for what's going on here at the CC/GC meeting this morning.

1) When the CC runs, it unlinks things which participate in a cycle, but it can't free them directly. Thus, CC tracks how many JSObjects it unlinked: if during the CC it unlinked "lots" of things which are now garbage, it will trigger a GC to clean these up. These are the CC_WAITING(_TO_CLEAR_DEAD_JS) GC's. Currently "lots" is like 250 -- way too low. Moreover, the GC's normal heuristics should handle normal growth -- CC_WAITING GC's should probably only happen at all when unlink a coalesced zone (e.g. after tab closure). I don't think we have a bug open for this yet.

2) Because the CC does not track what zones it is touching, the GC that happens is an all-zones GC. This will cause the atoms table to get swept -- the atoms table is enormous, so this is probably 5-10ms of that 40ms. Just making this not happen all the time will be a huge win on all workloads. This is bug 1052793, which I think Andrew is actively working on after this morning's meeting.

3) CC's choose when to run based on a pile of heuristics. It seems like 5s is way too quickly to be turning around CC's for these cases, so we should probably find out why and delay if possible. This may be a good bug to use for this problem.
(In reply to Terrence Cole [:terrence] from comment #22)
> 3) CC's choose when to run based on a pile of heuristics. It seems like 5s
> is way too quickly to be turning around CC's for these cases, so we should
> probably find out why and delay if possible. This may be a good bug to use
> for this problem.
There just is enough garbage to trigger CC quite often unfortunately.

> 1) When the CC runs, it unlinks things which participate in a cycle, but it
> can't free them directly. Thus, CC tracks how many JSObjects it unlinked: if
> during the CC it unlinked "lots" of things which are now garbage, it will
> trigger a GC to clean these up. These are the CC_WAITING(_TO_CLEAR_DEAD_JS)
> GC's. Currently "lots" is like 250 -- way too low. 
Actually we currently trigger GC often because of too many (limit is 2500, not 250 for those)
shortliving objects alive. Bug 1084421 will help with that a bit.

> CC_WAITING GC's should probably only happen at all when unlink a coalesced zone (e.g. after tab
> closure). I don't think we have a bug open for this yet.
Unfortunately we really need to GC more often.


We should still figure out how to prevent GC spending so much time in the last slice (especially when bug 1052793 doesn't kick in.)
Something in the game is causing quite a few AudioBufferSourceNode objects to stay alive just for a short time. Is that expected? Couldn't one reuse those?
Wow, great job digging into this everyone!

Re (1): by any chance are the JSObjects that get unlinked in the nursery (you'd hope so, given that these events are so likely to die young)?  If so, perhaps the CC could observe this fact and just not count them at all since a minor GC will inevitably happen before long.

(In reply to Olli Pettay [:smaug] from comment #23)
> We should still figure out how to prevent GC spending so much time in the
> last slice (especially when bug 1052793 doesn't kick in.)

Also, from what I usually see on the humble bundle games, the first slice.
(In reply to Luke Wagner [:luke] from comment #25)
> Re (1): by any chance are the JSObjects that get unlinked in the nursery
> (you'd hope so, given that these events are so likely to die young)?  If so,
> perhaps the CC could observe this fact and just not count them at all since
> a minor GC will inevitably happen before long.
I'm talking about (likely) short living C++ objects. I don't think we put any wrappers to nursery.
(In reply to Olli Pettay [:smaug] from comment #26)
> I don't think we put any wrappers to nursery.

Could we?
(In reply to Terrence Cole [:terrence] from comment #22)
> Currently "lots" is like 250 -- way too low.
I'm not sure if that is too low, if we're really freeing that many objects (currently we're not).  One basic problem is that we don't really know how much memory we're freeing for each object.

> CC_WAITING GC's should probably
> only happen at all when unlink a coalesced zone (e.g. after tab closure).
No, that is too strong of a statement. We need to run them occasionally to clean up wrappery things as we go.  But certainly every 5 seconds is too much.

> 3) CC's choose when to run based on a pile of heuristics. It seems like 5s
> is way too quickly to be turning around CC's for these cases, so we should
> probably find out why and delay if possible. This may be a good bug to use
> for this problem.
Yeah, it might be interesting to bump the delay to 10s and see how bad it is. But we certainly want to do that in a separate bug. ;)
Depends on: 1052793, 727965
(In reply to Olli Pettay [:smaug] from comment #26)
> I'm talking about (likely) short living C++ objects. I don't think we put
> any wrappers to nursery.
I had the impression that we do nursery allocate some wrappers, but not for, say, window globals.
(In reply to Olli Pettay [:smaug] from comment #24)
> Something in the game is causing quite a few AudioBufferSourceNode objects
> to stay alive just for a short time. Is that expected? Couldn't one reuse
> those?

Unfortunately not, that's how the Web Audio spec worked them out to be. https://developer.mozilla.org/en-US/docs/Web/API/AudioBufferSourceNode illustrates "A AudioBufferSourceNode can only be played once, that is, only one call to AudioBufferSourceNode.start() is allowed. If the sound needs to be played again, another AudioBufferSourceNode has to be created. Those nodes are cheap to create, and AudioBuffers can be reused accross plays. It is often said that AudioBufferSourceNodes have to be used in a "fire and forget" fashion: once it has beed started, all references to the node can be dropped, and it will be garbage-collected automatically."

Some of the games (FTL at least) stream in background music themselves by decoding ogg vorbis files, so they continuously create new AudioBufferSourceNodes even if there are no audio sfx clips playing.
(In reply to Luke Wagner [:luke] from comment #25)
> Wow, great job digging into this everyone!
> 
> Re (1): by any chance are the JSObjects that get unlinked in the nursery
> (you'd hope so, given that these events are so likely to die young)?  If so,
> perhaps the CC could observe this fact and just not count them at all since
> a minor GC will inevitably happen before long.

We do a minor collection at the start of each CC slice and assert that nothing that gets into the CC graph is in the nursery -- it is live across minor collections and needs to be unbarriered for speed anyway.

> (In reply to Olli Pettay [:smaug] from comment #23)
> > We should still figure out how to prevent GC spending so much time in the
> > last slice (especially when bug 1052793 doesn't kick in.)
> 
> Also, from what I usually see on the humble bundle games, the first slice.

Could you do a run with MOZ_GCTIMER=file and post the file here? We moved to an easier to read format when dumping to a file, since space is less precious than in the browser console.

(In reply to Olli Pettay [:smaug] from comment #23)
> (In reply to Terrence Cole [:terrence] from comment #22)
> > 3) CC's choose when to run based on a pile of heuristics. It seems like 5s
> > is way too quickly to be turning around CC's for these cases, so we should
> > probably find out why and delay if possible. This may be a good bug to use
> > for this problem.
> There just is enough garbage to trigger CC quite often unfortunately.

I saw that Jukka got exact counts for AB creation in bug 936168 so that we can adjust our triggers for the nursery. Could we maybe get exact counts for the objects that are tripping the CC's triggers and increase the triggers so that these games work better?

> > 1) When the CC runs, it unlinks things which participate in a cycle, but it
> > can't free them directly. Thus, CC tracks how many JSObjects it unlinked: if
> > during the CC it unlinked "lots" of things which are now garbage, it will
> > trigger a GC to clean these up. These are the CC_WAITING(_TO_CLEAR_DEAD_JS)
> > GC's. Currently "lots" is like 250 -- way too low. 
> Actually we currently trigger GC often because of too many (limit is 2500,
> not 250 for those)
> shortliving objects alive. Bug 1084421 will help with that a bit.

Yes, that bug is a great idea!

> > CC_WAITING GC's should probably only happen at all when unlink a coalesced zone (e.g. after tab
> > closure). I don't think we have a bug open for this yet.
> Unfortunately we really need to GC more often.
> 
> 
> We should still figure out how to prevent GC spending so much time in the
> last slice (especially when bug 1052793 doesn't kick in.)

That is bug 1057530. However, please do read Bill's comment 7 from that bug as it puts the scale of the request in perspective: https://bugzilla.mozilla.org/show_bug.cgi?id=1057530#c7

We're working on it, but it's not going to happen overnight.
I should point out that bug 1064578 already appears to be helping somewhat, and we have more in that vein planned.
One thing I noticed is that the largest pause, at 40ms, seems to always be in slice 1, and is entirely marking.  Because it is marking, we should be able to reduce that.  Maybe there's something that could be done to better detect that we want lower pause times and thus reduce the budget?  These GCs I am seeing have 4 slices: the mark setup (12ms), a full mark slice (40ms), a partial mark slice (25ms), then a final mark slice that is 20ms (15ms with bug 1052793).  If we split those mark slices up better, then we could get things down to more like 15 or 20ms, which is at least better.
I haven't been reading this closely, but I wanted to point out that 40ms is the budget we set in some cases for GC slices:
http://mxr.mozilla.org/mozilla-central/source/dom/base/nsJSEnvironment.cpp#113

Basically, if an IGC has already started and if we haven't done a slice for 100ms, then we do a 40ms slice. The intention is that, if we're running animation, then we should be getting notifyDidPaint calls for every frame:
  http://mxr.mozilla.org/mozilla-central/source/js/src/jsgc.cpp#5892
So the 40ms slices should only happen if we're not animating. Maybe something is going wrong though. It would help to see a GC log from mem.log.
Depends on: 1084672
Olli raised the point about needlessly generating garbage in Web Audio API. I forwarded the spec question to https://github.com/WebAudio/web-audio-api/issues/373.
The 250 collected objects triggering GC after CC is rather ancient, way before any CC optimizations or before check for short living C++ objects (Events and Touch objects).
See https://bugzilla.mozilla.org/show_bug.cgi?id=637206
Even if we changed that to count only GCed objects, I think we could probably increase the number a bit.
Depends on: 1084995
(In reply to Luke Wagner [:luke] from comment #27)
> (In reply to Olli Pettay [:smaug] from comment #26)
> > I don't think we put any wrappers to nursery.
> 
> Could we?

Wrappers have finalizers and so can't be allocated in the nursery.

It might be possible to allow objects with finalizers in the nursery in future, although I think there is browser machinery (e.g. the wrapper cache) which assumes wrappers are always tenured at the moment.
(In reply to Jon Coppeard (:jonco) from comment #37)
> (e.g. the wrapper cache) which assumes wrappers are always tenured at the moment.

Oh, actually the wrapper cache does not assume this so nursery allocated wrappers should be fine from that POV.
Thanks for the answers re: nursery wrappers!  It seems like Emscripten is a pretty good use case for this change since practically all wrappers created are short-lived.  When a nursery-allocated Event wrapper was finalized, would the finalizer drop a last ref and destroy the C++ Event object or would a CC still (eventually) be necessary to kill it?
(In reply to Luke Wagner [:luke] from comment #39)
> When a nursery-allocated Event wrapper was finalized,
> would the finalizer drop a last ref
Not necessarily the last ref. Wrappers may come and go, yet the C++ object stay alive.

> and destroy the C++ Event object
No

> or
> would a CC still (eventually) be necessary to kill it?
CC's showwhite killer will call deleteCycleCollectable on things which have ref count 0.
(In reply to Olli Pettay [:smaug] from comment #40)
> CC's showwhite killer will call deleteCycleCollectable on things which have
> ref count 0.

Ah, I see.  Sorry to ask probably-naive questions, but are there any GGC-like schemes we could use for CC'd objects which we know after often short-lived and asm.js code can't help but produce?  I know a 4ms CC pause time isn't our biggest fish to fry atm (by far) so this is mostly a long-term ultra-low-jank-goal question.
Concerning the middle-sized fish to fry, the 10ms slices: would it make sense to file a bug for trying to reduce slice time when animation is active?  (An obvious heuristic we've talked about forever is having the slice stop as soon as we're ready to fire the next requestAnimationFrame.)
SnowWhite Killer tends to delete objects asynchronously, not only during CC.
So in case of DOM Events I'd expect SWK to delete the objects soon after GC has run, since usually
only JS keeps DOM events alive after event dispatch.

So CC's 4ms pause is probably something else, like we're not having short enough slices.
Yes, right now the ICC slice budget is 10ms.  To make it lower, we need to do something like IGC does and have a request animation frame callback thing to tell when we're animating (though that is apparently broken with OMTC).  Only graph building is incremental in ICC, so I'm not sure how much lower we can make the pauses.
Depends on: 1085466
Depends on: 1085597
This should be fixed since we started allocating event wrappers in the GC nursery.
You need to log in before you can comment on or make changes to this bug.