The JS shell runs GC slices when the amount of allocated memory crosses a threshold. If a slice ends after finishing sweeping but not finalisation (or maybe part-way through sweeping) then the allocated memory will have dropped, the next slice won't run now for some time (until the allocated memory increases again). This means that if other zones want to begin a collection they need to interrupt this unfinished collection or due to bug 1367455 they will be delayed. Also barriers may still be in place (but I don't know) which may slow throughput.
Here are the two regressions that I'm attempting to fix: https://treeherder.mozilla.org/perf.html#/alerts?id=7016
Paul is this something that only shows up with shell or does this pattern happen in the browser as well? From the description it seems like the general browser could be affected by this too.
Created attachment 8885084 [details] Profile timeline This is a screenshot from the gecko profiler. This could be the same behaviour happening in the browser (either from the same or a different cause). I'm also suspicious about the integration between the GC and the gecko profiler, I'm currently trying to work out how all this fits together so I understand _completely_ what it is I'm seeing.
Created attachment 8885085 [details] Profile from Splay Latency Here is the gecko profile file for splay latency.
Here's what I know: + The same behavior with the final delayed slice does occur in the browser. + The GC following the one with the delayed slice becomes non-incremental due to GCBytesTrigger. This is also in Beta (I think my earlier patch is now in Beta). In my tests with the shell this was because the heap filled up during the collection because the start of the collection was delayed. But there are also other possibilities. + My patch, which i discussed with jonco, attempts to avoid this by adding a new monotonic counter for the amount of used memory, as memory is freed the counter is left alone, it is used to trigger the final slice in a reasonable time and complete the zonal GC sooner and therefor not delaying the following GC. + My patch works, it does avoid this problem, however _all_ the GCs seem to switch to non-incremental (rather than only the delayed GCs) due to GCBytesTrigger. This creates more frequent and longer pauses, lowering the benchmark scores further. This shouldn't happen if my original guess that the delayed GC became non-incremental because it was delayed. Now none are being delayed and yet ALL are non-incremental, this could also be a separate cause. + My patched version seems to have slices that "look" further apart but I haven't measured accurately. This should not be caused by no-longer delaying the final slice, that doesn't make sense. However it does suggest my new counter is not counting as fast as the previous one. Which could cause the GC to fail to reclaim memory quick enough and therefore switching to non-incremental mode. + Since lowering the slice budget to 5ms the whole system is possibly more likely to fail to reclaim enough memory soon enough, as the trigger for slices wasn't changed to correspond to this. I propose lowering the heap thresholds to begin a collection earlier and therefore hope to run enough slices before the heap fills up. Ideally we should calculate some of these parameters dynamically, based on a projection of how much work remains during a collection and how much space is left in the heap. There are some things I can test: + Adjust the threshold for my new counter, lower it and see if the behavior continues. + Search for places I'm missing to where I should be incrementing the counter. + Re-increase the GC pause time to 10ms and see if the problem still occurs. + Lower the thresholds that start a collection.
(In reply to Paul Bone [:pbone] from comment #5) > + My patch works, it does avoid this problem, however _all_ the GCs seem to > switch to non-incremental (rather than only the > delayed GCs) due to GCBytesTrigger. This creates more frequent and longer > pauses, lowering the benchmark scores further. > This shouldn't happen if my original guess that the delayed GC became > non-incremental because it was delayed. Now none > are being delayed and yet ALL are non-incremental, this could also be a > separate cause. It seems like it's important to understand what's going on here. If you look at a profile where this is happening, does the information on thresholds make sense? As in, if this is GCBytesTrigger, then timings.trigger_amount and timings.trigger_threshold should be getting set. Are these significantly different than before your patch? (It just strikes me that if your new counter were never incremented at all, due to some bug, then it would have this observed behavior.)
Good thinking. I'll check that when I get to look at this next, Probably not until Monday the 25th.
You need to log in before you can comment on or make changes to this bug.