Closed Bug 1304205 Opened 8 years ago Closed 8 years ago

Increase slice time for long-running cycle collections

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox52 --- wontfix
firefox55 --- fixed

People

(Reporter: mccr8, Unassigned)

References

Details

Attachments

(1 file)

An unusual aspect of the cycle collector is that its running time varies wildly, from a few milliseconds, all the way up to multiple seconds. Bug 1301301 would have had less of an effect on people if longer running cycle collections increased the slice time.

It seems like we usually run the CC for 50 slices before giving up and finishing it, but we're trying to just use very short slices (like 5ms). However, if we have more than 250 ms of work to do, this will result in a larger max pause. For instance, if we have 450ms of work, then we might end up doing 250ms in 5ms slices, then finish it up in a single 200ms slice. This makes our max pause 200ms. If we could have magically known that we needed to do 450ms of work, we could have increased our slice time to 9ms and been done, reducing the max pause from 200ms to 9ms. (Ignoring that the CC can't be perfectly incrementalized.)

Of course, we can't known in advance how long it will take, but perhaps we could increase the slice time as the CC progresses.  Maybe the first 10 slices are 5ms, then the next 10 are 10ms, etc. I'm not sure how to analytically determine what the optimal slice time is, but it wouldn't be hard to simulate the effect on max pause time of various slice budget policies, across various total amounts of work.
Bill suggested that we could do something fancier while the refresh driver is going, and try to skip entire frames rather than just messing them up here and there accidentally.

Honestly, I was just thinking of something simpler, where we make slices go on for longer. One problem is that nsJSEnvironment doesn't directly track how many slices we've done so far, so it would have to be time-based.
hmm, not sure what you mean with messing with them here and there. We try to not mess with them, but just run right after tick and short slices enough that UI should stay responsive.
Sure, we could have longer slice for the case when there hasn't been a tick.
Mass wontfix for bugs affecting firefox 52.
I started looking at this again. I have a patch that linearly increases the slice time to 40ms as we approach the halfway point of an ICC (1000ms).

Here's a big CC we get when we free four or five TechCrunch tabs at once:

CC(T+128.5)[content-21737] max pause: 135ms, total time: 949ms, slices: 33, suspected: 444, visited: 24679 RCed and 494510 GCed, collected: 24637 RCed and 494470 GCed (494470|0|0 waiting for GC)
ForgetSkippable 7 times before CC, min: 0 ms, max: 1 ms, avg: 1 ms, total: 7 ms, max sync: 0 ms, removed: 1131

A CC normally takes about 50 slices, so if we were doing ms slices for the first 49, then we'd have done a total of 245ms of work by the time we get to the final slice, leaving 704ms to be done in the last slice. As you see, my patch reduced the max pause to 135ms.

The budgets per slice look along the lines of this: 5, 5, 5, 6, 7.52, 9.2, 10.8, 12.8, 14.68, 16.6, 18.6, 20.6, 23.0, 25.0, 27.2, 29.6, 32, 34.4, 37.4, then the rest are 40. So, a CC that only takes a few slices (the common case) will not be any worse off than we are today. The pause will get increased a little for CCs that could have been completed within 250ms of work. A CC requiring 250ms of work would go from a max pause of 5ms to a max pause of 27ms, according to my estimate (assuming it was allowed to run for 50 slices!).
Attachment #8863058 - Flags: review?(continuation)
Could you please review this at some point next week, Olli? It is a short patch.
Flags: needinfo?(bugs)
Comment on attachment 8863058 [details]
Bug 1304205 - Increase slice time for longer running CCs.

https://reviewboard.mozilla.org/r/134892/#review137842

::: dom/base/nsJSEnvironment.cpp:1486
(Diff revision 1)
> -        float sliceMultiplier = std::max(TimeBetween(gCCStats.mEndSliceTime, now) / (float)kICCIntersliceDelay, 1.0f);
> +      if (runningTime < kMaxICCDuration) {
> +        // Try to make up for a delay in running this slice.
> +        float sliceDelayMultiplier = TimeBetween(gCCStats.mEndSliceTime, now) / (float)kICCIntersliceDelay;
> +        // Increase slice budgets up to 40ms as we approach half way
> +        // through the ICC, to avoid large sync CCs.
> +        float laterMultiplier = std::min((runningTime * 16.0f) / kMaxICCDuration, 8.0f);

this is quite mysterious. What are 16.0 and 8.0 here?
This does seem to do the right thing but some explanation would be good, and perhaps some assertion that budget stays below 40ms.

though, the value of sliceDelayMultiplier can be anything.

So maybe not assertion but 
TimeBudget(std::min(kICCSliceBudget * sliceMultiplier, 40.0f)) could be passed to SliceBudget.
That would self-document this a bit.
Attachment #8863058 - Flags: review+
Flags: needinfo?(bugs)
That's a very good point. I've restructured how this is calculated. I've left alone the way sliceDelayMultiplier is calculated.

For the later slice budget, I now do this:
        float percentToHalfDone = std::min(2.0f * runningTime / kMaxICCDuration, 1.0f);
        const float maxLaterSlice = 40.0f;
        float laterSliceBudget = maxLaterSlice * percentToHalfDone;

In other words, first it calculates how close it is to the halfway point, as a %, capped at 1.0. Then it multiplies that by the target max budget of 40ms.

Finally, it takes the largest of the two special calculations, or the default slice budget (5ms):
        float sliceMultiplier = std::max({delaySliceBudget, laterSliceBudget, kICCSliceBudget});
Well, the fancy bracket thing does not compile, so I'll have to use std::max twice.
Oops "std::max({delaySliceBudget, laterSliceBudget, (float)kICCSliceBudget});" does work. Sorry for the comment spam.
Pushed by amccreight@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c3a7d618057b
Increase slice time for longer running CCs. r=smaug
https://hg.mozilla.org/mozilla-central/rev/c3a7d618057b
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
There's only one day of reasonable telemetry data from this, but it seems to have had a pretty big impact on the CYCLE_COLLECTION measure, which records how long we go from start to end of a CC, including all of the mutator work that happens between slices. The 95th percentile dropped from about 624ms to 463ms. So as expected we are finishing long CCs sooner. There's no real improvement to the max pause time that I can see, but maybe there will be some visible improvement with more data.
See Also: → 1368972
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: