Open Bug 1260501 Opened 8 years ago Updated 2 years ago

High frequency GC mode slide budget doubling doesn't apply to non-0 budgets

Categories

(Core :: JavaScript: GC, defect)

defect

Tracking

()

REOPENED
Tracking Status
firefox48 --- affected

People

(Reporter: ehoogeveen, Unassigned)

References

Details

Attachments

(1 file)

If we're GCing frequently (more than once every second), we have logic that's supposed to double our slice budget to increase throughput. However, it currently only works if no slice budget is passed in (and we fall back to the default), which seemed wonky to Terrence and me.

I'm not sure how useful this heuristic is - if we're completing our GCs quickly enough, there should be no need to fall back to 30fps. Conversely, I don't think this helps us avoid full GCs if we're falling behind allocation, since IIUC it doesn't kick in unless the *next* GC happens less than a second later. However, fixing the logic is easy enough, so here's a patch to do that.
Summary: High frequency GC mode slide budget doubling don't apply to non-0 budgets → High frequency GC mode slide budget doubling doesn't apply to non-0 budgets
Comment on attachment 8735962 [details] [diff] [review]
Make high frequency GC mode slice budget doubling work with non-0 budgets.

Review of attachment 8735962 [details] [diff] [review]:
-----------------------------------------------------------------

I think the idea is that we want to give more time to the GC (lowering the MMU) if by the time we finish one GC we are already in need of another one.
Attachment #8735962 - Flags: review?(terrence) → review+
I included this in the try run for bug 1260475:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=cc88ce7e1de4&selectedJob=18761739

I don't think anything in there implicates this change. Setting the dependency just in case there's overlap in the context.
Blocks: 1260475
Status: NEW → ASSIGNED
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/6e65cda2a0eb
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
I recorded a regression for splay latency on both x86 and x64.
I retriggered some builds which gave us a better regression range:
http://hg.mozilla.org/integration/mozilla-inbound/log?rev=aeb9cc61c8ad%3A%3A6e65cda2a0eb%20and%20!aeb9cc61c8ad
Given the regression range, only this bug could have caused this regression.

2% on octane:
https://arewefastyet.com/#machine=28&view=single&suite=octane&start=1459427024&end=1459469796

20% on octane-splay-latency:
https://arewefastyet.com/#machine=28&view=single&suite=octane&subtest=SplayLatency&start=1459427024&end=1459469796

@Emanuel can you look into this?
Flags: needinfo?(emanuel.hoogeveen)
Hmm, yeah, this makes us do longer slices when we're getting hammered with GCs, so I suppose it's not surprising that Splay Latency doesn't like the additional source of variance. Should have considered that, sorry.

I haven't looked into it in detail, but I think this is probably inevitable with this logic being active. Another explanation could be that this threw off the splay tuning bhackett added, but that doesn't sound as likely to me. Terrence, what do you think we should do here? Back out, scrap the logic entirely, or improve it somehow?
Flags: needinfo?(emanuel.hoogeveen) → needinfo?(terrence)
Here's an idea: if during an incremental GC, any of the allocation triggers get to within 25% of triggering another GC, increase the slice budgets (at most once per GC), and let this multiplier carry over to the next GC. If during a subsequent GC, the allocation triggers are never closer than 75% from being hit, decrease the slice budgets by the same factor (down to some set minimum). Those percentages are arbitrary, of course, but in theory this should give us a graceful fallback to longer slices regardless of *why* we need them (e.g. it also works if most of each slice's budget is spent on bookkeeping).

One thing I don't know is how this would work with per-zone triggers. We could just take whichever zone gets closest to its allocation triggers (perhaps not unreasonable, since any zone that going over is bad), or we could use per-zone multipliers (but this might affect where the budget should be set). It would also be nice to factor the refresh rate into this multiplier somehow, at least in the browser - maybe round slice budgets to always leave most of a frame for the mutator.

Experimenting with this might not be the right call for this bug, but if you (Terrence) think it's a good idea, I'd like to try it in a new bug.
Yes. I guess the existing logic was complex for a reason. We should back out and try again with something more sophisticated. Unfortunately, I think that making the slices feed back like you're suggesting is just going to inevitably lead to the same behavior. Let's step back and bit and take a slightly broader view. We (basically) trigger 3 kinds of GC's.

1) Browser idle: 40ms slices every 100ms. This is driven by nsJSRuntime as part of the CC.
2) Browser active: 10ms slices every 16ms. This is driven by the frame refresh driver. It was broken by the landing of hardware vsync.
3) JS busyloop: short enough to appease Octane's latency benchmarks. This is driven by ALLOC_TRIGGER when the right stars align.

This patch breaks 3, but we really only care about fixing 2. I think a better and more general solution for this would be to just go and fix bug 1106964. The refresh driver is assuming that a 10ms requested slice will take <16ms and when this fails, things fall over. Instead, the refresh driver should be aware that it's request might take too long and if it does so, it needs to do something more sane. Perhaps it would make even more sense in this case to disable the animation driven GC altogether and rely on the browser idle GC to take care of garbage.
Flags: needinfo?(terrence)
https://hg.mozilla.org/integration/mozilla-inbound/rev/503ac89cb92b

This'll merge to m-c either tonight or tomorrow.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla48 → ---
Never actually had to *request* a backout before :)

> Unfortunately, I think that
> making the slices feed back like you're suggesting is just going to
> inevitably lead to the same behavior.

Maybe, though in the specific case of Splay-Latency it would depend on how close we get to our allocation triggers before the end of each GC (even if GCs are frequent, they might still be much shorter than the time between them). I wouldn't be surprised if we could tweak things to keep Splay-Latency happy. Also, even if we don't need the logic for S-L, where all we need to do is avoid non-incremental GCs, I think this kind of graceful degradation would be useful in real-world situations. Falling back to 30, 20, 15fps is almost always going to be preferable to a massive pause, I think, especially if the frame rate stays at all consistent.

> 2) Browser active: 10ms slices every 16ms. This is driven by the frame
> refresh driver. It was broken by the landing of hardware vsync.

Was it broken? Don't we just trigger them every (1001/60)ms now? I suppose it's faster for displays with higher refresh rates.

> 3) JS busyloop: short enough to appease Octane's latency benchmarks. This is
> driven by ALLOC_TRIGGER when the right stars align.

Yeah.. I need to look at how the logic bhackett added for this works (for instance, once we've started this special kind of incremental GC, how often do slices run?).

> I think a better and more general solution for
> this would be to just go and fix bug 1106964.

I think that's fixing a different problem, albeit an important one. I need to read up on how this works, too - do we get paint notifications when the refresh driver gets notified to *start* painting, or when it *finishes* painting? Does other work happen after we finish our slice (if not painting, something else)?

> The refresh driver is assuming that a 10ms requested slice will
> take <16ms and when this fails, things fall over.

Fall over in what sense? If we've finished painting by the time we start our slice, the refresh driver should have already pushed the frame into the backbuffer, and it should get composited regardless of what we do. If we start painting after we finish our slice.. well, it should just be delayed by however long we took. I think it's mostly a problem if we keep alternating between finishing just *before* composition and just *after* it, since that will presumably cause excessive stutter.

> ... should be aware that its request might take too long and if it does
> so, it needs to do something more sane. Perhaps it would make even more
> sense in this case to disable the animation driven GC altogether and rely on
> the browser idle GC to take care of garbage.

I'm probably missing some part of this, but if we need long slices to avoid falling back to non-incremental GCs I'm not really sure what we could do to play nicer with the refresh driver - on some level we just have to stop the world so we can make enough progress, right? (concurrent GC would be nicer here, of course, but even that could fall behind allocation and need to slow it down)
I'm guessing this caused a telemetry alert:

http://alerts.telemetry.mozilla.org/index.html#/detectors/1/metrics/1348/alerts/?from=2016-04-01&to=2016-04-01

I don't know what that means, though.

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: emanuel.hoogeveen → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.