Closed Bug 1011137 Opened 9 years ago Closed 9 years ago
Reduce GC delay rather than CC delay with ICC
With ICC disabled, we wait 6 seconds to actually trigger a CC. With it enabled, we wait 4 seconds. The idea is that it takes some period of time to run a CC (a max of 2 seconds), so we should reduce the time in between by the max length of time the CC can take to ensure that we run the CC enough. The drawback of this is that it makes us run the CC more often, because ICCs that actually take 2 seconds (in total duration) are fairly rare. (According to telemetry, the 95th percentile of CC duration is 250ms.) I tried looking at telemetry for CYCLE_COLLECTOR_TIME_BETWEEN, but it looks really weird (I filed bug 1011132) about that. If we only choose to look at the numbers that make sense, then we can see that the values dropped by a decent amount when ICC landed. Anyways, I'm going to change it so that we don't speed up the delay until we trigger a CC, but instead reduce the delay until we GC after a CC. At that point, we know how long the CC actually took, so we can reduce it appropriately, rather than assuming the worst case. I think it is possible that this is causing the TART regression on Win7, by making us run more GCs and CCs, but I think it is a good change to take in any event, to keep more consistent behavior. We'll have to watch what this does to MAX_PAUSE, as it is possible that some of the gains we are getting from ICC are simply from running the CC more often, so there's less to collect each time.
try run: https://tbpl.mozilla.org/?tree=Try&rev=7953cd2846d3 TART run: https://tbpl.mozilla.org/?tree=Try&rev=78acf493a73d This seems to "fix" most or all of the TART regression.
Attachment #8423591 - Flags: review?(bugs)
Attachment #8423591 - Flags: review?(bugs) → review+
This patch had the misfortune of landing in the middle of a large bustage pileup on inbound that led to me having to revert to a last-good cset in lieu of waiting 3-4 hours for green instead. Unfortunately, that means this was backed out along with the others. If this was confirmed green on Try, it can re-land at your convenience. https://hg.mozilla.org/integration/mozilla-inbound/rev/eb2a6f7785a2
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
I believe this bug is the cause of a ~74% regression on AWFY Win8 octane-Box2D?
This didn't affect the shell at all.
Oh, Win8 is browser. Interesting. Still, that seems fairly unlikely.
That test looks bimodal to me, over the last 5 or so days.
Oh, I see. The zooming on there is weird. I see this change set for that big regression: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=9d4726566626&tochange=38e5a78e407f That is odd. I guess because it is lower some times, maybe a change in CC timing could cause a problem?
The problem seems to have calmed down a bit now. It still has the occasional very slow runs that didn't happen before this landed but that may be OK. > maybe a change in CC timing could cause a problem? Sounds likely given its randomness. Maybe this uncovered an underlying bug in browser GC/CC heuristic?
(In reply to Hugh Nougher [:Hughman] from comment #11) > > maybe a change in CC timing could cause a problem? > Sounds likely given its randomness. Maybe this uncovered an underlying bug > in browser GC/CC heuristic? Not interfering with benchmark results is not a design goal of the browser GC and CC. :) It might be possible to avoid these perturbations by modifying the harness that runs this to force a GC+CC before the test runs.
(In reply to Andrew McCreight [:mccr8] from comment #12) > Not interfering with benchmark results is not a design goal of the browser > GC and CC. Agreed. And I would infer that all the noise runs between 30000 and 37000 would be that expected noise. Distinct runs of >70% slower seems out-of-ordinary. If these slow runs were part of the 'norm', wouldn't we expect slow runs appearing to cover all the range between fast and slow? I also note mandreel has similar pattern runs being >50% slower than 'norm'.
Probably the slow runs are just when we run a GC or CC or something during the test, and the other ones aren't. Anyways, if you want to continue discussing this, you should just file a new bug blocking this one and we could maybe figure out how to make it less bad.
(In reply to Andrew McCreight [:mccr8] from comment #0) > I think it is possible that this is causing the TART regression on Win7, by > making us run more GCs and CCs, For the record, TART numbers/regressions are never a goal. The behavior as far as the user is concerned is our goal. TART exercises tab animations more frequently than normal users would (though still with some delays in between to allow some cleanups - such as hopefully gc and cc), so it's possible that GC/CC timing heuristics changes would clash with TART's unusual frequency of tab animations. On such case, we should adapt TART to the new heuristics if it's considered better than before (for users), rather than tweaking the heuristics until the apparent TART regressions are gone. > but I think it is a good change to take in > any event, to keep more consistent behavior. That's good. If it happens again, please keep the code as best as you can as far as the user is concerned, and if it means one of our tests needs to adapt, then that's where we should make the changes, and if we can't make the change, then we'll still understand that the apparent regression is not a real one and therefore will let it be. A bit off topic, and again, regardless of TART and yes with regards to user benefits, on bug 1005253 I suggested a hinting systems which we could utilize on user-facing UI animations - which could become smoother if we hint not to use GC/CC during these short ~250ms animations.
> The behavior as far as the user is concerned is our goal. Of course. > on bug 1005253 I suggested a hinting systems which we could utilize on user-facing UI animations Yeah, that sounds like a reasonable idea, the question is just how much that would improve the user experience in practice. I suppose some kind of telemetry could check how often we run CC during animations, or something.
You need to log in before you can comment on or make changes to this bug.