We get in various situations where the cycle collector starts taking a long time (>500ms, say) without freeing much. In that case, we could try running the cycle collector less often. This would cause memory usage to increase a bit, because it would take longer for things to be freed, but it would probably improve the user experience. This would require increasing the CC timer, and possibly the purple buffer size threshold.
This will require some tuning. One data point to consider is bz's experience with the GC scheduling regression. What was the gap in CCs there?
After every CC, computer how long the CC took. Multiply that by 9 to get the next delay. If that's less than a certain minimum (5 seconds), set it to the minimum. If that's larger than a certain maximum (30 seconds), set it to the maximum. The next time the CC timer is set, use that delay that you calculated. The basic idea is that we need a minimum so we don't run the CC more than we do now when the CC is behaving itself. I put a fairly generous maximum in there because to avoid weird scenarios like if somebody turns off their computer in the middle of a CC, comes back a day later, and then Firefox decides that the CC actually took 24 hours to run, and then decides not to run the cycle collector again for a week. The numbers are just something vaguely reasonable I threw together, they can be easily changed. I need to come up with a "bz-ifier" that can make the CC sit and spin for an extra 1 second to see how that feels with this patch.
One possibility for an additional pressure valve here would be to use the minimum time if the CC actually collected a bunch of stuff (like, say, more than 10,000 things). That would avoid a failure cascade if the browser is actually generating a ton of garbage, wherein the CC is slow, gets delayed, so more garbage is generated between CCs, so the CC gets even slower, etc. Likewise, we could increase the factor if it isn't collecting anything. In bz's case (and most of the cases I've seen with super slow CCs), the CC isn't collecting more than around 500 items.
Something like this could be ok for FF11. I'm still aiming to get BBP (black-bit-propagation aka CanSkip) landed to FF12. That changes how CC is triggered, because CanSkip needs to run occasionally before CC.
Yeah, that's kind of why I put off working on this. I think it would be nice to have this in our back pocket in case the schedule slips on your CanSkip stuff, but maybe it isn't worth landing. I think it is a little too scary to land in 11 at this point in the cycle, especially without any telemetry about CC scheduling already in place.
I'd also still like a scheduling backstop like this for the CC, even with your stuff, for cases where the CC goes berserk.
I'll try tweaking this and landing it, so we can maybe get it put on Aurora.
(In reply to Andrew McCreight [:mccr8] from comment #3) > One possibility for an additional pressure valve here would be to use the > minimum time if the CC actually collected a bunch of stuff (like, say, more > than 10,000 things). This definitely sounds important to me. If a significant amount is being actually collected, then collection should arguably be done more frequently in order to reduce the maximum pause time and keep the browser responsive.
That is very much what 3.x did. Collection happened more often when user was inactive or there was plenty of garbage collected previous time.
P2 because this could be helpful in pathological situations.
Whiteboard: [Snappy] → [Snappy:P2]
This alters how often we CC by adjusting how many forget skippables we do before we check the size of the purple buffer. The minimum, 15, keeps the current behavior, which is checking every 6 seconds. The maximum, 50, checks every 20 seconds. I use the somewhat arbitrary multiplier of 20 times the length of the previous CC to get when we'll check the CC the next time. If a CC took 300ms, we get the minimum of 6 seconds. If the CC took 500ms, it will be at least 10 seconds before we run the CC next. If the CC took 1second or more, we will check again in 20 seconds. The throttling behavior is disabled if we collect at least 1000 objects. This is to try to avoid a death spiral where we delay the CC longer, which causes a longer CC, etc. If you look at CC telemetry for the last week, CCs over 633ms are extremely rare. Rare enough you can't tell how big the bars are, but they are each happen less than 0.107% of the time.
Attachment #590080 - Attachment is obsolete: true
Another factor to consider would be adjusting how often we do forget skippable. In theory we could end up spending a lot of our time on that. I think generally that forget skippable is less vulnerable to redoing pointless work over and over again than the CC. If we GC, then the next forget skippable will take longer, is the main problem.
Comment on attachment 600085 [details] [diff] [review] CC less often Could we just increase sCCTimer value, basically NS_CC_SKIPPABLE_DELAY when CC is slow. That would increase also the time between forgetSkippables.
This new patch increases the time between slices of CC work when the CC is slow. The min time between them is 400, which is the current setting. The max is 800, which is fairly conservative. That will mean we do a CC every 12 seconds instead of every 6 seconds. To determine the length of time between slices, we take the CC time and multiply it by .8. Any CC of 500ms or shorter will end up with the minimum delay. Multiply this by 15 to get the time between GCs. 500ms or less: 6 seconds 800ms: 9.6 seconds 1000ms or more: 12 seconds If we collected at least 1000 objects during the CC, we use the minimum time, no matter how long the last CC took. This is not ideal, because there could be a half million things in the CC graph, and maybe in that case freeing 1000 objects isn't really that productive.
Attachment #600085 - Attachment is obsolete: true
This doesn't seem worth the time to implement.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.