Closed Bug 1357183 Opened 7 years ago Closed 7 years ago

Measure GC slice overhead

Categories

(Core :: JavaScript: GC, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED INVALID
Performance Impact high

People

(Reporter: djvj, Unassigned)

References

(Blocks 2 open bugs)

Details

We issue GC requests with timeslices of 10ms and 40ms.  I've been told that this was due to high fixed-cost overhead of GC slices, and also that these overheads have come down recently.

We should measure these overheads and investigate how much we can lower these timeslice budgets.  Ideally, if we can halve both (to 5ms and 20ms), it would be good.
Blocks: 1357187
Whiteboard: [qf]
Whiteboard: [qf] → [qf:p1]
There's no fixed-cost overhead.  Some parts of GC can happen incrementally and some can't.  If you're in an incremental part (e.g. normal marking) overhead is very low.  If you're in a non-incremental part a slice can take any amount of time (e.g. it may depend on the size of some table that needs to be swept).

The thing we do need to ensure is that the GC keeps making progress.  Barriers that trigger between slices can cause additional marking to happen in the next slice, so if we make the slices too short then we never get anywhere.
Closing as per previous comment.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INVALID
I don't understand how comment 1 contradicts the need for measuring overhead. I'd define "overhead" as the part of a slice that does not contribute towards forward progress. Would it be possible to measure how much of a GC slice isn't really making forward progress?
(In reply to Andrew McCreight [:mccr8] from comment #3)
OK, that's not what I was thinking of when I read 'overhead'.

The 'lack of forward progress' problem is about incremental marking and the fact that barriers triggered between slices can give us more marking work to do.  We are always making forward progress during the slice, it's just that the goal we're working towards can get farther away.

I guess what would be necessary is discriminating marking time based on whether the thing we're marking was reached via a path that came from a barrier vs. the root set.  I'm not sure that's feasible though.
Yeah, I think the discussion is clouded by the collective memory of a time when we *did* have substantial per-slice overhead.

If we wanted to figure out whether there's something we could do here, I think we'd have to measure how often we mark something due to a barrier that we would not have marked in the absence of the barrier. I would expect (1) most things marked by barriers are true forward progress because they're still reachable in other ways, (2) most of the remainder is overhead because it is unreachable at sweeping times, and (3) the number of things that actually require the incremental barrier (ie, they're being moved behind the current marking frontier) is very small.

For this to be worth bothering with, (2) needs to be a significant amount of marking work. And it might be, though the likelihood is not so high as to make me think this is a high-priority avenue of investigation.

If (2) does happen to be fairly common, and we assume that (3) holds, then both measuring and adjusting for this is relatively simple: keep two mark stacks, one for barriers and one for everything else. Do all incremental marking with the non-barrier stack (which gets seeded from the roots), only switching to the barrier stack when the non-barrier stack is empty. In theory, nothing from (2) should ever be pushed onto the barrier stack after that point, because if a barrier is invoked on it, then it must still be reachable from the roots.

Assuming I'm not messing up here, we could use the same mechanism for making use of this distinction: the scheduling decision for when and how long to run the barrier slices ought to be different from the non-barrier slices. For example, you might do the first few as if they were non-barrier slices, but to handle the less common cases where you *still* have work to do because there's an unexpected amount of "overhead", you might do longer slices where you accept that the pause is going to be slightly noticeable (or same-length but more frequent, possibly overriding the sorts of "please don't GC now" requests we've been talking about.)

Anyway, I'm still not sure whether this has a high enough chance to gain enough to be worth looking into. It made a lot more sense when we were thinking there was some fixed, true overhead per slice.
Performance Impact: --- → P1
Whiteboard: [qf:p1]
You need to log in before you can comment on or make changes to this bug.