Closed Bug 697134 Opened 8 years ago Closed 8 years ago
[CC] Additional cycle collector timing telemetry
I thought I'd filed a bug for this already, but it doesn't look like it. There are more things I would like cycle collector telemetry for. - How often does the CC force a GC? This would be a boolean, and record at the start of each CC whether a GC was forced or not. These are triggered by overflow while ungraying. - Finer grained timing telemetry: how long does each phase of the CC take? In local testing, building the graph dominates the CC time. Occasionally freeing white nodes takes a measurable amount of time, too. I could either time every phase (there are 4 or 5 total), or just those phases we expect to take a while. I've also noticed in local testing that the total of the timing for the phases doesn't add up to the supposed total time of the CC, which is odd. - Number of edge blocks. We already compute this for about memory. Number of node blocks isn't that interesting because we already know the total number of nodes. This one is probably not that exciting.
Smaug, I was thinking of adding time for the graph building phase (MarkRoots) and the unlink phase (CollectWhite). In my own profiling, these are the only CC phases that usually take any time at all. Does that line up with what you wanted to measure?
If UnmarkGray overflows, then mark bits are bogus, and the CC may force a GC immediately before it runs. This will be awful for pause times, so we should see how often this happens in practice to decide whether we need to improve the situation.
Assignee: nobody → continuation
In the CC, COLLECT_TIME_DEBUG uses PRTime, but the existing telemetry uses TimeStamp instead. To simplify things a little, this patch mechanically replaces the remaining uses of PRTime with TimeStamp. In addition, it removes a totally pointless NS_TIME_FUNCTION, puts a little more information into COLLECT_TIME_DEBUG logging, and removes two headers files that are no longer needed after the prior changes.
This adds the telemetry for timing how long building the CC graph and freeing things takes. In my local experiments, these two phases constitute the bulk of CC time. I reduced the bucket size here compared to the full CC timing. If a CC is taking more than 2 seconds, it is fine to just let it count against 2 seconds, as that is already really awful. I also reduced the number of buckets in half, because it had separate buckets for 3 and 4 ms, which seemed silly.
I've read that TimeStamp only has 10ms resolution on Windows, so I'm not sure if it is the right thing to use for measuring the various phases of the CC.
I spun off bug 706164 for the forced GC telemetry.
Summary: even more cycle collector telemetry → [CC] Additional cycle collector timing telemetry
Attachment #571828 - Attachment is obsolete: true
Attachment #571830 - Attachment is obsolete: true
Attachment #571839 - Attachment is obsolete: true
In the wake of the landing of Smaug's CC improvements, I thought I'd check out where we are spending out time. There's currently COLLECT_TIME_DEBUG, but it isn't very thorough. This new patch makes COLLECT_TIME_DEBUG cover everything that should take any time, and also splits CollectWhite into separate categories, so we can tell where the time is going there. For now, this is implemented with hacky macros, but I should make it into a nice class, hook it into telemetry or something, etc., but this is interesting as is.
This is what happens when you close that gigantic W3C HTML spec page: cc: mRuntimes[*]->BeginCycleCollection() took 1ms cc: SelectPurple() took 11ms cc: MarkRoots() took 437ms cc: ScanRoots() took 32ms cc: CollectWhite::Root took 31ms cc: CollectWhite::Unlink took 206ms cc: CollectWhite::Unroot took 477ms cc: CollectWhite() took 715ms cc: ClearGraph() took 29ms cc: total cycle collector time was 1241ms This suggests that bug 652781 will help with pause times when we have a lot to free, but will not eliminate it entirely. One thing that might help is to make the Traverse function for a DOM node could add every other node in the DOM that we think is alive. That will avoid the virtual call overhead. Which may or may matter much. I'll do a shark run at some point. Similarly, if we could somehow Unlink an entire DOM at once, using a DOM-side thing, maybe that would be faster. Some other random large CCs I saw: cc: PrepareForCollection() took 62ms cc: mRuntimes[*]->BeginCycleCollection() took 1ms cc: SelectPurple() took 59ms cc: MarkRoots() took 7ms cc: total cycle collector time was 159ms cc: PrepareForCollection() took 29ms cc: mRuntimes[*]->BeginCycleCollection() took 1ms cc: MarkRoots() took 6ms cc: ClearGraph() took 9ms cc: total cycle collector time was 49ms The only thing that PrepareForCollection does that should take any time is notify observers that the CC has begun. I'm going to guess that most of the time in SelectPurple is calling CanSkipInCC. It looks like ScanRoots (that actually colors the graph) is still not an issue.
I should say that the above large CCs are uncommon. Most of the time (in the usual 35ms-50ms), MarkRoots still dominates the time. cc: PrepareForCollection() took 3ms cc: mRuntimes[*]->BeginCycleCollection() took 4ms cc: MarkRoots() took 44ms cc: ScanRoots() took 2ms cc: total cycle collector time was 56ms
Yeah, need to get the graph smaller.
One thing to try is to not add fully disconnected DOM trees to CC at all. They can blast themselves into pieces during forgetSkippable.
Looks like I am reinventing NS_FUNCTION_TIMER here, so I should rework this patch to use that...
Attachment #593173 - Attachment is obsolete: true
Attachment #593563 - Attachment is obsolete: true
I ended up landing TimeLog over in bug 726374. CC forcing GC telemetry has been in for a while. I think that's the important stuff here, so I'll just close this.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.