Closed Bug 697134 Opened 8 years ago Closed 8 years ago

[CC] Additional cycle collector timing telemetry

Categories

(Core :: XPCOM, defect)

defect
Not set

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: mccr8, Assigned: mccr8)

References

(Blocks 1 open bug)

Details

(Whiteboard: [Snappy:p2])

Attachments

(5 obsolete files)

I thought I'd filed a bug for this already, but it doesn't look like it.  There are more things I would like cycle collector telemetry for.

- How often does the CC force a GC?  This would be a boolean, and record at the start of each CC whether a GC was forced or not.  These are triggered by overflow while ungraying.

- Finer grained timing telemetry: how long does each phase of the CC take?  In local testing, building the graph dominates the CC time.  Occasionally freeing white nodes takes a measurable amount of time, too.  I could either time every phase (there are 4 or 5 total), or just those phases we expect to take a while.  I've also noticed in local testing that the total of the timing for the phases doesn't add up to the supposed total time of the CC, which is odd.

- Number of edge blocks.  We already compute this for about memory.  Number of node blocks isn't that interesting because we already know the total number of nodes.  This one is probably not that exciting.
Duplicate of this bug: 699397
Blocks: 698919
Smaug, I was thinking of adding time for the graph building phase (MarkRoots) and the unlink phase (CollectWhite).  In my own profiling, these are the only CC phases that usually take any time at all.  Does that line up with what you wanted to measure?
yes.
Attached patch add telemetry for CC-force GCs (obsolete) — Splinter Review
If UnmarkGray overflows, then mark bits are bogus, and the CC may force a GC immediately before it runs.  This will be awful for pause times, so we should see how often this happens in practice to decide whether we need to improve the situation.
Assignee: nobody → continuation
In the CC, COLLECT_TIME_DEBUG uses PRTime, but the existing telemetry uses TimeStamp instead.  To simplify things a little, this patch mechanically replaces the remaining uses of PRTime with TimeStamp.  In addition, it removes a totally pointless NS_TIME_FUNCTION, puts a little more information into COLLECT_TIME_DEBUG logging, and removes two headers files that are no longer needed after the prior changes.
This adds the telemetry for timing how long building the CC graph and freeing things takes.  In my local experiments, these two phases constitute the bulk of CC time.  I reduced the bucket size here compared to the full CC timing.  If a CC is taking more than 2 seconds, it is fine to just let it count against 2 seconds, as that is already really awful.  I also reduced the number of buckets in half, because it had separate buckets for 3 and 4 ms, which seemed silly.
Blocks: 697115
I've read that TimeStamp only has 10ms resolution on Windows, so I'm not sure if it is the right thing to use for measuring the various phases of the CC.
I spun off bug 706164 for the forced GC telemetry.
Summary: even more cycle collector telemetry → [CC] Additional cycle collector timing telemetry
Whiteboard: [Snappy]
Whiteboard: [Snappy] → [Snappy:p2]
Attachment #571828 - Attachment is obsolete: true
Attachment #571830 - Attachment is obsolete: true
Attachment #571839 - Attachment is obsolete: true
Attached patch crude CC time logging (obsolete) — Splinter Review
In the wake of the landing of Smaug's CC improvements, I thought I'd check out where we are spending out time.  There's currently COLLECT_TIME_DEBUG, but it isn't very thorough.  This new patch makes COLLECT_TIME_DEBUG cover everything that should take any time, and also splits CollectWhite into separate categories, so we can tell where the time is going there.  For now, this is implemented with hacky macros, but I should make it into a nice class, hook it into telemetry or something, etc., but this is interesting as is.
This is what happens when you close that gigantic W3C HTML spec page:

cc: mRuntimes[*]->BeginCycleCollection() took 1ms
cc: SelectPurple() took 11ms
cc: MarkRoots() took 437ms
cc: ScanRoots() took 32ms
cc: 	CollectWhite::Root took 31ms
cc: 	CollectWhite::Unlink took 206ms
cc: 	CollectWhite::Unroot took 477ms
cc: CollectWhite() took 715ms
cc: ClearGraph() took 29ms
cc: total cycle collector time was 1241ms

This suggests that bug 652781 will help with pause times when we have a lot to free, but will not eliminate it entirely.

One thing that might help is to make the Traverse function for a DOM node could add every other node in the DOM that we think is alive.  That will avoid the virtual call overhead.  Which may or may matter much.  I'll do a shark run at some point.

Similarly, if we could somehow Unlink an entire DOM at once, using a DOM-side thing, maybe that would be faster.


Some other random large CCs I saw:

cc: PrepareForCollection() took 62ms
cc: mRuntimes[*]->BeginCycleCollection() took 1ms
cc: SelectPurple() took 59ms
cc: MarkRoots() took 7ms
cc: total cycle collector time was 159ms

cc: PrepareForCollection() took 29ms
cc: mRuntimes[*]->BeginCycleCollection() took 1ms
cc: MarkRoots() took 6ms
cc: ClearGraph() took 9ms
cc: total cycle collector time was 49ms

The only thing that PrepareForCollection does that should take any time is notify observers that the CC has begun.  I'm going to guess that most of the time in SelectPurple is calling CanSkipInCC.

It looks like ScanRoots (that actually colors the graph) is still not an issue.
I should say that the above large CCs are uncommon.  Most of the time (in the usual 35ms-50ms), MarkRoots still dominates the time.

cc: PrepareForCollection() took 3ms
cc: mRuntimes[*]->BeginCycleCollection() took 4ms
cc: MarkRoots() took 44ms
cc: ScanRoots() took 2ms
cc: total cycle collector time was 56ms
Yeah, need to get the graph smaller.
One thing to try is to not add fully disconnected DOM trees to CC at all.
They can blast themselves into pieces during forgetSkippable.
Looks like I am reinventing NS_FUNCTION_TIMER here, so I should rework this patch to use that...
Attachment #593173 - Attachment is obsolete: true
Depends on: 726374
Attachment #593563 - Attachment is obsolete: true
No longer blocks: 698919
Blocks: 698919
I ended up landing TimeLog over in bug 726374.  CC forcing GC telemetry has been in for a while.  I think that's the important stuff here, so I'll just close this.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.