Closed
Bug 697134
Opened 13 years ago
Closed 13 years ago
[CC] Additional cycle collector timing telemetry
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: mccr8, Assigned: mccr8)
References
Details
(Whiteboard: [Snappy:p2])
Attachments
(5 obsolete files)
I thought I'd filed a bug for this already, but it doesn't look like it. There are more things I would like cycle collector telemetry for.
- How often does the CC force a GC? This would be a boolean, and record at the start of each CC whether a GC was forced or not. These are triggered by overflow while ungraying.
- Finer grained timing telemetry: how long does each phase of the CC take? In local testing, building the graph dominates the CC time. Occasionally freeing white nodes takes a measurable amount of time, too. I could either time every phase (there are 4 or 5 total), or just those phases we expect to take a while. I've also noticed in local testing that the total of the timing for the phases doesn't add up to the supposed total time of the CC, which is odd.
- Number of edge blocks. We already compute this for about memory. Number of node blocks isn't that interesting because we already know the total number of nodes. This one is probably not that exciting.
Assignee | ||
Comment 2•13 years ago
|
||
Smaug, I was thinking of adding time for the graph building phase (MarkRoots) and the unlink phase (CollectWhite). In my own profiling, these are the only CC phases that usually take any time at all. Does that line up with what you wanted to measure?
Comment 3•13 years ago
|
||
yes.
Assignee | ||
Comment 4•13 years ago
|
||
If UnmarkGray overflows, then mark bits are bogus, and the CC may force a GC immediately before it runs. This will be awful for pause times, so we should see how often this happens in practice to decide whether we need to improve the situation.
Assignee: nobody → continuation
Assignee | ||
Comment 5•13 years ago
|
||
In the CC, COLLECT_TIME_DEBUG uses PRTime, but the existing telemetry uses TimeStamp instead. To simplify things a little, this patch mechanically replaces the remaining uses of PRTime with TimeStamp. In addition, it removes a totally pointless NS_TIME_FUNCTION, puts a little more information into COLLECT_TIME_DEBUG logging, and removes two headers files that are no longer needed after the prior changes.
Assignee | ||
Comment 6•13 years ago
|
||
This adds the telemetry for timing how long building the CC graph and freeing things takes. In my local experiments, these two phases constitute the bulk of CC time. I reduced the bucket size here compared to the full CC timing. If a CC is taking more than 2 seconds, it is fine to just let it count against 2 seconds, as that is already really awful. I also reduced the number of buckets in half, because it had separate buckets for 3 and 4 ms, which seemed silly.
Assignee | ||
Comment 7•13 years ago
|
||
I've read that TimeStamp only has 10ms resolution on Windows, so I'm not sure if it is the right thing to use for measuring the various phases of the CC.
Assignee | ||
Comment 8•13 years ago
|
||
I spun off bug 706164 for the forced GC telemetry.
Summary: even more cycle collector telemetry → [CC] Additional cycle collector timing telemetry
Updated•13 years ago
|
Whiteboard: [Snappy]
Updated•13 years ago
|
Whiteboard: [Snappy] → [Snappy:p2]
Assignee | ||
Updated•13 years ago
|
Attachment #571828 -
Attachment is obsolete: true
Assignee | ||
Updated•13 years ago
|
Attachment #571830 -
Attachment is obsolete: true
Assignee | ||
Updated•13 years ago
|
Attachment #571839 -
Attachment is obsolete: true
Assignee | ||
Comment 9•13 years ago
|
||
In the wake of the landing of Smaug's CC improvements, I thought I'd check out where we are spending out time. There's currently COLLECT_TIME_DEBUG, but it isn't very thorough. This new patch makes COLLECT_TIME_DEBUG cover everything that should take any time, and also splits CollectWhite into separate categories, so we can tell where the time is going there. For now, this is implemented with hacky macros, but I should make it into a nice class, hook it into telemetry or something, etc., but this is interesting as is.
Assignee | ||
Comment 10•13 years ago
|
||
This is what happens when you close that gigantic W3C HTML spec page:
cc: mRuntimes[*]->BeginCycleCollection() took 1ms
cc: SelectPurple() took 11ms
cc: MarkRoots() took 437ms
cc: ScanRoots() took 32ms
cc: CollectWhite::Root took 31ms
cc: CollectWhite::Unlink took 206ms
cc: CollectWhite::Unroot took 477ms
cc: CollectWhite() took 715ms
cc: ClearGraph() took 29ms
cc: total cycle collector time was 1241ms
This suggests that bug 652781 will help with pause times when we have a lot to free, but will not eliminate it entirely.
One thing that might help is to make the Traverse function for a DOM node could add every other node in the DOM that we think is alive. That will avoid the virtual call overhead. Which may or may matter much. I'll do a shark run at some point.
Similarly, if we could somehow Unlink an entire DOM at once, using a DOM-side thing, maybe that would be faster.
Some other random large CCs I saw:
cc: PrepareForCollection() took 62ms
cc: mRuntimes[*]->BeginCycleCollection() took 1ms
cc: SelectPurple() took 59ms
cc: MarkRoots() took 7ms
cc: total cycle collector time was 159ms
cc: PrepareForCollection() took 29ms
cc: mRuntimes[*]->BeginCycleCollection() took 1ms
cc: MarkRoots() took 6ms
cc: ClearGraph() took 9ms
cc: total cycle collector time was 49ms
The only thing that PrepareForCollection does that should take any time is notify observers that the CC has begun. I'm going to guess that most of the time in SelectPurple is calling CanSkipInCC.
It looks like ScanRoots (that actually colors the graph) is still not an issue.
Assignee | ||
Comment 11•13 years ago
|
||
I should say that the above large CCs are uncommon. Most of the time (in the usual 35ms-50ms), MarkRoots still dominates the time.
cc: PrepareForCollection() took 3ms
cc: mRuntimes[*]->BeginCycleCollection() took 4ms
cc: MarkRoots() took 44ms
cc: ScanRoots() took 2ms
cc: total cycle collector time was 56ms
Comment 12•13 years ago
|
||
Yeah, need to get the graph smaller.
Comment 13•13 years ago
|
||
One thing to try is to not add fully disconnected DOM trees to CC at all.
They can blast themselves into pieces during forgetSkippable.
Assignee | ||
Comment 14•13 years ago
|
||
Looks like I am reinventing NS_FUNCTION_TIMER here, so I should rework this patch to use that...
Attachment #593173 -
Attachment is obsolete: true
Assignee | ||
Updated•13 years ago
|
Attachment #593563 -
Attachment is obsolete: true
Assignee | ||
Comment 15•13 years ago
|
||
I ended up landing TimeLog over in bug 726374. CC forcing GC telemetry has been in for a while. I think that's the important stuff here, so I'll just close this.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•