[CC] Additional cycle collector timing telemetry

RESOLVED WORKSFORME

Status

()

RESOLVED WORKSFORME
7 years ago
6 years ago

People

(Reporter: mccr8, Assigned: mccr8)

Tracking

(Blocks: 1 bug)

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [Snappy:p2])

Attachments

(5 obsolete attachments)

(Assignee)

Description

7 years ago
I thought I'd filed a bug for this already, but it doesn't look like it.  There are more things I would like cycle collector telemetry for.

- How often does the CC force a GC?  This would be a boolean, and record at the start of each CC whether a GC was forced or not.  These are triggered by overflow while ungraying.

- Finer grained timing telemetry: how long does each phase of the CC take?  In local testing, building the graph dominates the CC time.  Occasionally freeing white nodes takes a measurable amount of time, too.  I could either time every phase (there are 4 or 5 total), or just those phases we expect to take a while.  I've also noticed in local testing that the total of the timing for the phases doesn't add up to the supposed total time of the CC, which is odd.

- Number of edge blocks.  We already compute this for about memory.  Number of node blocks isn't that interesting because we already know the total number of nodes.  This one is probably not that exciting.
(Assignee)

Updated

7 years ago
Duplicate of this bug: 699397
(Assignee)

Updated

7 years ago
Blocks: 698919
(Assignee)

Comment 2

7 years ago
Smaug, I was thinking of adding time for the graph building phase (MarkRoots) and the unlink phase (CollectWhite).  In my own profiling, these are the only CC phases that usually take any time at all.  Does that line up with what you wanted to measure?
(Assignee)

Comment 4

7 years ago
Created attachment 571828 [details] [diff] [review]
add telemetry for CC-force GCs

If UnmarkGray overflows, then mark bits are bogus, and the CC may force a GC immediately before it runs.  This will be awful for pause times, so we should see how often this happens in practice to decide whether we need to improve the situation.
Assignee: nobody → continuation
(Assignee)

Comment 5

7 years ago
Created attachment 571830 [details] [diff] [review]
clean up timers in cycle collector

In the CC, COLLECT_TIME_DEBUG uses PRTime, but the existing telemetry uses TimeStamp instead.  To simplify things a little, this patch mechanically replaces the remaining uses of PRTime with TimeStamp.  In addition, it removes a totally pointless NS_TIME_FUNCTION, puts a little more information into COLLECT_TIME_DEBUG logging, and removes two headers files that are no longer needed after the prior changes.
(Assignee)

Comment 6

7 years ago
Created attachment 571839 [details] [diff] [review]
part 3: add telemetry for CC graph building and freeing time

This adds the telemetry for timing how long building the CC graph and freeing things takes.  In my local experiments, these two phases constitute the bulk of CC time.  I reduced the bucket size here compared to the full CC timing.  If a CC is taking more than 2 seconds, it is fine to just let it count against 2 seconds, as that is already really awful.  I also reduced the number of buckets in half, because it had separate buckets for 3 and 4 ms, which seemed silly.
(Assignee)

Updated

7 years ago
Blocks: 697115
(Assignee)

Comment 7

7 years ago
I've read that TimeStamp only has 10ms resolution on Windows, so I'm not sure if it is the right thing to use for measuring the various phases of the CC.
(Assignee)

Comment 8

7 years ago
I spun off bug 706164 for the forced GC telemetry.
Summary: even more cycle collector telemetry → [CC] Additional cycle collector timing telemetry
Whiteboard: [Snappy]

Updated

7 years ago
Whiteboard: [Snappy] → [Snappy:p2]
(Assignee)

Updated

7 years ago
Attachment #571828 - Attachment is obsolete: true
(Assignee)

Updated

7 years ago
Attachment #571830 - Attachment is obsolete: true
(Assignee)

Updated

7 years ago
Attachment #571839 - Attachment is obsolete: true
(Assignee)

Comment 9

7 years ago
Created attachment 593173 [details] [diff] [review]
crude CC time logging

In the wake of the landing of Smaug's CC improvements, I thought I'd check out where we are spending out time.  There's currently COLLECT_TIME_DEBUG, but it isn't very thorough.  This new patch makes COLLECT_TIME_DEBUG cover everything that should take any time, and also splits CollectWhite into separate categories, so we can tell where the time is going there.  For now, this is implemented with hacky macros, but I should make it into a nice class, hook it into telemetry or something, etc., but this is interesting as is.
(Assignee)

Comment 10

7 years ago
This is what happens when you close that gigantic W3C HTML spec page:

cc: mRuntimes[*]->BeginCycleCollection() took 1ms
cc: SelectPurple() took 11ms
cc: MarkRoots() took 437ms
cc: ScanRoots() took 32ms
cc: 	CollectWhite::Root took 31ms
cc: 	CollectWhite::Unlink took 206ms
cc: 	CollectWhite::Unroot took 477ms
cc: CollectWhite() took 715ms
cc: ClearGraph() took 29ms
cc: total cycle collector time was 1241ms

This suggests that bug 652781 will help with pause times when we have a lot to free, but will not eliminate it entirely.

One thing that might help is to make the Traverse function for a DOM node could add every other node in the DOM that we think is alive.  That will avoid the virtual call overhead.  Which may or may matter much.  I'll do a shark run at some point.

Similarly, if we could somehow Unlink an entire DOM at once, using a DOM-side thing, maybe that would be faster.


Some other random large CCs I saw:

cc: PrepareForCollection() took 62ms
cc: mRuntimes[*]->BeginCycleCollection() took 1ms
cc: SelectPurple() took 59ms
cc: MarkRoots() took 7ms
cc: total cycle collector time was 159ms

cc: PrepareForCollection() took 29ms
cc: mRuntimes[*]->BeginCycleCollection() took 1ms
cc: MarkRoots() took 6ms
cc: ClearGraph() took 9ms
cc: total cycle collector time was 49ms

The only thing that PrepareForCollection does that should take any time is notify observers that the CC has begun.  I'm going to guess that most of the time in SelectPurple is calling CanSkipInCC.

It looks like ScanRoots (that actually colors the graph) is still not an issue.
(Assignee)

Comment 11

7 years ago
I should say that the above large CCs are uncommon.  Most of the time (in the usual 35ms-50ms), MarkRoots still dominates the time.

cc: PrepareForCollection() took 3ms
cc: mRuntimes[*]->BeginCycleCollection() took 4ms
cc: MarkRoots() took 44ms
cc: ScanRoots() took 2ms
cc: total cycle collector time was 56ms
Yeah, need to get the graph smaller.
One thing to try is to not add fully disconnected DOM trees to CC at all.
They can blast themselves into pieces during forgetSkippable.
(Assignee)

Comment 14

7 years ago
Created attachment 593563 [details] [diff] [review]
use a class instead of gross macros

Looks like I am reinventing NS_FUNCTION_TIMER here, so I should rework this patch to use that...
Attachment #593173 - Attachment is obsolete: true
(Assignee)

Updated

7 years ago
Depends on: 726374
(Assignee)

Updated

7 years ago
Attachment #593563 - Attachment is obsolete: true
No longer blocks: 698919
Blocks: 698919
(Assignee)

Comment 15

6 years ago
I ended up landing TimeLog over in bug 726374.  CC forcing GC telemetry has been in for a while.  I think that's the important stuff here, so I'll just close this.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.