Closed Bug 821371 Opened 12 years ago Closed 12 years ago

Add telemetry for prep work done for cycle collection

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla20

People

(Reporter: mccr8, Assigned: mccr8)

References

Details

(Whiteboard: [Snappy])

Attachments

(2 files, 1 obsolete file)

Olli thinks this is causing long CC pauses with small graphs when we CC right after a GC, which is certainly believable. He's fixing it in another bug by trying harder to not run a CC right after a GC, but we should display in the error console when this happens, and add telemetry for it.
I also noticed that the CYCLE_COLLECTOR time does not include any synchronous work we do in nsJSEnvironment, such as finishing a GC or synchronous forget skippables. We should probably start recording that, too.
We should also include the time spent finishing off the current GC.

If either of these things are still a problem after Olli's patch, we can try complicating things more to split out work into a separate slice.
Whiteboard: [Snappy]
I have a patch to record various related data. There are two things other than the actual CC that can occur during a CC pause:
1) Finishing the current incremental GC, if the CC has been locked out for too long.
2) Running ForgetSkippable

I'm wondering what would make sense to record for telemetry. I could record a simple boolean for (1) and (2), if we had to do that kind of work at all, or the time spent in them. I'm planning on adding telemetry for the full CC pause time, so that plus the boolean may hint at what is causing problems. I could also just record the time spent in (1) or (2). For this, I'm inclined to only record the time for FS, as finishing the current IGC isn't a huge problem at the moment. Maybe I could record a boolean for (1) and the time for (2)? Or maybe adding 50 vs 2 buckets isn't a big deal and I should just always record the time?

Any thoughts, Olli?

(Side note: I have a patch to change how (1) is done so that if we are locked out and we CCNow, then we'll always finish the IGC inside of CCNow, so it will be recorded.)
Summary: Record number of synchronous forgetSkippables done before a CC → Add telemetry for prep work done for cycle collection
This is a little ugly, but I think necessary to detect possible regressions that make IGC lock out CC too much.
This adds three new measurements:
- time spent finishing off IGCs when the CC was locked out for too long
- time spent doing forget skippable synchornously with the CC
- the entire time spent doing work in CycleCollectNow

For these three measurements, we make sure they are in telemetry, the error console, and the JSON report.
(In reply to Andrew McCreight [:mccr8] from comment #3)
> I'm wondering what would make sense to record for telemetry. I could record
> a simple boolean for (1) and (2), if we had to do that kind of work at all,
Boolean sounds good

> or the time spent in them. I'm planning on adding telemetry for the full CC
> pause time, so that plus the boolean may hint at what is causing problems.
Sound good.

> I
> could also just record the time spent in (1) or (2). For this, 
Hmm, perhaps not too useful.
So, you think I should do just the boolean for the two prep phases, plus the full pause time? That sounds okay to me.
yeah, I think that is enough.
My attempt to pretty up the formatting strings in the source by splitting them into multiple lines appears to not work on MSVC, so I will have to fix that.
Attachment #695816 - Attachment is obsolete: true
Attachment #695815 - Flags: review?(bugs)
Comment on attachment 695815 [details] [diff] [review]
part 1: don't finish current GC when CCTimerFired calls into CycleCollectNow

This is indeed a bit ugly. I thought first FinishAnyIncrementalGC
was there for some optimization, but it is there to not change the current
behavior. Other option would be to use some global variable which is
set in CCTimerFired and also in CycleCollectNow if not already set, and then
used in CycleCollectNow after CC. Not sure that would be any less-ugly.


But, to make this less error prone, could you call FinishAnyIncrementalGC()
in FireForgetSkippable() (and in CycleCollectNow).
That would change the behavior if there are just a few new suspected objects, 
but I believe the change would be ok (gives more time for iGC).
Attachment #695815 - Flags: review?(bugs) → review+
Attachment #696365 - Flags: review?(bugs) → review+
Followup to remove the !sCCLockedOut assertions, because with FinishAnyIncrementalGC moved to FireForgetSkippable() we don't always finish the current IGC by the time we finish the timer, in the case where we decide we don't need to run the CC anyways.  Oops, my mistake.

https://hg.mozilla.org/integration/mozilla-inbound/rev/1e90b2848512
There are now crashes on windows.
Should %llums be %lums now? ccNowDuration is 32bit
Depends on: 826003
Depends on: 826153
Yikes, thanks for fixing that.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: