Open Bug 1702032 Opened 5 years ago Updated 1 year ago

Add GC telemetry restricted to times when an input event is pending

Categories

(Core :: JavaScript: GC, enhancement, P3)

enhancement

Tracking

()

ASSIGNED

People

(Reporter: sfink, Assigned: sfink, NeedInfo)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

These are the GCs that are most noticeable to the user.

At the very least, it would be good to have a GCReason recorded. Slice times could be good too.

Severity: -- → N/A
Priority: -- → P3

My latest proposal is to add a couple of telemetry probes that will be issued from Gecko (since it can access the state of the system.) On would be something like GECKO_GC_REASON, which would be exactly the same as GC_REASON_2 except it would be keyed by an enum something like { "input event pending", "animating", "other" }. Summing all of those together would produce the GC_REASON_2 histogram, but now we could answer questions like "what are the reasons for GCing when an input event is waiting to be processed, which means there's a good chance the user will be waiting for the effect?"

I'm hesitant to remove GC_REASON_2, for one because the GLAM UI currently doesn't give you the option to sum the data, but also because it's emitted internally from the GC engine and so feels more reliable than depending on the embedding. For now, it would also be useful as a cross-check -- I could easily imagine that GECKO_GC_REASON might miss Worker GCs or something.

There's still an argument for restricting this to slices that overflow their budgets or something. I guess I could fork the enum and have { "overflowed, input event pending", "input event pending", "overflowed, animating", ...} but I don't know if it's worth it.

I'd then want similar GECKO_GC_ probes for a few other metrics (slice time, bytes collected, ...?).

After talking to :chutten on #data-help, I've fallen back to the approach of adding simple probe types so that they can be mirrored to the GLEAN stuff that doesn't support keyed enumerations (yet?).

The naming is very questionable, but right now I'm thinking GC_REASON_INPUT_PENDING and GC_REASON_ANIMATING for the GCReason ones. Though typing them out, they look very sketchy -- it totally sounds like "GCReason why input is pending" and "GCReason why we're animating". So perhaps I'll make it more verbose and do GC_REASON_WHILE_INPUT_PENDING and GC_REASON_WHILE_ANIMATING.

The second majorly questionable thing is that I'm relying on js::NotifyAnimationActivity to tell whether we're animating right now. Which is weird, because the information is going from Gecko -> JS -> back to Gecko for this probe. (Also, it's arguable what definition of "currently animating" is best; js::gc::IsCurrentlyAnimating is checking whether we've animated within the last second; I'm checking against a duration that at the moment I'm setting to 50ms.)

I kind of wanted to land this with "are we animating?" as well as "is an input event pending?" cases. But I'm not sure what a good way to ask "are we animating?"

My first thought was to look at the GC's current GC_ANIMATION_MS probe, which has a histogram of time spent GCing during animations. It has hooks in several places to set a lastAnimationTime timestamp: when doing an rAF, when invalidating a canvas, when seeking to the next frame of a video, and an XRSession refresh. Those sound great. They are used by saying that if the last animation time is within the last second, then we're animating.

I was kind of hoping to get more precise than that, especially with the GC_REASON_WHILE_ANIMATING probe because it looks at GC slices. After we animate, any GC within the next second will get scooped up in the "animating" bucket, which adds some amount of noise to the signal. I naively thought "hey, I'll just drop it to 50ms!" but then realized: if GC janks the main thread for 100ms in the middle of an animation, this will be considered "not animating!" because the last animation is more than 50ms ago.

I guess I could do the animation test according to the beginning of the GC or GC slice (GC for max pause, GC slice for GCReason). Though if we start animating halfway through a 20-slice major GC, I'd kind of like to treat that as animating.

Options:

  • just use 1 second
  • determine animation at the start of the GC or GC slice
  • test whether the most recent animation is from anytime since 50ms before the start of the GC/slice
  • needinfo jonco and Bas to see if they have a better idea of how to define "are we animating?"

(as well as give jonco a pre-review opportunity to tell me that everything should share a single notion of "are we animating?", though that might be hard to maintain for both slice-based and GC-based probes.)

Flags: needinfo?(jcoppeard)
Flags: needinfo?(bas)
Assignee: nobody → sphink
Status: NEW → ASSIGNED

(In reply to Steve Fink [:sfink] [:s:] from comment #3)

I kind of wanted to land this with "are we animating?" as well as "is an input event pending?" cases. But I'm not sure what a good way to ask "are we animating?"

My first thought was to look at the GC's current GC_ANIMATION_MS probe, which has a histogram of time spent GCing during animations. It has hooks in several places to set a lastAnimationTime timestamp: when doing an rAF, when invalidating a canvas, when seeking to the next frame of a video, and an XRSession refresh. Those sound great. They are used by saying that if the last animation time is within the last second, then we're animating.

I was kind of hoping to get more precise than that, especially with the GC_REASON_WHILE_ANIMATING probe because it looks at GC slices. After we animate, any GC within the next second will get scooped up in the "animating" bucket, which adds some amount of noise to the signal. I naively thought "hey, I'll just drop it to 50ms!" but then realized: if GC janks the main thread for 100ms in the middle of an animation, this will be considered "not animating!" because the last animation is more than 50ms ago.

I guess I could do the animation test according to the beginning of the GC or GC slice (GC for max pause, GC slice for GCReason). Though if we start animating halfway through a 20-slice major GC, I'd kind of like to treat that as animating.

Options:

  • just use 1 second
  • determine animation at the start of the GC or GC slice
  • test whether the most recent animation is from anytime since 50ms before the start of the GC/slice
  • needinfo jonco and Bas to see if they have a better idea of how to define "are we animating?"

(as well as give jonco a pre-review opportunity to tell me that everything should share a single notion of "are we animating?", though that might be hard to maintain for both slice-based and GC-based probes.)

Intuitively I would say the refresh driver is the most authoritative source of information we have on whether we are animating. Markus and Olli will have better answers than I do though.

Flags: needinfo?(mstange.moz)
Flags: needinfo?(bugs)
Flags: needinfo?(bas)

Bug 1697585 will make this a bit harder to implement, but even right now it is fine to run a short slice after a tick, since input events should get
handled later during the frame (closer to the next tick). So having pending input tasks may not mean too much.
Or perhaps once Bug 1697585 lands, checking whether ShouldUseHighestPriority (need to be added to the InputTaskManager) returns true could be a way to check if GC is overlapping input handling.

For the "animating" case, I wonder if we could just check if there is pending vsync when GC slice ends. It doesn't necessarily mean anything is animating, but in practice should mean there is something high priority thing pending. (but probe shouldn't be called animating if we use just refreshdriver or vsync)

(This all of course overlaps *_DURING_IDLE probes quite a bit, but I guess having new probes for the cases when there isn't any idle time is good.)

Flags: needinfo?(bugs)
Flags: needinfo?(mstange.moz)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: