Closed Bug 1338347 Opened 7 years ago Closed 7 years ago

Telemetry measures for Compositor Frame Throughput (Scrolling/Animation)

Categories

(Core :: Graphics, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla55
Performance Impact high
Tracking Status
firefox55 --- fixed

People

(Reporter: Harald, Assigned: kats)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [gfx-noted])

Attachments

(6 files)

This issue tracks work to implement a simplified version of Frame Throughput (bug 1303313). To reduce scope this measure only covers compositor based animation/scrolling. Combined with checkerboarding telemetry (bug 1238040) this gives a more complete picture of perceived scrolling performance.

For reference, the proposal is somewhat based on the Frame Throughput doc [1] but also a paper on quality metrics for frame freezing in videos [2].

# Compositor Animation/Scrolling Only

In order to measure animation frame rate we need to know when the compositor is animating. Including all frames would include long frames caused by page load, layout, gc, making it impossible to tell how smoothly we animate.

# Aggregation

Separate quality metrics would be aggregated for the duration of each scroll/animation occurrence:

Frame Ratio:
  Actual Number of Frames / Expected Number of Frames
  0-1 where 1 equals no dropped frames (0.5 half of frames dropped, etc)

Long Frame:
  99th percentile frame length (in ms)

Each metric would be collected in its own histogram.

[1]: https://docs.google.com/document/d/1Bot91txCWZUstt32_BBLo-_HUsvOVn_L5yaI5xre2M0/edit#
[2]: https://pdfs.semanticscholar.org/80bc/e8ebe174f82f7fc298dcc9c66c54a53b6153.pdf
:kats, I am trying to slice up Frame Throughput into more actionable work. I am not sure if all my assumptions on the rendering pipeline were correct, so could you please review this proposal?
Flags: needinfo?(bugmail)
Thanks, see comments below.

(In reply to :Harald Kirschner :digitarald from comment #0)
> This issue tracks work to implement a simplified version of Frame Throughput
> (bug 1303313). To reduce scope this measure only covers compositor based
> animation/scrolling.

Just to be clear, my understanding of "compositor based animation/scrolling" is "frames that are composited where there is a visual difference from the previous frame and the compositor did not interact with any other threads". In particular, it means that we are purely measuring composite time (not including any time that the compositor spends processing incoming transactions from the main thread). It also means that if a composite happens to take more than 16ms (or whatever the vsync interval is) then the throughput will drop, because the time between one frame and the next visually-different frame will increase to some multiple of vsync intervals.

The problem with this metric is that in production we can't turn off the "processing of incoming transactions" in the compositor. So we can either redefine the metric to include that time (which is more representative of real-world behaviour) or do the measurements in some test harness.

> Combined with checkerboarding telemetry (bug 1238040)
> this gives a more complete picture of perceived scrolling performance.

Agreed.

> # Compositor Animation/Scrolling Only
> 
> In order to measure animation frame rate we need to know when the compositor
> is animating. Including all frames would include long frames caused by page
> load, layout, gc, making it impossible to tell how smoothly we animate.

So in theory "page load, layout, gc" should all be happening on a different thread than the composition, so they shouldn't affect composite time as I defined it above. If we include the "processing of incoming transactions", then the compositor measurement could be perturbed by how much data the main thread sends over, but it should still be unaffected by things like layout time and gc time. Does that make sense?

> # Aggregation
> 
> Separate quality metrics would be aggregated for the duration of each
> scroll/animation occurrence:
> 
> Frame Ratio:
>   Actual Number of Frames / Expected Number of Frames
>   0-1 where 1 equals no dropped frames (0.5 half of frames dropped, etc)

This seems ok. AIUI the "expected number of frames" will always be (frame rate of display, usually 60) * (wall-clock duration of measurement period, in seconds). The "actual number of frames" is the thing that will be recorded by instrumentation code.

> Long Frame:
>   99th percentile frame length (in ms)

This also seems ok, pending the decision of whether or not we are including the incoming transaction processing time or not.

> Each metric would be collected in its own histogram.

Yup.

The thing that seems missing here (well, it's implied but not explicitly stated) is what exactly is counted as a "compositor scrolling event" or a "compositor animation event". That is, how do we pick the wall-clock duration of the measurement period that I referred to above? It seems to me that a scrolling event would start from the input that triggers the scrolling (could be a touch event, a wheel event, a layer tree update from main-thread that includes a JS-driven smooth scroll request, etc.) to the point where all scrolling triggers have ceased to arrive and all pending scroll animations in AsyncPanZoomController instances have run to completion. I'm less familiar with the OMTA code but we could probably come up with a suitably precise interval for a "compositor animation event".

Thoughts?
Flags: needinfo?(bugmail)
Priority: -- → P3
Whiteboard: [gfx-noted]
> Just to be clear, my understanding of "compositor based animation/scrolling" is "frames that are composited where there is a visual difference from the previous frame and the compositor did not interact with any other threads".

My hope was that just focusing on animations/scrolling that are handled on the compositor would simplify things. The user-centric definition would be "Measure dropped frames when 60fps matters (and when the throughput is mostly defined by the compositor)".

> So we can either redefine the metric to include that time (which is more representative of real-world behaviour) or do the measurements in some test harness.

We should include whatever time needed to make this measure the reality of dropped frames. Best case would be if we could know when a vsync was missed. Are there other components that have more information about FPS and state of scrolling/animation?

> This seems ok. AIUI the "expected number of frames" will always be (frame rate of display, usually 60) * (wall-clock duration of measurement period, in seconds). The "actual number of frames" is the thing that will be recorded by instrumentation code.

Yes, this seems the easiest part.

> I'm less familiar with the OMTA code but we could probably come up with a suitably precise interval for a "compositor animation event".

Might it make sense to keep this issue focused on measuring scrolling only to create a separate issue for animation?
(In reply to :Harald Kirschner :digitarald from comment #3)
> My hope was that just focusing on animations/scrolling that are handled on
> the compositor would simplify things. The user-centric definition would be
> "Measure dropped frames when 60fps matters (and when the throughput is
> mostly defined by the compositor)".

Ok. With the metrics you outlined ("expected number of frames" and "actual number of frames") the frame drops is simply the difference between the two.

> We should include whatever time needed to make this measure the reality of
> dropped frames. Best case would be if we could know when a vsync was missed.

So then yeah, we should include the transaction processing time. I think the missed vsync (aka frame drop) falls out of the expected/actual measurements as mentioned above, so we don't need to explicitly write code to detect if a vsync is missed.

> Are there other components that have more information about FPS and state of
> scrolling/animation?

Not really, the compositor is the best place to do this.

> Might it make sense to keep this issue focused on measuring scrolling only
> to create a separate issue for animation?

I think it's fine to keep them together, at least for now. Most of the code will be shared.
Kats may hand this off to somebody else, but keeping it assigned as somebody is working on it.
Assignee: nobody → bugmail
What are the next steps here? Seems like we have a good start with animation duration.
Flags: needinfo?(milan)
Milan's going on PTO for a bit. But the next steps:
1) I will verify how the animation duration probe behaves with respect to infinite CSS animations. Milan brought that up when we were talking - I hadn't considered that case and I'm not sure what the probe will report
2) We need to figure out if we want to bucket the frame throughput ratio by animation duration, and if so, what the buckets should be (by this I mean have a series of probes, e.g. FRAME_RATIO_FOR_ANIMATIONS_UNDER_100MS, FRAME_RATIO_FOR_ANIMATIONS_BETWEEN_100MS_AND_300MS, and so on)
3) Add the probes
Flags: needinfo?(milan)
:kats, we talked about this before but I wanted to make sure that this probe can report the FT metrics separately for content-scroll/content-animation/main-animation?

> 1) I will verify how the animation duration probe behaves with respect to infinite CSS animations. Milan brought that up when we were talking - I hadn't considered that case and I'm not sure what the probe will report

We could cut animations off at a threshold (10sec?). It would be also OK to just not track those cases if we have a better understanding on how frequent they are.

> 2) We need to figure out if we want to bucket the frame throughput ratio by animation duration, and if so, what the buckets should be (by this I mean have a series of probes, e.g. FRAME_RATIO_FOR_ANIMATIONS_UNDER_100MS, FRAME_RATIO_FOR_ANIMATIONS_BETWEEN_100MS_AND_300MS, and so on)

Duration is probably less interesting for now. Maybe later we look at diagnostic metrics that we track when an animation goes off a cliff and track duration with it.
(In reply to :Harald Kirschner :digitarald from comment #8)
> :kats, we talked about this before but I wanted to make sure that this probe
> can report the FT metrics separately for
> content-scroll/content-animation/main-animation?

It should be able to, yes.

> > 1) I will verify how the animation duration probe behaves with respect to infinite CSS animations. Milan brought that up when we were talking - I hadn't considered that case and I'm not sure what the probe will report
> 
> We could cut animations off at a threshold (10sec?). It would be also OK to
> just not track those cases if we have a better understanding on how frequent
> they are.

From the telemetry data they don't seem too frequent. I checked what the code does and it only submits the probe data once the animation is stopped in the compositor (so for example switching tabs would do it). I agree that for now we probably don't need to worry about these.

> > 2) We need to figure out if we want to bucket the frame throughput ratio by animation duration, and if so, what the buckets should be (by this I mean have a series of probes, e.g. FRAME_RATIO_FOR_ANIMATIONS_UNDER_100MS, FRAME_RATIO_FOR_ANIMATIONS_BETWEEN_100MS_AND_300MS, and so on)
> 
> Duration is probably less interesting for now. Maybe later we look at
> diagnostic metrics that we track when an animation goes off a cliff and
> track duration with it.

That sounds reasonable. I'll get started on these probes.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #9)
> (In reply to :Harald Kirschner :digitarald from comment #8)
> > :kats, we talked about this before but I wanted to make sure that this probe
> > can report the FT metrics separately for
> > content-scroll/content-animation/main-animation?
> 
> It should be able to, yes.

Sounds like this bug expands to the scope of bug 1303313, or should we split up the work?
Whiteboard: [gfx-noted] → [gfx-noted][qf:p1]
To be honest I don't understand the full scope of bug 1303313 clearly. The scope of this bug is more clear to me, so I'll write the patches for this bug and then we can take a look at bug 1303313 again and see if there's anything missing.

Current WIPs are at https://github.com/staktrace/gecko-dev/commits/telemetry - I still need to add the "long frame" thing and the throughput probe for scrolling.
(In reply to :Harald Kirschner :digitarald from comment #0)
> Long Frame:
>   99th percentile frame length (in ms)

Actually now that I'm implementing this, it's unclear to me exactly what this means. Do you want me to record:
- the frame length of all frames in a compositor animation, and then compute the 99th percentile? we throw away data after each compositor animation so this would be on a per-animation basis.
- submit all the frame lengths of frames in compositor animations and just use that data to figure out the 99th percentile? this has potential performance penalties because we'd be calling into the telemetry code on every composite which is probably not the best idea

Assuming the intent is to get a metric for the "worst case" behaviour, would you be ok if I just reported the longest frame in each compositor animation to telemetry? Then we can look at the data and see what the 99th percentile (worst of the worst) is.
Flags: needinfo?(hkirschner)
Also: should I skip sending anything if all the frames are within budget? What about when the vsync interval is something other than 16ms? Reports of 18ms frames when the vsync interval is 20ms should not be considered bad, even though it would be bad if the vsync interval was 16ms. So we should take that into account somehow.
> 99th percentile frame length (in ms)
> What about when the vsync interval is something other than 16ms?

ms doesn't seem to make sense here as the other measure is ratio of dropped frames which seems much better for adapting to different vsync values.

Assume we have a histogram of dropped frame counts; we could either report the longest bucket or try to do some more math and report the 95th percentile. Risk is that the longest one could be a outlier; 95th might summarize the skew of the histogram better. Could we try to report both to understand how they relate to each other?

> submit all the frame lengths of frames in compositor animations and just use that data to figure out the 99th percentile?

In the summary I tried to make it vsync independent and just look at the ratio at dropped frames "Actual Number of Frames / Expected Number of Frames" . If that doesn't work or adds overhead, would it make sense median for the histogram of frame lengths, divided by vsync to make it vsync-independent?

> should I skip sending anything if all the frames are within budget?

We need to record both the good and the bad to get the whole picture.
Flags: needinfo?(hkirschner)
(In reply to :Harald Kirschner :digitarald from comment #14)
> Assume we have a histogram of dropped frame counts;

Just to be clear - do you mean dropped frames here, or long frames? I'm assuming you mean long frames (i.e. frames that took longer than one vsync to composite) because dropped frames by definition don't have a length as they were dropped. If you do mean dropped frames, please clarify what's on the x-axis of this histogram (it sounds like the y-axis is a count of frames).

> we could either report
> the longest bucket or try to do some more math and report the 95th
> percentile. Risk is that the longest one could be a outlier; 95th might
> summarize the skew of the histogram better. Could we try to report both to
> understand how they relate to each other?

Assuming you meant "long frames" above, is each data point in the histogram an animation? i.e. for each animation, we look at the time to composite of each frame, and report the 95th percentile and/or largest bucket. In general though, I don't want to have to do computations that require storing all of the frame times for a particular animation in memory, because that's not going to scale well to long animations. Calculating the 95th percentile is one of those things (as is calculating the median). If we have a fixed set of buckets then the "longest bucket" should be ok.

> In the summary I tried to make it vsync independent and just look at the
> ratio at dropped frames "Actual Number of Frames / Expected Number of
> Frames" . If that doesn't work or adds overhead, would it make sense median
> for the histogram of frame lengths, divided by vsync to make it
> vsync-independent?

As mentioned above, calculating the median of frame lengths in an animation at composite time is going to be expensive, so I would prefer to avoid that. Something like the mean or the max of the frame times will be cheap and easy to do, not sure if that would be good enough. We can divide those by vsync to get a sense of scale.
> Just to be clear - do you mean dropped frames here, or long frames?

Long frames measured in "number of contiguously dropped frames" or "number of frames dropped between paints".

The ideal would be 0, where no frames got dropped and full FPS was achieved. When during an animation one frame dropped the histogram would have one value in the 1 bucket and all other values in the 0 bucket.

> Assuming you meant "long frames" above, is each data point in the histogram an animation? i.e. for each animation, we look at the time to composite of each frame, and report the 95th percentile and/or largest bucket.

A temporary histogram per animation collects frame information. The summary of that histogram (so far median and p95) then feeds into telemetry histograms.

> In general though, I don't want to have to do computations that require storing all of the frame times for a particular animation in memory, because that's not going to scale well to long animations. Calculating the 95th percentile is one of those things (as is calculating the median). If we have a fixed set of buckets then the "longest bucket" should be ok.

Reporting the mean and longest frame devided by vsync is an acceptable alternative if all else fails but can be easily skewed by a single long frame.

Not storing all frame times in memory seems unacceptable, I agree. I don't have a good sense of what kind of distribution we have in frame times, but the collection of frame information in a histogram could work. I'd shoot for an exponential histogram to give us high resolution. How does [1] look, should we have less buckets for memory reasons?

[1]: https://telemetry.mozilla.org/histogram-simulator/#low=0&high=300&n_buckets=100&kind=exponential&generate=log-normal with frame drops up to 5sec (300) as upper bound
Flags: needinfo?(bugmail)
(In reply to :Harald Kirschner :digitarald from comment #16)
> Long frames measured in "number of contiguously dropped frames" or "number
> of frames dropped between paints".
> 
> The ideal would be 0, where no frames got dropped and full FPS was achieved.
> When during an animation one frame dropped the histogram would have one
> value in the 1 bucket and all other values in the 0 bucket.

Gotcha. So in the case where no frames are dropped, the count in the "0" bucket would be the same as the number of frames composited. (Going by your previous statement that we want to record both the good and the bad).

> > Assuming you meant "long frames" above, is each data point in the histogram an animation? i.e. for each animation, we look at the time to composite of each frame, and report the 95th percentile and/or largest bucket.
> 
> A temporary histogram per animation collects frame information. The summary
> of that histogram (so far median and p95) then feeds into telemetry
> histograms.

Understood.

> > In general though, I don't want to have to do computations that require storing all of the frame times for a particular animation in memory, because that's not going to scale well to long animations. Calculating the 95th percentile is one of those things (as is calculating the median). If we have a fixed set of buckets then the "longest bucket" should be ok.
> 
> Reporting the mean and longest frame devided by vsync is an acceptable
> alternative if all else fails but can be easily skewed by a single long
> frame.

So now the vsync thing is irrelevant. The histogram as you defined it above contains counts, not millisecond times, so it's independent of vsync. Reporting "mean" would report the mean of the *number* of frames contiguously dropped in any given animation. Same for max. I agree that max can be easily skewed by a single long frame. Mean is less susceptible that, because we'll still have a high count in the "0" bucket of the histogram. But I think mean turns out to be the same information as the frame throughput information (because the mean would be computed as the sum of all the buckets, which is the total number of frames dropped during the animation, divided by the number of composited frames). So that's useless to compute, because it doesn't give us anything new.

Median and p95 are still expensive to compute, even if we're dealing with numbers of contiguously dropped frames.

I think I'm going to put up the patches I have so far and get them reviewed, and we can continue this discussion here and maybe spin off another bug to add this probe if we figure out something that makes sense.
Flags: needinfo?(bugmail)
Comment on attachment 8848247 [details]
Bug 1338347 - Have the animation-sampling code track which process the animations are in.

https://reviewboard.mozilla.org/r/121188/#review123502

On non-e10s setups, this will flag content animations as eChrome. Is that ok?

::: gfx/layers/composite/AnimationMetricsTracker.h:15
(Diff revision 1)
> +#include "mozilla/TypedEnumBits.h"
>  
>  namespace mozilla {
>  namespace layers {
>  
> +enum class AnimationProcess {

s/Process/ProcessType (or ProcessKind)? "Process" sounds like variables of this type represent processes.

Also, since we're using it as a bit-field, should we call it AnimationProcessTypes (plural)? After all, a single variable of this type can represent zero or more process types (the ones for which the bits are "set").

::: gfx/layers/composite/AsyncCompositionManager.cpp:667
(Diff revision 1)
> +  // chrome process, and anything "below" is assumed to be the content process.
> +  AnimationProcess animProcess = AnimationProcess::eNone;
>  
>    ForEachNode<ForwardIterator>(
>        aLayer,
> -      [aStorage, &activeAnimations, &aPoint, &aLayerAreaAnimated] (Layer* layer)
> +      [aStorage, &animProcess, &aPoint, &aLayerAreaAnimated, &ancestorRefLayer] (Layer* layer)

I'll leave this up to you as a style choice, but elsewhere we started just capturing [&] once the list of variables started to get large.
Comment on attachment 8848249 [details]
Bug 1338347 - Record frame throughput ratios for compositor animations

https://reviewboard.mozilla.org/r/121192/#review123508

::: gfx/layers/composite/AnimationMetricsTracker.h:35
(Diff revision 1)
>    ~AnimationMetricsTracker();
>  
>    /**
>     * This function should be called per composite, to inform the metrics
> -   * tracker if any animation is in progress, and if so, what area is
> -   * being animated. The aLayerArea is in Layer pixels squared.
> +   * tracker which processes have active animations. If there is an animation
> +   * in progress, the area of the animation should also be provided, along with

"If there are animations in progress, the sum of their areas should also be provided"

::: gfx/layers/composite/AnimationMetricsTracker.h:46
(Diff revision 1)
>  private:
>    void AnimationStarted();
>    void AnimationEnded();
> -
> +  void UpdateAnimationThroughput(const char* aLabel,
> +                                 bool aInProgress,
> +                                 TimeStamp& aStartTime,

Not sure whether your preference to make "out" parameters pointers applies to "in/out" parameters like this one. Guess not?
Attachment #8848249 - Flags: review?(botond) → review+
Comment on attachment 8848250 [details]
Bug 1338347 - Record frame throughput ratio for APZ animations

https://reviewboard.mozilla.org/r/121194/#review123516
Attachment #8848250 - Flags: review?(botond) → review+
Comment on attachment 8848248 [details]
Bug 1338347 - Add telemetry histograms for frame throughput and frame drops during compositor animations.

https://reviewboard.mozilla.org/r/121190/#review123518

::: toolkit/components/telemetry/Histograms.json:296
(Diff revision 1)
> +    "high": 1500,
> +    "n_buckets": 50,
> +    "alert_emails": ["kgupta@mozilla.com"],
> +    "bug_numbers": [1338347]
> +  },
> +  "COMPOSITOR_ANIMATION_THROUGHPUT_SCROLLING" : {

This may be excessively pedantic, but APZ animations don't _all_ have to do with scrolling. We have the double-tap-to-zoom animation on Fennec as well.
Comment on attachment 8848247 [details]
Bug 1338347 - Have the animation-sampling code track which process the animations are in.

https://reviewboard.mozilla.org/r/121188/#review123502

Yeah, I think that should be ok. I think we don't care very much about non-e10s setups right now, and we can trivially filter e10s vs non-e10s setups in telemetry.m.o.

> s/Process/ProcessType (or ProcessKind)? "Process" sounds like variables of this type represent processes.
> 
> Also, since we're using it as a bit-field, should we call it AnimationProcessTypes (plural)? After all, a single variable of this type can represent zero or more process types (the ones for which the bits are "set").

Good point, I changed it to AnimationProcessTypes as that's more meaningful.

> I'll leave this up to you as a style choice, but elsewhere we started just capturing [&] once the list of variables started to get large.

Done
Comment on attachment 8848249 [details]
Bug 1338347 - Record frame throughput ratios for compositor animations

https://reviewboard.mozilla.org/r/121192/#review123508

> "If there are animations in progress, the sum of their areas should also be provided"

Done

> Not sure whether your preference to make "out" parameters pointers applies to "in/out" parameters like this one. Guess not?

I don't really have a preference in this case, I can change it if you prefer. The function is private to the class and the call sites are nearby so I guess I felt it would be unnecessarily verbose to add InOut decorations here.
Comment on attachment 8848248 [details]
Bug 1338347 - Add telemetry histograms for frame throughput and frame drops during compositor animations.

https://reviewboard.mozilla.org/r/121190/#review123518

> This may be excessively pedantic, but APZ animations don't _all_ have to do with scrolling. We have the double-tap-to-zoom animation on Fennec as well.

Yeah that's fair. And I used "APZ" everywhere else in this codepath, so I'll update this as well.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #17)
> I think I'm going to put up the patches I have so far and get them reviewed,
> and we can continue this discussion here and maybe spin off another bug to
> add this probe if we figure out something that makes sense.

I talked with Harald on IRC and we agreed to go with the "max contiguous frames dropped per animation" number for now. I think this is a useful number, and we can account for outliers after the values have been aggregated into telemetry. It's also easy to compute.
Comment on attachment 8848249 [details]
Bug 1338347 - Record frame throughput ratios for compositor animations

https://reviewboard.mozilla.org/r/121192/#review123550

::: gfx/layers/composite/AnimationMetricsTracker.cpp:120
(Diff revision 1)
> +    // We round the expectedFrameCount because it's a count and should be an
> +    // integer. The animationLength might not be an exact vsync multiple because
> +    // it's taken during the composition process and the amount of work done
> +    // between the vsync signal and the Timestamp::Now() call may vary slightly
> +    // from one composite to another.
> +    uint32_t expectedFrameCount = std::round(animationLength.ToMilliseconds() / vsyncIntervalMs);

Also I realized this should be std::lround instead of std::round. I'll update that as well.
^ Updated patches with review comments, as well as the std::lround change, and added a new probe for max contiguous frame drops. The probe is added to Histograms.json in part 2, and the new patch at the end implements it.
Comment on attachment 8848247 [details]
Bug 1338347 - Have the animation-sampling code track which process the animations are in.

https://reviewboard.mozilla.org/r/121188/#review123560
Attachment #8848247 - Flags: review?(botond) → review+
Comment on attachment 8848248 [details]
Bug 1338347 - Add telemetry histograms for frame throughput and frame drops during compositor animations.

Dropping flag for the moment. Harald said it would be better to do the max-frame-drop separated by chrome/compositor/apz so that'll add more probes.
Attachment #8848248 - Flags: review?(benjamin)
Comment on attachment 8848607 [details]
Bug 1338347 - Add code to measure the maximum number of contiguous frame drops in a compositor animation.

Ditto
Attachment #8848607 - Flags: review?(botond)
Comment on attachment 8849140 [details]
Bug 1338347 - Refactor to group fields into a per-animation data structure.

https://reviewboard.mozilla.org/r/121968/#review124062
Attachment #8849140 - Flags: review?(botond) → review+
Comment on attachment 8848607 [details]
Bug 1338347 - Add code to measure the maximum number of contiguous frame drops in a compositor animation.

https://reviewboard.mozilla.org/r/121502/#review124068

::: gfx/layers/composite/AnimationMetricsTracker.cpp:158
(Diff revisions 1 - 2)
> +
> +    // Get the longest frame time (make sure to check the final frame as well)
> +    TimeDuration longestFrame = std::max(aAnimation.mLongestFrame, now - aAnimation.mLastFrameTime);
> +    // As above, we round to get the frame count. Additionally we subtract one
> +    // from the frame count to get the number of dropped frames.
> +    uint32_t framesDropped = std::lround(longestFrame.ToMilliseconds() / vsyncIntervalMs) - 1;

Is rounding appropriate here?

Say the vsync interval is 16 ms, and our longest frame time is 20 ms. Doesn't that mean we still dropped a frame, and should report 1 here?
(In reply to Botond Ballo [:botond] from comment #48)
> Say the vsync interval is 16 ms, and our longest frame time is 20 ms.
> Doesn't that mean we still dropped a frame, and should report 1 here?

Not necessarily. In practice I was seeing a lot of values in the 17-18ms range but those were not actually frame drops. What's happening is this:


  |.........v..........................|..............v.....................|.........v....
  vsync     transformShadowTree        vsync          transformShadowTree   vsync     transformShadowTree

  ^---------------------16ms-----------^--------------------16ms------------^
            ^----------------------20ms---------------^-----------------12ms----------^

The transformShadowTree code (which triggers the calls to the AnimationMetricsTracker) is triggered by vsync, but may fluctuate somewhat. In the example above there is one delta of 20ms and the next delta is 12ms, but there are no frame drops because the code still completes before the next vsync.

> 
> Is rounding appropriate here?

I'm open to suggestions on how to improve this measurement. The rounding was the simplest thing I could come up with that seemed to do the right thing in the cases I tested. It's certainly possible that in some cases it won't do the right thing.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #49)
> What's happening is this:

Thanks for the explanation, that makes sense.

> I'm open to suggestions on how to improve this measurement.

It sounds like we could get a more accurate frame drop metric if we had access to the vsync times. Whether or not that's worth doing depends on if we care about distinguishing the "any frames dropped" vs. "no frames dropped" cases, or if we just care about spotting bad cases (many frames dropped).

Anyways, even if we want to implement a more accurate measurement, I'm happy to leave that to another bug.
Comment on attachment 8848607 [details]
Bug 1338347 - Add code to measure the maximum number of contiguous frame drops in a compositor animation.

https://reviewboard.mozilla.org/r/121502/#review124078
Attachment #8848607 - Flags: review?(botond) → review+
Comment on attachment 8848248 [details]
Bug 1338347 - Add telemetry histograms for frame throughput and frame drops during compositor animations.

https://reviewboard.mozilla.org/r/121190/#review126504

data-r=me

::: toolkit/components/telemetry/Histograms.json:325
(Diff revision 3)
> +    "alert_emails": ["kgupta@mozilla.com"],
> +    "bug_numbers": [1338347]
> +  },
> +  "COMPOSITOR_ANIMATION_THROUGHPUT_CONTENT" : {
> +    "expires_in_version": "60",
> +    "description": "Frame throughput ratio (scaled up to 1000) for compositor animations of content-process layers.",

I don't quite understand why this is scaled to 1000 but the max is 1500. I suppose this can go higher than 100%?
Attachment #8848248 - Flags: review?(benjamin) → review+
(In reply to Benjamin Smedberg [:bsmedberg] from comment #52)
> I don't quite understand why this is scaled to 1000 but the max is 1500. I
> suppose this can go higher than 100%?

Yeah, I did that to catch scenarios where we composite more frames than expected. It really shouldn't go as high as 1500 but if it does we'll want to know about it.
(I also mentioned that in the commit message, so hopefully that's sufficient to address your concern)
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/043fa58a556c
Have the animation-sampling code track which process the animations are in. r=botond
https://hg.mozilla.org/integration/autoland/rev/59f8c2560e6f
Add telemetry histograms for frame throughput and frame drops during compositor animations. r=bsmedberg
https://hg.mozilla.org/integration/autoland/rev/6bb6ad363f65
Record frame throughput ratios for compositor animations r=botond
https://hg.mozilla.org/integration/autoland/rev/02b734210c9c
Record frame throughput ratio for APZ animations r=botond
https://hg.mozilla.org/integration/autoland/rev/5f585d37cdb8
Refactor to group fields into a per-animation data structure. r=botond
https://hg.mozilla.org/integration/autoland/rev/988fd6eb2da9
Add code to measure the maximum number of contiguous frame drops in a compositor animation. r=botond
These probes were removed in bug 1449287.
Performance Impact: --- → P1
Whiteboard: [gfx-noted][qf:p1] → [gfx-noted]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: