Closed Bug 1238040 Opened 4 years ago Closed 4 years ago

Update checkerboarding telemetry measures

Categories

(Core :: Panning and Zooming, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
firefox46 --- fixed

People

(Reporter: kats, Assigned: kats)

References

Details

Attachments

(2 files, 4 obsolete files)

The checkerboard telemetry measure we have in place right now isn't that great. We can do a bit better by recording some more metrics. Specifically instead of measuring checkerboarded csspixel-milliseconds per frame of composition, I want to record per checkerboard event: (1) the peak checkerboard value (2) the severity of the checkerboard and (3) the duration of the checkerboard in milliseconds. We will also implicitly getting frequency of checkerboarding based on the number of data points submitted.

The peak/severity values don't need to have specific units since the units are basically meaningless when aggregated over a large number of users with different environments. However, we will still be able to see how changes affect the overall checkerboarding experience by looking for shifts in the peak and/or severity and/or duration.
Depends on: 1239472
Attached patch Patch (obsolete) — Splinter Review
Attachment #8707639 - Flags: review?(vladan.bugzilla)
Attachment #8707639 - Flags: review?(botond)
Comment on attachment 8707639 [details] [diff] [review]
Patch

Review of attachment 8707639 [details] [diff] [review]:
-----------------------------------------------------------------

This patch has the side effect of making the telemetry gathering conditional on the apz.record_checkerboarding pref. Is that intentional?

::: toolkit/components/telemetry/Histograms.json
@@ +101,5 @@
> +    "alert_emails": ["kgupta@mozilla.com"],
> +    "bug_numbers": [1238040],
> +    "expires_in_version": "55",
> +    "kind": "exponential",
> +    "high": "3840 * 2160 * 8",

Where does the 8 come from?
(In reply to Botond Ballo [:botond] from comment #2)
> This patch has the side effect of making the telemetry gathering conditional
> on the apz.record_checkerboarding pref. Is that intentional?

Oh good catch! It wasn't intentional. I'll change it so that we create the CheckerboardEvent unconditionally but only print it (and eventually report it to about:checkerboard) based on the pref.

> ::: toolkit/components/telemetry/Histograms.json
> @@ +101,5 @@
> > +    "alert_emails": ["kgupta@mozilla.com"],
> > +    "bug_numbers": [1238040],
> > +    "expires_in_version": "55",
> > +    "kind": "exponential",
> > +    "high": "3840 * 2160 * 8",
> 
> Where does the 8 come from?

I realized that the peak value is in CSS pixels but if we're zoomed out on a 4k display we might have 3840 * 2160 * (max zoom) CSS pixels checkerboarding at once. I picked 8 basically at random to account for max zoom and also to give us some room to accommodate larger displays.
(Also it's annoying that I can't put comments in the json file directly)
Attached patch Patch (v2) (obsolete) — Splinter Review
Updated for the pref as per above comments.
Attachment #8707639 - Attachment is obsolete: true
Attachment #8707639 - Flags: review?(vladan.bugzilla)
Attachment #8707639 - Flags: review?(botond)
Attachment #8707963 - Flags: review?(vladan.bugzilla)
Attachment #8707963 - Flags: review?(botond)
Comment on attachment 8707963 [details] [diff] [review]
Patch (v2)

Review of attachment 8707963 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/apz/src/CheckerboardEvent.cpp
@@ +47,4 @@
>  CheckerboardEvent::GetSeverity()
>  {
> +  // Scale the total into a 32-bit value
> +  return sqrt(mTotalPixelMs);

Looking over the patch again, I'm mildly surprised that this didn't give me a compile error. I feel like I should need a cast in here somewhere. I did a try push:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=73091b757931
Comment on attachment 8707963 [details] [diff] [review]
Patch (v2)

After a discussion with Avi we decided that it would also be useful to add another measure, of the total time that APZ might reasonably be expected to checkerboard. This would be union of "time where APZ was actively transforming content" and "time where APZ was checkerboarding". In practice much of the latter should be subsumed by the former, but there are instances (like if the main thread is blocked, and a fling comes to an end) where you can have checkerboarding while not actively scrolling.

Adding this metric would allow us to make statements of the form "users experienced checkerboarding X% of the time that they reasonably might have been checkerboarding". This is in addition to the per-checkerboard-event metrics in the above patch, which give us more specific data on the characteristics of the checkerboard.

To describe the metrics in this patch, I usually visualize it as a graph that looks like this:

pixels
checkerboarded
   ^
   |
   |
   |
   |
   |       +-+
   |       | +-------+
   |     +-+ |       +-+
   |     | | |       | |
   +-----+-+-+-------+-+------------> time (ms)

where the height of the bars are number of pixels checkerboarded on a given composition. The width of the bars are the time that the composition was on the screen (ideally 16ms if we're scrolling at 60fps). The PEAK probe records the height of the highest bar in this graph. The DURATION probe records the total width of the event, and the SEVERITY records the area covered by the bars, which will be less than or equal to PEAK * DURATION.

I'm unflagging this patch for review while I add the extra probe that we need, and until I can get bug 1226826 landed since it's still failing try and may need some fiddling with.
Attachment #8707963 - Flags: review?(vladan.bugzilla)
Attachment #8707963 - Flags: review?(botond)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #3)
> (In reply to Botond Ballo [:botond] from comment #2)
> > This patch has the side effect of making the telemetry gathering conditional
> > on the apz.record_checkerboarding pref. Is that intentional?
> 
> Oh good catch! It wasn't intentional. I'll change it so that we create the
> CheckerboardEvent unconditionally but only print it (and eventually report
> it to about:checkerboard) based on the pref.

If we do this, can we also condition on the pref some things CheckerboardEvent does that aren't necessary for computing the telemetry figures, such as writing to mRendertraceInfo? I'm a bit concerned about the performance overhead if we're going to be doing this unconditionally, particularly of all the string operations we'd be doing on every composite.
Good point, I'll update that as well. Thanks!
Attachment #8707963 - Attachment is obsolete: true
Attachment #8709041 - Flags: review?(vladan.bugzilla)
Attachment #8709041 - Flags: review?(botond)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #3)
> > ::: toolkit/components/telemetry/Histograms.json
> > @@ +101,5 @@
> > > +    "alert_emails": ["kgupta@mozilla.com"],
> > > +    "bug_numbers": [1238040],
> > > +    "expires_in_version": "55",
> > > +    "kind": "exponential",
> > > +    "high": "3840 * 2160 * 8",
> > 
> > Where does the 8 come from?
> 
> I realized that the peak value is in CSS pixels but if we're zoomed out on a
> 4k display we might have 3840 * 2160 * (max zoom) CSS pixels checkerboarding
> at once. I picked 8 basically at random to account for max zoom and also to
> give us some room to accommodate larger displays.

OK, thanks. I was just confused because earlier (bug 1221694 and comment 3) we had (3840 * 2160 * 16) as the max value for the product of this quantity and time.
Comment on attachment 8709041 [details] [diff] [review]
Part 1 - Add telemetry for checkerboarding

Review of attachment 8709041 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good! I do still have a couple of performance nits I'd like addressed:

  - I would prefer that we still guard the creation of mCheckerboardEvent on 
    Telemetry::CanRecordExtended(), without which the Accumulate() calls don't 
    record the data anyways. I confirmed with Vladan that this is safe to call
    off-main-thread.

  - In NotifyLayersUpdated(), we still do string manipulation (to prepare the
    |aExtraInfo| parameter) in the case where telemetry is enabled but
    checkerboard logging is not. Since that block of code just makes calls to
    UpdateRendertraceProperty(), which is ignored with checkerboard logging
    disabled, could we condition that block on checkerboard logging being
    enabled as well?

r+ with these addressed.
Attachment #8709041 - Flags: review?(botond) → review+
(In reply to Botond Ballo [:botond] from comment #13)
> OK, thanks. I was just confused because earlier (bug 1221694 and comment 3)
> we had (3840 * 2160 * 16) as the max value for the product of this quantity
> and time.

"bug 1221694 comment 3"
(In reply to Botond Ballo [:botond] from comment #14)
>   - I would prefer that we still guard the creation of mCheckerboardEvent on 
>     Telemetry::CanRecordExtended(), without which the Accumulate() calls
> don't 
>     record the data anyways. I confirmed with Vladan that this is safe to
> call
>     off-main-thread.

This should be fine.

>   - In NotifyLayersUpdated(), we still do string manipulation (to prepare the
>     |aExtraInfo| parameter) in the case where telemetry is enabled but
>     checkerboard logging is not. Since that block of code just makes calls to
>     UpdateRendertraceProperty(), which is ignored with checkerboard logging
>     disabled, could we condition that block on checkerboard logging being
>     enabled as well?

The reason I did it that way is I didn't want to be reading inconsistent values of the checkerboard pref. For example, say the pref is true on startup, and the mCheckerboardEvent object is created with mRecordTrace = true. Later, the user flips the pref off. At this point mRecordTrace is still true (until the next checkerboard event is created) but if NotifyLayersUpdated is guarded with gfxPrefs::APZRecordCheckerboard(), then it will stop recording the info partway through the event, which can give misleading results. I could expose mRecordTrace from the event object to do this. I just didn't think it was worth it when I wrote the patch but I'm happy to change it.
Comment on attachment 8709042 [details] [diff] [review]
Part 2 - Add telemetry for potential checkerboarding duration

Review of attachment 8709042 [details] [diff] [review]:
-----------------------------------------------------------------

Might it also be useful to know how much checkerboarding happens during transforming compared to during not transforming?

::: gfx/layers/apz/src/PotentialCheckerboardDurationTracker.cpp
@@ +51,5 @@
> +
> +void
> +PotentialCheckerboardDurationTracker::TransformStopped()
> +{
> +  MOZ_ASSERT(Tracking());

I'd make this slightly stronger and assert mInTransform.

::: gfx/layers/moz.build
@@ +99,5 @@
>      'apz/src/APZCTreeManager.h',
>      'apz/src/APZUtils.h',
>      'apz/src/AsyncDragMetrics.h',
>      'apz/src/AsyncPanZoomAnimation.h',
> +    'apz/src/PotentialCheckerboardDurationTracker.h',

Is this necessary? AsyncPanZoomController.h includes other files not exported here, like InputQueue.h.
Attachment #8709042 - Flags: review?(botond) → review+
Updated per review comments, carrying r+ from botond.
Attachment #8709041 - Attachment is obsolete: true
Attachment #8709041 - Flags: review?(vladan.bugzilla)
Attachment #8709486 - Flags: review?(vladan.bugzilla)
Ditto
Attachment #8709042 - Attachment is obsolete: true
Attachment #8709042 - Flags: review?(vladan.bugzilla)
Attachment #8709488 - Flags: review?(vladan.bugzilla)
No longer blocks: 1237791
Comment on attachment 8709488 [details] [diff] [review]
Part 2 - Add telemetry for potential checkerboarding duration (v2)

Review of attachment 8709488 [details] [diff] [review]:
-----------------------------------------------------------------

I'm concerned that there might be less main-thread jank on e10s but it might be qualitatively worse wrt checkerboarding. 

Will these probes be enough to understand how bad checkerboarding is on Beta? 
And do you have a sense of what a "tolerable" amount of checkeboarding might be (wrt these probes)?

r+ for the histogram code & approach, someone else will need to review the APZ code

::: toolkit/components/telemetry/Histograms.json
@@ +112,5 @@
> +    "expires_in_version": "55",
> +    "kind": "exponential",
> +    "high": "1000000",
> +    "n_buckets": 50,
> +    "description": "Duration of a chunk of time that could reasonably have had checkerboarding"

mention the units (ms)
Attachment #8709488 - Flags: review?(vladan.bugzilla) → review+
(In reply to Vladan Djeric (:vladan) -- please needinfo from comment #21)
> I'm concerned that there might be less main-thread jank on e10s but it might
> be qualitatively worse wrt checkerboarding. 

That is a potential problem, yes. But in the end there's no automated way to make that decision, since it's qualitative and depends on the individual experience of jank/checkerboarding.

> Will these probes be enough to understand how bad checkerboarding is on
> Beta? 

These probes are going to land on 46 (maybe even 47) but aren't going to be in 45 beta experiment.

> And do you have a sense of what a "tolerable" amount of checkeboarding might
> be (wrt these probes)?

From an absolute point of view, no. As I discussed with Avi, the probes will give us something measurable and meaningful, so we will be able to say things like "people are seeing checkerboard X% of the time that they could potentially be seeing it", and I would hope that X is a small number (like less than 1%) but in the end whether or not it's "tolerable" depends on the individual user. The probes will at least definitively tell us if our efforts to reduce checkerboarding are helping or hurting.

> r+ for the histogram code & approach, someone else will need to review the
> APZ code

Thanks

> ::: toolkit/components/telemetry/Histograms.json
> 
> mention the units (ms)

Will do.
Comment on attachment 8709486 [details] [diff] [review]
Part 1 - Add telemetry for checkerboarding (v2)

Review of attachment 8709486 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/Histograms.json
@@ +103,5 @@
> +    "expires_in_version": "55",
> +    "kind": "exponential",
> +    "high": "3840 * 2160 * 8",
> +    "n_buckets": 50,
> +    "description": "Peak number of CSS pixels checkerboarded during a checkerboard event"

put the meaning of the "3840 * 2160 * 8" max in the description
Attachment #8709486 - Flags: review?(vladan.bugzilla) → review+
> "high": "(1 << 30)",

Mark: can the aggregator code handle this?
Flags: needinfo?(mreid)
Please uplift this for the Beta 45 experiment if possible
(In reply to Vladan Djeric (:vladan) -- please needinfo from comment #24)
> > "high": "(1 << 30)",
> 
> Mark: can the aggregator code handle this?

Related to this, see bug 1239472.
(In reply to Vladan Djeric (:vladan) -- please needinfo from comment #24)
> > "high": "(1 << 30)",
> 
> Mark: can the aggregator code handle this?

I'm not sure - Roberto?
Flags: needinfo?(mreid) → needinfo?(rvitillo)
Generally, we'd like less expressions in histogram definitions, not more. See bug 920169.
Flags: needinfo?(rvitillo)
Would the aggregator be ok if I replaced that with a numeric constant? Or is the aggregator concern about the size of the value?
I'll land it with the numeric constant since I want to get this in before the next merge. If there are concerns we can address them as they come up.
https://hg.mozilla.org/mozilla-central/rev/e780ca0bc68d
https://hg.mozilla.org/mozilla-central/rev/7a3b839f4a8e
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
Depends on: 1244901
kats, would this be a good description of one way to look at the values?

1. CHECKERBOARD_DURATION / CHECKERBOARD_POTENTIAL_DURATION - represents a fraction of how much checkerboarding (CB) is visible (even if minimally) of the overal duration where APZ is actively doing stuff on screen*.

2. CHECKERBOARD_PEAK corresponds to the biggest number of CB pixels per checkerboard event. We might want to use this histogram to set some threshold which we consider "really visible", and by finding out which percentage of the CB were "really visible" we could maybe apply this factor to (1).

3. CHECKERBOARD_SEVERITY is described as "Opaque measure of the severity of a checkerboard event". How should this be evaluated? what are some useful thresholds or ways to understand it?

* I believe currently is only mouse wheel scroll, with the intent to extend this to other scroll triggers, and maybe potentially other transformations too?


Overall, how do we take the probes histograms and create meaningful and trackable statements from them?
Flags: needinfo?(bugmail.mozilla)
Yes, your descriptions are pretty accurate. Some clarifications:

- In (1), it's the "total CHECKERBOARD_DURATION / total CHECKERBOARD_POTENTIAL_DURATION" that is meaningful. Based on the current telemetry-gathered data that's sitting at close to 25% which seems pretty high to me.

- The way the probe is written it's not limited to mousewheel scroll, it will cover all APZ-driven async transformations. So for example I recently enabled touch support on windows, and that will get included here as well. If javascript does a scrollTo({behaviour:smooth}) which triggers an APZ smooth-scroll animation, that will get included as well.

- With respect to the severity, I think we can use it in two ways. The first is to use the telemetry as a guide to reducing checkerboarding. So for example, right now based on the telemetry data a severity of > 10k would be in the upper half of severities. So what we can do (and what I intend to do) is ask people to go to about:checkerboard and submit the logs for checkerboard events with severities greater than 10k. Since we know these events are in the upper half of badness, diagnosing why these events happened will be useful and will help us target these high-badness events. Obviously we can adjust the 10k threshold up or down as needed.

The second way to use the severity is similar to what you described for the PEAK probe - that is, we try to find a threshold at which we consider it "really visible" and then do something with that. The way to do this, I think, is to figure out what severity checkerboard people actually notice. Yesterday Milan took a look at his about:checkerboard and he had an event there with severity 15k from about an hour ago, and he didn't remember actually "seeing" the checkerboard. Granted, he's probably used to seeing checkerboard now, but it's an indication that even a 15k-severity event isn't so bad that you remember it an hour later. With additional data points of this sort we can find a threshold that determines at what severity the user experience is actually noticeable/impacted.
Flags: needinfo?(bugmail.mozilla)
You need to log in before you can comment on or make changes to this bug.