Closed Bug 1238040 Opened 9 years ago Closed 9 years ago

Update checkerboarding telemetry measures

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla46

Tracking Flags:

Tracking

Status

firefox46

---

fixed

People

(Reporter: kats, Assigned: kats)

References

Details

Attachments

(2 files, 4 obsolete files)

Patch 9 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 6.02 KB, patch		Details \| Diff \| Splinter Review
Patch (v2) 9 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 6.35 KB, patch		Details \| Diff \| Splinter Review
Part 1 - Add telemetry for checkerboarding 9 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 10.05 KB, patch	botond : review+	Details \| Diff \| Splinter Review
Part 2 - Add telemetry for potential checkerboarding duration 9 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 11.38 KB, patch	botond : review+	Details \| Diff \| Splinter Review
Part 1 - Add telemetry for checkerboarding (v2) 9 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 12.40 KB, patch	vladan : review+	Details \| Diff \| Splinter Review
Part 2 - Add telemetry for potential checkerboarding duration (v2) 9 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 10.78 KB, patch	vladan : review+	Details \| Diff \| Splinter Review

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Description

•

9 years ago

The checkerboard telemetry measure we have in place right now isn't that great. We can do a bit better by recording some more metrics. Specifically instead of measuring checkerboarded csspixel-milliseconds per frame of composition, I want to record per checkerboard event: (1) the peak checkerboard value (2) the severity of the checkerboard and (3) the duration of the checkerboard in milliseconds. We will also implicitly getting frequency of checkerboarding based on the number of data points submitted. The peak/severity values don't need to have specific units since the units are basically meaningless when aggregated over a large number of users with different environments. However, we will still be able to see how changes affect the overall checkerboarding experience by looking for shifts in the peak and/or severity and/or duration.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

9 years ago

Depends on: 1239472

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 1

•

9 years ago

Attached patch Patch (obsolete) — Details — Splinter Review

Attachment #8707639 - Flags: review?(vladan.bugzilla)

Attachment #8707639 - Flags: review?(botond)

Botond Ballo [:botond]

Comment 2

•

9 years ago

Comment on attachment 8707639 [details] [diff] [review] Patch Review of attachment 8707639 [details] [diff] [review]: ----------------------------------------------------------------- This patch has the side effect of making the telemetry gathering conditional on the apz.record_checkerboarding pref. Is that intentional? ::: toolkit/components/telemetry/Histograms.json @@ +101,5 @@ > + "alert_emails": ["kgupta@mozilla.com"], > + "bug_numbers": [1238040], > + "expires_in_version": "55", > + "kind": "exponential", > + "high": "3840 * 2160 * 8", Where does the 8 come from?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

9 years ago

Blocks: 1237791

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 3

•

9 years ago

(In reply to Botond Ballo [:botond] from comment #2) > This patch has the side effect of making the telemetry gathering conditional > on the apz.record_checkerboarding pref. Is that intentional? Oh good catch! It wasn't intentional. I'll change it so that we create the CheckerboardEvent unconditionally but only print it (and eventually report it to about:checkerboard) based on the pref. > ::: toolkit/components/telemetry/Histograms.json > @@ +101,5 @@ > > + "alert_emails": ["kgupta@mozilla.com"], > > + "bug_numbers": [1238040], > > + "expires_in_version": "55", > > + "kind": "exponential", > > + "high": "3840 * 2160 * 8", > > Where does the 8 come from? I realized that the peak value is in CSS pixels but if we're zoomed out on a 4k display we might have 3840 * 2160 * (max zoom) CSS pixels checkerboarding at once. I picked 8 basically at random to account for max zoom and also to give us some room to accommodate larger displays.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 4

•

9 years ago

(Also it's annoying that I can't put comments in the json file directly)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 5

•

9 years ago

Attached patch Patch (v2) (obsolete) — Details — Splinter Review

Updated for the pref as per above comments.

Attachment #8707639 - Attachment is obsolete: true

Attachment #8707639 - Flags: review?(vladan.bugzilla)

Attachment #8707639 - Flags: review?(botond)

Attachment #8707963 - Flags: review?(vladan.bugzilla)

Attachment #8707963 - Flags: review?(botond)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 6

•

9 years ago

Comment on attachment 8707963 [details] [diff] [review] Patch (v2) Review of attachment 8707963 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/apz/src/CheckerboardEvent.cpp @@ +47,4 @@ > CheckerboardEvent::GetSeverity() > { > + // Scale the total into a 32-bit value > + return sqrt(mTotalPixelMs); Looking over the patch again, I'm mildly surprised that this didn't give me a compile error. I feel like I should need a cast in here somewhere. I did a try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=73091b757931

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 7

•

9 years ago

As I suspected. Now with casts: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d35955a7fe2e

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 8

•

9 years ago

Comment on attachment 8707963 [details] [diff] [review] Patch (v2) After a discussion with Avi we decided that it would also be useful to add another measure, of the total time that APZ might reasonably be expected to checkerboard. This would be union of "time where APZ was actively transforming content" and "time where APZ was checkerboarding". In practice much of the latter should be subsumed by the former, but there are instances (like if the main thread is blocked, and a fling comes to an end) where you can have checkerboarding while not actively scrolling. Adding this metric would allow us to make statements of the form "users experienced checkerboarding X% of the time that they reasonably might have been checkerboarding". This is in addition to the per-checkerboard-event metrics in the above patch, which give us more specific data on the characteristics of the checkerboard. To describe the metrics in this patch, I usually visualize it as a graph that looks like this: pixels checkerboarded ^ | | | | | +-+ | | +-------+ | +-+ | +-+ | | | | | | +-----+-+-+-------+-+------------> time (ms) where the height of the bars are number of pixels checkerboarded on a given composition. The width of the bars are the time that the composition was on the screen (ideally 16ms if we're scrolling at 60fps). The PEAK probe records the height of the highest bar in this graph. The DURATION probe records the total width of the event, and the SEVERITY records the area covered by the bars, which will be less than or equal to PEAK * DURATION. I'm unflagging this patch for review while I add the extra probe that we need, and until I can get bug 1226826 landed since it's still failing try and may need some fiddling with.

Attachment #8707963 - Flags: review?(vladan.bugzilla)

Attachment #8707963 - Flags: review?(botond)

Botond Ballo [:botond]

Comment 9

•

9 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #3) > (In reply to Botond Ballo [:botond] from comment #2) > > This patch has the side effect of making the telemetry gathering conditional > > on the apz.record_checkerboarding pref. Is that intentional? > > Oh good catch! It wasn't intentional. I'll change it so that we create the > CheckerboardEvent unconditionally but only print it (and eventually report > it to about:checkerboard) based on the pref. If we do this, can we also condition on the pref some things CheckerboardEvent does that aren't necessary for computing the telemetry figures, such as writing to mRendertraceInfo? I'm a bit concerned about the performance overhead if we're going to be doing this unconditionally, particularly of all the string operations we'd be doing on every composite.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 10

•

9 years ago

Good point, I'll update that as well. Thanks!

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 11

•

9 years ago

Attached patch Part 1 - Add telemetry for checkerboarding (obsolete) — Details — Splinter Review

Attachment #8707963 - Attachment is obsolete: true

Attachment #8709041 - Flags: review?(vladan.bugzilla)

Attachment #8709041 - Flags: review?(botond)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 12

•

9 years ago

Attached patch Part 2 - Add telemetry for potential checkerboarding duration (obsolete) — Details — Splinter Review

Try push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=8b5a12b6ea10

Attachment #8709042 - Flags: review?(vladan.bugzilla)

Attachment #8709042 - Flags: review?(botond)

Botond Ballo [:botond]

Comment 13

•

9 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #3) > > ::: toolkit/components/telemetry/Histograms.json > > @@ +101,5 @@ > > > + "alert_emails": ["kgupta@mozilla.com"], > > > + "bug_numbers": [1238040], > > > + "expires_in_version": "55", > > > + "kind": "exponential", > > > + "high": "3840 * 2160 * 8", > > > > Where does the 8 come from? > > I realized that the peak value is in CSS pixels but if we're zoomed out on a > 4k display we might have 3840 * 2160 * (max zoom) CSS pixels checkerboarding > at once. I picked 8 basically at random to account for max zoom and also to > give us some room to accommodate larger displays. OK, thanks. I was just confused because earlier (bug 1221694 and comment 3) we had (3840 * 2160 * 16) as the max value for the product of this quantity and time.

Botond Ballo [:botond]

Comment 14

•

9 years ago

Comment on attachment 8709041 [details] [diff] [review] Part 1 - Add telemetry for checkerboarding Review of attachment 8709041 [details] [diff] [review]: ----------------------------------------------------------------- Looks good! I do still have a couple of performance nits I'd like addressed: - I would prefer that we still guard the creation of mCheckerboardEvent on Telemetry::CanRecordExtended(), without which the Accumulate() calls don't record the data anyways. I confirmed with Vladan that this is safe to call off-main-thread. - In NotifyLayersUpdated(), we still do string manipulation (to prepare the |aExtraInfo| parameter) in the case where telemetry is enabled but checkerboard logging is not. Since that block of code just makes calls to UpdateRendertraceProperty(), which is ignored with checkerboard logging disabled, could we condition that block on checkerboard logging being enabled as well? r+ with these addressed.

Attachment #8709041 - Flags: review?(botond) → review+

Botond Ballo [:botond]

Comment 15

•

9 years ago

(In reply to Botond Ballo [:botond] from comment #13) > OK, thanks. I was just confused because earlier (bug 1221694 and comment 3) > we had (3840 * 2160 * 16) as the max value for the product of this quantity > and time. "bug 1221694 comment 3"

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 16

•

9 years ago

(In reply to Botond Ballo [:botond] from comment #14) > - I would prefer that we still guard the creation of mCheckerboardEvent on > Telemetry::CanRecordExtended(), without which the Accumulate() calls > don't > record the data anyways. I confirmed with Vladan that this is safe to > call > off-main-thread. This should be fine. > - In NotifyLayersUpdated(), we still do string manipulation (to prepare the > |aExtraInfo| parameter) in the case where telemetry is enabled but > checkerboard logging is not. Since that block of code just makes calls to > UpdateRendertraceProperty(), which is ignored with checkerboard logging > disabled, could we condition that block on checkerboard logging being > enabled as well? The reason I did it that way is I didn't want to be reading inconsistent values of the checkerboard pref. For example, say the pref is true on startup, and the mCheckerboardEvent object is created with mRecordTrace = true. Later, the user flips the pref off. At this point mRecordTrace is still true (until the next checkerboard event is created) but if NotifyLayersUpdated is guarded with gfxPrefs::APZRecordCheckerboard(), then it will stop recording the info partway through the event, which can give misleading results. I could expose mRecordTrace from the event object to do this. I just didn't think it was worth it when I wrote the patch but I'm happy to change it.

Botond Ballo [:botond]

Comment 17

•

9 years ago

Comment on attachment 8709042 [details] [diff] [review] Part 2 - Add telemetry for potential checkerboarding duration Review of attachment 8709042 [details] [diff] [review]: ----------------------------------------------------------------- Might it also be useful to know how much checkerboarding happens during transforming compared to during not transforming? ::: gfx/layers/apz/src/PotentialCheckerboardDurationTracker.cpp @@ +51,5 @@ > + > +void > +PotentialCheckerboardDurationTracker::TransformStopped() > +{ > + MOZ_ASSERT(Tracking()); I'd make this slightly stronger and assert mInTransform. ::: gfx/layers/moz.build @@ +99,5 @@ > 'apz/src/APZCTreeManager.h', > 'apz/src/APZUtils.h', > 'apz/src/AsyncDragMetrics.h', > 'apz/src/AsyncPanZoomAnimation.h', > + 'apz/src/PotentialCheckerboardDurationTracker.h', Is this necessary? AsyncPanZoomController.h includes other files not exported here, like InputQueue.h.

Attachment #8709042 - Flags: review?(botond) → review+

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 18

•

9 years ago

Attached patch Part 1 - Add telemetry for checkerboarding (v2) — Details — Splinter Review

Updated per review comments, carrying r+ from botond.

Attachment #8709041 - Attachment is obsolete: true

Attachment #8709041 - Flags: review?(vladan.bugzilla)

Attachment #8709486 - Flags: review?(vladan.bugzilla)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 19

•

9 years ago

Attached patch Part 2 - Add telemetry for potential checkerboarding duration (v2) — Details — Splinter Review

Ditto

Attachment #8709042 - Attachment is obsolete: true

Attachment #8709042 - Flags: review?(vladan.bugzilla)

Attachment #8709488 - Flags: review?(vladan.bugzilla)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 20

•

9 years ago

Try push was also green https://treeherder.mozilla.org/#/jobs?repo=try&revision=d227f3fe4d3a&group_state=expanded

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

9 years ago

Blocks: 1240133

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

9 years ago

Blocks: 1236046

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

9 years ago

No longer blocks: 1237791

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

9 years ago

Blocks: 1236035

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

9 years ago

Blocks: 1238042

Vladan Djeric (:vladan)

Comment 21

•

9 years ago

Comment on attachment 8709488 [details] [diff] [review] Part 2 - Add telemetry for potential checkerboarding duration (v2) Review of attachment 8709488 [details] [diff] [review]: ----------------------------------------------------------------- I'm concerned that there might be less main-thread jank on e10s but it might be qualitatively worse wrt checkerboarding. Will these probes be enough to understand how bad checkerboarding is on Beta? And do you have a sense of what a "tolerable" amount of checkeboarding might be (wrt these probes)? r+ for the histogram code & approach, someone else will need to review the APZ code ::: toolkit/components/telemetry/Histograms.json @@ +112,5 @@ > + "expires_in_version": "55", > + "kind": "exponential", > + "high": "1000000", > + "n_buckets": 50, > + "description": "Duration of a chunk of time that could reasonably have had checkerboarding" mention the units (ms)

Attachment #8709488 - Flags: review?(vladan.bugzilla) → review+

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 22

•

9 years ago

(In reply to Vladan Djeric (:vladan) -- please needinfo from comment #21) > I'm concerned that there might be less main-thread jank on e10s but it might > be qualitatively worse wrt checkerboarding. That is a potential problem, yes. But in the end there's no automated way to make that decision, since it's qualitative and depends on the individual experience of jank/checkerboarding. > Will these probes be enough to understand how bad checkerboarding is on > Beta? These probes are going to land on 46 (maybe even 47) but aren't going to be in 45 beta experiment. > And do you have a sense of what a "tolerable" amount of checkeboarding might > be (wrt these probes)? From an absolute point of view, no. As I discussed with Avi, the probes will give us something measurable and meaningful, so we will be able to say things like "people are seeing checkerboard X% of the time that they could potentially be seeing it", and I would hope that X is a small number (like less than 1%) but in the end whether or not it's "tolerable" depends on the individual user. The probes will at least definitively tell us if our efforts to reduce checkerboarding are helping or hurting. > r+ for the histogram code & approach, someone else will need to review the > APZ code Thanks > ::: toolkit/components/telemetry/Histograms.json > > mention the units (ms) Will do.

Vladan Djeric (:vladan)

Comment 23

•

9 years ago

Comment on attachment 8709486 [details] [diff] [review] Part 1 - Add telemetry for checkerboarding (v2) Review of attachment 8709486 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/telemetry/Histograms.json @@ +103,5 @@ > + "expires_in_version": "55", > + "kind": "exponential", > + "high": "3840 * 2160 * 8", > + "n_buckets": 50, > + "description": "Peak number of CSS pixels checkerboarded during a checkerboard event" put the meaning of the "3840 * 2160 * 8" max in the description

Attachment #8709486 - Flags: review?(vladan.bugzilla) → review+

Vladan Djeric (:vladan)

Comment 24

•

9 years ago

> "high": "(1 << 30)", Mark: can the aggregator code handle this?

Flags: needinfo?(mreid)

Vladan Djeric (:vladan)

Updated

•

9 years ago

Blocks: e10s-measurement

Vladan Djeric (:vladan)

Comment 25

•

9 years ago

Please uplift this for the Beta 45 experiment if possible

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 26

•

9 years ago

(In reply to Vladan Djeric (:vladan) -- please needinfo from comment #24) > > "high": "(1 << 30)", > > Mark: can the aggregator code handle this? Related to this, see bug 1239472.

Mark Reid [:mreid]

Comment 27

•

9 years ago

(In reply to Vladan Djeric (:vladan) -- please needinfo from comment #24) > > "high": "(1 << 30)", > > Mark: can the aggregator code handle this? I'm not sure - Roberto?

Flags: needinfo?(mreid) → needinfo?(rvitillo)

Mark Reid [:mreid]

Comment 28

•

9 years ago

Generally, we'd like less expressions in histogram definitions, not more. See bug 920169.

Roberto Agostino Vitillo (:rvitillo)

Updated

•

9 years ago

Flags: needinfo?(rvitillo)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 29

•

9 years ago

Would the aggregator be ok if I replaced that with a numeric constant? Or is the aggregator concern about the size of the value?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 30

•

9 years ago

I'll land it with the numeric constant since I want to get this in before the next merge. If there are concerns we can address them as they come up.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 31

•

9 years ago

Oh, and here's a recent green try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8f9c86332313&selectedJob=15708992

Pulsebot

Comment 32

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/e780ca0bc68d https://hg.mozilla.org/integration/mozilla-inbound/rev/7a3b839f4a8e

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 33

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/e780ca0bc68d https://hg.mozilla.org/mozilla-central/rev/7a3b839f4a8e

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox46: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla46

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

9 years ago

Depends on: 1244901

Avi Halachmi (:avih)

Comment 34

•

9 years ago

kats, would this be a good description of one way to look at the values? 1. CHECKERBOARD_DURATION / CHECKERBOARD_POTENTIAL_DURATION - represents a fraction of how much checkerboarding (CB) is visible (even if minimally) of the overal duration where APZ is actively doing stuff on screen*. 2. CHECKERBOARD_PEAK corresponds to the biggest number of CB pixels per checkerboard event. We might want to use this histogram to set some threshold which we consider "really visible", and by finding out which percentage of the CB were "really visible" we could maybe apply this factor to (1). 3. CHECKERBOARD_SEVERITY is described as "Opaque measure of the severity of a checkerboard event". How should this be evaluated? what are some useful thresholds or ways to understand it? * I believe currently is only mouse wheel scroll, with the intent to extend this to other scroll triggers, and maybe potentially other transformations too? Overall, how do we take the probes histograms and create meaningful and trackable statements from them?

Flags: needinfo?(bugmail.mozilla)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 35

•

9 years ago

Yes, your descriptions are pretty accurate. Some clarifications: - In (1), it's the "total CHECKERBOARD_DURATION / total CHECKERBOARD_POTENTIAL_DURATION" that is meaningful. Based on the current telemetry-gathered data that's sitting at close to 25% which seems pretty high to me. - The way the probe is written it's not limited to mousewheel scroll, it will cover all APZ-driven async transformations. So for example I recently enabled touch support on windows, and that will get included here as well. If javascript does a scrollTo({behaviour:smooth}) which triggers an APZ smooth-scroll animation, that will get included as well. - With respect to the severity, I think we can use it in two ways. The first is to use the telemetry as a guide to reducing checkerboarding. So for example, right now based on the telemetry data a severity of > 10k would be in the upper half of severities. So what we can do (and what I intend to do) is ask people to go to about:checkerboard and submit the logs for checkerboard events with severities greater than 10k. Since we know these events are in the upper half of badness, diagnosing why these events happened will be useful and will help us target these high-badness events. Obviously we can adjust the 10k threshold up or down as needed. The second way to use the severity is similar to what you described for the PEAK probe - that is, we try to find a threshold at which we consider it "really visible" and then do something with that. The way to do this, I think, is to figure out what severity checkerboard people actually notice. Yesterday Milan took a look at his about:checkerboard and he had an event there with severity 15k from about an hour ago, and he didn't remember actually "seeing" the checkerboard. Granted, he's probably used to seeing checkerboard now, but it's an indication that even a 15k-severity event isn't so bad that you remember it an hour later. With additional data points of this sort we can find a threshold that determines at what severity the user experience is actually noticeable/impacted.

Flags: needinfo?(bugmail.mozilla)

Tim Smith (inactive) 👨‍🔬 [:tdsmith]

Updated

•

6 years ago

Blocks: 1539309

You need to log in before you can comment on or make changes to this bug.