Closed Bug 1221694 Opened 9 years ago Closed 9 years ago

Add a telemetry probe for checkerboarding

Categories

(Core :: Panning and Zooming, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla45
Tracking Status
firefox45 --- fixed

People

(Reporter: kats, Assigned: kats)

References

Details

Attachments

(1 file, 2 obsolete files)

We want to have a telemetry probe in place for APZ checkerboarding. Even better than just a "checkerboarding" metric would be something that reports "velocity,checkerboarding" data, so that we can visualize the checkerboarding as a function of scroll velocity. We would expect lower checkerboarding values at lower velocities, and if that is not the case then it will need extra attention.
You might be able to reuse some code from the patch in bug 969466.
See Also: → 969466
Assignee: nobody → bugmail.mozilla
Attached patch Patch (obsolete) — Splinter Review
This is a basic probe for checkerboarding. To do it as a function of velocity is trivial in the APZC code, but requires adding a new histogram type (or something) in Telemetry.cpp.
Attached patch Patch (obsolete) — Splinter Review
Slightly updated.

The 3840 * 2160 * 16 comes from the resolution of a 4k display, multiplied by 16ms. I used this as a rough approximation for an upper bound on how many CSS-pixel-ms we might checkerboarding per composite. It's possible we might exceed this but I think it's ok for a starting point and we can adjust it upwards later if needed.

I filed bug 1223030 for getting a two-axis telemetry metric support, since I'd like to have a basic telemetry ping in sooner rather than later and don't want to wait for that.
Attachment #8683834 - Attachment is obsolete: true
Attachment #8684938 - Flags: review?(botond)
Comment on attachment 8684938 [details] [diff] [review]
Patch

Review of attachment 8684938 [details] [diff] [review]:
-----------------------------------------------------------------

According to [1] you need a "data collection review" to land this. r+ assuming that checks out.

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Adding_a_new_Telemetry_probe

::: gfx/layers/apz/src/AsyncPanZoomController.cpp
@@ +3052,5 @@
> +  uint32_t time = (aSampleTime - mLastCheckerboardReport).ToMilliseconds();
> +  uint32_t magnitude = GetCheckerboardMagnitude();
> +  // TODO: make this a function of velocity
> +  mozilla::Telemetry::Accumulate(
> +      mozilla::Telemetry::CHECKERBOARDED_CSSPIXELS_MS, magnitude * time);

Is mozilla::Telemetry::CHECKERBOARDED_CSSPIXELS_MS generated from the entry in Histograms.json?
Attachment #8684938 - Flags: review?(botond) → review+
(In reply to Botond Ballo [:botond] from comment #4)
> Is mozilla::Telemetry::CHECKERBOARDED_CSSPIXELS_MS generated from the entry
> in Histograms.json?

(Yes. It says so on the page I liked :)).
(In reply to Botond Ballo [:botond] from comment #5)
> (In reply to Botond Ballo [:botond] from comment #4)
> > Is mozilla::Telemetry::CHECKERBOARDED_CSSPIXELS_MS generated from the entry
> > in Histograms.json?
> 
> (Yes. It says so on the page I liked :)).

_linked_. I'm a bit <enter>-happy today...
Comment on attachment 8684938 [details] [diff] [review]
Patch

Thanks! r? to vladan for data collection review.
Attachment #8684938 - Flags: review?(vladan.bugzilla)
Comment on attachment 8684938 [details] [diff] [review]
Patch

Review of attachment 8684938 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/apz/src/AsyncPanZoomController.cpp
@@ +3048,5 @@
> +    // composite (once for each layer it is attached to). Only report the
> +    // checkerboard once per composite though.
> +    return;
> +  }
> +  uint32_t time = (aSampleTime - mLastCheckerboardReport).ToMilliseconds();

I don't know the APZ code, so this question is likely misguided, but couldn't mLastCheckerboardReport be a timestamp from long ago because the compositor has been idle?

::: toolkit/components/telemetry/Histograms.json
@@ +89,5 @@
>      "kind": "boolean",
>      "description": "blocklist.xml has been loaded synchronously *** No longer needed (bug 1156565). Delete histogram and accumulation code! ***"
>    },
> +  "CHECKERBOARDED_CSSPIXELS_MS": {
> +    "expires_in_version": "never",

- make it expire in 10 versions, we should revisit its usefulness then
- add the bug_numbers and alert_emails fields
- nit: make the description into the last histogram field

@@ +90,5 @@
>      "description": "blocklist.xml has been loaded synchronously *** No longer needed (bug 1156565). Delete histogram and accumulation code! ***"
>    },
> +  "CHECKERBOARDED_CSSPIXELS_MS": {
> +    "expires_in_version": "never",
> +    "description": "Magnitude of checkerboarding in CSSPixel-milliseconds per scrollable frame per composite",

are pixels really a useful measure? why not report on % of page checkerboarded?
wouldn't you want to report checkerboard duration in a separate histogram?

It seems like you're trying record a tuple of values in a histogram, I wonder if it would be easier to interpret if you reported the components separately.
Attachment #8684938 - Flags: review?(vladan.bugzilla)
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #8)
> > +  uint32_t time = (aSampleTime - mLastCheckerboardReport).ToMilliseconds();
> 
> I don't know the APZ code, so this question is likely misguided, but
> couldn't mLastCheckerboardReport be a timestamp from long ago because the
> compositor has been idle?

It isn't misguided, and you're right that mLastCheckerboardReport could be a timestamp from long ago. The case you're thinking of, though, means that the compositor was idle while we were in a checkerboarding state. Therefore we do actually want to report a large value for this, because the user was seeing checkerboarding for a long time.

> ::: toolkit/components/telemetry/Histograms.json
> @@ +89,5 @@
> >      "kind": "boolean",
> >      "description": "blocklist.xml has been loaded synchronously *** No longer needed (bug 1156565). Delete histogram and accumulation code! ***"
> >    },
> > +  "CHECKERBOARDED_CSSPIXELS_MS": {
> > +    "expires_in_version": "never",
> 
> - make it expire in 10 versions, we should revisit its usefulness then
> - add the bug_numbers and alert_emails fields
> - nit: make the description into the last histogram field

Ok, will do.

> @@ +90,5 @@
> >      "description": "blocklist.xml has been loaded synchronously *** No longer needed (bug 1156565). Delete histogram and accumulation code! ***"
> >    },
> > +  "CHECKERBOARDED_CSSPIXELS_MS": {
> > +    "expires_in_version": "never",
> > +    "description": "Magnitude of checkerboarding in CSSPixel-milliseconds per scrollable frame per composite",
> 
> are pixels really a useful measure? why not report on % of page
> checkerboarded?

Reporting on % of page checkerbording is nontrival because the page is split up into layers that overlap each other. We know the "checkerboarded area" for each layer but computing a % of page would mean figuring out how the areas are overlapping and deciding if an area of checkerboarding on one layer is covered up by another layer. This is further complicated by the fact that layers can have transparency to the layer that's "covering up" checkerboarding might not actually be covering up anything because it's transparent.

If you want a %, the best we can do there is to take this pixel value and divide it by the composition bounds, which is bounding box in which the layer is scrolling. Even that doesn't fully reflect user-visible checkerboarding because part of the bounding box might be offscreen (e.g. an iframe partially scrolled off the screen). However I'm not really convinced that % is more meaningful than pixels. If you have a 1x1 tracking-pixel iframe that's fully checkerboarded it's less noticeable to the user than if 50% of a full-screen layer is checkerboarded.

> wouldn't you want to report checkerboard duration in a separate histogram?

I don't see the value in doing that, can you expand on why that would be useful? And technically I'm not sure how to define "checkerboard duration" - what if there's some pixels that are checkerboarded on one composite and some other pixels that are checkerboarded on the next composite. Would you lump those into the same "duration"? Or do you want to track which pixels are checkerboarded and then see how long it takes to get those pixels painted? Those pixels might move offscreen before they get painted so we might not be able to know at all.

> It seems like you're trying record a tuple of values in a histogram, I
> wonder if it would be easier to interpret if you reported the components
> separately.

No, my intention was to produce a value that correlated with user experience of checkerboarding. The more pixels that are checkerboarded, the worse the user experience. The longer the time spent looking at checkerboarded pixels, the worse the user experience.
Attached patch Patch v2Splinter Review
Updated histograms.json as requested; carrying botond's r+ and r? to vladan again. Let me know if my comment above didn't answer your concerns, or if you have other questions.

Also, if there's a way to test the histogram before landing please let me know. I added a printf locally to make sure the numbers being accumulated were sane but I don't know if there's anything beyond that I should test.
Attachment #8684938 - Attachment is obsolete: true
Attachment #8686129 - Flags: review?(vladan.bugzilla)
I think I understand, but a couple of remaining questions..

> > wouldn't you want to report checkerboard duration in a separate histogram?
> 
> I don't see the value in doing that, can you expand on why that would be
> useful? And technically I'm not sure how to define "checkerboard duration" -
> what if there's some pixels that are checkerboarded on one composite and
> some other pixels that are checkerboarded on the next composite. Would you
> lump those into the same "duration"? 

I think yes. Basically a measure of the duration of a period of uninterrupted checkerboarding. I think this would be a pretty relatable measurement even though it wouldn't reflect the size of the checkerboarded area or whether the checkerboarding was visible to the user.

> No, my intention was to produce a value that correlated with user experience
> of checkerboarding. The more pixels that are checkerboarded, the worse the
> user experience. The longer the time spent looking at checkerboarded pixels,
> the worse the user experience.

Yup, I got that, but there is an implicit weighting in your approach (between duration & pixels). e.g. effectively you're stating that checkerboarding a 1920px * 35px region for half a second (1920*35*500=34M) looks as bad as checkboarding a 1920 x 1080 pixel area for 1 frame (1920*1080*16=33M). I'm claiming that such a weighting seems premature and we might be better off reporting the component variables separately.

There's no harm in adding all 3 histograms. Ultimately, it's your call, since you're the first client for these measurements.
Attachment #8686129 - Flags: review?(vladan.bugzilla) → review+
> Also, if there's a way to test the histogram before landing please let me know. I added a printf
> locally to make sure the numbers being accumulated were sane but I don't know if there's anything 
> beyond that I should test.

You can just look in the "Histograms" section of about:telemetry in your local build
Thanks for the tip. The reason I haven't landed this yet is I'm rethinking if this is the best metric to be recording. It'll basically tell us if checkerboarding is regressing or improving, but beyond that it doesn't provide much actionable information, so maybe splitting them up like you suggested would be better. I'll mull it over a bit more.
I discussed this with Vladan today a bit. I'm going to land this patch just to start getting *some* data that may or may not be useful, but now I have a better idea on what we should be collecting. I'll file a follow-up bug with details.
https://hg.mozilla.org/mozilla-central/rev/de4585e9617c
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
So the initial data from this probe doesn't have any surprises, at least. But it's also not really useful or actionable in any way beyond that. Bug 1226826 should give us a lot more information.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: