Spike in checkerboarding telemetry metrics on Feb 05 build

RESOLVED FIXED

Status

()

Core
Panning and Zooming
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: kats, Assigned: kats)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(1 attachment)

The telemetry measures for checkerboarding (in particular the severity and duration metrics) have gone up from the Feb 05 build. See for example [1]. The top graph shows the median going up as well as the 95th percentile. The bottom graph shows that the number of submissions has stayed roughly constant. Note that the spike happened across all desktop platforms. This puts the regression range at [2], and I strongly suspect it was the displayport expiration code (bug 990916) that caused this. The other possibility is bug 1236046 but that was supposed to improve checkerboarding, not make it worse.

I think that to confirm it was the displaypory expiry, we should set apz.displayport_expiry_ms to 0 and see if the spike goes away. Botond suggested increasing the timeout to a larger value as well - in fact we can do both at the same time in the form of an A/B/C test by doing them on different platforms. So e.g. on Windows set the pref to 0, on OS X increase it, and on Linux leave it the same. Future telemetry metrics will give us additional information and we can use that to figure out the best thing to do here.

[1] https://telemetry.mozilla.org/new-pipeline/evo.html#!aggregates=mean!95th-percentile&cumulative=0&end_date=2016-02-06&keys=!__none__!__none__&max_channel_version=nightly%252F47&measure=CHECKERBOARD_SEVERITY&min_channel_version=nightly%252F47&os=Linux&processType=false&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2016-02-05&trim=1&use_submission_date=0
[2] https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=03297f8c28a08d2b39a252c7b368524d9e69da69&tochange=1dbe350b57b17ec1ce2887441b79c6f51b429378
Created attachment 8717004 [details]
MozReview Request: Bug 1246676 - Adjust the displayport expiry timeout on different platforms to observe the effect on checkerboarding. r?botond

Review commit: https://reviewboard.mozilla.org/r/34007/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/34007/
Attachment #8717004 - Flags: review?(botond)
Oh, the preprocessor doesn't like having comments on the same line as #if. Fixing...
Comment on attachment 8717004 [details]
MozReview Request: Bug 1246676 - Adjust the displayport expiry timeout on different platforms to observe the effect on checkerboarding. r?botond

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34007/diff/1-2/
Assignee: nobody → bugmail.mozilla

Updated

2 years ago
Attachment #8717004 - Flags: review?(botond) → review+
Comment on attachment 8717004 [details]
MozReview Request: Bug 1246676 - Adjust the displayport expiry timeout on different platforms to observe the effect on checkerboarding. r?botond

https://reviewboard.mozilla.org/r/34007/#review30631
Keywords: leave-open
Blocks: 1246997
I don't really understand the results in telemetry. On windows, the spike disappears in the Feb 10 build, which matches what I did in the patch (i.e. changing the pref to 0 fixed the spike). However on Linux the spike disappears in the Feb 11 build, even though the pref wasn't touched there.
Ah! In the Feb 11 pushlog [1] there is bug 1245925 which might explain this behavior. That is, the original spike was caused by expiring the displayport on root scrollframes. On Windows, the Feb 10 build had expiry disabled entirely, so the spike went away. On OS X the Feb 10 build had expiry extended to 30s, which dropped the spike but didn't remove it entirely. On Linux the Feb 10 build still had the spike. On Feb 11 the spike disappeared on all OS X and Linux because of the root scrollframe change.

So that means we should be able to restore the timeout of 15s and the spike should remain gone.

[1] https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=ac39fba33c6daf95b2cda71e588ca18e2eb752ab&tochange=d4d72e7b30da251ad3027e234444251adad5e335
Attachment #8717004 - Flags: checkin-
I'm going to call this fixed by bug 1245925.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Depends on: 1245925
Keywords: leave-open
Resolution: --- → FIXED
(Also, eyeballing the data before and after, it doesn't look like bug 1236046 made much of a difference in the numbers either way).
Depends on: 1250924
You need to log in before you can comment on or make changes to this bug.