Closed Bug 1246676 Opened 5 years ago Closed 5 years ago

Spike in checkerboarding telemetry metrics on Feb 05 build

Categories

(Core :: Panning and Zooming, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: kats, Assigned: kats)

References

Details

Attachments

(1 file)

The telemetry measures for checkerboarding (in particular the severity and duration metrics) have gone up from the Feb 05 build. See for example [1]. The top graph shows the median going up as well as the 95th percentile. The bottom graph shows that the number of submissions has stayed roughly constant. Note that the spike happened across all desktop platforms. This puts the regression range at [2], and I strongly suspect it was the displayport expiration code (bug 990916) that caused this. The other possibility is bug 1236046 but that was supposed to improve checkerboarding, not make it worse.

I think that to confirm it was the displaypory expiry, we should set apz.displayport_expiry_ms to 0 and see if the spike goes away. Botond suggested increasing the timeout to a larger value as well - in fact we can do both at the same time in the form of an A/B/C test by doing them on different platforms. So e.g. on Windows set the pref to 0, on OS X increase it, and on Linux leave it the same. Future telemetry metrics will give us additional information and we can use that to figure out the best thing to do here.

[1] https://telemetry.mozilla.org/new-pipeline/evo.html#!aggregates=mean!95th-percentile&cumulative=0&end_date=2016-02-06&keys=!__none__!__none__&max_channel_version=nightly%252F47&measure=CHECKERBOARD_SEVERITY&min_channel_version=nightly%252F47&os=Linux&processType=false&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2016-02-05&trim=1&use_submission_date=0
[2] https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=03297f8c28a08d2b39a252c7b368524d9e69da69&tochange=1dbe350b57b17ec1ce2887441b79c6f51b429378
Oh, the preprocessor doesn't like having comments on the same line as #if. Fixing...
Comment on attachment 8717004 [details]
MozReview Request: Bug 1246676 - Adjust the displayport expiry timeout on different platforms to observe the effect on checkerboarding. r?botond

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34007/diff/1-2/
Assignee: nobody → bugmail.mozilla
Attachment #8717004 - Flags: review?(botond) → review+
Comment on attachment 8717004 [details]
MozReview Request: Bug 1246676 - Adjust the displayport expiry timeout on different platforms to observe the effect on checkerboarding. r?botond

https://reviewboard.mozilla.org/r/34007/#review30631
I don't really understand the results in telemetry. On windows, the spike disappears in the Feb 10 build, which matches what I did in the patch (i.e. changing the pref to 0 fixed the spike). However on Linux the spike disappears in the Feb 11 build, even though the pref wasn't touched there.
Ah! In the Feb 11 pushlog [1] there is bug 1245925 which might explain this behavior. That is, the original spike was caused by expiring the displayport on root scrollframes. On Windows, the Feb 10 build had expiry disabled entirely, so the spike went away. On OS X the Feb 10 build had expiry extended to 30s, which dropped the spike but didn't remove it entirely. On Linux the Feb 10 build still had the spike. On Feb 11 the spike disappeared on all OS X and Linux because of the root scrollframe change.

So that means we should be able to restore the timeout of 15s and the spike should remain gone.

[1] https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=ac39fba33c6daf95b2cda71e588ca18e2eb752ab&tochange=d4d72e7b30da251ad3027e234444251adad5e335
I'm going to call this fixed by bug 1245925.
Status: NEW → RESOLVED
Closed: 5 years ago
Depends on: 1245925
Keywords: leave-open
Resolution: --- → FIXED
(Also, eyeballing the data before and after, it doesn't look like bug 1236046 made much of a difference in the numbers either way).
You need to log in before you can comment on or make changes to this bug.