Closed Bug 1454668 Opened 2 years ago Closed 2 years ago

CHECKERBOARD_SEVERITY (et al) regression around 2018-03-22

Categories

(Core :: Panning and Zooming, defect)

61 Branch
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox-esr52 --- unaffected
firefox59 --- unaffected
firefox60 --- unaffected
firefox61 + fixed

People

(Reporter: chutten, Assigned: florian)

References

Details

(Keywords: regression)

No description provided.
Something happened around 2018-03-22 that made CHECKERBOARD_SEVERITY go from a unimodal graph looking like this: https://mzl.la/2qG50x1 

To a bimodal graph looking like this: https://mzl.la/2qDSsGB

Once a noise-inducing regression (bug 1447193) was fixed on 2018-04-05, it now looks like this: https://mzl.la/2qEBjfO

This happened simultaneously with an increase in the number of pings containing CHECKERBOARD_SEVERITY (submissions) as can be seen on the second plot here: https://mzl.la/2JsBTGh

This was not reported as a regression by the automated histogram regression detector. This might have been due to the noise-inducing regression making it hard for cerberus to pick it out, or it might be due to its current unreliability (bug 1450729). I apologize for that.

Next Steps: 
* Get a changelog and start poking around for likely culprits.
See Also: → 1452632
Summary: CHECKERBOARD_SEVERITY (et al) regression around 2018-03 → CHECKERBOARD_SEVERITY (et al) regression around 2018-03-22
Keywords: regression
Version: 49 Branch → 61 Branch
The regression range is anything after 20180321220044 and up to and including 20180322220118, which is this:

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=68464218a41a&tochange=8bf380faae74

From a first glance the most likely culprit is bug 1447719. Note that it affects Windows and Linux, and sure enough, the second mode in the histograms seems to be restricted to those two platforms.
Blocks: 1447719
Could someone please explain what this is measuring and which changes are likely to affect it? "Opaque measure of the severity of a checkerboard event" in Histograms.json doesn't help me much.
I think the units of severity are "CSS pixels x milliseconds" and it represents the integral under the function of checkerboard size over time.

A less-opaque measure that shows the same effect is CHECKERBOARD_PEAK which is the peak number of CSS pixels affected during a checkerboard event.

Before: https://mzl.la/2qEhUvz
After: https://mzl.la/2qEhUvz
The same submissions increase shape in the second plot: https://mzl.la/2JsBTGh
Interesting to note are the very specific peaks in CHECKERBOARD_PEAK's "After" (https://mzl.la/2qEhUvz) suggesting the checkerboard events are hitting specific sizes. Given the likely culprit, they could correspond to the number of CSS pixels available to a maximized window at common Firefox use display resolutions[1].

[1]: https://hardware.metrics.mozilla.com/
(( mis-copied a url. The "After" graph is https://mzl.la/2qF0AXv ))
What does "checkerboard" mean?
Checkerboarding is when you scroll an area into view that has not been painted yet, and so you just see a blank region that's white or the page's background color. It's called "checkerboarding" for historical reasons (some mobile platforms would display a checkerboard pattern in those areas).

See [1] for some more description.

[1] https://searchfox.org/mozilla-central/source/gfx/doc/AsyncPanZoom.md#checkerboarding
Note that checkerboarding can also happen if we composite really early before content has painted, which I suspect would be the case here.

It's quite possible that in this case the checkerboarding is not really user-perceivable and the regression is something we can just ignore given the other improvements the patch brings.
Also for context https://bugzilla.mozilla.org/show_bug.cgi?id=1243911 is the reason I thought your patch was likely to affect checkerboarding.
See Also: → 1243911
In the range from comment 2, bug 1446264 may also be relevant. IIRC it makes us size the window appropriately earlier during startup, so you may end up with startup checkerboarding for the size of the whole window instead of for the 0x0px initial browser window we used to have.

If it's my patch that affected this, you may see another change from bug 1450293 that I landed yesterday.
I verified that the regression was caused by the browser.startup.blankWindow pref (which was toggled in bug 1447719) and that it introduces a not-really-user-visible checkerboarding instance on browser startup. I verified on Windows Nightly by loading about:checkerboard, restarting the browser, and refreshing the page. When the browser.startup.blankWindow pref on (current m-c) I get a ~30 frame checkerboarding instance listed. With the pref off the checkerboarding instance doesn't appear. So somehow the blank window thing is causing the APZ code to detect a half-second (in my case) checkerboard instance.

Given that this regression is not actually user-perceivable I would say we don't need to worry about it and can close the bug.
It will be interesting to see if bug 1450293 makes this go away. It very well might, if we are not using the GPU process and/or APZ for the initial blank window any more.
Indeed, it looks like on April 17th all the checkerboard metrics took a nosedive, presumably from bug 1450293. I'm going to mark this bug fixed although really it's a "don't care about this regression because it's not user-visible".
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Assignee: nobody → florian
Depends on: 1450293
Target Milestone: --- → mozilla61
You need to log in before you can comment on or make changes to this bug.