In investigating bug 1713360, a different CHECKERBOARDING_SEVERITY regression we've learned some important things about the CHECKERBOARDING_SEVERITY metrics itself. From talking with botond and hiro their understanding of CHECKERBOARDING_SEVERITY is the following. The way the metric is computed is that everytime we experience checkerboarding we calculate the number of ms and pixels affected, multiply those and then sqrt root it. When the checkerboard event is over we submit a CHECKERBOARDING_SEVERITY event to telemetry.
So just looking at the median (or mean, etc) is misleading, you need to at the very least look at the number of events submitted (since we could have an increase in low severity events bringing the median/mean down but actually we are doing worse because we have more events). (Ideally looking at the distribution too.)
So with this knowledge we re-look at the graph in comment 15.
We start May 2020 at our highest position of about 5k. Then bug 1627012 (which caused us to set a full display port on any scroll frame that apz knows about) lands on 2020-05-28 and we see a sharp drop in the median to about 3.7k, the number of CS events stays stable at around 250k. This makes sense, setting a full displayport should decrease checkerboard severity.
Then on 2020-10-29 bug 1669861 (should just be a cleanup, we don't know why) lands and we get a sharp drop to 800 in the median value. The number of CS events jumps to around 800k.
Then on 2021-01-24 bug 1682919 lands, which fixes a regression from bug 1669861 and we get a sharp rise to 1.8k in the median value, partially but not fully reverting the corresponding rise from bug 1669861. The number of CS events goes down to about 500k.
Then on 2021-02-02 bug 1687927 lands which fixes the full display port setting regression from bug 1627012 and we set a sharp rise back to 5k in the median value. The number of CS events stays stable. This makes sense. No longer setting full display ports should increase checkerboard severity.
So bug 1669861 seems to have introduced a large number of checkerboarding events of low severity. Bug 1682919 removed some of these extra CS events, but not all of them.