CHECKERBOARD_DURATION reports times longer than 10 minutes
Categories
(Core :: Panning and Zooming, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox76 | --- | fixed |
People
(Reporter: Dexter, Assigned: kats)
References
Details
Attachments
(6 files)
While looking at our Glean SDK error stream, we found that the CHECKERBOARD_DURATION histogram, being reported through GeckoView->Glean in Fenix as gfx.checkerboard.duration
, is reporting loading times longer than 10 minutes.
This could be either a bug in the GeckoView code measuring that on Android, or we're really measuring > 10 minutes time.
Assignee | ||
Comment 1•5 years ago
|
||
I'm not sure how to operate that SQL query, but questions I have:
- What percentage of values are over the 10 minutes threshold?
- What percentage of values are over the 1 minute threshold? Is it comparable to the numbers we get on desktop which shows a tail of ~1.63% data points where it's greater than ~1 minute?
- Can you provide more details on the distribution of these big values? I'm not sure if they are getting grouped into buckets like they do with desktop telemetry or not, but either way getting a bit more detail on the distribution might give clues as to what's going on.
Reporter | ||
Comment 2•5 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #1)
I'm not sure how to operate that SQL query, but questions I have:
- What percentage of values are over the 10 minutes threshold?
This was reported 546 times and affects 6% of the clients.
- What percentage of values are over the 1 minute threshold? Is it comparable to the numbers we get on desktop which shows a tail of ~1.63% data points where it's greater than ~1 minute?
- Can you provide more details on the distribution of these big values? I'm not sure if they are getting grouped into buckets like they do with desktop telemetry or not, but either way getting a bit more detail on the distribution might give clues as to what's going on.
I'm afraid I don't have that data at hand. I put together a sample query with CDF/PDF to help you with your investigation here.
Please let us know if anything is needed of us!
Assignee | ||
Comment 3•5 years ago
|
||
I spent some time looking at this and couldn't make much sense of it. about:telemetry also doesn't work on Fenix, so I can't try and catch the problem in action. For now this is going into the P3 bucket as it is unactionable.
Reporter | ||
Comment 4•5 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #3)
I spent some time looking at this and couldn't make much sense of it. about:telemetry also doesn't work on Fenix, so I can't try and catch the problem in action. For now this is going into the P3 bucket as it is unactionable.
Fenix supports the Glean SDK facilities for debugging telemetry, including dumping to logcat and the debug view.
Comment 5•5 years ago
|
||
botond: Should this be retriaged in light of the latest comment, that it is actually visible and actionable, using Glean's debug facilities?
Comment 6•5 years ago
|
||
Kats, does comment 4 unblock further investigation of this?
Assignee | ||
Comment 7•5 years ago
|
||
Sure, I can look into it further. Not convinced that it needs upgrading from P3 yet though, unless we can demonstrate that it's actually a user-visible product bug.
Comment 8•5 years ago
|
||
Sounds good, thanks!
Assignee | ||
Comment 9•5 years ago
|
||
So I discovered while investigating bug 1617427 that a scenario similar to the one in that bug gets erroneously detected as a checkerboard. This will happen on pretty much any page where there's a really big nested scrollframe, such as github code listings. The check for whether or not it's checkerboarding doesn't account for ancestor scroll containers clipping the "checkerboarded" portion out of view. The test pages I've constructed so far all produce "infinite" checkerboarding events - they never end, so they never submit to telemetry. But I could probably construct a case where the checkerboard event can terminate based on your scroll position, and in that case I should be able to create "checkerboard" events of arbitrary length. I'll try to do that first, then see what's the best way to fix it.
Note that this may not actually be the thing that's generating the data we're seeing in glean, but it's certainly a possibility. After fixing this one we'll have to see if the data reported in glean changes or not, and if not, look for additional defects in the code that might be causing it.
Assignee | ||
Comment 10•5 years ago
|
||
I was able to construct a page that reproduces the problem (when loaded in GVE in the android emulator). The specific values in the page may vary per platform and depend on screen size and so on, but it at least confirms my theory that such a case is possible and probably not too hard to hit in the code. It would certainly explain abnormally long checkerboard times reported in telemetry.
I should be able to fix it relatively easily using the patches from bug 1617427, but there's a potential impact on performance since it would be doing a relatively expensive check per composite. I'll take a closer look to see if this can be optimized further.
Assignee | ||
Comment 11•5 years ago
|
||
I had written some patches but I ran afoul of the lock ordering in my design. I'll have to rethink it a bit.
Reporter | ||
Comment 12•5 years ago
|
||
I'm a curious one, so sorry about me jumping in :-) I'm trying to understand comment 9 and comment 10 a bit better: do they suggest that there was a bug in some gfx feature, a bug in the way things were measured or both? (again, for my own curiosity)
Assignee | ||
Comment 13•5 years ago
|
||
The case I found was a bug in the gfx measurement code. So it's not a user-visible bug but we are measuring checkerboarding incorrectly in this case.
Assignee | ||
Comment 14•5 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=94694ff80318c71ca780d78dd4d3558d94eee5cc has the new version which should be better.
Assignee | ||
Comment 15•5 years ago
|
||
The operations of recording checkerboard events and advancing the animations
across all APZs are conceptually linked. This patch extracts a helper that
does these operations in a single function, that is then called from both
the WR and non-WR codepaths. This will allow us to expand this code a bit
without having to duplicate it in multiple places.
Assignee | ||
Comment 16•5 years ago
|
||
Depends on D67374
Assignee | ||
Comment 17•5 years ago
|
||
The goal of this patch is to enhance GetCheckerboardMagnitude so that it
correctly accounts for clipping by ancestor APZC instances, the same way
IsCurrentlyCheckerboarding does. However, IsCurrentlyCheckerboarding uses
the mParent parent pointer when computing the clipped composition bounds,
which requires the tree lock. GetCheckerboardMagnitude is not allowed to
acquire the tree lock as it runs on the sampler thread.
This patch therefore takes another approach: the APZCTreeManager code
precomputes the clipped composition bounds for each APZC (which it can do
relatively efficiently and holding just the map lock), and then provides
that rect to the individual APZCs when they invoke GetCheckerboardMagnitude.
The additional changes to GetCheckerboardMagnitude are copied from the
IsCurrentlyCheckerboarding codepath, and the next patch will remove the latter
function entirely since GetCheckerboardMagnitude will do everything that is
needed for IsCurrentlyCheckerboarding.
Depends on D67375
Assignee | ||
Comment 18•5 years ago
|
||
IsCurrentlyCheckerboarding can be removed now, as it does the same underlying
computation as "GetCheckerboardMagnitude(...) > 0". The only difference is that
the caller now needs to compute the clipped composition bounds, which is
easy enough to do at the one call site.
Depends on D67376
Assignee | ||
Comment 19•5 years ago
|
||
Depends on D67377
Assignee | ||
Comment 20•5 years ago
|
||
Updated to address the review comments: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=9f0521fe7f903bcb9b6e7cab72859457874d7761
Updated•5 years ago
|
Comment 21•5 years ago
|
||
Comment 22•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/c41e90684dc9
https://hg.mozilla.org/mozilla-central/rev/10df53f9927f
https://hg.mozilla.org/mozilla-central/rev/014e0b537c11
https://hg.mozilla.org/mozilla-central/rev/1f0cdf3fd531
https://hg.mozilla.org/mozilla-central/rev/d1990005fc98
Updated•4 years ago
|
Description
•