CONTENT_FULL_PAINT_TIME is missing from the dashboard

RESOLVED FIXED in Firefox 67

Status

()

defect
RESOLVED FIXED
3 months ago
3 months ago

People

(Reporter: jrmuizel, Assigned: kats)

Tracking

(Regression, {regression})

unspecified
mozilla68
Points:
---
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox66 wontfix, firefox67 fixed, firefox68 fixed)

Details

Attachments

(1 attachment)

Not sure why this happened.

The TMO evolution dashboard also seems to suggest that the probe stopped being sent after the Feb 25 nightly.

Interesting that the effect is only on Windows. The probe submission numbers seem fine on Linux and macOS. I can try and investigate a bit.

Assignee: nobody → kats

I did a local build on windows 10 and it seems to be hitting the relevant accumulate call. :chutten, any ideas why this probe would have just stopped showing up? The TMO link in comment 1 shows no data for 67 nightly after Feb 25, and then it comes back at super low-volume in nightly 68.

Flags: needinfo?(chutten)

If you turn "Sanitize Data" off, it shows a bit more of the picture: https://mzl.la/2Zc19rY

Starting with Feb 26th's nightly submission volumes go from 80-110k down to sub 1k, which would usually suggest that the probe no longer submits. If it happened on a version edge it'd be consistent with the probe expiring.

Most of the samples come from the gpu process. Might something have happened to turn that off?

Flags: needinfo?(chutten)

(In reply to Chris H-C :chutten from comment #4)

Most of the samples come from the gpu process. Might something have happened to turn that off?

Nothing that I can think of. I looked at other related probes (e.g. WR_SCENEBUILD_TIME) and the submission rate for those seems fine. Those should also have roughly the same submission distribution as CONTENT_FULL_PAINT_TIME.

I've thrown the post-anomaly Nightly pings into a databricks notebook to poke around a bit: https://dbc-caf9527b-e073.cloud.databricks.com/#notebook/101401/command/101402

About 20% of pings have a parent-process CONTENT_FULL_PAINT_TIME whereas only 96 have one in the gpu process. This is compared to 27% having a gpu-process WR_SCENEBUILD_TIME.

So there's something wrong, but it's not in the aggregator.

Oh. Well lookee here: it's not recordable in the gpu process: https://searchfox.org/mozilla-central/source/toolkit/components/telemetry/Histograms.json#13097

headdesk

I can fix that bit. But then why are we getting any at all from the GPU process?

My best guess is that there's some sort of timing weirdness that permits some marginal amounts of accumulations some of the time... see also bug 1544028 (where I will refer back to this as potentially-relevant info)

Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5bf26403f3a9
Record CONTENT_FULL_PAINT_TIME in the GPU process too. r=chutten

Comment on attachment 9058395 [details]
Bug 1544039 - Record CONTENT_FULL_PAINT_TIME in the GPU process too. r?chutten

Beta/Release Uplift Approval Request

  • Feature/Bug causing the regression: Bug 1505858
  • User impact if declined: We don't get complete data for this probe which can impact our decisions
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Trivial non-code change
  • String changes made/needed:
Attachment #9058395 - Flags: approval-mozilla-beta?
Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68
No longer blocks: 1505858
Has Regression Range: --- → yes
Has STR: --- → irrelevant
Regressed by: 1505858

Comment on attachment 9058395 [details]
Bug 1544039 - Record CONTENT_FULL_PAINT_TIME in the GPU process too. r?chutten

Restore WebRender telemetry probe, low risk, uplift approved for 67 beta 12, thanks.

Attachment #9058395 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

For the record it looks like it was bug 1530361 that caused the probe to (correctly) stop getting recorded in the GPU process. Prior to that the code would record it in the GPU process even though it wasn't supposed to, per the probe definition.

(Edit: the actual pushlog range based on buildids is https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=49b2a4c8be018f92d050512f9646cb3004ec1bec&tochange=110ea2a7c3d4f34b5079c195f7ea57966748e6da but that change is the obvious culprit in that range)

Regressed by: 1530361
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.