Closed Bug 1758248 Opened 2 years ago Closed 2 years ago

Document that canRecordExtended/canRecordPrereleaseData will always report `true` on non-content child processes

Categories

(Toolkit :: Telemetry, defect, P1)

defect

Tracking

()

RESOLVED FIXED
100 Branch
Tracking Status
firefox100 --- fixed

People

(Reporter: tnikkel, Assigned: perry.mcmanis)

Details

Attachments

(1 file)

Extended telemetry should not be recorded or sent on the release channel, according to

https://searchfox.org/mozilla-central/rev/1e13dfc1bd87c3747d6712807401c590d0211a46/toolkit/components/telemetry/core/nsITelemetry.idl#331

We init gCanRecordExtended in TelemetryHistogram::InitializeGlobalState

https://searchfox.org/mozilla-central/rev/1e13dfc1bd87c3747d6712807401c590d0211a46/toolkit/components/telemetry/core/TelemetryHistogram.cpp#2426

The only caller of that function passes true for the parent, content, gpu, and network process

https://searchfox.org/mozilla-central/rev/1e13dfc1bd87c3747d6712807401c590d0211a46/toolkit/components/telemetry/core/Telemetry.cpp#1140

Then later TelemetryImpl::SetCanRecordExtended is called which can potentially set canRecordExtended to false. Except the only callers of this that I can find are in js, in

https://searchfox.org/mozilla-central/rev/1e13dfc1bd87c3747d6712807401c590d0211a46/toolkit/components/telemetry/app/TelemetryControllerBase.jsm#111

and another caller in that file. But I don't think there is any js executed in the network or gpu process, and indeed I do not see TelemetryImpl::SetCanRecordExtended ever called in the gpu or network process.

Using about:telemetry I checked to see if a release build would submit checkerboarding histogram telemetry and it did indeed appear to show up in previously submitted pings there. Checkerboarding telemetry should be extended only see here

https://searchfox.org/mozilla-central/rev/1e13dfc1bd87c3747d6712807401c590d0211a46/gfx/layers/apz/src/AsyncPanZoomController.cpp#4821

And on the other end we do seem to receive checkerboard telemetry from the gpu process on release

https://glam.telemetry.mozilla.org/firefox/probe/checkerboard_severity/explore?aggregationLevel=version&channel=release&process=gpu&ref=95

I'm not generally familiar with the telemetry code, so someone should definitely verify this.

Can you make sure this gets to the right people/person? Thanks.

Flags: needinfo?(chutten)

That'd be me.

Congratulations (and my condolences) for untangling the mess of how canRecordExtended works. It's not the easiest part of the codebase to play in.

However, its value on non-parent processes shouldn't matter. All child-process actions on Telemetry pieces are sent over to the parent process where they're checked and conditionally applied. Each of the codepaths starting in TelemetryIPC check CanRecordDataset which will use the probe's definition to tell us whether it can be recorded in this build or not and compare it against the parent-process' knowledge of whether we're recording base or extended.

So, how does this corroborate with your observed behaviour of CHECKERBOARD_SEVERITY, then? Well, according to the definition in Histograms.json, CHECKERBOARD_SEVERITY should be collected in release (see bug 1539309). The forTelemetry code in APZ dates from 2016 when Telemetry (and Firefox as a whole) only had at most two processes, and should probably have been adjusted in bug 1539309. (+Cc jrmuizel for visibility: maybe you'll want a bug to fix that up or tear it out.)

So I think we're clear here and we can un-mococonf this bug. The flags are indeed incorrect in any process that isn't running TelemetryControllerBase.jsm's initialization routines, but that's a documentation and API ergonomics issue because they don't actually need to be correct in those processes to be dealt with correctly before storing and reporting data.

:tnikkel, what do you think? If we un-mococonf this bug, I'll use this to quickly get the APIs' docs updated.

Assignee: nobody → chutten
Severity: -- → S2
Status: NEW → ASSIGNED
Flags: needinfo?(chutten) → needinfo?(tnikkel)
Priority: -- → P1

Phew! Thanks for the explanation! That helps explain some of the other weirdness I was seeing.

Yes, you can un-mococonf this bug. I will handle filling a follow up and fixing up the checkerboarding telemetry.

Flags: needinfo?(tnikkel)

(In reply to Timothy Nikkel (:tnikkel) from comment #3)

I will handle filling a follow up and fixing up the checkerboarding telemetry.

That bug is bug 1758670.

This is now a bug to fix API documentation : )

I should be able to get to this promptly.

Group: mozilla-employee-confidential
Summary: gpu and network processes appear to record and send extended telemetry on release channel → Document that canRecordExtended/canRecordPrereleaseData will always report `true` on non-content child processes

Passing to Perry.

You'll be editing nsITelemetry.idl to update the comments on the attributes canRecordExtended and canRecordPrereleaseData to note that they will always return true on non-content child processes (or, in reverse, that their values should only be trusted on the parent process).

Background: nsITelemetry.idl is the backbone of the Telemetry singleton available in C++ and JS in Firefox Desktop. It has the public API for recording data, and some other stuff added over the years.

Assignee: chutten → pmcmanis

(In reply to Chris H-C :chutten from comment #2)

The flags are indeed incorrect in any process that isn't running TelemetryControllerBase.jsm's initialization routines, but that's a documentation and API ergonomics issue because they don't actually need to be correct in those processes to be dealt with correctly before storing and reporting data.

Is it perhaps also a potential performance issue, if those processes do unnecessary work gated on Telemetry::CanRecordExtended() to collect data that then gets discarded?

(In reply to Botond Ballo [:botond] from comment #7)

Is it perhaps also a potential performance issue, if those processes do unnecessary work gated on Telemetry::CanRecordExtended() to collect data that then gets discarded?

True. I'm unclear on the cost/benefit of designing and implementing a generic cross-process solution for canRecord* at this time, so I'll leave it to consumers to reach out when they discover they need it and continue work on Glean (which doesn't have this lever).

Clarifying expected behavior of telemetry's canRecordExtended and canRecordPreleaseData in child processes in the documentation.

Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/705b93332795
update documentation of canRecordExtended and canRecordPreleaseData. r=chutten DONTBUILD
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 100 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: