Document that canRecordExtended/canRecordPrereleaseData will always report `true` on non-content child processes
Categories
(Toolkit :: Telemetry, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox100 | --- | fixed |
People
(Reporter: tnikkel, Assigned: perry.mcmanis)
Details
Attachments
(1 file)
Extended telemetry should not be recorded or sent on the release channel, according to
We init gCanRecordExtended in TelemetryHistogram::InitializeGlobalState
The only caller of that function passes true for the parent, content, gpu, and network process
Then later TelemetryImpl::SetCanRecordExtended is called which can potentially set canRecordExtended to false. Except the only callers of this that I can find are in js, in
and another caller in that file. But I don't think there is any js executed in the network or gpu process, and indeed I do not see TelemetryImpl::SetCanRecordExtended ever called in the gpu or network process.
Using about:telemetry I checked to see if a release build would submit checkerboarding histogram telemetry and it did indeed appear to show up in previously submitted pings there. Checkerboarding telemetry should be extended only see here
And on the other end we do seem to receive checkerboard telemetry from the gpu process on release
I'm not generally familiar with the telemetry code, so someone should definitely verify this.
Reporter | ||
Comment 1•2 years ago
|
||
Can you make sure this gets to the right people/person? Thanks.
Comment 2•2 years ago
|
||
That'd be me.
Congratulations (and my condolences) for untangling the mess of how canRecordExtended
works. It's not the easiest part of the codebase to play in.
However, its value on non-parent processes shouldn't matter. All child-process actions on Telemetry pieces are sent over to the parent process where they're checked and conditionally applied. Each of the codepaths starting in TelemetryIPC
check CanRecordDataset
which will use the probe's definition to tell us whether it can be recorded in this build or not and compare it against the parent-process' knowledge of whether we're recording base or extended.
So, how does this corroborate with your observed behaviour of CHECKERBOARD_SEVERITY
, then? Well, according to the definition in Histograms.json, CHECKERBOARD_SEVERITY
should be collected in release (see bug 1539309). The forTelemetry
code in APZ dates from 2016 when Telemetry (and Firefox as a whole) only had at most two processes, and should probably have been adjusted in bug 1539309. (+Cc jrmuizel for visibility: maybe you'll want a bug to fix that up or tear it out.)
So I think we're clear here and we can un-mococonf this bug. The flags are indeed incorrect in any process that isn't running TelemetryControllerBase.jsm
's initialization routines, but that's a documentation and API ergonomics issue because they don't actually need to be correct in those processes to be dealt with correctly before storing and reporting data.
:tnikkel, what do you think? If we un-mococonf this bug, I'll use this to quickly get the APIs' docs updated.
Reporter | ||
Comment 3•2 years ago
|
||
Phew! Thanks for the explanation! That helps explain some of the other weirdness I was seeing.
Yes, you can un-mococonf this bug. I will handle filling a follow up and fixing up the checkerboarding telemetry.
Reporter | ||
Comment 4•2 years ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #3)
I will handle filling a follow up and fixing up the checkerboarding telemetry.
That bug is bug 1758670.
Comment 5•2 years ago
|
||
This is now a bug to fix API documentation : )
I should be able to get to this promptly.
Comment 6•2 years ago
|
||
Passing to Perry.
You'll be editing nsITelemetry.idl
to update the comments on the attributes canRecordExtended
and canRecordPrereleaseData
to note that they will always return true
on non-content child processes (or, in reverse, that their values should only be trusted on the parent process).
Background: nsITelemetry.idl
is the backbone of the Telemetry singleton available in C++ and JS in Firefox Desktop. It has the public API for recording data, and some other stuff added over the years.
Comment 7•2 years ago
|
||
(In reply to Chris H-C :chutten from comment #2)
The flags are indeed incorrect in any process that isn't running
TelemetryControllerBase.jsm
's initialization routines, but that's a documentation and API ergonomics issue because they don't actually need to be correct in those processes to be dealt with correctly before storing and reporting data.
Is it perhaps also a potential performance issue, if those processes do unnecessary work gated on Telemetry::CanRecordExtended()
to collect data that then gets discarded?
Comment 8•2 years ago
|
||
(In reply to Botond Ballo [:botond] from comment #7)
Is it perhaps also a potential performance issue, if those processes do unnecessary work gated on
Telemetry::CanRecordExtended()
to collect data that then gets discarded?
True. I'm unclear on the cost/benefit of designing and implementing a generic cross-process solution for canRecord*
at this time, so I'll leave it to consumers to reach out when they discover they need it and continue work on Glean (which doesn't have this lever).
Assignee | ||
Comment 9•2 years ago
|
||
Clarifying expected behavior of telemetry's canRecordExtended and canRecordPreleaseData in child processes in the documentation.
Comment 10•2 years ago
|
||
Pushed by chutten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/705b93332795 update documentation of canRecordExtended and canRecordPreleaseData. r=chutten DONTBUILD
Comment 11•2 years ago
|
||
bugherder |
Description
•