Closed Bug 1606898 Opened 4 years ago Closed 4 years ago

No RDD process untrusted module telemetry data

Categories

(External Software Affecting Firefox :: Telemetry, task)

Desktop
Windows
task
Not set
normal

Tracking

(firefox74 fixed)

RESOLVED FIXED
Tracking Status
firefox74 --- fixed

People

(Reporter: jimm, Assigned: toshi)

References

Details

Attachments

(1 file)

I was looking at child process data via redash query -

https://sql.telemetry.mozilla.org/queries/67352/source?p_process_name_undefined=rdd

We have data coming in for 'browser' and 'tab' but not 'rdd' -

https://searchfox.org/mozilla-central/source/xpcom/build/GeckoProcessTypes.h

At first I was thinking this was a side effect of turning CIG on for the RDD, work tritter did back in august. But that mitigation was applied delayed -

https://searchfox.org/mozilla-central/source/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp#860

Which has me wondering if something might be broken, as I would expect some injections to get in prior to the sandbox applying this change. Also, CIG only works on Win10 so this is looking more and more like a collection bug.

Lets investigate this and confirm telemetry is working correctly from the rdd. I'd really like to see what impact CIG has on this but right now I don't trust the data.

Flags: needinfo?(tkikuchi)

Upon debugging, the root cause was not CIG. RDD was opted out because of this telemetry check. And telemetry reporting from RDD is off because of this condition. The fix would be either to remove Telemetry::CanRecordReleaseData() or to add XRE_IsRDDProcess().

Assignee: nobody → tkikuchi
Flags: needinfo?(tkikuchi)

:chutten, is there any risk to allow a new process type in TelemetryImpl::CreateTelemetryInstance()? We want to add RDD process there to collect the third party module ping from RDD process.

Flags: needinfo?(chutten)

Do we need to add a new process type to Telemetry in order to perform that collection? The "third-party-modules" ping is already a custom ping so it behaves by its own rules, no?

(( The risk in adding one is that we turn on histograms, scalars, events, and some few other things for that process. Which probably wouldn't be a big deal, but would add more data for every central ping for every profile... which would add up. ))

Maybe I need some more information about how the "third-party-modules" ping chooses what to include and how that interacts with useTelemetry.

Flags: needinfo?(chutten)

I see. Changing TelemetryImpl::CreateTelemetryInstance has far more impacts than what we want.

Bug 1522830 expanded the third party module ping to Tab & RDD process, but we're not getting data from RDD because we call Telemetry::CanRecordReleaseData() here.

The ping is collected in a child process, but it's sent from the browser process anyway, so now I'm leaning toward removing Telemetry::CanRecordReleaseData() from UntrustedModulesProcessor as a fix.

Aaron, can we remove Telemetry::CanRecordReleaseData()? Looking at TelemetryImpl::CreateTelemetryInstance(), I think it just filters a process type. recordreplay::IsRecordingOrReplaying() always returns false on Windows.

Flags: needinfo?(aklotz)

We cannot remove it outright; that's what is used to ensure that we aren't collecting telemetry on users who have disabled it.

I think we need to know more about why this is the problem. Does the RDD process not initialize the code that is required to get true from Telemetry::CanRecordReleaseData?

Or is this a red herring?

We only query for third-party modules information when the appropriate timer fires in the parent process. It occurs to me that we will get RDD data if and only if that timer fires at the same time that an RDD process is live. Could that be the problem?

Flags: needinfo?(aklotz)

(In reply to Aaron Klotz [:aklotz] from comment #5)

I think we need to know more about why this is the problem. Does the RDD process not initialize the code that is required to get true from Telemetry::CanRecordReleaseData?

Yes. I didn't see canRecordBase (= a value returned from CanRecordReleaseData()) is set to true in RDD even though it's set to true in Browser and Tab process.

canRecordBase is initialized based on a process type here, and then in Tab or Browser process, it's updated in enableTelemetryRecording again, but enableTelemetryRecording was never called in RDD.

I found a concept Unified Telemetry. It's enabled by default and when it's enabled, base data is always recorded. Maybe we should check the unified telemetry is on or off instead of using CanRecordReleaseData()?

Oh, please don't use our preferences. They're a mess : (

(( And especially don't use that one. It should be true in all builds on all channels in all distributions, and will be removed whenever we get around to bug 1577863 ))

Does the process doing the recording need to know if Telemetry is enabled, or just the process compiling the reports and sending the ping?

Also: ...is this RDD process likely to, in the near-term, need Histogram/Scalar/Event support anyway? Maybe we want to bring it into the fold anyway and we're dancing around it for no reason?

(In reply to Aaron Klotz [:aklotz] from comment #5)

We only query for third-party modules information when the appropriate timer fires in the parent process. It occurs to me that we will get RDD data if and only if that timer fires at the same time that an RDD process is live. Could that be the problem?

You mean this part in UntrustedModulesPing.jsm, right? This is working good. The problem is Telemetry::CanRecordReleaseData() blocks UntrustedModulesProcessor from being created, so RDD process never processes data.

(In reply to Chris H-C :chutten from comment #7)

Oh, please don't use our preferences. They're a mess : (

(( And especially don't use that one. It should be true in all builds on all channels in all distributions, and will be removed whenever we get around to bug 1577863 ))

CanRecordReleaseData() always returns true in Browser/Tab, but unfortunately it always returns false in RDD...

Does the process doing the recording need to know if Telemetry is enabled, or just the process compiling the reports and sending the ping?

Our code in RDD process collects data and submits via the standard telemetry infrastructure of the browser process. Technically we don't need to know if telemetry is enabled, but we added CanRecordReleaseData() to respect ther user's choice. Aaron, is my understanding correct?

Also: ...is this RDD process likely to, in the near-term, need Histogram/Scalar/Event support anyway? Maybe we want to bring it into the fold anyway and we're dancing around it for no reason?

The third-party-module ping is independent, and we don't have a plan to collect Histogram/Scalar/Event metrics from RDD.

So the question is, should we check some configuration in RDD to collect our ping? With Unified Telemetry, we always collect base data. Does that mean we can collect our ping as well? Or, is there any API we should call before collecting data?

From a Data Stewardship point of view, recording data locally is always permitted. It's the upload of that data to our servers that turns it into collection. Since the user data choice preference prevents data uploads at the Telemetry level, any further checks end up being solely optimizations to avoid performing extra work that won't be reported to mozilla.

So if we don't mind skipping the optimization in this case, we can unconditionally record in that process and rely on Telemetry stopping things from being uploaded.

(In reply to Chris H-C :chutten from comment #9)

From a Data Stewardship point of view, recording data locally is always permitted. It's the upload of that data to our servers that turns it into collection. Since the user data choice preference prevents data uploads at the Telemetry level, any further checks end up being solely optimizations to avoid performing extra work that won't be reported to mozilla.

So if we don't mind skipping the optimization in this case, we can unconditionally record in that process and rely on Telemetry stopping things from being uploaded.

Thank you for making it clear. One last question. If we want to skip collection for performance purpose, what is a recommended way? If there is a simple way to check it, I'd like to use it. If not, I'm fine with removing the current check of CanRecordReleaseData().

The best way is indeed CanRecordReleaseData() (or CanRecordPreReleaseData() if you also want to mix in a check to make sure that you're on Nightly or Beta). But as we've now learned, that only works on processes that have Telemetry of any kind working, so maybe check that in the parent process : )

Telemetry::CanRecordReleaseData() always returns false in the RDD process. This
behavior prevents the RDD process from collecting the third-party-module ping.
This patch removes CanRecordReleaseData() from the RDD process. The telemetry
settings are respected when the browser process submits data.

Pushed by cbrindusan@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e104e86784c1
Allow RDD process to collect the third-party-module ping.  r=aklotz
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: