Open Bug 1671458 Opened 4 years ago Updated 2 years ago

Consider collecting Telemetry and filtering attribution origins on macOS

Categories

(Firefox :: Installer, enhancement, P2)

Unspecified
macOS
enhancement

Tracking

()

People

(Reporter: nalexander, Unassigned)

Details

On Windows, the attribution process is "obvious": a user downloads an installer that includes some attribution data, which is then made available to Firefox at runtime.

On macOS, the attribution process is "not obvious": the user downloads a DMG, and macOS and the user's browser collaborate to keep the URL of the download page which linked to the DMG in the user's quarantine database. This URL is then available to Firefox at runtime, and attribution data is extracted from it.

All of the macOS attribution data is extracted from the query parameters in that URL. However, we have an entire URL to analyze! I think we should at least consider:

  1. collecting Telemetry in some form on the URLs we witness. We have an opportunity to understand the "dark funnel" on macOS, i.e., installations that aren't coming from mozilla.org. There are data privacy concerns here.

  2. allow-listing attribution data based on the URL we witness. Right now we consider the valid "return to AMO" attribution URL

https://www.mozilla.org/en-US/firefox/new/?utm_campaign=non-fx-button&utm_content=rta:QGNvbnRhaW4tZmFjZWJvb2s&utm_medium=referral&utm_source=addons.mozilla.org

to be identical to

https://hot.warez/?utm_campaign=non-fx-button&utm_content=rta:QGNvbnRhaW4tZmFjZWJvb2s&utm_medium=referral&utm_source=addons.mozilla.org

when the latter is clearly not a Mozilla download. This will manifest as very messy data.

mkaply: re: point 1), I remember you had a list of places where lots of people get Firefox (that aren't Mozilla). Have you had conversations around the policies we have for collecting information about those places?

Re: point 2), perhaps you Just Know: would the allow-list be mozilla.org? Or are there other landing pages that Mozilla hosts that link to downloads?

Flags: needinfo?(mozilla)

1), I remember you had a list of places where lots of people get Firefox (that aren't Mozilla). Have you had conversations around the policies we have for collecting information about those places?

We haven't had conversations around collecting information about those places We probably should open a bug with the data team to see how they feel about collecting the URL. In my mind, this wouldn't be much different then getting a Referrer?

2), perhaps you Just Know: would the allow-list be mozilla.org? Or are there other landing pages that Mozilla hosts that link to downloads?

Right now just mozilla.org. We'll be able to attribute to different download locations on mozilla.org properties.

Flags: needinfo?(mozilla)

(In reply to Mike Kaply [:mkaply] from comment #2)

We probably should open a bug with the data team to see how they feel about collecting the URL. In my mind, this wouldn't be much different then getting a Referrer?

Filed this request for a chat with the data team about this.

Chiming in here:

From reading this ticket, I believe collecting the URL would fall into Category 3 data in our current data collection policy, but I'm not sure if this falls into Firefox Data collection, since it's kinda being collected before Firefox is installed. I'd check in with a data steward (pinging +chutten).

If we were to use a "white list" implementation, (i.e. only report the actual website if it falls into a white list of known distributors, otherwise, censor it as "other"), that might make it fall within policy?

From a personal perspective, I think this data would be useful and should be collected, so I support this as a data scientist.

One thing to point out that (it might already be obvious but I want to explicitly state it), this URL info should be collected in a new field. The old attribution fields (source, campaign, etc.) should continue to be collected the way they were previously, and this new URL information should be collected as a new, supplementary field.

Flags: needinfo?(chutten)

A point of clarification: The Data Collection Policy (including the Categories) are Mozilla-wide. All new or expanded data collections in Mozilla projects are subject to Data Collection Review (which has a categorization step).

Collecting a URL is Category 3, yes. Collecting the URL even when it's from a *.mozilla.{org|com} domain is still Category 3, but that sort of mitigation against the risk of PII leakage means Trust will likely okay its default-on collection in release. Similarly it would be Category 3 but a good mitigation for default-on collection in Release if you wanted to record which of a list of known domains it came from, or Other if it isn't in the list.

Not sure we could come up with a risk mitigation for collecting the entire url from non-mozilla-controlled sites. Too likely that the url could contain anything at all.

Flags: needinfo?(chutten)
Priority: -- → P2
No longer depends on: 1619353
You need to log in before you can comment on or make changes to this bug.