Closed Bug 1851624 Opened 1 year ago Closed 1 year ago

Extend windows installer telemetry to include the source website the installer was downloaded from

Categories

(Firefox :: Installer, enhancement)

enhancement

Tracking

()

RESOLVED DUPLICATE of bug 1815023

People

(Reporter: RT, Unassigned)

Details

After discussions with David it seems that most browsers include data about the source website the installer was downloaded from inside NTFS "alternate file stream", commonly known as "Mark of the Web" or MOTW

User story: As a product manager I want to understand where users download Firefox from so that I can address attribution sources as well as possible.

Acceptance criteria:

  • The installer telemetry includes information about ReferrerUrl, HostUrl, download path location (informs about downloading browser) and ZoneId
  • Both stub and full installers are supported

Sample data:
PS C:\Users\tigge> Get-Content -Stream Zone.Identifier 'C:\Users\tigge\OneDrive\Downloads\edge Firefox Installer.exe'
[ZoneTransfer]
ZoneId=3
ReferrerUrl=https://www.mozilla.org/
HostUrl=https://download-installer.cdn.mozilla.net/pub/firefox/releases/117.0/win32/en-US/Firefox%20Installer.exe

So, a couple of things here. We investigated the possibility of doing something like this in Bug 1815023. I didn't really look much at the resulting telemetry, but my understanding is that we discovered that it's fairly unusual that we are actually able to read any useful data this way. From my own testing, it seems like Windows usually removes the Alternate Data Stream from the installer when it runs. My understanding is that this has to do with dismissing the "this file was downloaded from the internet" warning such that it only happens the first time you run the file, not every time.

The other thing is that it was still an open question how we would collect any more information here in a way that wasn't an atrocious invasion of privacy. Just sending back raw URLs that the user visited not only compromises the user's privacy but, depending on what the URL is, could even compromise their safety.

Thanks for the context, super helpful.
Looking at https://sql.telemetry.mozilla.org/queries/94400 seem to confirm your understanding that most of times the data is not available. Have we explored options to fix or is that a dead end?
Agreed on telemetry collection - this is clearly something to validate with data stewards if indeed there are options to collect valuable insights - it sounds like the steps that were taken so fair aimed at validating if indeed these insights exist.

Yes, we tried to answer the questions "would NTFS ADS allow us to answer questions about unattributed provenance" (as distinct from attribution, specific to mozilla.org). The answer is essentially "no, there's not enough data available in the wild". See https://docs.google.com/document/d/1kPiMsbxKDrFafdtJpu8xFHKEmvsvTx_Ne1FWwvmRZro/edit for what we implemented, more or less, and https://mozilla-hub.atlassian.net/browse/FIDEDI-466 for what analysis I did (sorry, both internal).

I am going to mark this a duplicate of Bug 1815023 since we ran this down fairly thoroughly.

Status: NEW → RESOLVED
Closed: 1 year ago
Duplicate of bug: 1815023
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.