Closed Bug 1955332 Opened 1 month ago Closed 27 days ago

Collect fingerprinted fonts present

Categories

(Core :: Privacy: Anti-Tracking, enhancement)

enhancement

Tracking

()

RESOLVED FIXED
138 Branch
Tracking Status
firefox138 --- fixed

People

(Reporter: tjr, Assigned: fkilic)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Font fingerprinting is a technique fingerprinters use to uniquely identify people by seeing which combination of uncommon or less common fonts they have installed. Presently we know which of the fonts we allowlist users don't have. But we don't know which fonts that fingerprinters check users do have. The former is useful to identify how unique users could be when using the protections we have. The latter is needed to know how unique users look to fingerprinters today when the protections aren't enabled. (e.g. in Standard mode.)

We have lists of fonts that fingerprinters collect, so we're going to report back which fonts users have installed. However we don't actually need to know which fonts, we just need a mapping from font-set to unique string, so we'll sort the fonts alphabetically and then hash the names together. Users with the same hash have the same fontfingerprint. I'm aware this isn't a cryptographically sound approach - a known, finite set of items could have a lookup dictionary calculated to reverse the hashes; but its recognized we're doing this to try to provide some privacy when we could make it easier and not provide it, and take it at face value when I say I have no interest in doing trying to reverse the hash.

DATA REVIEW REQUEST

  1. What questions will you answer with this data?

How much entropy font fingerprinting provide

More generally: What is the most productive use of engineering time to make fingerprinting an ineffective method of tracking users? As detailed in https://bugzilla.mozilla.org/show_bug.cgi?id=1879151

  1. Why does Mozilla need to answer these questions? Are there benefits for users?
    Do we need this information to address product or business requirements?

We want to improve our fingerprinting defenses. We don't want to guess at what will make an improvement, so we want to make a decision based on data. We also want to know how much of an improvement we have made, so we can state it and know how much further we have to go.

  1. What alternative methods did you consider to answer these questions?
    Why were they not sufficient?

We considered privacy preserving metric collection (DAP), collecting it indirectly (e.g. via hashes of the data), using exisiting (lmited) data we currently collect, not collecting the data at all and using academic literature. These options are detailed in https://docs.google.com/document/d/1m_j0BQEprQleRHZ7tVT7mG-krc8UA171GD5Vl6gZbL0/edit

  1. Can current instrumentation answer these questions?

As detailed in https://docs.google.com/document/d/1m_j0BQEprQleRHZ7tVT7mG-krc8UA171GD5Vl6gZbL0/edit - some attributes are collected by current instrumentation. However, using this data (and not using the other data we don't collect) will give an incomplete picture that may mislead us into choosing a task that does not make an appreciable change for users. We will also be unable to accurately state the improvement we have made.

  1. List all proposed measurements and indicate the category of data collection for each
    measurement, using the Firefox data collection categories found on the Mozilla wiki.
Measurement Name Measurement Description Data Collection Category Tracking Bug
characteristics.fpjs_fonts_allowlisted SHA256 of allowlisted fonts queried by FPJS technical https://bugzilla.mozilla.org/show_bug.cgi?id=1955687
characteristics.fpjs_fonts_nonallowlisted SHA256 of non-allowlisted fonts queried by FPJS technical https://bugzilla.mozilla.org/show_bug.cgi?id=1955687
characteristics.fonts_variant_a_allowlisted SHA256 of allowlisted fonts queried of variant A technical https://bugzilla.mozilla.org/show_bug.cgi?id=1955687
characteristics.fonts_variant_a_nonallowlisted SHA256 of non-allowlisted fonts queried of variant A technical https://bugzilla.mozilla.org/show_bug.cgi?id=1955687
characteristics.fonts_variant_b_allowlisted SHA256 of allowlisted fonts queried of variant B technical https://bugzilla.mozilla.org/show_bug.cgi?id=1955687
characteristics.fonts_variant_b_nonallowlisted SHA256 of non-allowlisted fonts queried of variant B technical https://bugzilla.mozilla.org/show_bug.cgi?id=1955687
  1. Please provide a link to the documentation for this data collection which
    describes the ultimate data set in a public, complete, and accurate way.

This collection is Glean so is documented in the Glean Dictionary.

  1. How long will this data be collected?

This collection will be collected permanently.
tom@mozilla.com will be responsible for the permanent collections.

  1. What populations will you measure?

All channels, countries, and locales. No filters.

  1. If this data collection is default on, what is the opt-out mechanism for users?

These collections are Glean. The opt-out can be found in the product's preferences.

  1. Please provide a general description of how you will analyze this data.

The general question is "What engineering tasks should we do". To determine that, we will answer sub-questions like:

  • How many users are uniquely identifiable via fingerprinting?
  • For the users who are not, how large a cohort are they bucketed into?
  • What attributes contribute the most to making users unique, or placing them in small buckets
  • What attributes correlate with each other, such that we would need to address them in tandem
  1. Where do you intend to share the results of your analysis?

We hope to publish an academic paper, actually, as this is a significant contribution to the topic of browser fingerprinting. We can also expect to do a blog post. The decisions about what engineering tasks we choose to do to decrease the uniqueness of our users will be filed as Bugzilla Bugs that will contain descriptions of why this is the engineering task to do.

  1. Is there a third-party tool (i.e. not Glean or Telemetry) that you
    are proposing to use for this data collection?

No.

Duplicate of this bug: 1955687
Assignee: nobody → fkilic
Status: NEW → ASSIGNED
Status: ASSIGNED → RESOLVED
Closed: 27 days ago
Resolution: --- → FIXED
Target Milestone: --- → 138 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: