Collect fingerprinted fonts present
Categories
(Core :: Privacy: Anti-Tracking, enhancement)
Tracking
()
Tracking | Status | |
---|---|---|
firefox138 | --- | fixed |
People
(Reporter: tjr, Assigned: fkilic)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
Font fingerprinting is a technique fingerprinters use to uniquely identify people by seeing which combination of uncommon or less common fonts they have installed. Presently we know which of the fonts we allowlist users don't have. But we don't know which fonts that fingerprinters check users do have. The former is useful to identify how unique users could be when using the protections we have. The latter is needed to know how unique users look to fingerprinters today when the protections aren't enabled. (e.g. in Standard mode.)
We have lists of fonts that fingerprinters collect, so we're going to report back which fonts users have installed. However we don't actually need to know which fonts, we just need a mapping from font-set to unique string, so we'll sort the fonts alphabetically and then hash the names together. Users with the same hash have the same fontfingerprint. I'm aware this isn't a cryptographically sound approach - a known, finite set of items could have a lookup dictionary calculated to reverse the hashes; but its recognized we're doing this to try to provide some privacy when we could make it easier and not provide it, and take it at face value when I say I have no interest in doing trying to reverse the hash.
Assignee | ||
Comment 1•1 month ago
|
||
DATA REVIEW REQUEST
- What questions will you answer with this data?
How much entropy font fingerprinting provide
More generally: What is the most productive use of engineering time to make fingerprinting an ineffective method of tracking users? As detailed in https://bugzilla.mozilla.org/show_bug.cgi?id=1879151
- Why does Mozilla need to answer these questions? Are there benefits for users?
Do we need this information to address product or business requirements?
We want to improve our fingerprinting defenses. We don't want to guess at what will make an improvement, so we want to make a decision based on data. We also want to know how much of an improvement we have made, so we can state it and know how much further we have to go.
- What alternative methods did you consider to answer these questions?
Why were they not sufficient?
We considered privacy preserving metric collection (DAP), collecting it indirectly (e.g. via hashes of the data), using exisiting (lmited) data we currently collect, not collecting the data at all and using academic literature. These options are detailed in https://docs.google.com/document/d/1m_j0BQEprQleRHZ7tVT7mG-krc8UA171GD5Vl6gZbL0/edit
- Can current instrumentation answer these questions?
As detailed in https://docs.google.com/document/d/1m_j0BQEprQleRHZ7tVT7mG-krc8UA171GD5Vl6gZbL0/edit - some attributes are collected by current instrumentation. However, using this data (and not using the other data we don't collect) will give an incomplete picture that may mislead us into choosing a task that does not make an appreciable change for users. We will also be unable to accurately state the improvement we have made.
- List all proposed measurements and indicate the category of data collection for each
measurement, using the Firefox data collection categories found on the Mozilla wiki.
Measurement Name | Measurement Description | Data Collection Category | Tracking Bug |
---|---|---|---|
characteristics.fpjs_fonts_allowlisted |
SHA256 of allowlisted fonts queried by FPJS | technical | https://bugzilla.mozilla.org/show_bug.cgi?id=1955687 |
characteristics.fpjs_fonts_nonallowlisted |
SHA256 of non-allowlisted fonts queried by FPJS | technical | https://bugzilla.mozilla.org/show_bug.cgi?id=1955687 |
characteristics.fonts_variant_a_allowlisted |
SHA256 of allowlisted fonts queried of variant A | technical | https://bugzilla.mozilla.org/show_bug.cgi?id=1955687 |
characteristics.fonts_variant_a_nonallowlisted |
SHA256 of non-allowlisted fonts queried of variant A | technical | https://bugzilla.mozilla.org/show_bug.cgi?id=1955687 |
characteristics.fonts_variant_b_allowlisted |
SHA256 of allowlisted fonts queried of variant B | technical | https://bugzilla.mozilla.org/show_bug.cgi?id=1955687 |
characteristics.fonts_variant_b_nonallowlisted |
SHA256 of non-allowlisted fonts queried of variant B | technical | https://bugzilla.mozilla.org/show_bug.cgi?id=1955687 |
- Please provide a link to the documentation for this data collection which
describes the ultimate data set in a public, complete, and accurate way.
This collection is Glean so is documented in the Glean Dictionary.
- How long will this data be collected?
This collection will be collected permanently.
tom@mozilla.com will be responsible for the permanent collections.
- What populations will you measure?
All channels, countries, and locales. No filters.
- If this data collection is default on, what is the opt-out mechanism for users?
These collections are Glean. The opt-out can be found in the product's preferences.
- Please provide a general description of how you will analyze this data.
The general question is "What engineering tasks should we do". To determine that, we will answer sub-questions like:
- How many users are uniquely identifiable via fingerprinting?
- For the users who are not, how large a cohort are they bucketed into?
- What attributes contribute the most to making users unique, or placing them in small buckets
- What attributes correlate with each other, such that we would need to address them in tandem
- Where do you intend to share the results of your analysis?
We hope to publish an academic paper, actually, as this is a significant contribution to the topic of browser fingerprinting. We can also expect to do a blog post. The decisions about what engineering tasks we choose to do to decrease the uniqueness of our users will be filed as Bugzilla Bugs that will contain descriptions of why this is the engineering task to do.
- Is there a third-party tool (i.e. not Glean or Telemetry) that you
are proposing to use for this data collection?
No.
Assignee | ||
Comment 3•1 month ago
|
||
Updated•1 month ago
|
Comment 5•27 days ago
|
||
bugherder |
Description
•