Collect machine model info
Categories
(Core :: Privacy: Anti-Tracking, task, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox138 | --- | fixed |
People
(Reporter: fkilic, Assigned: fkilic)
References
(Blocks 1 open bug)
Details
Attachments
(4 files)
DATA REVIEW REQUEST
- What questions will you answer with this data?
We want to collect machine model information to help us understand data better. Especially for anomalies. We are aware that these aren't available to web apis, but we are hoping to collect this data to understand the data.
More generally: What is the most productive use of engineering time to make fingerprinting an ineffective method of tracking users? As detailed in https://bugzilla.mozilla.org/show_bug.cgi?id=1879151
- Why does Mozilla need to answer these questions? Are there benefits for users?
Do we need this information to address product or business requirements?
We want to improve our fingerprinting defenses. We don't want to guess at what will make an improvement, so we want to make a decision based on data. We also want to know how much of an improvement we have made, so we can state it and know how much further we have to go.
- What alternative methods did you consider to answer these questions?
Why were they not sufficient?
We considered privacy preserving metric collection (DAP), collecting it indirectly (e.g. via hashes of the data), using exisiting (lmited) data we currently collect, not collecting the data at all and using academic literature. These options are detailed in https://docs.google.com/document/d/1m_j0BQEprQleRHZ7tVT7mG-krc8UA171GD5Vl6gZbL0/edit
- Can current instrumentation answer these questions?
As detailed in https://docs.google.com/document/d/1m_j0BQEprQleRHZ7tVT7mG-krc8UA171GD5Vl6gZbL0/edit - some attributes are collected by current instrumentation. However, using this data (and not using the other data we don't collect) will give an incomplete picture that may mislead us into choosing a task that does not make an appreciable change for users. We will also be unable to accurately state the improvement we have made.
- List all proposed measurements and indicate the category of data collection for each
measurement, using the Firefox data collection categories found on the Mozilla wiki.
Measurement Name | Measurement Description | Data Collection Category | Tracking Bug |
---|---|---|---|
characteristics.machine_model_name |
Machine model name | stored_content | https://bugzilla.mozilla.org/show_bug.cgi?id=1952006 |
- Please provide a link to the documentation for this data collection which
describes the ultimate data set in a public, complete, and accurate way.
This collection is Glean so is documented in the Glean Dictionary.
- How long will this data be collected?
This collection will be collected permanently.
tom@mozilla.com will be responsible for the permanent collections.
- What populations will you measure?
All channels, countries, and locales. No filters.
- If this data collection is default on, what is the opt-out mechanism for users?
These collections are Glean. The opt-out can be found in the product's preferences.
- Please provide a general description of how you will analyze this data.
The general question is "What engineering tasks should we do". To determine that, we will answer sub-questions like:
- How many users are uniquely identifiable via fingerprinting?
- For the users who are not, how large a cohort are they bucketed into?
- What attributes contribute the most to making users unique, or placing them in small buckets
- What attributes correlate with each other, such that we would need to address them in tandem
- Where do you intend to share the results of your analysis?
We hope to publish an academic paper, actually, as this is a significant contribution to the topic of browser fingerprinting. We can also expect to do a blog post. The decisions about what engineering tasks we choose to do to decrease the uniqueness of our users will be filed as Bugzilla Bugs that will contain descriptions of why this is the engineering task to do.
- Is there a third-party tool (i.e. not Glean or Telemetry) that you
are proposing to use for this data collection?
No.
Assignee | ||
Comment 1•20 days ago
|
||
Assignee | ||
Comment 2•20 days ago
|
||
Assignee | ||
Comment 3•20 days ago
|
||
Assignee | ||
Comment 4•20 days ago
|
||
Assignee | ||
Comment 5•15 days ago
|
||
Per data-review: we don't think this will contribute more entropy than we are already collecting, but we will monitor the data to ensure it isn't going off the rails. We're making it Cat 3 to ensure it isn't collected in the future without the additional protections we have in place for this collection (OHTTP, no client id, data access, deletion, etc).
Comment 7•5 days ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/9ebccffed55b
https://hg.mozilla.org/mozilla-central/rev/5a4dd2b70f8f
https://hg.mozilla.org/mozilla-central/rev/008f182ea730
https://hg.mozilla.org/mozilla-central/rev/f84dd3ebb884
Description
•