Make Storage Quota part of Fingerprinting Protection
Categories
(Core :: Storage: StorageManager, enhancement)
Tracking
()
People
(Reporter: tjr, Assigned: tjr)
References
(Blocks 1 open bug)
Details
Attachments
(2 files)
In our most recent analysis of web fingerprinters, two of the three we modeled in depth used Storage Quota as part of their data collection. About 92% of users report the 10GB but the remaining 8% are in a very long tail. If we reported a constant 10GB from this API a portion of users who are currently seen as unique to these fingerprinters would no longer.
Specifically, as a relative improvement in the number of unique users per platform this is the improvement:
| OS | FPJS v2 | Signifyd |
|---|---|---|
| Overall | −3.52% | −4.89% |
| Android | −2.15% | −2.65% |
| Darwin | −0.20% | −0.23% |
| Linux | −3.87% | −5.34% |
| WINNT | −5.48% | −9.47% |
Besides making users no longer appear unique, they also get bucketed into larger cohorts. The math behind this calculation is complicated, but to distill it, the overall cohort score gets this much better.
| OS | FPJS v2 | Signifyd |
|---|---|---|
| Overall | +5.37% | +3.10% |
| Android | +2.06% | +2.98% |
| Darwin | +0.13% | +0.18% |
| Linux | +8.26% | +10.21% |
| WINNT | +7.04% | +3.12% |
| Assignee | ||
Comment 1•14 days ago
|
||
Our typical rollout strategy for changes like this is to enable it in Nightly (in all contexts) to try to shake out any Web Compatability issues. Then we switch it to only be enabled in PBM and ETP Strict. It will ride the trains (often we do a release experiment enabling them while it is in Beta) out to release and stay in PBM / ETP Strict indefinitely and potentially after some time (a year, more?) would consider enabling it by default if it had no noticeable impact. Any WebCompat issues that arise from Nightly or at any point means it gets reconsidered entirely.
| Assignee | ||
Comment 2•14 days ago
|
||
Updated•14 days ago
|
| Assignee | ||
Comment 3•13 days ago
|
||
Comment 4•12 days ago
|
||
:tjr, it seems like this is duplicative of the mitigation already in place from bug 1781277 and the DiskStorageLimit it introduced. Can you explain why we want this mitigation that seems to clobber the previous mitigation with the same value? That mitigation has the benefit that it actually imposes a quota limit, which is observable by using storage APIs, where this does not.
While quota tracking and reporting is QM-related, this proposed mitigation is a change made directly in StorageManager, moving there.
| Assignee | ||
Comment 5•12 days ago
|
||
(In reply to Andrew Sutherland [:asuth] (he/him) from comment #4)
:tjr, it seems like this is duplicative of the mitigation already in place from bug 1781277 and the DiskStorageLimit it introduced. Can you explain why we want this mitigation that seems to clobber the previous mitigation with the same value? That mitigation has the benefit that it actually imposes a quota limit, which is observable by using storage APIs, where this does not.
I think it's the last one. IIUC DiskStorageLimit will cause Firefox to believe it can use 50 GB for storage. Every origin (or site? or 'group'?) will get 10 GB. estimate() will report 10GB and that's great, but we will also have the problem that (a) Firefox won't use more than 50 GB when it might be desirable to do so and (b) Firefox will use up to 50 GB when it is desirable to not do so.
This mitigation is a lighter touch. We lie to the website, but the operation of the browser does not change. That seems more desirable, and would cause less problems, which is why I did it but if that's the wrong understanding, or you're not concerned about that, then I'm fine going with DiskStorageLimit. The intention here is that this will be enabled by default in PBM and ETP Strict. (And sometimes mitigations make their way into Normal Browsing also, but only if they're considered very lower risk.
| Assignee | ||
Updated•1 day ago
|
Comment 6•23 hours ago
|
||
aside: I need to stop composing replies directly in bugzilla, because my WIP replies keep getting wiped from existence through some interaction of session store (possibly purging values that are too large?), having duplicate tabs, and perhaps bugzilla's own attempt to save comments.
(In reply to Tom Ritter [:tjr] from comment #5)
We lie to the website, but the operation of the browser does not change.
I think this is unfortunately too optimistic, at least with our current quota limits. Most complex sites that use storage depend on the values returned to moderate their usage or know when they are potentially in a problem state and need to clean up. Structurally, major issues here are that the QuotaExceededErr is not super useful on its own, but also that implementation realities/bugs mean that it can be impossible to clean up without data-loss when this state is reached because, for example, deleting can itself require disk space when SQLite and its WAL are in use. (But it's possible to directly delete an entire database or use clear-site-data.)
If our actual limit for the origin is less than 10 GiB, this can potentially cause real problems. gmail and youtube are two notable sites that could break quite badly as they both use navigator.storage.estimate() to moderate their usage. We do have potential mitigations in place if sites with a ServiceWorker experience ServiceWorker breakage on navigation fetches, but we wipe the origin except for cookies and localstorage, so it's not great.
That said, it seems like Chrome has adopted a policy of always reporting that your quota is 10 GiB + what you are already using if there's more than 10 GiB of free disk space or you're in incognito. I assume this mainly has to work just because Chrome's quota grants are extremely generous and so the sites basically never get anywhere near their actual quota limit, but I also have not fully processed all of what Chrome is up to. The docs at https://developer.mozilla.org/en-US/docs/Web/API/Storage_API/Storage_quotas_and_eviction_criteria are generally pretty good but these chromium constants are also quite informative.
That seems more desirable, and would cause less problems, which is why I did it but if that's the wrong understanding, or you're not concerned about that, then I'm fine going with DiskStorageLimit.
I would prefer to go with the existing DiskStorageLimit approach of performing any anti-fingerprinting steps in the parent process. While I don't know that we need to be completely concerned about someone compromising a content process to get at the true info, I think:
- We should definitely be quantizing to the GiB level like Chrome in general; the current entropy shouldn't be there.
- It definitely is harder to leak the info to the content process if we do the deception in the parent process
- It will be more understandable to platform engineers if we only do it in the one spot in the parent process
- I very much like Chrome's concept of saying "you can have what you have plus this heavily quantized bit more" potentially even as a means of doling out quota to origins rather than ever telling the full quota limit we might actually give them. We very much know that our current limits are too limiting and causing problems (but we have to deal with free space).
- Doing it always in the parent process near QM lets us potentially perform any fix-ups if we are seeing breakages from sites thinking what we're telling them is the truth.
- There is always some potential for an arms race where sites potentially try and see if they can hit the actual quota. I believe this was a problem for incognito modes in general since it was potentially quite easy to call the browser's bluff due to in-memory constraints. And if sites do start doing systematic bluff-calling that we need to mitigate, it's nice to potentially be able to do that all in the same place in the parent process.
Description
•