Closed Bug 1692609 Opened 4 years ago Closed 2 years ago

resist-fingerprinting makes it hard to have smooth (60hz) animations

Categories

(Core :: DOM: Core & HTML, defect)

defect

Tracking

()

RESOLVED FIXED
103 Branch
Tracking Status
firefox102 + fixed
firefox103 --- fixed

People

(Reporter: jgilbert, Assigned: tjr)

References

(Regressed 1 open bug)

Details

Attachments

(5 files)

We know some of why this is (performance.now being truncated), but I'd like to know more about where we want the trade-offs to be, and where we are along that line.

This is a problem because people obviously want to resist fingerprinting, but they definitely don't realize it'll have all the adverse effects they run into.

What we don't want is people turning on RFP and then ditching Firefox because they end up with bad experiences around the web.

I'm thinking that RFP should truncate to the RAF interval, which would still be pretty good, without causing this class of issue.

RFP either already restricts RAF to 60hz, or it should be made to to prevent fingerprinting on RAF_count/second?

Flags: needinfo?(tom)

I think there's two questions here: What should we do for Firefox, and what should Tor do. Typically, we've taken the opinion that RFP exists for Tor, so RFP should do what Tor wants it to do. I'm not strictly opposed to doing something different in Firefox via some mechanism (e.g. through an additional pref Tor sets) but it seems like we should first figure out what Tor wants and ideally we could keep thing uniform.

Bug 1577243 is where we started applying reduced precision to the timestamp in rAF; Bug 1585589 is where we exempted the System Principal context. Bug 1662277 refactored this code to where it lives today.

As background. RFP is the acronym for Reduce Timer Precision and refers to the current-default mode of Firefox were we clamp+jitter to 1ms. RFP refers to Resist Fingerprinting. We have four types of Timer Clamping+Jittering:

  • Absolutely none - this is used for browser internal stuff.
  • Unconditional: this clamps to 20 ns and is in effect for (today) COOP/COEP documents and was always the case (pre-spectre) for performance.now. It is also used in Animation Timelines because it was hard to make RTP work there. No jitter. In the code this can be identified by RFPOnly mode.
  • RTP - the default for Firefox: 1ms + jitter
  • RFP - 100ms + jitter

My understanding from re-reading Bug Bug 1577243 is that the rAF timestamp wasn't originally being affected by any timer clamping not for RFP, not even to do the unconditional 20ns clamping. Animation Timelines however were being affected by Unconditional. To make Animation Timelines match up with rAF and resolve the test failures, we applied (unconditional) Timer Clamping to rAF which also make rAF do RFP clamping when RFP is enabled.

We're focusing on changing behavior for RFP mode here. We would want to keep Animation Timelines and rAF in sync, so if we change how rAF behaves we will also change how Animation Timelines behave.

We could switch Animation Timelines and rAF to Unconditional Clamping in RFP mode. (This would be a 'fifth' type of Timer Clamping+Jittering but what's one more?) I think the argument for this would be: rAF gets called on a regular interval, 16.666666666666667 ms (or 60 times a second) so even thought the timer in the function argument lies to you, you always know the time delta between function calls anyway. Additionally, rAF yields execution to the browser, so you can't set a global variable in an rAF callback and then immediately use it in another JS function and keep the precision.

I think the argument against this is that rAF only gets called on a regular 60 times a second when operating at high speed. It'll never go faster than that. It will go slower. So if rAF is called, hypothetically, 30 times a second because your computer is slow (either by hardware specs or because an attacker is slowing it down) the calls to rAF will get a high-resolution timer that is disclosing useful information. It seems feasible that an attacker could queue a lot of work in a rAF callback that pegs the system, and then the next rAF callback, look at the amount of work completed, and be able to glean useful information about the time deltas between the callbacks. I don't know if this attack would work. Jeff, what do you think about this concern?

Given that using timing attacks to glean information from a browser hasn't been seen in practice (I think?) in the wild, I am sympathetic to the argument that Tor's level of timer precision is hurting usability with minimal security gain. Maybe a revisit of the precision for all of RFP is in order, and we could get a big gain of usability by lowering it to 10ms. Or 20ms. or 50ms. (Whatever the highest value is that makes things work well?)

Flags: needinfo?(tom)
Flags: needinfo?(sysrqb)
Flags: needinfo?(jgilbert)
Flags: needinfo?(gk)

Fwiw bug 1666160 is "RFP can surprise users by being painful", which I see as one way to help this problem for most users affected by it. We'll keep the discussion here about "assuming a user wants the RFP tradeoffs, how can they be less painful".

In order for things to go mostly smoothly, I think we need our RAF target interval to be the basis for clamping, or at least related to it. I could see this being 16.67ms clamping, and mandating RAF operate at 60hz maximum under RFP. I think that's a reasonable tradeoff here. Thoughts?

(Note that even the clamping to 1ms for RTP would probably be smoother if it were 1000/60/16 = 1.041667 clamping, rather than one in three consecutive raf frames commonly having 1/16.67 = 6% variance from true presentation time)

It seems feasible that an attacker could queue a lot of work in a rAF callback that pegs the system, and then the next rAF callback, look at the amount of work completed, and be able to glean useful information about the time deltas between the callbacks. I don't know if this attack would work. Jeff, what do you think about this concern?

One of the mitigation options here is to only update "is it done" flags at frame boundaries, or some other acceptable/regular timer. (tjr:) Do we handle (via timer-gating?) this case for events to/from a worker?

Flags: needinfo?(jgilbert) → needinfo?(tom)
See Also: → 1666160
See Also: → 1693212

(That said, I love that the truncation to 1.00ms makes all timing code have pretty round numbers usually!)

(In reply to Jeff Gilbert [:jgilbert] from comment #4)

In order for things to go mostly smoothly, I think we need our RAF target interval to be the basis for clamping, or at least related to it. I could see this being 16.67ms clamping, and mandating RAF operate at 60hz maximum under RFP. I think that's a reasonable tradeoff here. Thoughts?

16.67 ms seems like a good starting point for consideration (kind of a PITA to write tests for a non-terminating decimal...)

60hz maximum seems reasonable; is that something easy to implement?

It seems feasible that an attacker could queue a lot of work in a rAF callback that pegs the system, and then the next rAF callback, look at the amount of work completed, and be able to glean useful information about the time deltas between the callbacks. I don't know if this attack would work. Jeff, what do you think about this concern?

One of the mitigation options here is to only update "is it done" flags at frame boundaries, or some other acceptable/regular timer. (tjr:) Do we handle (via timer-gating?) this case for events to/from a worker?

If I understand your question, it's roughly "Do we do anything that affects the behavior of <anything> based on how we reduce time precision?" and the answer is not directly. An inability to make assertions about Animations working as before if RTP was applied to them was one example of deciding not to do this.

Fuzzyfox (whose code is still in-tree but will be ripped out at some point soonish) actually delays browser operations - it will sleep the main thread and other threads will await events firing from there; but our timer mitigations today typically munge a time value directly before it is exposed to JS.

In the past we had instances where we were storing munged values and it led to bugs - so I'm not aware of situations where we store a munged value and then base logic on it, but it's not out of the question. But in general, gating browser behavior based on timer clamping/jittering is probably hard, and probably why Fuzzyfox was designed the way it was. I could be wrong about all this, but that's my educated guesswork.

Flags: needinfo?(tom)

RAF rate is controlled via layout.frame_rate, where -1 is 'use vsync' and e.g. 50 means 50Hz. RFP should set that to 60 if it doesn't already. (this will negatively affect e.g. 90hz displays) 30hz is another option that works fine for 90hz, but is a little chunky-feeling.

16.67 is indeed pretty annoying, but we're supposed to avoid magic numbers anyways, eh? :) const MS_60HZ= 1000 / 60;

I think about the best we can do is to use RAF interval as our time-atom. For example, it looks like CSS Animations (for example) only update once a tick, which is good.

Assignee: nobody → jgilbert
Status: NEW → ASSIGNED

FYI, I gave this a r+ so we can move it forward and experience it on Nightly. Tor Browser for Android is based off release (not ESR), but the old behavior can be reverted to in TB by specifying privacy.resistFingerprinting.reduceTimerPrecision.microseconds = 100000. (Well, except for the locking-the-display-to-60Hz aspect of the patch, but that's an improvement to things anyway.)

Flags: needinfo?(gk)

There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:jgilbert, could you have a look please?
For more information, please visit auto_nag documentation.

Flags: needinfo?(jmuizelaar)
Flags: needinfo?(jgilbert)
Flags: needinfo?(jmuizelaar)

Bad bot.

Flags: needinfo?(jgilbert)

Jeff - I took a look at the test failures for this patch and think I resolved them. There were some failures on try but they seem unrelated and there's always a host of failures for a really full run.

https://treeherder.mozilla.org/jobs?repo=try&revision=5ed43b930879a7cc1344c35c9619313b0ebe0dde
https://treeherder.mozilla.org/push-health/push?revision=5ed43b930879a7cc1344c35c9619313b0ebe0dde&repo=try&testGroup=pr&selectedTest=dommediatesttestmediarecorderplaybackcanrepeathtml&selectedTaskId=&regressionsGroupBy=path&selectedJobName=dom%2Fmedia%2Ftest%2Ftest_mediarecorder_pause_resume_video.html+test-windows10-32-qr%2Fopt-mochitest-media-fis-gli-e10s-3

The most convincing failure is dom/media/test/test_mediarecorder_playback_can_repeat.html - there are several failures on mediarecorder on Windows so that might possible be something...

I worry that you're missing changes that I had made locally, but hadn't posted yet, but I'm happy to have to chip away at this.
Ping the assignee next time before taking over patches, please!

Assignee: jgilbert → tom

(I also would have preferred that you left the initial changes in my name, so we know who to blame :) )

Attachment #9235229 - Attachment description: Bug 1692609 - Force 60hz for RFP and use that as the time-atom r?jgilbert → Bug 1692609 - Force 60hz for RFP and use that as the time-atom r?tjr
Pushed by tritter@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d36f5f4ece1e Force 60hz for RFP and use that as the time-atom r=tjr https://hg.mozilla.org/integration/autoland/rev/ce4b4f786c4d Update tests for a higher-precision RFP r=jgilbert

After a bunch of bisections on try; I eventually figured out that the test failures are due to this code:

+  Preferences::RegisterCallback(
+      FrameRatePrefChanged,
+      nsDependentCString(
+          StaticPrefs::GetPrefName_privacy_resistFingerprinting()));

I couldn't reproduce any of these failures locally so I'm not sure why this would be causing a problem. Jeff, would you have a guess? If not, you can pass back the NI and I'll do a bunch of MOZ_LOG debugging...

Flags: needinfo?(tom) → needinfo?(jgilbert)

Hows does this fit in with Bug 1442863 ? Does it solve it?

at the very least animations should be 30fps (33ms) or preferably 60fps (17ms)

FYI, Bug 1589060 is another one that could benefit from 60fps (if you wanted that)

(In reply to Tom Ritter [:tjr] (OOTO until Aug 23rd) from comment #19)

After a bunch of bisections on try; I eventually figured out that the test failures are due to this code:

+  Preferences::RegisterCallback(
+      FrameRatePrefChanged,
+      nsDependentCString(
+          StaticPrefs::GetPrefName_privacy_resistFingerprinting()));

I couldn't reproduce any of these failures locally so I'm not sure why this would be causing a problem. Jeff, would you have a guess? If not, you can pass back the NI and I'll do a bunch of MOZ_LOG debugging...

Oh! We probably just need to dispatch to the main thread for FrameRatePrefChanged, that's probably our threading issue then.

Flags: needinfo?(jgilbert)

Looked at this again rebasing and running your patches. Confirmed that the issue is the listening for the RFP pref change. Remove that and it's green. I wondered if moving back and forth from a framterate of 60 for RFP and -1 from layout.frame_rate was causing an issue, so I set layout.frame_rate to 55 but it didn't make a difference.

At this point I'm wondering what the consequence of not listening for the pref change would be? It wouldn't affect people who have RFP enabled all the time. I don't know how often users toggle RFP on and off in a single session (I do it occasionally myself for sure though...) but would the consequence be significant?

I don't know how often users toggle RFP on and off in a single session (I do it occasionally myself for sure though...) but would the consequence be significant?

Only per eTLD+1, where e.g. workers can retain the original state (even new instances), but the top level document doesn't. My understanding is that when we add site exceptions, it would behave much like Brave's Shields, where toggling the brave mode reloads the page (in our case it might be a message to say reload the page for changes to take affect). With multiple tabs from the same domain open, IDK what happens there (with Brave or what your thoughts are). If/when we give RFP a UI front, toggling it can also show the same message (reload web pages to take affect).

So I guess you really need to decide how RFP should work

Severity: -- → S3

I rebased and sent in a try run which doesn't seem to reproduce the intermittents anymore (possibly after the changes in Bug 1765399 ....)

https://treeherder.mozilla.org/jobs?repo=try&revision=4bfcc119125da296149cdf03ea3141f541cbe124

Flags: needinfo?(jgilbert)
See Also: → 1765399

:D

Flags: needinfo?(jgilbert)

Let's land it then!

The severity field for this bug is relatively low, S3. However, the bug has 3 duplicates.
:tjr, could you consider increasing the bug severity?

For more information, please visit auto_nag documentation.

Flags: needinfo?(tom)

Okay; I've sent in a full try run here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b61436faa9fd28ac40a138ce2ef3feec185b7cd4 and if that looks good I'll post two new patches for review.

Flags: needinfo?(tom)
Flags: needinfo?(sysrqb)

Push health has a bunch of failures, of course... these ones stood out to me:

gfx/layers/apz/test/mochitest/test_group_keyboard.html fails on all (I think?) Windows platforms - but this test has a bunch of different intermittent bugs: Bug 1761669, Bug 1771835, Bug 1771836, Bug 1659729....

browser/base/content/test/performance/browser_startup_flicker.js seems like a relevant test to this change; and it fails consistently on one flavor of platform: x64 opt devedition-qr swr fis - so it makes me think it might be okay if it's not failing on any other platform?

browser/base/content/test/performance/browser_vsync_accessibility.js also seems relevant - it fails on x64 opt linux qr swr fis only.

I'm going to post the patches and let you take a look at the results - let me know what you think we should do.

Pushed by tritter@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ca7e6cc96539 Force 60hz for RFP and use that as the time-atom r=jgilbert https://hg.mozilla.org/integration/autoland/rev/25dd05e29a4b Update tests for a higher-precision RFP r=jgilbert
Regressions: 1772418
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 103 Branch

tested: clearly a lot smoother: https://ciechanow.ski/mechanical-watch/ - I played them side by side in nightly vs beta with RFP

[Tracking Requested - why for this release]: It would be great if we could uplift this to 102 so Tor Browser can get it in the ESR without them needing to track the patch themselves.

Tom, could you make an uplift request? Thanks

Comment on attachment 9279416 [details]
Bug 1692609 - Force 60hz for RFP and use that as the time-atom r?jgilbert

Beta/Release Uplift Approval Request

  • User impact if declined: Tor will have to carry an additional patchset
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Only affects Resist Fingerprinting Mode
  • String changes made/needed:
  • Is Android affected?: Yes
Flags: needinfo?(tom)
Attachment #9279416 - Flags: approval-mozilla-beta?
Attachment #9279417 - Flags: approval-mozilla-beta?

Comment on attachment 9279416 [details]
Bug 1692609 - Force 60hz for RFP and use that as the time-atom r?jgilbert

Approved for 102 beta 6, thanks.

Attachment #9279416 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9279417 [details]
Bug 1692609 - Update tests for a higher-precision RFP r?jgilbert

Approved for 102 beta 6, thanks.

Attachment #9279417 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Regressed by: 1773560
No longer regressed by: 1773560
Regressions: 1773560
Regressions: 1774247
Duplicate of this bug: 1589060
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: