Date.now() is 3.3x slower in Firefox than Chrome (related to ReduceTimerPrecision)
Categories
(Core :: Security, defect, P2)
Tracking
()
People
(Reporter: jrmuizel, Unassigned)
References
(Blocks 1 open bug)
Details
(Whiteboard: [sp3])
About half of this slowness comes from mozilla::nsRFPService::ReduceTimePrecisionAsUSecsWrapper
Here are profiles:
Chrome: https://share.firefox.dev/3Rre1bA
Firefox: https://share.firefox.dev/44WtcfK
This shows up on the Backbone Speedometer3 test.
Making Firefox as fast as Chrome on this function would reduce its time on that test by 0.6%.
Reporter | ||
Updated•1 year ago
|
Comment 1•1 year ago
|
||
I think I've seen ReduceTimePrecisionAsUSecsWrapper also elsewhere.
Comment 2•1 year ago
|
||
Inside: ReduceTimePrecisionAsUSecsWrapper
- 23% is the actual precision reduction. Inside of that (numbers should add to 23%)
- 5.2% is floor() in ucrtbase. I wonder if fdlibm would be faster
- 4.1% is logging
- 6.2% is adding Jitter
- 7.5% is the rest
- 36% (!) is GetRTPCallerType() Inside of that (numbers add to 36%)
- 20% is CrossOriginIsolated()
- 10% is PrincipalOrNull()
- 6% is the rest
- 31% is xpc::CurrentNativeGlobal()
The lowest hanging fruit here would be:
- Confirm a document can't become or lose CrossOriginIsolated after it's been set up.
- Memoize that function
- Memoize GetRTPCallerType
After that, maybe there's something related to CurrentNativeGlobal() that could be improved? Change the way the callback works or something to pass more stuff?
Reporter | ||
Comment 3•1 year ago
|
||
What is the CurrentNativeGlobal/GetRTPCallerType stuff for?
Comment 4•1 year ago
|
||
We use CurrentNativeGlobal to get the GlobalObject from the JSContext - this wrapper is used from spidermonkey.
GetRTPCallerType() is used to determine how much we need to reduce the precision of the timestamp.
Reporter | ||
Comment 5•1 year ago
|
||
(In reply to Tom Ritter [:tjr] from comment #4)
We use CurrentNativeGlobal to get the GlobalObject from the JSContext - this wrapper is used from spidermonkey.
GetRTPCallerType() is used to determine how much we need to reduce the precision of the timestamp.
Can the decision on how much to reduce precision be made per process instead of per global?
Comment 6•1 year ago
|
||
I think we should compute GetRTPCallerType()
when constructing the realm and set it like all other resist fingerprinting settings, for example setForceUTC. The JS engine can then pass RTPCallerType
directly to ReduceTimePrecisionAsUSecsWrapper
.
We should probably also ifdef this code block: https://searchfox.org/mozilla-central/rev/7499890dc8f116a9e40f4a689a251a0311a9f461/toolkit/components/resistfingerprinting/nsRFPService.cpp#563-576
Maybe some of the math could also be optimized.
Comment 7•1 year ago
|
||
Some of these improvements, especially the low hanging fruits from comment 2 and the suggestion from comment 6 seem doable.
Tom Schuster: Is that something we can do within the 120 cycle (we merge on September 25th)?
Comment 8•1 year ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #5)
(In reply to Tom Ritter [:tjr] from comment #4)
We use CurrentNativeGlobal to get the GlobalObject from the JSContext - this wrapper is used from spidermonkey.
GetRTPCallerType() is used to determine how much we need to reduce the precision of the timestamp.
Can the decision on how much to reduce precision be made per process instead of per global?
No, a process can contain very differently 'privileged' (for lack of a better word) documents.
Reporter | ||
Comment 9•1 year ago
|
||
(In reply to Tom Ritter [:tjr] from comment #8)
No, a process can contain very differently 'privileged' (for lack of a better word) documents.
Can you give an example?
Comment 10•1 year ago
|
||
A page may be exempted from fingerprinting protections (RFP right now, but potentially FPP in the future) in one context, but not in the other. Similarly, a page may have one level of time precision, but a web extension's content script another.
Neither of these are very common cases right now, so we could create a fast path and a slow path, and that would shave off about 75% of the calltime I suppose.
Comment 11•1 year ago
•
|
||
I think implementing what I suggested in comment 6 should be possible to do during the next cycle (120).
Comment 13•1 year ago
|
||
I have a WIP that implements enough to test this and just removing the xpc::CurrentNativeGlobal
and GetRTPCallerType
calls reduces the wallclock time from 150ms to 100ms in a micro-benchmark that just calls Date.now in a loop.
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Comment 14•1 year ago
|
||
Jeff, could you produce an updated profile when you have the time?
Comment 15•1 year ago
|
||
I've created my own profile based on a micro benchmark as well.
We still spend 53% of our time under Date.now in mozilla::nsRFPService::ReduceTimePrecisionImpl
:
- 11% mozilla::nsRFPService::RandomMidpoint (Most of the time seems to be the long division/modulo?)
- 7.1% mozilla::detail::log_test (I think we should probably just ifdef away all logging from this function. It's probably useless unless we are explicitly trying to debug something)
I am going to try and optimize the math here a bit, but all those conversions and floating point uses seems just slow.
Also
- 5.7% JS::TimeClip (This seems surprisingly slow to me?)
Comment 16•1 year ago
|
||
Looking at the profile, it seems like the following line is very slow:
long long clamped = floor(double(timeAsInt) / resolutionAsInt) * resolutionAsInt;
I tried replacing that with various flooring integer division, but usually the time got slower ...
I would be curious to know if someone from the perf team sees some easy wins in the profile.
Reporter | ||
Comment 17•1 year ago
|
||
(In reply to Tom Schuster (MoCo) from comment #16)
Looking at the profile, it seems like the following line is very slow:
long long clamped = floor(double(timeAsInt) / resolutionAsInt) * resolutionAsInt;
This code seems to have a lot of conversions to and from integers. Is there a reason all the math can't just be done on doubles?
e.g. Why does the resolution need to be converted to an integer?
Comment 18•1 year ago
|
||
The comment above that code has some explanation. As I said I tried not doing the double conversions (by using various int floor_div implementations) and somehow it wasn't faster.
Reporter | ||
Comment 19•1 year ago
|
||
(In reply to Tom Schuster (MoCo) from comment #18)
The comment above that code has some explanation. As I said I tried not doing the double conversions (by using various int floor_div implementations) and somehow it wasn't faster.
That comment explains why it does the division with doubles. Not why it converts aResolutionUSec
to an integer than then back to a double.
I'm suggesting doing everything as doubles instead of trying to do the division on integers.
Comment 20•1 year ago
|
||
Trying to page this back in. I know at one point we use double -> int conversion to truncate sub-microsecond precision:
// Cut off anything less than a microsecond.
long long timeAsInt = timeScaled;
I suppose we could floor for that instead...
I've sent in a try run with an assertion to try and sus out any inconsistencies....
https://treeherder.mozilla.org/jobs?repo=try&revision=0dfb8e964cfdae754b38c6a7f8e8c21611703032
Updated•1 year ago
|
Comment 21•1 year ago
|
||
We could just assert that the aResolutionUSec
is a whole number. I tried changing the parameter to an integer before and it didn't seem to change anything either.
Comment 22•3 months ago
|
||
The bug assignee is inactive on Bugzilla, so the assignee is being reset.
Description
•