Open Bug 1887161 Opened 8 months ago Updated 2 months ago

In private windows, canvas getImageData is slow due to GenerateCanvasKeyFromImageData

Categories

(Core :: Privacy: Anti-Tracking, defect, P2)

defect

Tracking

()

Performance Impact high
Tracking Status
firefox-esr115 --- unaffected
firefox124 --- wontfix
firefox125 --- wontfix
firefox126 --- fix-optional

People

(Reporter: mstange, Assigned: timhuang)

References

(Regression)

Details

(Keywords: leave-open, regression)

Attachments

(3 files)

In private windows, the performance of map zooming on airbnb and Google Maps is much reduced, because getImageData spends a lot of time in mozilla::nsRFPService::GenerateCanvasKeyFromImageData. https://share.firefox.dev/3IQ5hpN

This was originally filed as bug 1876532 and bug 1884711.

These are both big sites and their performance matters, even in private windows.

Regressed by: 1858181
No longer regressed by: 1849903

Set release status flags based on info from the regressing bug 1858181

:timhuang, since you are the author of the regressor, bug 1858181, could you take a look? Also, could you set the severity field?

For more information, please visit BugBot documentation.

Flags: needinfo?(tihuang)

Set to S3 given that this is only affected private windows.

Assignee: nobody → tihuang
Severity: -- → S3
Flags: needinfo?(tihuang)

The severity field for this bug is set to S3. However, the Performance Impact field flags this bug as having a high impact on the performance.
:timhuang, could you consider increasing the severity of this performance-impacting bug? Alternatively, if you think the performance impact is lower than previously assessed, could you request a re-triage from the performance team by setting the Performance Impact flag to ??

For more information, please visit BugBot documentation.

Flags: needinfo?(tihuang)
Flags: needinfo?(tihuang)

I think this bug may need a higher severity than S3. The websites affected are quite popular, and I expect many users do use them in private windows. Having a really bad performance experience on these websites is a push factor which could lead to loss of users who use these sites heavily.

Duplicate of this bug: 1876532
Duplicate of this bug: 1884711
No longer blocks: 1876532, 1884711
Severity: S3 → S2

Does this issue only affect Window? Raul, can you see the same janky behavior on MAC or Linux? Thanks.

Flags: needinfo?(rbucata)

It's the same code on all the platforms - do you have reason to believe it doesn't affect the other platforms?

Can you summarize what the code is doing? Why does it need to compute a hash of the canvas contents?
Do you know which canvases this function is being called on on the two sites? How large is the canvas, and do the contents vary from call to call? You can use gfxUtils::WriteAsPNG to dump the DrawTarget contents to files.

Hi Tim. Should I test the mentioned duplicate bugs, in private windows? Or is there a specific test case I should follow?

Flags: needinfo?(rbucata)

Please test the duplicate bugs in private windows.

(In reply to Markus Stange [:mstange] from comment #8)

It's the same code on all the platforms - do you have reason to believe it doesn't affect the other platforms?

Sorry for the late response.

I don't see the same janky behavior, and the profile that I captured on my MAC doesn't show that GenerateCanvasKeyFromImageData() takes a lot of time. So, I wonder if the speed of calculating SHA256 differs from platform to platform.

Can you summarize what the code is doing? Why does it need to compute a hash of the canvas contents?

The code generates a randomization value per canvas. We need to ensure the random noises differ from canvas to canvas so the fingerprinter cannot easily cancel them. To generate the random value, we take the entire canvas data and the per-site random key.

Do you know which canvases this function is being called on on the two sites? How large is the canvas, and do the contents vary from call to call? You can use gfxUtils::WriteAsPNG to dump the DrawTarget contents to files.

Both GoogleMaps and Airbnb call CanvasRenderingContext2D: getImageData() to render 64 KB canvas and the contents vary from call to call.

So right now, we use an HMAC using SHA-256; and the key is our random key from the cookie jar. We feed in the entire image contents. We take the output of the HMAC and use it to feed an XorShift128+ RNG to tell us which bits of the image we will affect. XorShift128+ isn't cryptographically secure, but we rely on its seed being both cryptographically secure and secret - so even though you can see its output, you can't know or control its input or how many bits we take out of it.

HMAC on the inside is two hashes, one of the long message, the second of a fixed-length much shorter string.

There's a bunch of questions here to make it faster:

  • Do we need to feed the entire image into the random-making process?
  • Do we need to use an HMAC construct, or can we cut corners? (e.g. do we care about protecting against length extension attacks?)
  • Do we need to use SHA-256, or can we use something faster?
  • In the realm of cryptographic hash functions - is there anything faster? SHA-256 is accelerated with the Intel SHA extensions (as is SHA-1)
  • Do we need to use a cryptographic hash function?

Some attacks I can imagine are:

  • An attacker observes the output GenerateCanvasKeyFromImageData, knows the image input data, and uses it to determine its origin's random key. Then they can subtract out the randomness and it's like it was never there.
  • An attacker controls the image input data and uses it to flush out any undesired random state inside the generator, resulting in deterministic randomness being generated.

Ditching HMAC-SHA256 and using XorShift128+ isn't a panacea - the initial state for XorShift128+ is two 64 bit values, whereas HMAC accepts an arbitrary length input - so the API prevents a drop-in replacement.

I don't see any trivial wins here, I think anything we do will need to be tested performance wise to see if we're really making a difference in the benchmark. I could imagine feeding a subset of the image into HMAC, chosen at random (e.g. every nth byte) - this wouldn't reduce the memory read required but the amount of data HMAC is processing would decrease.

If we have BLAKE2b/BLAKE2s that might be faster than SHA-256....

Is there some higher level background information on how this feature works? Is it explicitly trying to make it so that the randomization is deterministic depending on the contents of the canvas?

Flags: needinfo?(tom)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #14)

Is there some higher level background information on how this feature works? Is it explicitly trying to make it so that the randomization is deterministic depending on the contents of the canvas?

Correct. We perturb between 20 and 255 pixels. For each pixel we perturb one color channel, and xor either 0x2 or 0x1 into the value. The pixels chosen, the color channel chosen, and the value xor-ed into, are all deterministic based on the value of the canvas, the first and third-party origin, and a per-session key. We also do not randomize a canvas if it is a constant color.

The reasoning behind this is to frustrate simple techniques to render the same canvas multiple times and either average out the noise added or detect the noise and invalidate the fingerprint produced (which we can see happen in some fingerprinting scripts). We change the randomness based on first and third-party origin to prevent a user's canvas fingerprint being correlated across origins. The per-session key causes a new seed to be used on browser restart (or when PBM resets, or when a site's data is cleared) - these choices make more sense when one has no other identifiers (e.g. cookies) for a site and when one's IP address isn't an identifier itself, but we do what we can.

At the end of the day, naive fingerprinting techniques will not correlate a user across domains successfully using canvas; and more sophisticated fingerprinting techniques (of which we've seen almost none) will be exposed in the client-side behavior of which and how canvases are rendered. (So we can tell they're working to bypass the protections.)

Flags: needinfo?(tom)
Component: Graphics: Canvas2D → Privacy: Anti-Tracking

This seemed interesting so I decided to poke around a bit for fun and wanted to share my findings in case they're helpful for anyone else.

Observations

If I go to airbnb.com in Private Browsing mode select a location, pick dates and then search, I get 109 calls to nsRFPService::GenerateCanvasKeyFromImageData(), via this (partial) call stack:
nsRFPService::GenerateCanvasKeyFromImageData(nsICookieJarSettings* aCookieJarSettings, uint8_t* aImageData, uint32_t aSize, nsTArray<uint8_t>& aCanvasKey)
nsRFPService::RandomizePixels(nsICookieJarSettings* aCookieJarSettings, uint8_t* aData, uint32_t aWidth, uint32_t aHeight, uint32_t aSize, gfx::SurfaceFormat aSurfaceFormat)
CanvasRenderingContext2D::GetImageDataArray(JSContext * aCx, int aX, int aY, unsigned int aWidth, unsigned int aHeight, nsIPrincipal & aSubjectPrincipal, JSObject * * aRetval)
CanvasRenderingContext2D::GetImageData(JSContext * aCx, int aSx, int aSy, int aSw, int aSh, nsIPrincipal & aSubjectPrincipal, mozilla::ErrorResult & aError)
...

There are multiple calls to GetImageData()/GetImageDataArray() on the same CanvasRenderingContext2D object. According to the parameters to GetImageDataArray(), we always have the starting coordinate of <0, 0> but differing sizes. The sizes are generally small (only once is the size > 3500 pixels) but the calculations/modifications in GenerateCanvasKeyFromImageData()/RandomizePixels() are using the entire 262144 byte imageData buffer.

In each call to GenerateCanvasKeyFromImageData(), "imageData" is 262144 bytes in size and is almost all (>99%) zeroes, with some interesting patterns of non-zero entries. The first non-zero entry occurs about 12000 bytes in, and the non-zero bytes occur in bursts of 3-10, and always spaced four bytes apart. The non-zero bytes are always the fourth byte of each four-byte integer. Given that this is image data this is the R or the A channel, but I'm not sure which one.

randomKey is the same in each of these calls (which is as I expected).

Despite the fact that we are using the same CanvasRenderingContext2D object on subsequent calls and randomKey is the same on each call, we compute a different canvasKey each time. Digging a little bit further, it seems that imageData is different in each call. It is still mostly zeroes even though it has changed.

Of the three functions involved in computing the HMAC, the middle one takes the most time in my measurements. In taking a quick look at what is happening in there, it seems that there is some locking involved.

Maybe I'm just not looking in the right place (although I think I am) but, in stepping through the disassembly, I'm not seeing any special Intel SHA extension instructions being used anywhere. (The implementation that is being used is nss3.dll which is also calling into softokn3.dll.)

One additional guess for why this functionality is costly here: we could be getting many cache misses, and possibly cache thrashing, as we're traipsing through this 262144-byte buffer multiple times.

Questions

I was initially thinking that, rather than re-calculating canvasKey each time over and over again, we could cache it in the CanvasRenderingContext2D. However, as stated above, imageData is actually different on each call. Is this expected? If CanvasRenderingContext2D::GetImageData() is called multiple times is the underlying image data supposed to change each time? Or must it be that the JS code from the site that is changing it between calls?

Why is imageData mostly zero? I was expecting to see something that resembled the images on-screen - you know, actual images. Am I misinterpreting what this data represents?

I don't understand why an HMAC calculation needs locking. Am I missing something?

Are we expecting that Intel SHA extension instructions were going to be getting used for this calculation?

Finally, as already asked by Tom already above, do we need to use a cryptographic MAC/hash? Is the concern that using something else would allow an attacker to potentially find out a secret key by comparing the altered image to the "pristine" one?

Thanks for this!

(In reply to Justin Link from comment #16)

I was initially thinking that, rather than re-calculating canvasKey each time over and over again, we could cache it in the CanvasRenderingContext2D. However, as stated above, imageData is actually different on each call. Is this expected? If CanvasRenderingContext2D::GetImageData() is called multiple times is the underlying image data supposed to change each time? Or must it be the JS code from the site that is changing it between calls?

Yeah, as the canvas is updated by JavaScript, the imagedata will change.

Why is imageData mostly zero? I was expecting to see something that resembled the images that on-screen - you know, actual images. Am I misinterpreting what this data represents?

I have no idea. I wouldn't be that surprised if Google Maps is doing complicated things that results in this?

I don't understand why an HMAC calculation needs locking. Am I missing something?

I guess you could share the hmac object across threads, although that seems crazy to me. Is the locking actually slowing anything down? AIUI it's locking the digest context so multiple threads can't try to update it at the same time. But the context is this hmac local variable and is only used on this one thread. So there should be no contention and the locking should be... ~free? Or is locking always perf-heavy even if there's no contention?

Are we expecting that Intel SHA extension instructions were going to be getting used for this calculation?

It's going to depend on your computer, but odds are you have the CPU extension. I'll redirect to Dennis, he might know.

Finally, as already asked by Tom already above, do we need to use a cryptographic MAC/hash? Is the concern that using something else would allow an attacker to potentially find out a secret key by comparing the altered image to the "pristine" one?

I would say a sub-question of that is: how much of the perf cost is the calculations of SHA-1, and how much is the memory traipsing :)

Flags: needinfo?(djackson)

I don't understand why an HMAC calculation needs locking. Am I missing something?

It's a fair question. The short answer is that NSS was originally designed to support a variety of hardware modules and shared state through a stateful PKCS#11 interface. I don't expect any kind of locking contention to be an issue in practice because of how we call NSS, but any hints that it might be would certainly prompt us to investigate.

Are we expecting that Intel SHA extension instructions were going to be getting used for this calculation?

Yes, they ought to be if the platform supports them. We might need to take a look at this.

do we need to use a cryptographic MAC/hash?

HMAC might be a little overkill but I'd be surprised if its a material difference. It's one extra hash invocation. I expect we'd need to use a cryptographic hash in this context. Otherwise crafted images were break the scheme as Tom notes.

There are multiple calls to GetImageData()/GetImageDataArray() on the same CanvasRenderingContext2D object but calculations/modifications in GenerateCanvasKeyFromImageData()/RandomizePixels() are using the entire 262144 byte imageData buffer.

I might be a bit confused about how the scheme is working here, but can't the CanvasKey be cached with this object? So we only do a single HMAC for this imageData buffer and then have it available for all further GetImageData calls?

Flags: needinfo?(djackson)

Are we expecting that Intel SHA extension instructions were going to be getting used for this calculation?

Yes, they ought to be if the platform supports them. We might need to take a look at this.

John and I quickly tested in local Firefox builds on x86-64 and Apple Silicon, in both cases we're dispatching to the correct native instructions (SHA256_Update_Native). We noticed we're not seeing coverage for that in CI - https://coverage.moz.tools/#revision=latest&path=security/nss/lib/freebl/sha512.c&view=file&line=214 - so likely something isn't being set there.

Currently, we calculate everything using HMAC, including the canvas noises and the randomization key based on the first-party site. The randomization key is computed using the top-level site and a session key. We only compute this key once per tab when loading the top-level page; then, it's cached and used for generating canvas noises.

We also use HMAC to generate the canvas noises. It takes the randomization key and the entire canvas data to generate the noises. Then, we apply the noises, as Tom mentioned in comment 15. The performance bottleneck happens here, and we cannot cache the canvas noises because the canvas data can vary for the same canvas element.

We can improve the performance by changing the hash algorithm to calculate the canvas noises. However, I am not sure what the best alternative is that can provide enough protection to prevent attackers from breaking the hash algorithm to get the randomization key. I think there are two candidates.

  • BLAKE2b/BLAKE2s
  • SipHash
    I think BLAKE2b/BLAKE2s might be the better option, but I don't know if there is an implementation we can use in Gecko for BLAKE2b/BLAKE2s. SipHash is a non-cryptographic hash function that is less strong than HMAC or BLAKE2b/BLAKE2s, but maybe it can provide enough security protection for the purpose we need.

Dennis, could you share your insight here? Thanks.

Flags: needinfo?(djackson)

There's some headroom in changing the cryptographic hash algorithm but there's no single winner. If you take a look at the benchmarks here from 2022, you can see that SHA-256 with hardware support is ~2x as fast as blake2b / blake2s. Without hardware support, blake2b is ~2x faster.

SipHash is right on the cusp of cryptographic / non-cryptographic but I think its not unreasonable for this case. The risk would be that an attacker who triggered a high number of evaluations (~2^32 or so I think) would be able to force the SipHash derived key to collide and so remove the noise. If that's a tolerable risk, it might be reasonable to migrate. I would suggest doing a performance measurement first before committing though.

Flags: needinfo?(djackson)

The patch implements using SipHash to generate canvas randomization key.
The implementation is behind a pref
"privacy.resistFingerprinting.randomization.canvas.use_siphash".

Depends on D216277

Priority: -- → P1
Keywords: leave-open
Pushed by tihuang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/856d5366ba4b Add a profile marker for nsRFPService::GenerateCanvasKeyFromImageData() r=fkilic https://hg.mozilla.org/integration/autoland/rev/bd63a7c8f39e Implement using SipHash to generate canvas randomization key. r=fkilic
Flags: needinfo?(tihuang)
Priority: P1 → P2
See Also: → 1838856
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: