Closed Bug 1941091 Opened 9 months ago Closed 9 months ago

RFP affects data collection

Categories

(Core :: Privacy: Anti-Tracking, defect, P3)

defect

Tracking

()

RESOLVED FIXED
136 Branch
Tracking Status
firefox136 --- fixed

People

(Reporter: fkilic, Assigned: fkilic)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(2 files, 5 obsolete files)

We only call document's should resist fingerprinting on canvas and since we run our data collection with whatever document we have, we end up applying randomization or outright return placeholder.

See https://searchfox.org/mozilla-central/rev/29e186485fe1b835f05bde01f650e371545de98e/dom/canvas/nsICanvasRenderingContextInternal.cpp#111 for the check. Solution is probably doing a one time exception to toDataURL call and check the principal there.

So that's bad.... How did you figure this out, and where do we see this happening in the data?

I like Variant 2 better, but could we put the double-check in this function? Or does it need to be lower in OffscreenCanvas (and Document??)?

I was comparing two pings, had rfp on and realized some things were different between two pings. Things that are different, on my machine, are frame rate, pointer info, and canvases. I'll look into the code afterwards to fix everything that might be different but are not because my machine matches with the spoofed values.

we put the double-check in this function? Or does it need to be lower in OffscreenCanvas (and Document??)?

I mean we could but to lower the performance impact, I didn't put it there. This change will only affect toDataURL call of Canvas2D (not even webgl webgpu etc.) IFF canvas randomization is on

Attachment #9459185 - Attachment is obsolete: true
See Also: → 1941388
See Also: → 1941389

It seems like canvas, pointer info, window info and frame rate is the only thing that's affected. code executed in usercharacteristics.html is fine, code executed in UserCharacteristicsPageService.sys.mjs is also fine. The issue is with the actors. Should RFP depends on document and our canvas, window, and pointer code runs in a child actor. The patch attached to this bug and the see alsos should fix these issues.

For frame rate, there isn't much we can do. Framerate is initialized once and set globally, we don't spoof it based on document etc.

See Also: → 1941590
See Also: 1941388
See Also: 1941389
Summary: Canvas randomization/placeholder affects fingerprinting data collection → RFP affects data collection
Attachment #9459184 - Attachment is obsolete: true
Severity: -- → S3
Priority: -- → P3
Blocks: fpdata
Depends on: 1941590
See Also: 1941590
Attachment #9459389 - Attachment description: Bug 1941091: Patch some of ShouldResistFingerprinting functions. r?tjr → Bug 1941091: Add system principal check to nsIGlobalObject::ShouldResistFingerprinting functions. Variant 1. r?tjr

To allow actors access non-rfp'ed values, we need to check for principal and don't apply RFP if it is a system principal.

Attachment #9459735 - Attachment is obsolete: true

To allow actors access non-rfp'ed values, we need to check for principal and don't apply RFP if it is a system principal.

Okay, so my understanding is that on the Main Thread of a website there is one JSContext object that lives for the life of the page. That JSContext object can have its principal change - using a stack-based model - when other 'things' run code in the context of the page. The two big examples are an Extension's Content Script and code from an actor. The latter gets the System Principal. Therefore the principal of a page might be a ContentPrincipal of e.g. example.com but the principal of the JSContext might be the System Principal. (Or maybe an ExpandedPrincipal. But definetly it could be the System Principal.)

The problem we're dealing with is that when System-privileged code is running in the context of the page we don't want anti-fingerprinting behaviors to apply to that code. But that page's principal and all the indicators say "Apply fingerprinting protections". But checking the JSContext (at time of the call, you can't do it ahead of time) can be a last-minute savior. And we unfortunately seem to need it.

As an aside, we aren't planning on exempting Extensions' Content Scripts - there is a logical reason to do so, and if someone came and asked we might, but presently it seems like a footgun where an errant extension could leak data to the page. It'd also be a footgun for bugs - if an extension read the timestamps from the performance object (like PerformanceMarks) - those timestamps wouldn't match the ones the page sees!

So getting the JScontext on the main thread isn't a big deal, the thing we've been grappling with is off-main-thread. There's Stylo, but we just exclude all situations where IsInServoTraversal() is true and don't worry about it. The only non-main-thread situations we would care about are where there is running unprivileged javascript that gets temporarily elevated to privileged. And my theory is this doesn't happen.

Elevated javascript (on the main thread) can spawn a worker, but if they do, that worker will be System Privileged for its entire life - it isn't temporarily elevated. (I think. That's what I'm inferring from this code anyway. )

Worklets are much more simple things, I doubt anyone has ever tried to spawn a worklet from elevated javascript. If they did this code looks like it would not inherit any privilege from the elevated JSContext.

As I said - my theory is that one cannot temporarily elevate the privileges of the JSContext attached to a Worker or Worklet, or more precisely - any off-main-thread GlobalObject (which are the objects with a JSContext.)

To test this I put code in IsRFPEnabledFor():

  auto* workerCx = dom::GetCurrentWorkerThreadJSContext();
  if (workerCx) {
    JS::Realm* realm = js::GetContextRealm(workerCx);
    MOZ_ASSERT(realm);

    JSPrincipals* principals = JS::GetRealmPrincipals(realm);
    auto workerCtxPrin = nsJSPrincipals::get(principals);

    auto* wp = dom::GetWorkerPrivateFromContext(workerCx);
    MOZ_ASSERT(wp, "Did not get WorkerPrivate from context");
    MOZ_ASSERT(
        !workerCtxPrin || !wp ||
            wp->UsesSystemPrincipal() == workerCtxPrin->IsSystemPrincipal() ||
            workerCtxPrin->GetIsNullPrincipal(),
        "System Principal mismatch between WorkerPrivate and Context");
  }

It looks for a situation where a Worker's Principal is not System but the JSContext's principal is. Running this code on the test suite produces no crashes.

Note that workerCtxPrin->GetIsNullPrincipal() is needed because a worker (privileged or not) can call SimpleGlobalObject::Create (through at least SecurityPolicyViolationEventInit::Init, WorkerDebuggerGlobalScope::CreateSandbox, and JsonWebKey::Init) - and the Worker will have the SystemPrincipal, but this GlobalObject (living on the same thread as the Worker) will not. Interestingly I found that it won't have a principal at all (because it's a worker thread and we pass nullptr there.) jandem did find a comment on Bug 1246153 "We shouldn't be creating any JS globals on the main thread without a principal. The best thing to do here is probably to check NS_IsMainThread, and mint a new nsNullPrincipal in that case." Which kind of explains how we got here, but not why having the principal being nullptr is okay and it doesn't need to be a NullPrincipal. I'm guessing that it should be a NullPrincipal from a correctness standpoint, but it also just kind of works with a nullptr as far as we know because we are deep deep in situations that are unlikely to occur naturally.

As another aside "deep deep in situations that are unlikely to occur naturally" is the perfect place to find security bugs, and making the wrong decision about a principal is a great one. What could happen is we have an off-the-main-thread JS Sandbox or Global or something that has a null principal, and when we do some sort of security check on it, we pass the check if the principal is null instead of performing some sort of privilege check that a non-SystemPrincipal would fail. I spot-checked this situation, and nothing jumped out at me from this list. Although I did see that this could be a null pointer derefence? In general I'd still be more comfortable if we made this a NullPrincipal. (BONUS ASIDE: nika said we can use the NullPrincipal, so I'm testing that.)

Popping the stack back to the theory is that one cannot temporarily elevate the privileges of the JSContext attached to a Worker or Worklet - I tried to Debug a Worker in DevTools and muck with an object in that Worker from the DevTools console just in case that triggered anything but it did not.

So, given the above investigate and the data that seems to support this theory, we only need to perform this check on the main thread, but a DEBUG check that watches Workers is advisable.

I have a patch and am testing it on try.

The try run here and the two patches 1 2 has some test failures and some assertions.

#01: nsContentUtils::GetCurrentJSContext() [dom/base/nsContentUtils.cpp:6438]
#02: mozilla::nsRFPService::IsRFPEnabledFor(bool, mozilla::RFPTarget, mozilla::Maybe<mozilla::RFPTarget> const&) [toolkit/components/resistfingerprinting/nsRFPService.cpp:302]
#03: nsContentUtils::ShouldResistFingerprinting(nsIChannel*, mozilla::RFPTarget) [dom/base/nsContentUtils.cpp:2443]
#04: mozilla::net::HttpBaseChannel::Init(nsIURI*, unsigned int, mozilla::net::nsProxyInfo*, unsigned int, nsIURI*, unsigned long long, ExtContentPolicyType, nsILoadInfo*) [netwerk/protocol/http/HttpBaseChannel.cpp:411]
#05: mozilla::net::nsHttpHandler::SetupChannelInternal(mozilla::net::HttpBaseChannel*, nsIURI*, nsIProxyInfo*, unsigned int, nsIURI*, nsILoadInfo*, nsIChannel**) [netwerk/protocol/http/nsHttpHandler.cpp:2079]
#06: mozilla::net::nsHttpHandler::CreateTRRServiceChannel(nsIURI*, nsIProxyInfo*, unsigned int, nsIURI*, nsILoadInfo*, nsIChannel**) [netwerk/protocol/http/nsHttpHandler.cpp:2116]
#07: mozilla::net::DNSUtils::CreateChannelHelper(nsIURI*, nsIChannel**) [netwerk/dns/DNSUtils.cpp:72]
#08: mozilla::net::TRR::SendHTTPRequest() [netwerk/dns/TRR.cpp:290]
#09: mozilla::net::TRR::Run() [netwerk/dns/TRR.cpp:130]

It turns out that nsContentUtils is not initialized - I didn't even know it got initialized. I think that's the socket process, we can exempt that process from this new check pretty easily, along with several other types.

I submitted another try run that removes the NullPrincipal patch (in case it was messing anything up) and exempts some process types. Hopefully that addresses all the assertions and we're only left with test failures...

(In reply to Tom Ritter [:tjr] from comment #11)

I submitted another try run that removes the NullPrincipal patch (in case it was messing anything up) and exempts some process types. Hopefully that addresses all the assertions and we're only left with test failures...

This looks like it's taken care of all of the assertions.

There are a bunch of RFP test failures that I am guessing weren't included in the original fix-the-tests patch.

image/test/mochitest/test_discardFramesAnimatedImage.html is a bit of an unusual failure that claims to be new. I retriggered it a few times (but the retriggers may have failed...?.

It also looks like we've caused a leak YOU ARE LEAKING THE WORLD (at least one JSRuntime and everything alive inside it, that is) AT JS_ShutDown TIME. FIX THIS! - I tried to reproduce this with pernosco.

There are a bunch of RFP test failures that I am guessing weren't included in the original fix-the-tests patch.

I think it is happening because we moved IsChrome check to IsRFPEnabledFor check. Many parts of test code run with chrome privileges, so we'll have to manually fix all of them. I'll work on it.

edit: i also used try auto while you used debug test which might have covered more stuff.

Hmm this is breaking a lot more things than we expect. All the ChromeUtils.shouldResistFingerprint return false because, well, it is for chrome contexts and we exempt them.

I'm having a real issue with runActualTest function and the tests it references. I tried creating a tab with content principal, playing around with wrappedJSObject but couldn't make it work. I can think of two other options, porting all the tests to simple mochi tests (which I'm not sure if possible, especially the PBM ones) or disabling principal check for every test. The latter option is easy but it is also wrong in some instances (see devtools's color schema simulation test, if we were to disable principal check, we would only make it pass in testing but it won't work actually), so the only option seem to be porting them to simple tests. I tried many things including creating a sandbox but couldn't make it work. So I'm just going to work on converting them to simple mochi tests

edit: (not actually converting each test but testing principal check works in test cases that use runActualTest, like testA-H, defaults and defaultPBM etc.)

:gjis's answer helped me solve the spawn using system principal issue, but now for some reason, test G and H seem to be broken.

Of all the tests that use runActualTest function, G and H is 100% broken. Also almost all of canvas randomization tests are broken (which is quite a lot). I can't tell what's special about G and H. I was thinking maybe if I could understand what's wrong with G and H, I could have some pointers, but not sure what's going on there.

Update: :tjr suggested it might be because of this line here. It turned out that because JS context is using system principal during testing, when we create a new tab, we end up not enabling RFP for pages.

Attachment #9460034 - Attachment is obsolete: true
Attachment #9459389 - Attachment description: Bug 1941091: Add system principal check to nsIGlobalObject::ShouldResistFingerprinting functions. Variant 1. r?tjr → Bug 1941091: Add system principal check to nsRFPService::IsRFPEnabledFor. r?tjr
Attachment #9461077 - Attachment is obsolete: true
Pushed by fkilic@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b8f50e4765b8 Add system principal check to nsRFPService::IsRFPEnabledFor. r=tjr https://hg.mozilla.org/integration/autoland/rev/3733b3b7bab3 Fix broken tests. r=tjr
Status: ASSIGNED → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → 136 Branch
Regressions: 1944419
Regressions: 1944864
See Also: → 1945142
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: