Closed Bug 1141352 Opened 5 years ago Closed 4 years ago

Enable pairwise allowlisting for tracking protection

Categories

(Core :: DOM: Security, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla42
Tracking Status
firefox42 --- fixed

People

(Reporter: mmc, Assigned: francois)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

We want to be able to specify (first party, third party) pairs to skip tracking protection enforcement. One way to do this:

1) Create a new error code, NS_ERROR_PAIRWISE_ALLOWLIST
2) Create a new table with name mozpub-trackexceptions-digest256 with URLs of format https://thirdparty.com/?firstparty=firstparty.com (or similar) and add them in safebrowsing format to a digest256 table
3) Register this table to get list updates if tracking protection is enabled in https://mxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/SafeBrowsing.jsm.
4) Modify toolkit/components/url-classifier/nsUrlClassifierDBService.cpp::TablesToResponse to return NS_ERROR_PAIRWISE_ALLOWLIST if a table match is found with name in 2)
5) In nsChannelClassifier::ShouldEnableTrackingProtection, after extracting the toplevel URI for the permission manager check, canonicalize toplevel URI to eTLD+1, synthesize a pairwise URL with form from 2), run ClassifyLocal against this principal and check the return value for NS_ERROR_PAIRWISE_ALLOWLIST. If so, skip tracking protection enforcement (return false).

This is maybe not a great way to do things because it adds an extra ClassifyLocal check before every call to Classify, if tracking protection is enabled. Another place to check might be right before the channel is cancelled.
Blocks: 1029886
No longer blocks: tp-breakage
A better approach might be to put the call to ClassifyLocal (to check the pairwise allowlist) to right before cancelling the channel in nsChannelClassifier, so that we only have 3 calls to ClassifyLocal (1 to stop speculative loads, 1 as part of nsUrlClassifierDBService::Classify, and 1 if the channel should be cancelled) if the load might be blocked, and 2 otherwise (no change).

We have to look up different URIs for the allowlist ClassifyLocal check (because the URI will represent the pair of URLs being allowlisted, and the other check is the URI of the channel being loaded) so at least 2 checks are necessary in any case.
WIP patch: https://bitbucket.org/fmarier/mozilla-central-mq-1141352
Assignee: nobody → francois
Status: NEW → ASSIGNED
OS: Mac OS X → All
Hardware: x86 → All
Francois,

I've been thinking about the design here and was considering a different approach.

Instead of having pairs of (parent, child), what about instead stealing the bottom
32-bits of the hash to represent "organization". Then you could have a separate
shavar list that contains non-tracker parents. So, instead of looking up the whitelist
for the domain, you'd look up the parent domain to get the organization and compare
it to the organization in the tracker domain.

wdyt?
The problem with that alternative design is that we use all 256 bits of the hash. The TP list is not in the shavar format, it's in the digest256 format.
(In reply to François Marier [:francois] from comment #4)
> The problem with that alternative design is that we use all 256 bits of the
> hash. The TP list is not in the shavar format, it's in the digest256 format.

I'm not understanding what "we use all 256-bits of the hash" means.


Design a new hash algorithm SHA-256X that takes two inputs:

1. The URI
2. The "organization-id


It then outputs a string that is:

SHA256X=SHA-256(URI)[0..27] || organization[0..3]

You store every URI in the TP-flavored list as a SHA-256X hash
which URI being the URI and organization being an id from the
organization list.

Then to look up, you proceed as follows, setting P to the parent
URI and C to the child URI.

1. Upon initial request, compute
SHA-256X(C, 0) = H. You then look up H[0..3] in the
local database as usual and if there's a non-match you return
"ACCEPT".

2. If there's match you retrieve all the blocks up to H[0..3].
Call that B_C. And also retrieve all the blocks for
SHA-256X(P, 0)[0..3]. Call that B_C.

3. Iterate through the list of blocks in B_C looking for an
entry B_C_i st B_C_i[0..27] == H[0..27]. If there is no match,
return "ACCEPT"

4. Iterate through the list of blocks in B_P looking for an entry
B_P_i st B_P_i[0..27] == SHA-256X(P, 0)[0..27]. If there is no
match, return "BLOCK" (because there is no acceptable parent). 
If there is match, compare B_C_i[27..31] to B_P_i[27..31]. If
they match, return "ACCEPT". Otherwise, return "BLOCK"

The advantage of this algorithm is that it looks like a list
of SHA-256s, but it's actually not. So you can modify (trivially)
the list formation code and (somewhat less trivially, but without
too much difficulty) the lookup code on the client. And if you decide
you don't want this grouping feature, you can fall back by
not retrieving B_P and replacing Step 4 with "Return BLOCK"

Can you explain to me why this won't work
I just had a chat with ekr to clarify his proposal:

- two lists of URLs are downloaded by the client: blacklist of sub-resources and list of top-level sites
- the sha256 hash of each URL is truncated to 224 bits and the rest is used to encode the ID of the owner

The lookup looks like this:

1. look for the SHA-224 hash of the sub-resource in the blacklist. Not there -> ACCEPT
2. look for the SHA-224 hash of the parent in the second list. Not there -> BLOCK
3. compare the owner ID of the sub-resource with the one for the parent. Same -> ACCEPT, different -> BLOCK

The advantages of this scheme are:

- smaller list (no need for the cross product of top-level sites x sub-resources in the entity whitelist)
- use of real URLs (which also means we don't have to do ETLD+1 conversions and can use the regular safe browsing normalization)

The disadvantages are:

- we've actually started on the other approach
- probably requires more involved client changes
(In reply to François Marier [:francois] from comment #6)
> I just had a chat with ekr to clarify his proposal:
> 
> - two lists of URLs are downloaded by the client: blacklist of sub-resources
> and list of top-level sites
> - the sha256 hash of each URL is truncated to 224 bits and the rest is used
> to encode the ID of the owner
> 
> The lookup looks like this:
> 
> 1. look for the SHA-224 hash of the sub-resource in the blacklist. Not there
> -> ACCEPT
> 2. look for the SHA-224 hash of the parent in the second list. Not there ->
> BLOCK
> 3. compare the owner ID of the sub-resource with the one for the parent.
> Same -> ACCEPT, different -> BLOCK
> 
> The advantages of this scheme are:
> 
> - smaller list (no need for the cross product of top-level sites x
> sub-resources in the entity whitelist)
> - use of real URLs (which also means we don't have to do ETLD+1 conversions
> and can use the regular safe browsing normalization)
> 
> The disadvantages are:
> 
> - we've actually started on the other approach
> - probably requires more involved client changes

I think it would be fine to proceed along either line, though it might
be worth looking at the size of the list with your proposal to make
a decision.
QA Contact: mwobensmith
Depends on: 1189087
Bug 1141352 - add a pairwise allowlist to tracking protection. r?gcp
Attachment #8641939 - Flags: review?(gpascutto)
Comment on attachment 8641939 [details]
MozReview Request: Bug 1141352 - add a pairwise allowlist to tracking protection. r?gcp

Bug 1141352 - add a pairwise allowlist to tracking protection. r?gcp
Blocks: 1182876
Blocks: 1149825
https://reviewboard.mozilla.org/r/14661/#review13289

::: netwerk/base/nsChannelClassifier.cpp:539
(Diff revision 2)
> +    LOG(("nsChannelClassifier[%p]: No window URI\n", this));

nit: inconsistent use of \n in LOG

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1206
(Diff revision 2)
> +      tables.Append(',');

Potentially the same bug.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1073
(Diff revision 2)
> +  if (!tables.IsEmpty()) {

Does this construct not break if https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#1045 is not set?

The list would end up looking like ",foo-table-here".

The issue already exists with the existing code but let's not make it worse.

::: toolkit/components/url-classifier/SafeBrowsing.jsm:242
(Diff revision 2)
>                  "a:" + (i + 1) + ":32:" + trackerURL.length + "\n" +

Does this actually work? I'm baffled. You're sending:

ad:1
a:blah
ad:1
a:foo

And expecting blah to survive. If it does I'm afraid that may be a side effect you can't rely on?
Comment on attachment 8641939 [details]
MozReview Request: Bug 1141352 - add a pairwise allowlist to tracking protection. r?gcp

Bug 1141352 - add a pairwise allowlist to tracking protection. r?gcp
Thanks for the feedback gcp. I've addressed your comments and also added two more tests.
Comment on attachment 8641939 [details]
MozReview Request: Bug 1141352 - add a pairwise allowlist to tracking protection. r?gcp

https://reviewboard.mozilla.org/r/14663/#review13729

Ship It!
Attachment #8641939 - Flags: review?(gpascutto) → review+
https://hg.mozilla.org/mozilla-central/rev/c58b7d331f5b
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
In the end, I went for the original pairwise allowlist approach due to time constraints and the need to coordinate with other teams.

I have however filed bug 1196995 so that we look into doing this better when we have more time.
You need to log in before you can comment on or make changes to this bug.