1141352 - Enable pairwise allowlisting for tracking protection

[:mmc] Monica Chew (no longer reading bugmail)

Reporter

Description

•

9 years ago

We want to be able to specify (first party, third party) pairs to skip tracking protection enforcement. One way to do this:

1) Create a new error code, NS_ERROR_PAIRWISE_ALLOWLIST
2) Create a new table with name mozpub-trackexceptions-digest256 with URLs of format https://thirdparty.com/?firstparty=firstparty.com (or similar) and add them in safebrowsing format to a digest256 table
3) Register this table to get list updates if tracking protection is enabled in https://mxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/SafeBrowsing.jsm.
4) Modify toolkit/components/url-classifier/nsUrlClassifierDBService.cpp::TablesToResponse to return NS_ERROR_PAIRWISE_ALLOWLIST if a table match is found with name in 2)
5) In nsChannelClassifier::ShouldEnableTrackingProtection, after extracting the toplevel URI for the permission manager check, canonicalize toplevel URI to eTLD+1, synthesize a pairwise URL with form from 2), run ClassifyLocal against this principal and check the return value for NS_ERROR_PAIRWISE_ALLOWLIST. If so, skip tracking protection enforcement (return false).

This is maybe not a great way to do things because it adds an extra ClassifyLocal check before every call to Classify, if tracking protection is enabled. Another place to check might be right before the channel is cancelled.

[:mmc] Monica Chew (no longer reading bugmail)

Reporter

Updated

•

9 years ago

Blocks: tp-breakage

[:mmc] Monica Chew (no longer reading bugmail)

Reporter

Updated

•

9 years ago

Blocks: 1029886
No longer blocks: tp-breakage

[:mmc] Monica Chew (no longer reading bugmail)

Reporter

Comment 1

•

9 years ago

A better approach might be to put the call to ClassifyLocal (to check the pairwise allowlist) to right before cancelling the channel in nsChannelClassifier, so that we only have 3 calls to ClassifyLocal (1 to stop speculative loads, 1 as part of nsUrlClassifierDBService::Classify, and 1 if the channel should be cancelled) if the load might be blocked, and 2 otherwise (no change).

We have to look up different URIs for the allowlist ClassifyLocal check (because the URI will represent the pair of URLs being allowlisted, and the other check is the URI of the channel being loaded) so at least 2 checks are necessary in any case.

François Marier [:francois]

Assignee

Comment 2

•

9 years ago

WIP patch: https://bitbucket.org/fmarier/mozilla-central-mq-1141352

Assignee: nobody → francois

Status: NEW → ASSIGNED

François Marier [:francois]

Assignee

Updated

•

9 years ago

OS: Mac OS X → All

Hardware: x86 → All

Eric Rescorla (:ekr)

Comment 3

•

9 years ago

Francois,

I've been thinking about the design here and was considering a different approach.

Instead of having pairs of (parent, child), what about instead stealing the bottom
32-bits of the hash to represent "organization". Then you could have a separate
shavar list that contains non-tracker parents. So, instead of looking up the whitelist
for the domain, you'd look up the parent domain to get the organization and compare
it to the organization in the tracker domain.

wdyt?

François Marier [:francois]

Assignee

Comment 4

•

9 years ago

The problem with that alternative design is that we use all 256 bits of the hash. The TP list is not in the shavar format, it's in the digest256 format.

Eric Rescorla (:ekr)

Comment 5

•

9 years ago

(In reply to François Marier [:francois] from comment #4)
> The problem with that alternative design is that we use all 256 bits of the
> hash. The TP list is not in the shavar format, it's in the digest256 format.

I'm not understanding what "we use all 256-bits of the hash" means.


Design a new hash algorithm SHA-256X that takes two inputs:

1. The URI
2. The "organization-id


It then outputs a string that is:

SHA256X=SHA-256(URI)[0..27] || organization[0..3]

You store every URI in the TP-flavored list as a SHA-256X hash
which URI being the URI and organization being an id from the
organization list.

Then to look up, you proceed as follows, setting P to the parent
URI and C to the child URI.

1. Upon initial request, compute
SHA-256X(C, 0) = H. You then look up H[0..3] in the
local database as usual and if there's a non-match you return
"ACCEPT".

2. If there's match you retrieve all the blocks up to H[0..3].
Call that B_C. And also retrieve all the blocks for
SHA-256X(P, 0)[0..3]. Call that B_C.

3. Iterate through the list of blocks in B_C looking for an
entry B_C_i st B_C_i[0..27] == H[0..27]. If there is no match,
return "ACCEPT"

4. Iterate through the list of blocks in B_P looking for an entry
B_P_i st B_P_i[0..27] == SHA-256X(P, 0)[0..27]. If there is no
match, return "BLOCK" (because there is no acceptable parent). 
If there is match, compare B_C_i[27..31] to B_P_i[27..31]. If
they match, return "ACCEPT". Otherwise, return "BLOCK"

The advantage of this algorithm is that it looks like a list
of SHA-256s, but it's actually not. So you can modify (trivially)
the list formation code and (somewhat less trivially, but without
too much difficulty) the lookup code on the client. And if you decide
you don't want this grouping feature, you can fall back by
not retrieving B_P and replacing Step 4 with "Return BLOCK"

Can you explain to me why this won't work

François Marier [:francois]

Assignee

Comment 6

•

9 years ago

I just had a chat with ekr to clarify his proposal:

- two lists of URLs are downloaded by the client: blacklist of sub-resources and list of top-level sites
- the sha256 hash of each URL is truncated to 224 bits and the rest is used to encode the ID of the owner

The lookup looks like this:

1. look for the SHA-224 hash of the sub-resource in the blacklist. Not there -> ACCEPT
2. look for the SHA-224 hash of the parent in the second list. Not there -> BLOCK
3. compare the owner ID of the sub-resource with the one for the parent. Same -> ACCEPT, different -> BLOCK

The advantages of this scheme are:

- smaller list (no need for the cross product of top-level sites x sub-resources in the entity whitelist)
- use of real URLs (which also means we don't have to do ETLD+1 conversions and can use the regular safe browsing normalization)

The disadvantages are:

- we've actually started on the other approach
- probably requires more involved client changes

Eric Rescorla (:ekr)

Comment 7

•

9 years ago

(In reply to François Marier [:francois] from comment #6)
> I just had a chat with ekr to clarify his proposal:
> 
> - two lists of URLs are downloaded by the client: blacklist of sub-resources
> and list of top-level sites
> - the sha256 hash of each URL is truncated to 224 bits and the rest is used
> to encode the ID of the owner
> 
> The lookup looks like this:
> 
> 1. look for the SHA-224 hash of the sub-resource in the blacklist. Not there
> -> ACCEPT
> 2. look for the SHA-224 hash of the parent in the second list. Not there ->
> BLOCK
> 3. compare the owner ID of the sub-resource with the one for the parent.
> Same -> ACCEPT, different -> BLOCK
> 
> The advantages of this scheme are:
> 
> - smaller list (no need for the cross product of top-level sites x
> sub-resources in the entity whitelist)
> - use of real URLs (which also means we don't have to do ETLD+1 conversions
> and can use the regular safe browsing normalization)
> 
> The disadvantages are:
> 
> - we've actually started on the other approach
> - probably requires more involved client changes

I think it would be fine to proceed along either line, though it might
be worth looking at the size of the list with your proposal to make
a decision.

Matt Wobensmith [:mwobensmith][:matt:]

Updated

•

9 years ago

QA Contact: mwobensmith

François Marier [:francois]

Assignee

Updated

•

9 years ago

Depends on: 1189087

François Marier [:francois]

Assignee

Comment 8

•

9 years ago

Attached file MozReview Request: Bug 1141352 - add a pairwise allowlist to tracking protection. r?gcp — Details

Bug 1141352 - add a pairwise allowlist to tracking protection. r?gcp

Attachment #8641939 - Flags: review?(gpascutto)

François Marier [:francois]

Assignee

Comment 9

•

9 years ago

Comment on attachment 8641939 [details]
MozReview Request: Bug 1141352 - add a pairwise allowlist to tracking protection. r?gcp

Bug 1141352 - add a pairwise allowlist to tracking protection. r?gcp

François Marier [:francois]

Assignee

Updated

•

9 years ago

Blocks: 1182876

François Marier [:francois]

Assignee

Updated

•

9 years ago

Blocks: 1149825

François Marier [:francois]

Assignee

Comment 10

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=8b172f6704f6

Gian-Carlo Pascutto [:gcp]

Comment 11

•

9 years ago

https://reviewboard.mozilla.org/r/14661/#review13289

::: netwerk/base/nsChannelClassifier.cpp:539
(Diff revision 2)
> +    LOG(("nsChannelClassifier[%p]: No window URI\n", this));

nit: inconsistent use of \n in LOG

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1206
(Diff revision 2)
> +      tables.Append(',');

Potentially the same bug.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1073
(Diff revision 2)
> +  if (!tables.IsEmpty()) {

Does this construct not break if https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#1045 is not set?

The list would end up looking like ",foo-table-here".

The issue already exists with the existing code but let's not make it worse.

::: toolkit/components/url-classifier/SafeBrowsing.jsm:242
(Diff revision 2)
>                  "a:" + (i + 1) + ":32:" + trackerURL.length + "\n" +

Does this actually work? I'm baffled. You're sending:

ad:1
a:blah
ad:1
a:foo

And expecting blah to survive. If it does I'm afraid that may be a side effect you can't rely on?

François Marier [:francois]

Assignee

Comment 12

•

9 years ago

Comment on attachment 8641939 [details]
MozReview Request: Bug 1141352 - add a pairwise allowlist to tracking protection. r?gcp

Bug 1141352 - add a pairwise allowlist to tracking protection. r?gcp

François Marier [:francois]

Assignee

Comment 13

•

9 years ago

Thanks for the feedback gcp. I've addressed your comments and also added two more tests.

Gian-Carlo Pascutto [:gcp]

Comment 14

•

9 years ago

Comment on attachment 8641939 [details]
MozReview Request: Bug 1141352 - add a pairwise allowlist to tracking protection. r?gcp

https://reviewboard.mozilla.org/r/14663/#review13729

Ship It!

Attachment #8641939 - Flags: review?(gpascutto) → review+

Pulsebot

Comment 15

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/c58b7d331f5b

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 16

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/c58b7d331f5b

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

status-firefox42: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla42

François Marier [:francois]

Assignee

Comment 17

•

9 years ago

In the end, I went for the original pairwise allowlist approach due to time constraints and the need to coordinate with other teams.

I have however filed bug 1196995 so that we look into doing this better when we have more time.