Closed Bug 1580416 Opened 5 years ago Closed 5 years ago

Enhanced Tracking Protection breaks Facebook instantgames

Categories

(Core :: Privacy: Anti-Tracking, defect)

69 Branch
All
Unspecified
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla71
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- disabled
firefox69 + verified
firefox70 + verified
firefox71 + fixed

People

(Reporter: dveditz, Assigned: dimi)

References

(Blocks 1 open bug)

Details

(Keywords: regression, Whiteboard: [rca - design error])

Attachments

(2 files, 1 obsolete file)

Enhanced Tracking Protection is breaking legit Facebook content where the "3rd" party is an affiliated domain. In at least one case the 3rd party fbsbx.com base domain is on Disconnect's list, but the actual 3rd party is a sub-domain of that (and there are a lot of such subdomains: it functions like *.github.io or *.bmoattachments.org).

Examples include https://apps.facebook.com/ppearls/ and any game at https://www.facebook.com/instantgames

Brad Hill writes:

We actually give each game its own subdomain beneath apps.fbsbx.com, which is in the Public Suffix List, because we want to sandbox each game’s content from other developers, as well as from core facebook.com content.

I wonder if this is an edge case where crossing the PSL boundary causes the “same owner” property from the disconnect list (which just includes fbsbx.com) to be lost?

Example error is:

“Request to access cookie or storage on “https://apps-383041995381848.apps.fbsbx.com/instant-bundle/1…on=477&gcgs=1&source=fbinstant-383041995381848&IsMobileWeb=0” was blocked because it came from a tracker and content blocking is enabled.”

Blocks: etp-breakage
No longer blocks: tp-breakage

When I run https://apps.facebook.com/ppearls/, I get the following error on the web console:

Request to access cookie or storage on “https://apps-208295966289297.apps.fbsbx.com/bundle/169676410…vcml0aG0iOiJITUFDLVNIQTI1NiIsImlzc3VlZF9hdCI6MTU2ODIxODU4NH0” was blocked because it came from a tracker and content blocking is enabled.

On about:url-classifier, a whitelist search for https://apps.facebook.com/?resource=fbsbx.com finds the following:

cryptomining-annotation	
URI: https://apps.facebook.com/?resource=fbsbx.com
List of tables: mozstd-trackwhite-digest256
cryptomining-protection	
URI: https://apps.facebook.com/?resource=fbsbx.com
List of tables: mozstd-trackwhite-digest256
fingerprinting-annotation	
URI: https://apps.facebook.com/?resource=fbsbx.com
List of tables: mozstd-trackwhite-digest256
fingerprinting-protection	
URI: https://apps.facebook.com/?resource=fbsbx.com
List of tables: mozstd-trackwhite-digest256
socialtracking-annotation	
URI: https://apps.facebook.com/?resource=fbsbx.com
List of tables: mozstd-trackwhite-digest256
socialtracking-protection	
URI: https://apps.facebook.com/?resource=fbsbx.com
List of tables: mozstd-trackwhite-digest256
tracking-protection	
URI: https://apps.facebook.com/?resource=fbsbx.com
List of tables: mozstd-trackwhite-digest256
tracking-annotation	
URI: https://apps.facebook.com/?resource=fbsbx.com
List of tables: mozstd-trackwhite-digest256
flash-deny	
URI: https://apps.facebook.com/?resource=fbsbx.com
List of tables: except-flash-digest256,except-flash-digest256

A search for https://apps.facebook.com/?resource=apps.fbsbx.com finds the following:

flash-deny	
URI: https://apps.facebook.com/?resource=apps.fbsbx.com
List of tables: except-flash-digest256,except-flash-digest256

Similarly for https://apps.facebook.com/?resource=apps-208295966289297.apps.fbsbx.com:

flash-deny	
URI: https://apps.facebook.com/?resource=apps-208295966289297.apps.fbsbx.com
List of tables: except-flash-digest256,except-flash-digest256

Dimi, is this expected? Should we be collapsing the resource domain to its eTLD+1 when doing a whitelist search? Thanks!

Note here that the eTLD+1 is apps.fbsbx.com not fbsbx.com due to this.

Flags: needinfo?(dlee)
Component: Tracking Protection → Privacy: Anti-Tracking
Product: Firefox → Core

The way that I think we could fix this bug is by:

a) Create a new nsIUrlClassifierFeature::listType called whitelistSuffix and modify UrlClassifierCommon::CreatePairwiseWhiteListURI() to use the public suffix to create the whitelist URI if whitelistSuffix is passed.
b) Merge those into mWhitelistTables.
c) Work with Disconnect to make sure apps.fbsbx.com is added to the resources key under the entry for facebook.com.

Since this will take some time, we may want to consider a skip list fix to address this in a more timely manner for users.

One relevant note here: I was wondering why we don't hit the same corner case when we go to block these resources, and then I remembered Bug 1203635. So Firefox effectively ignores the PSL boundary when checking whether to block a resource but honors it for the whitelist.

(In reply to :ehsan akhgari from comment #1)

Dimi, is this expected? Should we be collapsing the resource domain to its eTLD+1 when doing a whitelist search? Thanks!

Note here that the eTLD+1 is apps.fbsbx.com not fbsbx.com due to this.

Yes, this is expected.
While matching URLs, URL Classifier only collapses domain name to find a match. Since the whitelisted domain is put in the query string (resouece=xxx), we don't collapse it. But while creating a pairwise whitelisted URL to classify, we already use eTLD+1[1] (here). I assume what we have done is also only to include eTLD+1 in the entity list

I want to make sure that if I understand correctly. The proposed solution in Comment 2 is to add an eTLD domain(apps.fbsbx.com) into our entity list and for every whitelist lookup, we will then check both eTLD and eTLD+1. If this is correct, it means that an additional whitelist lookup for every blacklisted URL(just for this apps.fbsbx.com case). And we should make sure this change doesn't have performance impact.

Flags: needinfo?(dlee) → needinfo?(ehsan)

(In reply to Dimi Lee [:dimi][:dlee] from comment #4)

(In reply to :ehsan akhgari from comment #1)

Dimi, is this expected? Should we be collapsing the resource domain to its eTLD+1 when doing a whitelist search? Thanks!

Note here that the eTLD+1 is apps.fbsbx.com not fbsbx.com due to this.

Yes, this is expected.
While matching URLs, URL Classifier only collapses domain name to find a match. Since the whitelisted domain is put in the query string (resouece=xxx), we don't collapse it. But while creating a pairwise whitelisted URL to classify, we already use eTLD+1[1] (here). I assume what we have done is also only to include eTLD+1 in the entity list

Thanks for confirming.

I want to make sure that if I understand correctly. The proposed solution in Comment 2 is to add an eTLD domain(apps.fbsbx.com) into our entity list and for every whitelist lookup, we will then check both eTLD and eTLD+1. If this is correct, it means that an additional whitelist lookup for every blacklisted URL(just for this apps.fbsbx.com case). And we should make sure this change doesn't have performance impact.

Yes, that is correct. Do we have a good way to test the performance impact of this change?

Also, can you think of any alternative solutions here?

Flags: needinfo?(ehsan) → needinfo?(dlee)

(In reply to :ehsan akhgari from comment #5)

Yes, that is correct. Do we have a good way to test the performance impact of this change?

Also, can you think of any alternative solutions here?

I think we could customize how we fragment a whitelisted URL.
Right now, the fragment rule follows the Safe Browsing url-hashing protocol, which is called here.
But for whitelisted URLs, we don't really have to follow the spec because the format of a whitelisted URL is defined by us (For example, fragment a whitelist URL by removing the query string is useless in our scenario).

So for the whitelisted URL, "toplevel.page/?resource=third.party.basedomain", the new fragment function can produce additional fragment in which the leading component of "third.party.basedomain" is removed.
If this method works, it shouldn't impact performance much because we won't create an additional lookup, we just create additional fragments in the same whitelist URL lookup (We can also optimize the method by removing those fragments we don't need for the whitelist).

How do you think?

Flags: needinfo?(dlee) → needinfo?(ehsan)

(In reply to Dimi Lee [:dimi][:dlee] from comment #6)

(In reply to :ehsan akhgari from comment #5)

Yes, that is correct. Do we have a good way to test the performance impact of this change?

Also, can you think of any alternative solutions here?

I think we could customize how we fragment a whitelisted URL.
Right now, the fragment rule follows the Safe Browsing url-hashing protocol, which is called here.
But for whitelisted URLs, we don't really have to follow the spec because the format of a whitelisted URL is defined by us (For example, fragment a whitelist URL by removing the query string is useless in our scenario).

So for the whitelisted URL, "toplevel.page/?resource=third.party.basedomain", the new fragment function can produce additional fragment in which the leading component of "third.party.basedomain" is removed.
If this method works, it shouldn't impact performance much because we won't create an additional lookup, we just create additional fragments in the same whitelist URL lookup (We can also optimize the method by removing those fragments we don't need for the whitelist).

How do you think?

That sounds good! One question though, couldn't we also encounter URLs in the form of "toplevel.page/?resuource=foo.bar.baz" if the user goes to a page with a URL like that (that is, with an actual resource GET parameter)? Does that matter here?

If not, then this sounds like a great plan!

Flags: needinfo?(ehsan)

BTW yesterday we deployed a skip list fix for this bug which effectively temporarily fixes it with a work-around across all version of Firefox, so we will need to only deploy a code fix for this on trunk. I filed bug 1581120 to remove the skip list work-around once the real fix is shipped to the release channel.

(In reply to :ehsan akhgari from comment #7)

That sounds good! One question though, couldn't we also encounter URLs in the form of "toplevel.page/?resuource=foo.bar.baz" if the user goes to a page with a URL like that (that is, with an actual resource GET parameter)? Does that matter here?

If not, then this sounds like a great plan!

No, the new fragments generation function should only be used to whitelisted URLs produced in UrlClassifierCommon::CreatePairwiseWhiteListURI, which means there is no "real" query string in it.

Assignee: nobody → dlee
Status: NEW → ASSIGNED

Great, sounds like a great plan then!

Depends on: 1581160
Flags: qe-verify+
See Also: → 1582123

Managed to verify for 69.0.1, 70.0b7, 71.0a1 (2019-09-17) but leaving the QE+ flag up and tracking it for when the changes.

As discussed with dimi on slack; due to the nature of this bug, filled bug 1582123 for the remaining issue.
Leaving the 71 flag as it is in case something breaks again. Once the issue is fully fixed, will come back and re-check all.

For any change that might occur feel free to update and revert for RC/beta builds.

Flags: qe-verify+
Hardware: Unspecified → All

just an update that the patch is ready. If try result looks good, I'll submit the patch to review tomorrow.

Attachment #9095424 - Attachment is obsolete: true

This patch does the following to support matching a whitelisted URI when
its domain is eTLD+1:

  1. add an URIType to indicate whether a URI is generated by
    UrlClassifierCommoon::CreatePairwiseWhiteListURI(), which crafts a
    whitelist URL like "toplevel.page/?resource=third.party.domain"

  2. call LookupCache::GetLookupWhitelistFragments() if URIType is
    nsIUrlClassifierFeature::pairwiseWhitelistedURI before initiating
    a lookup.

  3. implement LookupCache::GetLookupWhitelistFragments() which creates
    an additional fragment by removing the leading component of
    third.party.domain

Pushed by dlee@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a1237d4bad3d P1. Use a different fragment method when the URI is a pairwiseWhitelistedURI. r=Ehsan https://hg.mozilla.org/integration/autoland/rev/843690981959 P2. Add a testcase to test a match can be found when whitelisted URI is eTLD. r=Ehsan
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla71
Flags: in-qa-testsuite?(bogdan.maris)
Flags: in-qa-testsuite?(bogdan.maris) → in-qa-testsuite+
Blocks: 1587002

Now that Bug 1587002 is fixed I tested https://apps.facebook.com/ppearls/ by first disabling remote settings and then checking in devtools if the apps.fbsbx.com resources were classified. As expected, these resources are no longer marked as tracking on Facebook properties (e.g., https://apps.facebook.com/ppearls/), but are marked as tracking on my test page (https://senglehardt.com/test/trackingprotection/fbsbx.html).

See Also: → 1590944
No longer depends on: 1591200

This bug has been identified as part of a pilot on determining root causes of blocking and dot release drivers.

It needs a root-cause set for it. Please see the list at https://docs.google.com/document/d/1FFEGsmoU8T0N8R9kk-MXWptOPtXXXRRIe4vQo3_HgMw/.

Add the root cause as a whiteboard tag in the form [rca - <cause> ] and remove the rca-needed keyword.

If you have questions, please contact :tmaity.

Keywords: rca-needed
Keywords: rca-needed
Whiteboard: [rca - design error]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: