Closed Bug 1461822 Opened 6 years ago Closed 5 years ago

Investigate the impact of blocking cookies from third-party resources on the tracking protection list


(Firefox :: Protections UI, enhancement, P3)






(Reporter: englehardt, Assigned: englehardt)


(Blocks 1 open bug)



(2 files, 1 obsolete file)

We are interested in knowing whether blocking cookies from resources on the tracking protection list causes those resources to load with significantly different content. 

We can measure this impact by diffing across three web crawls, two which don't block cookies and one which does. As a first pass we can examine the scripts which serve the same content when cookies are enabled, but different content when cookies are blocked.
Blocks: 1461921
No longer blocks: antitracking
This bug should either be closed/won't fix or changed to indicate we are testing the development done for  Steve, please update accordingly.  Thanks!
Flags: needinfo?(senglehardt)
It looks like the bug is still a valid issue to look into to me...
Blocks: cookierestrictions
No longer blocks: 1461921
Let's keep this open to test the development work done for Bug 1473978.
Flags: needinfo?(senglehardt)
Summary: Investigate the impact of blocking cookies from third-party resources on the tracking protection list → Investigate the impact of the new cookie restrictions applied to trackers
This testing will use the prototype developed in Reverting the title to reflect this.
Summary: Investigate the impact of the new cookie restrictions applied to trackers → Investigate the impact of blocking cookies from third-party resources on the tracking protection list
Attached file bug1461822.html
Attached file bug1461822-diffs.ipynb (obsolete) —
Attached file bug1461822-diffs.html
Attachment #9034190 - Attachment is obsolete: true
## Data collection

We crawled the homepages of the top 10k Alexa sites using OpenWPM [0] with the following configuration:
* 2 crawls set to strip cookie headers from third-party http requests and responses matching the disconnect tracking protection list. We did not restrict document.cookie or other JS-based storage access due to technical limitations in the instrumentation
* 2 crawls which allowed all cookies

All crawls did not save state between the page visits.

## Summary of findings

The attached notebook (bug1461822.html) contains the full analysis, which I summarize here. Overall, I did not find any strong indications of wide-scale breakage, but did run into a couple limitations in both the data collection and analysis (listed below) that, if solved, would give me more confidence in the results.

The majority of the differences in content between the two test conditions appears to be advertising-related. In particular, I see a reduction in the amount of cookie syncing / number of pixels loaded when cookies are blocked (which is expected). I also see that different sets of ads and ad scripts are loaded between the two crawls (where the number of differences are relatively low compared to pixel differences). I don’t expect either of these to lead to user-visible breakage.

I do not observe any essential content that is consistently different between the two test conditions across a large number of sites. I also did not observe any breakage when manually examining a small sample of the (relatively noisy) differences that are only present on a small number of sites.

1. Differences in the set of resources loaded

The crawls with cookies blocked loaded resources from ~3 fewer eTLDs than crawls that did not have cookies blocked. To examine the missing resources, I look at the set of resources loaded in both crawls that don’t block cookies, but in neither of the crawls that do block cookies. These are defined as “missing resources”. All resources are normalized to hostname + path for comparison.

Missing images:
The majority of the missing resources are images (7k / 10k distinct urls, 53k / 76k instances). I re-requested about 1,600 of the more popular resources and found that half of them were 1x1 pixels. These accounted for ~29k / 53k missed images. Many of the other commonly missed image sizes were standard advertisement sizes (e.g., 300x250). This doesn’t necessarily mean that no advertisement was shown in the cookie block crawls; it could also be that a different ad was shown (possibly by chance). 

I filtered the missing images by those that contain ad related keywords (ad, pixel, cookie, banner, px, uid, sync, match, tag, beacon, or a size AAAxBBB) and looked a sample of the remaining images. They were mostly avatar and thumbnail images (which will naturally rotate on the homepage of sites).

Other missing content:
To summarize the non-image content that was missing in the blocked crawls (3k / 10k urls, 23k / 76k instances).
image - 0.80 (42099/52844) instances contain ad-related words
subdocument - 0.80 (9776/12258) instances contain ad-related words 
script - 0.72 (6291/8779) instances contain ad-related words
xhr - 0.66 (609/926) instances contain ad-related words
stylesheet - 0.49 (171/348) instances contain ad-related words
media - 0.46 (6/13) instances contain ad-related words 
imageset - 0.37 (65/175) instances contain ad-related words
object - 0.31 (4/13) instances contain ad-related words 
font - 0.21 (40/191) instances contain ad-related words
beacon - 0.15 (11/74) instances contain ad-related words 
fetch - 0.04 (8/220) instances contain ad-related words 
websocket - 0.00 (0/10) instances contain ad-related words

As can be seen, the majority of the missing content was from URLs that contain ad-related keywords. It would be interesting to dig further into these differences, but I did not during this initial pass. Overall, I am making the assumption that missing / different ad scripts and subdocuments are unlikely to lead to user-visible breakage.

Instead, I manually examined samples of the remaining “non-ad” resources to see if there were login / captcha / social buttons present and check if I could reproduce the load failures. These had very low load counts (at most 52 sites), and largely seemed to be false positives (i.e., the two blocked crawls happened to fail to load the resource). I did not observe any breakage in the sites I checked manually, nor was I able to reproduce the load failures. We’d need to work out a way to reduce these false positives before it would make sense to continue manual testing.

2. Differences in the content loaded

We can also examine differences in the content of scripts loaded across the four crawls. Again, we want to look for content that is the same for the crawls that don’t block cookies, the same for the crawls that do block cookies, but different between the two pairs of crawls. This analysis was quite limited do to limitations (1) and (3) below, but I’ll summarize what I was able to do given the limitations.

I don’t see any popular scripts that are consistently different across a large number of sites. The 20 most common scripts that are flagged by the method outlined above are flagged on between 9 and 205 sites (out of ~9k sites that loaded successfully across the 4 crawls). Many of these scripts are much more popular, pointing to minification as a likely reason we’re seeing such a small percentage flagged (i.e., the two measurements within each test condition happened to get the same minified version of the script on a small number of sites).

The attached bug1461822-diffs.ipynb gives a classification of some of the diffs observed in the more common scripts. The majority of these were uninteresting: a difference in some ID in the script, or an entirely noisy diff due to minification. I’ve included snippets and descriptions for the more interesting cases at the bottom of the file.

## Limitations

1. The storage format for JS content is very slow to load, making it prohibitively expensive to load and compute differences for all scripts loaded during the crawl. Fixing would fix this.
2. AB testing (not cookie blocking) could be the cause of any inconsistent results we see for individual scripts. It may simply be the case that the two cookie block crawls fell into bucket A and the unblocked crawls fell into bucket B. Increasing the number of crawls per test condition would help.
3. Diffing across minified scripts that change variable names with each load is largely ineffective. This practice appears to be pretty common for the popular scripts. Diffing with something like gumtree [1] that uses ASTs is a possible future analysis.
4. The crawls do not block JS storage access. This is a limitation because we’re using an old version of Firefox [0] in which we replicated the cookie blocking logic present in more recent versions. The work being done for [2] will help remove this limitation for future measurements.

Resolving this since I'm not planning to investigate any further with the current dataset.
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.