Closed Bug 1475424 Opened 6 years ago Closed 5 years ago

[Shield] FastBlock Study: Comparison of Tracker Blocking, beta 63

Categories

(Shield :: Shield Study, defect)

defect
Not set
normal

Tracking

(firefox63+ fixed)

RESOLVED FIXED
Tracking Status
firefox63 + fixed

People

(Reporter: julie, Unassigned)

References

Details

(Whiteboard: [shield-ended])

User Story

PHD: https://docs.google.com/document/d/1LQdOFIZeoiD38NNMxvpm32bX6Urnv0KbEsGfgAuw5L8/edit

Add-On Repository: https://github.com/mozilla/FastBlockShield

Attachments

(4 files, 15 obsolete files)

4.83 KB, text/plain
chutten
: review+
Details
50 bytes, text/x-github-pull-request
nhnt11
: review+
Details | Review
51 bytes, text/x-github-pull-request
nhnt11
: review+
Details | Review
98.44 KB, application/x-xpinstall
Details
Basic description of experiment:

The sole focus of the Fastblock feature is to restrict the loading of trackers. It monitors trackers waiting for the first byte of data since the start of navigation of the current tab’s top level document. If this is not received within 5s, the request is canceled. If any bytes are received, the 5s timer is stopped. In some of the experimental branches, a few tracker requests are whitelisted, and do not have this monitoring. These include resources known to cause breakage, such essential audio/video, and commenting platforms.

We choose a 1.75% random sample of Beta profiles using Firefox 63 on the Windows platform. See the anova power analysis for more information regarding how sample size was calculated.  In addition, these profiles will not have privacy add-ons like:
   uBlock Origin: uBlock0@raymondhill.net
   Adblock Plus: d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d                 
   Adblock: jid1-NIfFY2CA8fy1tg@jetpack
   NoScript: 73a6fe31-595d-460b-a920-fcc0f8843232
   Ghostery: firefox@ghostery.com         
   AdBlocker Ultimate: adblockultimate@adblockultimate.net
   Privacy Badger: jid1-MnnxcxisBPnSXQ@jetpack
   DuckDuckGo Privacy Essentials: 
   uMatrix: https://addons.mozilla.org/firefox/addon/umatrix/ 

This will yield a sample size of approximately 29K profiles. There will be 17 cohorts based upon three parameters: tracker blocking (TB) category, and TB list, and tracker tailing (TT). These are described in detail below.

Enrollment is for a week, and once enrolled, subjects are monitored for three weeks. The profiles will be equally split across all cohorts. For all groups, the first week after installation will be acquisition of baseline metrics, and the subsequent two weeks will be the period of TB/TT.  The TP, FastBlock , and TT preference for groups 0-16 will not be flipped on until the beginning of the second week.

The TB list used is cohort specific and will remained unchanged throughout the monitoring period. There are four Control cohorts based upon these lists for data collection purposes, despite no TB ever occurring for these groups.

During the baseline collection period, no influence on the trackers will occur. This implies they will behave identical to the 12-15 (TB=Control) groups.  The cohorts settings will be as follows:
   0-3 (TB=TP):  TP preference is False
   4-11 (TB=FB2/FB5): FastBlock preference is False
   12-15 (TB=Control): No change occurs
   16 (TT): network.http.tailing.enabled=False

At the beginning of the second week, the group settings will updated to their final, TB settings. For cohorts 0-3, the TP preference will be set to true. The FastBlock preference will be set to true for for cohorts 4-11 (TB= FB2/FB5). The TT preference will be set to true for cohort 16. No change will occur for the Control cohorts 12-15.  Monitoring with these setting will run for the following two weeks.

High level metric we are attempting to influence:

The high level metric we are attempting to influence is Firefox page loading performance with TB. This is achieved by separately enabling the tracking protection or the FastBlock preference. The former influences performance by ignoring tracker requests from the beginning of a page load. FastBlock is more conservative, by allowing trackers to attempt to load, but canceling the request of trackers whose first byte is not received within 5s. This is in contrast to no TB, where trackers can load unimpeded.

Measurements will be collected at the domain level. Before and after TB comparisons will be made to determine the level of influence of TB on page load performance.


Performance Measures:

We will collect data at the page visited level. This will enable direct comparison of the measurements for a given domain with and without TB. To ensure privacy preserving conditions are met, we assign every domain (i.e. etld+1, foo.com) a profile visits a unique id (e.g. a key in hashtable). All data regarding a domain is collected under this id. This id is local to the profile, and will have no relation to the same id for any other profile. In addition, the corresponding domain will never be communicated outside the client, including the telemetry payload.

The data will be sent with a telemetry Shield ping. 

All values in the payload correspond to the same page load. A ping will be sent for each single page payload, when the number of blockable trackers is greater than zero. Therefore, for any page where blockable trackers were not present, no payload will be sent.

Configuration

This value represent the measurement period for a given cohort. The values during the baseline measurement period are is those given in Table 1. For example, for cohort 1 during the baseline collection period, this configuration value will be set to 1. During the TB period, the value 17 is added to the cohort ID. Therefore, during the TB period, configuration = 18 for cohort 1. 

Probe Histograms

The probe histograms will be a list of integers, as opposed to the standard telemetry payload. Each value of the list corresponds to the probe value for a single page load. Note that we will not be using the existing histograms directly, but rather implementing new Web Extension experiment APIs to expose similar performance data.

Page Breakages

Four measurement types will be returned. The first is a boolean list containing if the page load was due to a reload event. The second is the response to the pop-up survey results for a page load event. These are nominals corresponding to the answer of the survey results. The third is a boolean list as to whether the user reported the page as broken. The final set of metrics deal with page loading errors, and all of integer lists of counts. 

Breakage Measures

Three measures of page breakage we will investigate are page reloading, recording of user reported breakage, and errors thrown on page loading. These will be obtained with the two modifications of the UI. These measurements will be applicable to every cohort.

Page Reload

We will capture page reloading in a similar fashion to the Page Reload Research study. Most functionality for this study will be identical to that study. A UI button will be displayed over the existing Firefox reload button. This button has two functions:
   To “listen” and record when the user reloaded the page.
   In some reload instances, which are defined below, to present a user with the following pop-up.

Both the reload and survey data will be stored in with the other probe data as described in Performance Measures.
In addition, the reload keyboard hotkey will be modified to perform the identical function as the UI button above. 

The user will be presented with a pop-up under similar conditions as those described in Page Reload Research. The general idea is that on a page reload a pop-up will ask the user whether the page is broken.  The probability of this pop-up occurring is random, with the chance increasing each time the same domain is reloaded. This probability reaches 100% on the 6th reload. The total number of pop-ups a user can receive during a session is limited to three and the # of pop-ups for a specific domain is limited to two. Unlike the aforementioned study, the pop-up will contain a single yes/no choice regarding page breakage. 

User Reported Breakage

This study will provide a way for  a users to report page breakage most likely through a notification bar or a button which asks the user whether or not a given page is broken. This reporting mechanism returns a boolean value. Our study will record the number of times the users selects the affirmative.
Page Load Errors 
Two additional breakage metrics will collected that focus on page load errors. The first is the count of the number of unique script URLs for which an error occurred upon loading.  The second is the collection of the number of each error type thrown for the page load. These include:
   EvalError
   InternalError
   RangeError
   ReferenceError
   SyntaxError
   TypeError
   URIError
   SecurityError
   Javascript execution exceptions

All of these exceptions will be counted once the page load has completed.

This isn't a pref flip study.

Branches of the study the values each branch should be set to:

There will be 17 cohorts based upon the three control variables of TB category, TB list, and TT. TB Categories are defined as follows:
TP: The TP preference is activated
FastBlock not activated
FB2: FastBlock preference is activated with timer = 2s
TP is not activated
FB5: FastBlock preference is activated with timer = 5s
TP is not activated
Control: FastBlock not activated. TP is not activated.
For every cohort above the preference network.http.tailing.enabled=False.

The TB list has four values. The lists L1, L2, and L3 are hosted here.
   L0: Standard list used in TP
   L1: Standard TP list minus whitelists in the Ghostery add-on
   L2: Standard TP list minus whitelist based on exception and shim rules
   L3: Standard TP list minus any breakage we have had reported on Bugzilla related to tracking protection.

The TB lists are varied between the four TB categories yielding 16 cohorts. 

The final cohort only uses the single default list (L0):
   TT: network.http.tailing.enabled=True
   L0 is the single list used

Focusing on the Test groups (TB≠Control), we can use a profiles baseline and page specific histograms as a “control” for their measurement phase. This will enable comparison at the domain level of the influences of TP and FastBlock on performance and page breakages. Storing the performance measure results of repeated visits helps to reduce noise at the page level.

However, this data collection scheme introduces a bias for highly visited pages. With TB=Control branches, we can compare the histograms for cases where TB should occur with Test. This will help to mitigate the effects of this high-frequency page bias. 

Percentage of users in each branch:
Equal sampling for each of the 17 branches. 5.88% for each cohort.

Channels and locales to ship to:
Beta channel, english locale, version 63, countries: all

Intended go live date study length:
The go live date is after September 4th, the Beta 63 release date, with one week for enrollment and two weeks of observation per profile. Therefore, if a profile is enrolled today, they have one week for baseline measurement and then two weeks of active measurement.

Specific criteria for participants:
Participants must be Windows users running Firefox Beta 63. In addition profiles should not be running ad-blockers.

Main effect we're looking for and data we'll use to make these decisions:
The main effect we are observing is how TB in general alters the performance of Firefox from an engineering perspective (e.g., metrics). These are described above.

The other main effect is to observe the influence of TB on both user and engineering perspective of page breakage. This will be observed in the results of the “page breakage” UI button and the reload pop-up. 

In particular, we are looking for a reduction in overall page load time, primarily measured on TIME_TO_LOAD_EVENT_START_MS, of at least 10% relative to control. For the highest performant branch total page breakage calculated from user reported breakage and the reload pop-up should be no higher than 10% above the corresponding control branch. Ideally, we will also see improvements on earlier events like TIME_TO_DOM_CONTENT_LOADED_START_MS for the most successful branch.

Owner of the data analysis for this study:
Corey Dow-Hygelund and Saptarshi Guha

Will this experiment require uplift?
No

QA Status of your code
PI Request has been submitted

Do you plan on surveying users at the end of the study?
No.

Link to PHD doc for additional detail: https://docs.google.com/document/d/1LQdOFIZeoiD38NNMxvpm32bX6Urnv0KbEsGfgAuw5L8/edit#
Blocks: FastBlock
Depends on: 1481252
User Story: (updated)
[Tracking Requested - why for this release]:
Summary: [Shield] FastBlock Study: Comparison of Tracker Blocking → [Shield] FastBlock Study: Comparison of Tracker Blocking, beta 63
Peer Review Assignment
Flags: needinfo?(nhnt11)
Hey Dave, can you look at or assign a peer to review this study? Thanks.
Flags: needinfo?(nhnt11) → needinfo?(dtownsend)
Looks like Nihanth was already assigned here.
Flags: needinfo?(dtownsend) → needinfo?(nhnt11)
Chris, if you are unable to do this data review next week, let me know and I'll find someone else who can.

This is for an opt-out Shield study that will launch on 10 Sep. I believe that all of the data is category 1 or 2.

Everything is documented publicly on the product hypothesis document: https://docs.google.com/document/d/1LQdOFIZeoiD38NNMxvpm32bX6Urnv0KbEsGfgAuw5L8/edit#heading=h.mn1ovnang5jv
Attachment #9005756 - Flags: review?(chutten)
Attached file GitHub Pull Request
Flags: needinfo?(nhnt11)
Attachment #9005857 - Flags: review?(nhnt11)
Comment on attachment 9005756 [details]
fastblock-data-review-request.txt

DATA COLLECTION REVIEW RESPONSE:

    Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes, see TELEMETRY.md

    Is there a control mechanism that allows the user to turn the data collection on and off?

Yes, standard Shield Study mechanisms apply.

    If the request is for permanent data collection, is there someone who will monitor the data over time?

N/A timeboxed study for two weeks.

    Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

I'm pretty sure hashed ETLD+1 counts as Category 3, even with a per-participant salt. It is equivalent to a monotonically-increasing id per etld+1 unique per profile which would still classify it as Cat3 as you're capturing data per etld+1.

Ultimately, since this study is on a pre-release population with a well-understood opt-out Cat2/3 doesn't affect the review result, but I think it's important to call out that we are collecting data per etld+1 and that is stronger than just user interaction with the browser.

    Is the data collection request for default-on or default-off?

Default-on for study participants.

    Does the instrumentation include the addition of any new identifiers (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)?

Yes: the aforementioned etld+1 hash. It is designed to not be linkable to anything outside of this study.

    Is the data collection covered by the existing Firefox privacy notice? 

Yes.

    Does there need to be a check-in in the future to determine whether to renew the data?

No, Shield cleans up after itself and renewal involves starting a new study.

---
Result: datareview+
Attachment #9005756 - Flags: review?(chutten) → review+
Attached file mozilla_fastblock_user_study-1.0.0.zip (obsolete) —
attached zip file for signing please
Flags: needinfo?(mcooper)
Flags: needinfo?(mcooper)
Comment on attachment 9005857 [details] [review]
GitHub Pull Request

Review of attachment 9005857 [details] [review]:
-----------------------------------------------------------------

r+ for the Shield study from a code-sanity point of view. I did not look too hard at the overall repository; I focused on the code that's shipping to users. Please let me know if there's further code that's pushed to the repo that you'd like me to look at.
Attachment #9005857 - Attachment is patch: true
Attachment #9005857 - Attachment mime type: text/x-github-pull-request → text/plain
Attachment #9005857 - Flags: review?(nhnt11) → review+
Comment on attachment 9005857 [details] [review]
GitHub Pull Request

I set r+ from the Splinter review UI, I think it automatically changed the mime-type. Reverting.
Attachment #9005857 - Attachment is patch: false
Attachment #9005857 - Attachment mime type: text/plain → text/x-github-pull-request
UX Review Request
Flags: needinfo?(bbell)
(In reply to Tony Cinotto [:tcinotto] (UTC-8) from comment #15)
> UX Review Request

UX looks good.
Flags: needinfo?(bbell)
Attached file mozilla_fastblock_user_study-1.0.1.zip (obsolete) —
zip file for signing please
Flags: needinfo?(mcooper)
Flags: needinfo?(mcooper)
Hi Nihanth, We've added an additional branch to the experiment, could you kindly re-review and provide a peer review sign-off prior to our next Fastblock check in? (September 13th). Thank you.
Flags: needinfo?(nhnt11)
Please note we are now aiming to launch this shield study on September 17th. Thank you.
Attached file mozilla_fastblock_user_study-1.0.2.zip (obsolete) —
Can we get this one signed as well? :)

Thanks!
Flags: needinfo?(mcooper)
Attached file GitHub Pull Request v2
Nihanth, this is the PR with the diff between the code you last reviewed and current master, it's mostly configuration changes.

(We need a procedure for when the code in the shield study was reviewed by a peer on GitHub already)
Assignee: nobody → jhofmann
Attachment #9008086 - Flags: review?(nhnt11)
Flags: needinfo?(mcooper)
Comment on attachment 9008086 [details] [review]
GitHub Pull Request v2

Review of attachment 9008086 [details] [review]:
-----------------------------------------------------------------

Looks ok, thanks. To be clear I'm not really looking at the config stuff, just the actual extension source code.
Attachment #9008086 - Attachment is patch: true
Attachment #9008086 - Attachment mime type: text/x-github-pull-request → text/plain
Attachment #9008086 - Flags: review?(nhnt11) → review+
Attachment #9008086 - Attachment is patch: false
Attachment #9008086 - Attachment mime type: text/plain → text/x-github-pull-request
Flags: needinfo?(nhnt11)
Attached file mozilla_fastblock_user_study-1.0.3.zip (obsolete) —
We would need another signature :)

Thanks!
Flags: needinfo?(mcooper)
Flags: needinfo?(mcooper)
For signing please
Flags: needinfo?(mcooper)
Flags: needinfo?(mcooper)
Hi Nihanth, Could we trouble you for a new peer review as we have a new signed build? Thank you for your help!
Flags: needinfo?(nhnt11)
Attached file mozilla_fastblock_user_study-1.0.4.zip (obsolete) —
Ah, sorry, that was the wrong study. Here's the FastBlock study.

Thanks!
Flags: needinfo?(mcooper)
Comment on attachment 9009073 [details]
mozilla_fastblock_user_study-1.0.4.zip

Making another minor change...
Attachment #9009073 - Attachment is obsolete: true
Flags: needinfo?(mcooper)
Science review: R+
Attached file mozilla_fastblock_user_study-1.0.5.zip (obsolete) —
final zip for signing please
Attachment #9006933 - Attachment is obsolete: true
Attachment #9006950 - Attachment is obsolete: true
Attachment #9007797 - Attachment is obsolete: true
Attachment #9007820 - Attachment is obsolete: true
Attachment #9008084 - Attachment is obsolete: true
Attachment #9008090 - Attachment is obsolete: true
Attachment #9008513 - Attachment is obsolete: true
Attachment #9008516 - Attachment is obsolete: true
Attachment #9008880 - Attachment is obsolete: true
Attachment #9008881 - Attachment is obsolete: true
Flags: needinfo?(mcooper)
Flags: needinfo?(mcooper)
Attached file mozilla_fastblock_user_study-1.0.6.zip (obsolete) —
Another study for signing, please...
Attachment #9009258 - Attachment is obsolete: true
Flags: needinfo?(mcooper)
Flags: needinfo?(mcooper)
Attachment #9009654 - Attachment is obsolete: true
Attachment #9009261 - Attachment is obsolete: true
Attachment #9009599 - Attachment is obsolete: true
FastBlock
Targeted: Firefox Beta 63

We have finished testing the FastBlock experiment. All the reported issues were fixed and verified.

QA’s recommendation: YELLOW - SHIP IT, CONDITIONALLY

Reasoning:
    The shield study is looking ok overall and the lists are acting better after the latest fixes, however, the QA has one major concern.
    Although branches seem to block the trackers accordingly to the lists they are connected, we were only able to verify randomly a small number of them. This results in us not being 100% sure if all trackers will be blocked accordingly.

Testing Summary:
- Full Functional test suite: TestRail (https://goo.gl/aT9fjX)

Tested Platforms:
- Windows 7 x64
- Windows 10 x64

Tested Firefox versions:
- Firefox Beta 63.0b4
- Firefox Beta 63.0b5
- Firefox Beta 63.0b6
- Firefox Beta Unbranded 63.0b6
Since Nihanth was a bit confused about what the latest changes are for review, let me clarify: I've kept the branch open that shows the changes between the original review and the latest master here:

https://github.com/mozilla/FastBlockShield/pull/121

Sorry for not mentioning that earlier!

(Note that since these changes were reviewed by me and they also went into Cookie Restrictions, I think we're generally in a good state).

Thanks!
Based on the multiple rounds of testing that we've done and the fact that we've run much more aggressive default-on tracking protection studies in the past, from a product team perspective, I'm comfortable with the level of risk to ship the shield study to beta.

Note: Francois is doing an extra validation of the lists (since he has greater familiarity with them), so we'd like him to give his okay as well before we move forward. (NI'ing him here)
Flags: needinfo?(francois)
Francois signed off on email. With that I can sign off too, on Pascal's (63 release owner) behalf.
Flags: needinfo?(francois)
From François' email: 
Francois Marier
	
1:46 PM (1 hour ago)
	
to Ritu, Marnie, Peter, Carmen, Johann, Erica, Tony, Ehsan, ettseng, Tania, Wennie, Matthew, Gregg, release-signoff, experiments-qa

I have completed the extra validation I wanted to do on version 1.0.6 of
the Shield extension. This is good to go from my point of view.

In terms of testing every tracker, that's not something we are planning
to do as it would be a very large amount of work and likely wouldn't
really tell us more than the random sampling that was done by SV. We
have been comfortable shipping tracking protection without that level of
verification in the past. I believe we can do the same here.

Since FastBlock is a performance feature, as opposed to a privacy
feature, trackers getting through is not a major concern. If we were to
find a tracker on the list that's not blocked as it should, then all
that would mean is that the feature wouldn't give us as much as a
performance improvement as it could.

Francois
I reviewed the changes that Johann linked, consider the previous r+ carried-forward.
Flags: needinfo?(nhnt11)
Great, thanks!
We're live here :)
Assignee: jhofmann → nobody
Marking the 63 status as fixed as it went live to beta 3 weeks ago.
Recipe disabled - (Delivery console #584).
Whiteboard: [shield-ended]
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: