Closed
Bug 1475424
Opened 5 years ago
Closed 4 years ago
[Shield] FastBlock Study: Comparison of Tracker Blocking, beta 63
Categories
(Shield :: Shield Study, defect)
Shield
Shield Study
Tracking
(firefox63+ fixed)
RESOLVED
FIXED
People
(Reporter: julie, Unassigned)
References
Details
(Whiteboard: [shield-ended])
User Story
Attachments
(4 files, 15 obsolete files)
Basic description of experiment: The sole focus of the Fastblock feature is to restrict the loading of trackers. It monitors trackers waiting for the first byte of data since the start of navigation of the current tab’s top level document. If this is not received within 5s, the request is canceled. If any bytes are received, the 5s timer is stopped. In some of the experimental branches, a few tracker requests are whitelisted, and do not have this monitoring. These include resources known to cause breakage, such essential audio/video, and commenting platforms. We choose a 1.75% random sample of Beta profiles using Firefox 63 on the Windows platform. See the anova power analysis for more information regarding how sample size was calculated. In addition, these profiles will not have privacy add-ons like: uBlock Origin: uBlock0@raymondhill.net Adblock Plus: d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d Adblock: jid1-NIfFY2CA8fy1tg@jetpack NoScript: 73a6fe31-595d-460b-a920-fcc0f8843232 Ghostery: firefox@ghostery.com AdBlocker Ultimate: adblockultimate@adblockultimate.net Privacy Badger: jid1-MnnxcxisBPnSXQ@jetpack DuckDuckGo Privacy Essentials: uMatrix: https://addons.mozilla.org/firefox/addon/umatrix/ This will yield a sample size of approximately 29K profiles. There will be 17 cohorts based upon three parameters: tracker blocking (TB) category, and TB list, and tracker tailing (TT). These are described in detail below. Enrollment is for a week, and once enrolled, subjects are monitored for three weeks. The profiles will be equally split across all cohorts. For all groups, the first week after installation will be acquisition of baseline metrics, and the subsequent two weeks will be the period of TB/TT. The TP, FastBlock , and TT preference for groups 0-16 will not be flipped on until the beginning of the second week. The TB list used is cohort specific and will remained unchanged throughout the monitoring period. There are four Control cohorts based upon these lists for data collection purposes, despite no TB ever occurring for these groups. During the baseline collection period, no influence on the trackers will occur. This implies they will behave identical to the 12-15 (TB=Control) groups. The cohorts settings will be as follows: 0-3 (TB=TP): TP preference is False 4-11 (TB=FB2/FB5): FastBlock preference is False 12-15 (TB=Control): No change occurs 16 (TT): network.http.tailing.enabled=False At the beginning of the second week, the group settings will updated to their final, TB settings. For cohorts 0-3, the TP preference will be set to true. The FastBlock preference will be set to true for for cohorts 4-11 (TB= FB2/FB5). The TT preference will be set to true for cohort 16. No change will occur for the Control cohorts 12-15. Monitoring with these setting will run for the following two weeks. High level metric we are attempting to influence: The high level metric we are attempting to influence is Firefox page loading performance with TB. This is achieved by separately enabling the tracking protection or the FastBlock preference. The former influences performance by ignoring tracker requests from the beginning of a page load. FastBlock is more conservative, by allowing trackers to attempt to load, but canceling the request of trackers whose first byte is not received within 5s. This is in contrast to no TB, where trackers can load unimpeded. Measurements will be collected at the domain level. Before and after TB comparisons will be made to determine the level of influence of TB on page load performance. Performance Measures: We will collect data at the page visited level. This will enable direct comparison of the measurements for a given domain with and without TB. To ensure privacy preserving conditions are met, we assign every domain (i.e. etld+1, foo.com) a profile visits a unique id (e.g. a key in hashtable). All data regarding a domain is collected under this id. This id is local to the profile, and will have no relation to the same id for any other profile. In addition, the corresponding domain will never be communicated outside the client, including the telemetry payload. The data will be sent with a telemetry Shield ping. All values in the payload correspond to the same page load. A ping will be sent for each single page payload, when the number of blockable trackers is greater than zero. Therefore, for any page where blockable trackers were not present, no payload will be sent. Configuration This value represent the measurement period for a given cohort. The values during the baseline measurement period are is those given in Table 1. For example, for cohort 1 during the baseline collection period, this configuration value will be set to 1. During the TB period, the value 17 is added to the cohort ID. Therefore, during the TB period, configuration = 18 for cohort 1. Probe Histograms The probe histograms will be a list of integers, as opposed to the standard telemetry payload. Each value of the list corresponds to the probe value for a single page load. Note that we will not be using the existing histograms directly, but rather implementing new Web Extension experiment APIs to expose similar performance data. Page Breakages Four measurement types will be returned. The first is a boolean list containing if the page load was due to a reload event. The second is the response to the pop-up survey results for a page load event. These are nominals corresponding to the answer of the survey results. The third is a boolean list as to whether the user reported the page as broken. The final set of metrics deal with page loading errors, and all of integer lists of counts. Breakage Measures Three measures of page breakage we will investigate are page reloading, recording of user reported breakage, and errors thrown on page loading. These will be obtained with the two modifications of the UI. These measurements will be applicable to every cohort. Page Reload We will capture page reloading in a similar fashion to the Page Reload Research study. Most functionality for this study will be identical to that study. A UI button will be displayed over the existing Firefox reload button. This button has two functions: To “listen” and record when the user reloaded the page. In some reload instances, which are defined below, to present a user with the following pop-up. Both the reload and survey data will be stored in with the other probe data as described in Performance Measures. In addition, the reload keyboard hotkey will be modified to perform the identical function as the UI button above. The user will be presented with a pop-up under similar conditions as those described in Page Reload Research. The general idea is that on a page reload a pop-up will ask the user whether the page is broken. The probability of this pop-up occurring is random, with the chance increasing each time the same domain is reloaded. This probability reaches 100% on the 6th reload. The total number of pop-ups a user can receive during a session is limited to three and the # of pop-ups for a specific domain is limited to two. Unlike the aforementioned study, the pop-up will contain a single yes/no choice regarding page breakage. User Reported Breakage This study will provide a way for a users to report page breakage most likely through a notification bar or a button which asks the user whether or not a given page is broken. This reporting mechanism returns a boolean value. Our study will record the number of times the users selects the affirmative. Page Load Errors Two additional breakage metrics will collected that focus on page load errors. The first is the count of the number of unique script URLs for which an error occurred upon loading. The second is the collection of the number of each error type thrown for the page load. These include: EvalError InternalError RangeError ReferenceError SyntaxError TypeError URIError SecurityError Javascript execution exceptions All of these exceptions will be counted once the page load has completed. This isn't a pref flip study. Branches of the study the values each branch should be set to: There will be 17 cohorts based upon the three control variables of TB category, TB list, and TT. TB Categories are defined as follows: TP: The TP preference is activated FastBlock not activated FB2: FastBlock preference is activated with timer = 2s TP is not activated FB5: FastBlock preference is activated with timer = 5s TP is not activated Control: FastBlock not activated. TP is not activated. For every cohort above the preference network.http.tailing.enabled=False. The TB list has four values. The lists L1, L2, and L3 are hosted here. L0: Standard list used in TP L1: Standard TP list minus whitelists in the Ghostery add-on L2: Standard TP list minus whitelist based on exception and shim rules L3: Standard TP list minus any breakage we have had reported on Bugzilla related to tracking protection. The TB lists are varied between the four TB categories yielding 16 cohorts. The final cohort only uses the single default list (L0): TT: network.http.tailing.enabled=True L0 is the single list used Focusing on the Test groups (TB≠Control), we can use a profiles baseline and page specific histograms as a “control” for their measurement phase. This will enable comparison at the domain level of the influences of TP and FastBlock on performance and page breakages. Storing the performance measure results of repeated visits helps to reduce noise at the page level. However, this data collection scheme introduces a bias for highly visited pages. With TB=Control branches, we can compare the histograms for cases where TB should occur with Test. This will help to mitigate the effects of this high-frequency page bias. Percentage of users in each branch: Equal sampling for each of the 17 branches. 5.88% for each cohort. Channels and locales to ship to: Beta channel, english locale, version 63, countries: all Intended go live date study length: The go live date is after September 4th, the Beta 63 release date, with one week for enrollment and two weeks of observation per profile. Therefore, if a profile is enrolled today, they have one week for baseline measurement and then two weeks of active measurement. Specific criteria for participants: Participants must be Windows users running Firefox Beta 63. In addition profiles should not be running ad-blockers. Main effect we're looking for and data we'll use to make these decisions: The main effect we are observing is how TB in general alters the performance of Firefox from an engineering perspective (e.g., metrics). These are described above. The other main effect is to observe the influence of TB on both user and engineering perspective of page breakage. This will be observed in the results of the “page breakage” UI button and the reload pop-up. In particular, we are looking for a reduction in overall page load time, primarily measured on TIME_TO_LOAD_EVENT_START_MS, of at least 10% relative to control. For the highest performant branch total page breakage calculated from user reported breakage and the reload pop-up should be no higher than 10% above the corresponding control branch. Ideally, we will also see improvements on earlier events like TIME_TO_DOM_CONTENT_LOADED_START_MS for the most successful branch. Owner of the data analysis for this study: Corey Dow-Hygelund and Saptarshi Guha Will this experiment require uplift? No QA Status of your code PI Request has been submitted Do you plan on surveying users at the end of the study? No. Link to PHD doc for additional detail: https://docs.google.com/document/d/1LQdOFIZeoiD38NNMxvpm32bX6Urnv0KbEsGfgAuw5L8/edit#
Reporter | ||
Comment 1•5 years ago
|
||
FastBlock PHD: https://docs.google.com/document/d/1LQdOFIZeoiD38NNMxvpm32bX6Urnv0KbEsGfgAuw5L8/edit
Comment 2•5 years ago
|
||
Repo is here: https://github.com/mozilla/FastBlockShield
status-firefox63:
--- → affected
Updated•5 years ago
|
User Story: (updated)
Comment 3•5 years ago
|
||
[Tracking Requested - why for this release]:
tracking-firefox63:
--- → ?
Summary: [Shield] FastBlock Study: Comparison of Tracker Blocking → [Shield] FastBlock Study: Comparison of Tracker Blocking, beta 63
Planned study for 63, tracked.
Comment 6•5 years ago
|
||
Hey Dave, can you look at or assign a peer to review this study? Thanks.
Flags: needinfo?(nhnt11) → needinfo?(dtownsend)
Comment 7•5 years ago
|
||
Looks like Nihanth was already assigned here.
Flags: needinfo?(dtownsend) → needinfo?(nhnt11)
Comment 8•5 years ago
|
||
Chris, if you are unable to do this data review next week, let me know and I'll find someone else who can. This is for an opt-out Shield study that will launch on 10 Sep. I believe that all of the data is category 1 or 2. Everything is documented publicly on the product hypothesis document: https://docs.google.com/document/d/1LQdOFIZeoiD38NNMxvpm32bX6Urnv0KbEsGfgAuw5L8/edit#heading=h.mn1ovnang5jv
Attachment #9005756 -
Flags: review?(chutten)
Comment 9•5 years ago
|
||
Flags: needinfo?(nhnt11)
Attachment #9005857 -
Flags: review?(nhnt11)
Comment 10•5 years ago
|
||
Comment on attachment 9005756 [details]
fastblock-data-review-request.txt
DATA COLLECTION REVIEW RESPONSE:
Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?
Yes, see TELEMETRY.md
Is there a control mechanism that allows the user to turn the data collection on and off?
Yes, standard Shield Study mechanisms apply.
If the request is for permanent data collection, is there someone who will monitor the data over time?
N/A timeboxed study for two weeks.
Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?
I'm pretty sure hashed ETLD+1 counts as Category 3, even with a per-participant salt. It is equivalent to a monotonically-increasing id per etld+1 unique per profile which would still classify it as Cat3 as you're capturing data per etld+1.
Ultimately, since this study is on a pre-release population with a well-understood opt-out Cat2/3 doesn't affect the review result, but I think it's important to call out that we are collecting data per etld+1 and that is stronger than just user interaction with the browser.
Is the data collection request for default-on or default-off?
Default-on for study participants.
Does the instrumentation include the addition of any new identifiers (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)?
Yes: the aforementioned etld+1 hash. It is designed to not be linkable to anything outside of this study.
Is the data collection covered by the existing Firefox privacy notice?
Yes.
Does there need to be a check-in in the future to determine whether to renew the data?
No, Shield cleans up after itself and renewal involves starting a new study.
---
Result: datareview+
Attachment #9005756 -
Flags: review?(chutten) → review+
Comment 12•5 years ago
|
||
Updated•5 years ago
|
Flags: needinfo?(mcooper)
Comment 13•5 years ago
|
||
Comment on attachment 9005857 [details] [review] GitHub Pull Request Review of attachment 9005857 [details] [review]: ----------------------------------------------------------------- r+ for the Shield study from a code-sanity point of view. I did not look too hard at the overall repository; I focused on the code that's shipping to users. Please let me know if there's further code that's pushed to the repo that you'd like me to look at.
Attachment #9005857 -
Attachment is patch: true
Attachment #9005857 -
Attachment mime type: text/x-github-pull-request → text/plain
Attachment #9005857 -
Flags: review?(nhnt11) → review+
Comment 14•5 years ago
|
||
Comment on attachment 9005857 [details] [review] GitHub Pull Request I set r+ from the Splinter review UI, I think it automatically changed the mime-type. Reverting.
Attachment #9005857 -
Attachment is patch: false
Attachment #9005857 -
Attachment mime type: text/plain → text/x-github-pull-request
Comment 16•5 years ago
|
||
(In reply to Tony Cinotto [:tcinotto] (UTC-8) from comment #15) > UX Review Request UX looks good.
Flags: needinfo?(bbell)
Comment 18•5 years ago
|
||
Updated•5 years ago
|
Flags: needinfo?(mcooper)
Comment 19•5 years ago
|
||
Hi Nihanth, We've added an additional branch to the experiment, could you kindly re-review and provide a peer review sign-off prior to our next Fastblock check in? (September 13th). Thank you.
Flags: needinfo?(nhnt11)
Comment 20•5 years ago
|
||
Please note we are now aiming to launch this shield study on September 17th. Thank you.
Comment 22•5 years ago
|
||
Nihanth, this is the PR with the diff between the code you last reviewed and current master, it's mostly configuration changes. (We need a procedure for when the code in the shield study was reviewed by a peer on GitHub already)
Assignee: nobody → jhofmann
Attachment #9008086 -
Flags: review?(nhnt11)
Comment 23•5 years ago
|
||
Updated•5 years ago
|
Flags: needinfo?(mcooper)
Comment 24•5 years ago
|
||
Comment on attachment 9008086 [details] [review] GitHub Pull Request v2 Review of attachment 9008086 [details] [review]: ----------------------------------------------------------------- Looks ok, thanks. To be clear I'm not really looking at the config stuff, just the actual extension source code.
Attachment #9008086 -
Attachment is patch: true
Attachment #9008086 -
Attachment mime type: text/x-github-pull-request → text/plain
Attachment #9008086 -
Flags: review?(nhnt11) → review+
Updated•5 years ago
|
Attachment #9008086 -
Attachment is patch: false
Attachment #9008086 -
Attachment mime type: text/plain → text/x-github-pull-request
Flags: needinfo?(nhnt11)
Comment 26•5 years ago
|
||
Updated•5 years ago
|
Flags: needinfo?(mcooper)
Comment 28•5 years ago
|
||
Updated•5 years ago
|
Flags: needinfo?(mcooper)
Comment 29•5 years ago
|
||
Hi Nihanth, Could we trouble you for a new peer review as we have a new signed build? Thank you for your help!
Flags: needinfo?(nhnt11)
Comment 30•5 years ago
|
||
Ah, sorry, that was the wrong study. Here's the FastBlock study. Thanks!
Flags: needinfo?(mcooper)
Comment 31•5 years ago
|
||
Comment on attachment 9009073 [details]
mozilla_fastblock_user_study-1.0.4.zip
Making another minor change...
Attachment #9009073 -
Attachment is obsolete: true
Updated•5 years ago
|
Flags: needinfo?(mcooper)
Comment 32•5 years ago
|
||
Science review: R+
Comment 33•5 years ago
|
||
final zip for signing please
Attachment #9006933 -
Attachment is obsolete: true
Attachment #9006950 -
Attachment is obsolete: true
Attachment #9007797 -
Attachment is obsolete: true
Attachment #9007820 -
Attachment is obsolete: true
Attachment #9008084 -
Attachment is obsolete: true
Attachment #9008090 -
Attachment is obsolete: true
Attachment #9008513 -
Attachment is obsolete: true
Attachment #9008516 -
Attachment is obsolete: true
Attachment #9008880 -
Attachment is obsolete: true
Attachment #9008881 -
Attachment is obsolete: true
Flags: needinfo?(mcooper)
Comment 34•5 years ago
|
||
Updated•5 years ago
|
Flags: needinfo?(mcooper)
Comment 35•5 years ago
|
||
Another study for signing, please...
Attachment #9009258 -
Attachment is obsolete: true
Flags: needinfo?(mcooper)
Comment 36•5 years ago
|
||
Updated•5 years ago
|
Flags: needinfo?(mcooper)
Comment 37•5 years ago
|
||
Updated•5 years ago
|
Attachment #9009654 -
Attachment is obsolete: true
Updated•5 years ago
|
Attachment #9009261 -
Attachment is obsolete: true
Updated•5 years ago
|
Attachment #9009599 -
Attachment is obsolete: true
Comment 38•5 years ago
|
||
FastBlock Targeted: Firefox Beta 63 We have finished testing the FastBlock experiment. All the reported issues were fixed and verified. QA’s recommendation: YELLOW - SHIP IT, CONDITIONALLY Reasoning: The shield study is looking ok overall and the lists are acting better after the latest fixes, however, the QA has one major concern. Although branches seem to block the trackers accordingly to the lists they are connected, we were only able to verify randomly a small number of them. This results in us not being 100% sure if all trackers will be blocked accordingly. Testing Summary: - Full Functional test suite: TestRail (https://goo.gl/aT9fjX) Tested Platforms: - Windows 7 x64 - Windows 10 x64 Tested Firefox versions: - Firefox Beta 63.0b4 - Firefox Beta 63.0b5 - Firefox Beta 63.0b6 - Firefox Beta Unbranded 63.0b6
Comment 39•5 years ago
|
||
Since Nihanth was a bit confused about what the latest changes are for review, let me clarify: I've kept the branch open that shows the changes between the original review and the latest master here: https://github.com/mozilla/FastBlockShield/pull/121 Sorry for not mentioning that earlier! (Note that since these changes were reviewed by me and they also went into Cookie Restrictions, I think we're generally in a good state). Thanks!
Comment 40•5 years ago
|
||
Based on the multiple rounds of testing that we've done and the fact that we've run much more aggressive default-on tracking protection studies in the past, from a product team perspective, I'm comfortable with the level of risk to ship the shield study to beta. Note: Francois is doing an extra validation of the lists (since he has greater familiarity with them), so we'd like him to give his okay as well before we move forward. (NI'ing him here)
Flags: needinfo?(francois)
Francois signed off on email. With that I can sign off too, on Pascal's (63 release owner) behalf.
Updated•5 years ago
|
Flags: needinfo?(francois)
Comment 42•5 years ago
|
||
From François' email: Francois Marier 1:46 PM (1 hour ago) to Ritu, Marnie, Peter, Carmen, Johann, Erica, Tony, Ehsan, ettseng, Tania, Wennie, Matthew, Gregg, release-signoff, experiments-qa I have completed the extra validation I wanted to do on version 1.0.6 of the Shield extension. This is good to go from my point of view. In terms of testing every tracker, that's not something we are planning to do as it would be a very large amount of work and likely wouldn't really tell us more than the random sampling that was done by SV. We have been comfortable shipping tracking protection without that level of verification in the past. I believe we can do the same here. Since FastBlock is a performance feature, as opposed to a privacy feature, trackers getting through is not a major concern. If we were to find a tracker on the list that's not blocked as it should, then all that would mean is that the feature wouldn't give us as much as a performance improvement as it could. Francois
Comment 43•5 years ago
|
||
I reviewed the changes that Johann linked, consider the previous r+ carried-forward.
Flags: needinfo?(nhnt11)
Comment 44•5 years ago
|
||
Great, thanks!
Comment 45•5 years ago
|
||
We're live here :)
Updated•5 years ago
|
Assignee: jhofmann → nobody
Comment 46•5 years ago
|
||
Marking the 63 status as fixed as it went live to beta 3 weeks ago.
Recipe disabled - (Delivery console #584).
Whiteboard: [shield-ended]
Updated•4 years ago
|
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•