Closed Bug 1532217 Opened 6 years ago Closed 6 years ago

[Shield] Add-On Study: Federated Learning v2

Categories

(Shield :: Shield Study, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: experimenter, Assigned: isegall)

Details

Attachments

(3 files, 7 obsolete files)

Federated Learning v2

We seek to replicate the federated learning study performed last year (PHD: https://docs.google.com/document/d/1DuZ1nQ2ve7k-98BKUKCZegtVzAxnJBEgYJYv6IUqZck/edit?ts=5b0885a3) with an updated architecture and additional probes.

More information: https://experimenter.services.mozilla.com/experiments/federated-learning-v2/

Attached v2.1.1 for signing

Flags: needinfo?(mcooper)

Attached v2.1.2 for signing

Attachment #9048112 - Attachment is obsolete: true

Attached 2.1.3 for signing

Attachment #9048117 - Attachment is obsolete: true
Flags: needinfo?(mcooper)

The add-on (version 2.1.3) is now ready for signing.

Flags: needinfo?(mcooper)
Flags: needinfo?(mcooper)

Hi Ilana, can you please submit the following filled in request.md as an attachment for review?

The TELEMETRY.md itself is fantastic, and I don't see anything concerning so this should be a direct approval after the request is attached for review.

Flags: needinfo?(rrayborn) → needinfo?(isegall)

Thanks Rob - I'm going to have :fwollsen review my draft of the data-review before submitting it.

Flags: needinfo?(isegall) → needinfo?(vng)
Attached file data_review.md
Flags: needinfo?(vng)
Attachment #9049005 - Flags: data-review?
Comment on attachment 9049005 [details] data_review.md General Notes: Great documentation! 1) Is there or will there be **documentation** that describes the schema for the ultimate data set available publicly, complete and accurate? Yes, https://github.com/mozilla/federated-learning-v2-study-addon/blob/master/docs/TELEMETRY.md 2) Is there a control mechanism that allows the user to turn the data collection on and off? Yes, Telemetry settings 3) If the request is for permanent data collection, is there someone who will monitor the data over time?** Non-permanent, but there's already a processing pipeline 4) Using the category system of data types (https://wiki.mozilla.org/Firefox/Data_Collection), what collection type of data do the requested measurements fall under? Category 2, the pings do a great job of respecting lean data practices 5) Is the data collection request for default-on or default-off? On, Telemetry defaults 6) Does the instrumentation include the addition of any *new* identifier* (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)? No 7) Is the data collection covered by the existing Firefox privacy notice? Yes, Telemetry policies 8) Does there need to be a check-in in the future to determine whether to renew the data? (Yes/No) (If yes, set a todo reminder or file a bug if appropriate)** No, but likely to iterate
Attachment #9049005 - Flags: data-review? → data-review+

There appears to be some issue with the signing of the add-on. Attempting to install federated-learning-v2@shield.mozilla.org-2.1.3-signed-testing.xpi fails and the browser log reports Add-on federated-learning-v2@shield.mozilla.org is not correctly signed. Using about:debugging, it installs but the add-on runtime fails with TypeError: browser.study is undefined. This was tested on beta and dev edition.

Flags: needinfo?(mcooper)

This is expected, since this is a testing signature. In order to test these add-ons, you must enable the dev-root for add-on signature verification. To do that, you must use a Nightly or un-signed build. Instructions for how to that are maintained by QA here.

Flags: needinfo?(mcooper)

(In reply to Michael Cooper [:mythmon] from comment #13)

This is expected, since this is a testing signature. In order to test these add-ons, you must enable the dev-root for add-on signature verification. To do that, you must use a Nightly or un-signed build. Instructions for how to that are maintained by QA here.

Thanks mythmon, that clarifies it. I was afraid that the testing of "Not showing in about:addons" was not possible outside of the branded builds, but it turns it that it is perfectly testable in the Unbranded builds.

Attached v2.1.4 for signing

Flags: needinfo?(mcooper)
Flags: needinfo?(mcooper)

Add-On Study: Federated Learning v2
Targeted: Firefox Release 66.0

We have finished testing the Add-On Study: Federated Learning v2 experiment.

QA’s recommendation: GREEN - SHIP IT

Reasoning:

  • All the logged issues were either fixed or labeled as won't fix by the development team. None of the won't fix issues affect the users since most of them are telemetry data related.
  • Because of the high amount of test cases and the complexity of the telemetry probes, we were unable to fully test the experiment on Linux (76% with no new issues found). However, we did manage to cover 100% of Mac and Windows OSes.

Testing Summary:

Tested Platforms:

  • Windows 10 x64
  • Mac 10.14
  • Ubuntu 16.04 x64

Tested Firefox versions:

  • Firefox 66.0b14 Beta Unbranded build (en-US)
  • Firefox 66.0b14 Dev Edition build (en-US)
Flags: shield-qa+
Experiment Type: Opt-Out Study What are the branches of the study: - Treatment Non Dogfood Badly Seeded 20%: Initial coefficients are parameters geometrically far from existing ones (same as Branch 3, all 1s), different users train and evaluate model - Treatment Non Dogfood 20%: Initial coefficients are existing parameters (same as Branch 1), different users train and evaluate model - Treatment Dogfood Badly Seeded 20%: Initial coefficients are parameters geometrically far from existing ones (use all 1s), same users train and receive updated model - Treatment Dogfood 20%: Initial coefficients are existing parameters, same users train and receive updated model - Treatment Control 20%: Control What version and channel do you intend to ship to? 1% of Release Firefox 66.0 Are there specific criteria for participants? Version: Please include 66.0+ and not just a 66.0 exact match. Since the study will be enrolling during the Fx66 launch we expect enrollment to ramp up as clients update from 65 to 66 on March 19th. Locales: en-all Geographic regions: all Prefs: browser.urlbar.suggest.searches = True browser.urlbar.suggest.history = True browser.urlbar.suggest.bookmark = True browser.privatebrowsing.autostart = False #remove auto private browsing users privacy.sanitize.sanitizeOnShutdown = False #remove clear history on shutdown users browser.urlbar.matchBuckets = general:5,suggestion:Infinity #search not forced first Studies: NA Any additional filters: What is your intended go live date and how long will the study run? Mar 18, 2019 - Apr 29, 2019 (42 days) What is the main effect you are looking for and what data will you use to make these decisions? In order to evaluate the winning branch for the search parameters, we will be using both interaction data and survey results. The data we will examine is: - Number of characters typed before choosing result (used in v1) - Rank of result chosen (used in v1) - How soon the chosen result entered the result set - Total time elapsed between beginning query and choosing result - Number of abandoned searches - Total amount of usage of awesomebar - Search satisfaction score from survey Additionally, we will examine the standard engagement metrics to ensure that no branch sees significant decays (not anticipated). Search satisfaction score will be the gold standard for determining the winning branch. In the case that there is no clear winner using a 1% margin, we will evaluate which of the above metrics are most strongly correlated with positive search experience and see if a clear winner emerges from there. In the case that there is conflicting output, as in the previous study, we will consult with the search team to decide if there is enough evidence to support replacing the current parameters. If we see the training-test branches clearly outperform the dogfooding branches in both the well-seeded and badly-seeded cases, we will make note of that for upcoming federated learning studies. This result would be counter to the way that federated learning has been used in the past, and would warrant further investigation. We will also be examining how the convergence of the badly-seeded branches compares to their well-seeded counterparts and if they arrive at the same model. This outcome will affect how careful we need to be with start values for similar analyses (adding additional branches, running simulations, etc etc). We have already developed our survey (with Rosanne Scholl of S&I), which is available here: https://qsurvey.mozilla.com/s3/URL-bar-satisfaction-survey. Screenshot to be added as indicated in the text. Who is the owner of the data analysis for this study? Ilana Segall Will this experiment require uplift? False QA Status of your code: https://github.com/mozilla/federated-learning-v2-study-addon/blob/master/docs/TESTPLAN.md Link to more information about this study: https://experimenter.services.mozilla.com/experiments/federated-learning-v2/
Experiment Type: Opt-Out Study What are the branches of the study: - Treatment Non Dogfood Badly Seeded 20%: Initial coefficients are parameters geometrically far from existing ones (same as Branch 3, all 1s), different users train and evaluate model - Treatment Non Dogfood 20%: Initial coefficients are existing parameters (same as Branch 1), different users train and evaluate model - Treatment Dogfood Badly Seeded 20%: Initial coefficients are parameters geometrically far from existing ones (use all 1s), same users train and receive updated model - Treatment Dogfood 20%: Initial coefficients are existing parameters, same users train and receive updated model - Treatment Control 20%: Control What version and channel do you intend to ship to? 1% of Release Firefox 66.0 Are there specific criteria for participants? Version: Please include 66.0+ and not just a 66.0 exact match. Since the study will be enrolling during the Fx66 launch we expect enrollment to ramp up as clients update from 65 to 66 on March 19th. Locales: en-all Geographic regions: all Prefs: browser.urlbar.suggest.searches = True browser.urlbar.suggest.history = True browser.urlbar.suggest.bookmark = True browser.privatebrowsing.autostart = False #remove auto private browsing users privacy.sanitize.sanitizeOnShutdown = False #remove clear history on shutdown users browser.urlbar.matchBuckets Does not exist as a pref! #restricts to search first, which is the default on all clients 57 and newer Studies: NA Any additional filters: What is your intended go live date and how long will the study run? Mar 18, 2019 - Apr 29, 2019 (42 days) What is the main effect you are looking for and what data will you use to make these decisions? In order to evaluate the winning branch for the search parameters, we will be using both interaction data and survey results. The data we will examine is: - Number of characters typed before choosing result (used in v1) - Rank of result chosen (used in v1) - How soon the chosen result entered the result set - Total time elapsed between beginning query and choosing result - Number of abandoned searches - Total amount of usage of awesomebar - Search satisfaction score from survey Additionally, we will examine the standard engagement metrics to ensure that no branch sees significant decays (not anticipated). Search satisfaction score will be the gold standard for determining the winning branch. In the case that there is no clear winner using a 1% margin, we will evaluate which of the above metrics are most strongly correlated with positive search experience and see if a clear winner emerges from there. In the case that there is conflicting output, as in the previous study, we will consult with the search team to decide if there is enough evidence to support replacing the current parameters. If we see the training-test branches clearly outperform the dogfooding branches in both the well-seeded and badly-seeded cases, we will make note of that for upcoming federated learning studies. This result would be counter to the way that federated learning has been used in the past, and would warrant further investigation. We will also be examining how the convergence of the badly-seeded branches compares to their well-seeded counterparts and if they arrive at the same model. This outcome will affect how careful we need to be with start values for similar analyses (adding additional branches, running simulations, etc etc). We have already developed our survey (with Rosanne Scholl of S&I), which is available here: https://qsurvey.mozilla.com/s3/URL-bar-satisfaction-survey. Screenshot to be added as indicated in the text. Who is the owner of the data analysis for this study? Ilana Segall Will this experiment require uplift? False QA Status of your code: https://github.com/mozilla/federated-learning-v2-study-addon/blob/master/docs/TESTPLAN.md Link to more information about this study: https://experimenter.services.mozilla.com/experiments/federated-learning-v2/

Attached v2.2.0 for signing

Attachment #9048347 - Attachment is obsolete: true
Attachment #9048562 - Attachment is obsolete: true
Attachment #9051660 - Attachment is obsolete: true
Attachment #9051713 - Attachment is obsolete: true
Attachment #9051744 - Attachment is obsolete: true
Flags: needinfo?(mcooper)

Requesting QA sign-off for minor changes reflected in v2.2.0

Flags: needinfo?(carmen.fat)
Flags: needinfo?(mcooper)

Note for QA: The changes in v2.2.0 were necessary to be compatible with the updated remote streaming ETL job and does not affect any user-facing behavior.

Hi Martin, Carmen is returning tomorrow from PTO. Since experiments are launching only on Monday, I assume that we can wait for her return to validate the new build, right?

Flags: needinfo?(mlopatka)

@Paul Oiegas, Yes, no problem.
As fwollsen mentions these are minor changes to improve compatibility with the downstream data processing jobs. It should be straightforward if you/Carmen are comfortable reviewing the diff. We aim to relaunch with the new xpi on Monday April 1st.

Flags: needinfo?(mlopatka)

I've just posted the QA sign off for 2.2.0 build on bug 1539034.

Flags: needinfo?(carmen.fat)

ended in May

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: