Closed Bug 1539034 Opened 5 years ago Closed 5 years ago

[Shield] Add-On Study: Add-On Study: Federated Learning v2 relaunch

Categories

(Shield :: Shield Study, enhancement)

Desktop
All
enhancement
Not set
normal

Tracking

(geckoview66 ?, firefox66- affected)

RESOLVED FIXED
Tracking Status
geckoview66 --- ?
firefox66 - affected

People

(Reporter: experimenter, Assigned: isegall)

Details

Add-On Study: Federated Learning v2 relaunch

We seek to replicate the federated learning study performed last year (PHD: https://docs.google.com/document/d/1DuZ1nQ2ve7k-98BKUKCZegtVzAxnJBEgYJYv6IUqZck/edit?ts=5b0885a3) with an updated architecture and additional probes.

This study is a relaunch of the previous Federated Learning v2 study (https://experimenter.services.mozilla.com/experiments/federated-learning-v2/), which was launched with errors.

Public facing description: This study uses federated learning, a privacy-preserving learning methodology, to gain insights into how our users utilize the history and bookmark features in the awesome bar. We'll use these findings both to optimize the experience in the awesome bar and explore our ability to use federated learning in the future.

More information: https://experimenter.services.mozilla.com/experiments/add-on-study-federated-learning-v2-relaunch/

Add-On Study: Federated Learning v2 - 2.2.0 build

Targeted: Firefox Release 66.x
We have finished testing the Add-On Study: Federated Learning v2 - 2.2.0 build experiment.

QA’s recommendation: GREEN - SHIP IT

Reasoning:

  • No new issues have been found during testing the 2.2.0 build.

Testing Summary:

  • Verified that each of the branches can be accessed using their changed names (values).
  • Verified that the "control" and "not-submitting" types of branches generate the "shield-study-addon" telemetry probes.
  • Verified that the "control" and "not-submitting" types of branches don't generate the "frecency-update" telemetry probes.
  • Verified that the rest of the branches (model1, model2, model3-submitting and model4-submitting) generate both "shield-study-addon" and "frecency-update" telemetry probes.
  • Performed regression testing to ensure that the changes performed to the add-on’s architecture don’t affect the user-facing behavior and telemetry probes.

Tested Platforms:

  • Windows 10 x64
  • MacOS 10.14

Tested Firefox versions:

  • Firefox Release 66.0.1 (en-US)
  • Firefox Release 66.0.2 (en-US)
Flags: shield-qa+

[Tracking Requested - why for this release]: This experiment is targeting the Firefox 66 release population.

Flags: shield-science+
Flags: shield-data-steward+
OS: Unspecified → All
Hardware: Unspecified → Desktop
Flags: needinfo?(lhenry)
Flags: needinfo?(lhenry)
    Experiment Type: Opt-Out Study

    What are the branches of the study:

- Treatment model4-not-submitting 10%:

Initial coefficients are parameters geometrically far from existing ones (same as model2, all 1s), testing set
        
- Treatment model4-submitting 10%:

Initial coefficients are parameters geometrically far from existing ones (same as model2, all 1s), training set
        
- Treatment model3-not-submitting 10%:

Initial coefficients are existing parameters (same as model1), testing set
        
- Treatment model3-submitting 10%:

Initial coefficients are existing parameters (same as model1), training set
        
- Treatment model2 20%:

Initial coefficients are parameters geometrically far from existing ones (use all 1s), same users train and receive updated model
        
- Treatment model1 20%:

Initial coefficients are existing parameters, same users train and receive updated model
        
- Treatment Control 20%:

control
        

    What version and channel do you intend to ship to?

1% of Release Firefox 66.0

    Are there specific criteria for participants?

Version: Please include 66.0+ and not just a 66.0 exact match. Since the study will be enrolling during the Fx66 launch we expect enrollment to ramp up as clients update from 65 to 66 on March 19th.

Locales: en-all
Geographic regions: all

Prefs:
browser.urlbar.suggest.searches = True
browser.urlbar.suggest.history = True
browser.urlbar.suggest.bookmark = True
browser.privatebrowsing.autostart = False #remove auto private browsing users
privacy.sanitize.sanitizeOnShutdown = False #remove clear history on shutdown users
browser.urlbar.matchBuckets Does not exist as a pref! #restricts to search first, which is the default on all clients 57 and newer

Studies: NA

Any additional filters:

    What is your intended go live date and how long will the study run?

Apr 01, 2019 - May 13, 2019 (42 days)

    What is the main effect you are looking for and what data will you use to
    make these decisions?

In order to evaluate the winning branch for the search parameters, we will be using both interaction data and survey results. The data we will examine is:

- Number of characters typed before choosing result (used in v1)
- Rank of result chosen (used in v1)
- How soon the chosen result entered the result set
- Total time elapsed between beginning query and choosing result
- Number of abandoned searches
- Total amount of usage of awesomebar
- Search satisfaction score from survey

Additionally, we will examine the standard engagement metrics to ensure that no branch sees significant decays (not anticipated).

Search satisfaction score will be the gold standard for determining the winning branch. In the case that there is no clear winner using a 1% margin, we will evaluate which of the above metrics are most strongly correlated with positive search experience and see if a clear winner emerges from there. In the case that there is conflicting output, as in the previous study, we will consult with the search team to decide if there is enough evidence to support replacing the current parameters.

If we see the training-test branches clearly outperform the dogfooding branches in both the well-seeded and badly-seeded cases, we will make note of that for upcoming federated learning studies. This result would be counter to the way that federated learning has been used in the past, and would warrant further investigation.

We will also be examining how the convergence of the badly-seeded branches compares to their well-seeded counterparts and if they arrive at the same model. This outcome will affect how careful we need to be with start values for similar analyses (adding additional branches, running simulations, etc etc).

We have already developed our survey (with Rosanne Scholl of S&I), which is available here: https://qsurvey.mozilla.com/s3/URL-bar-satisfaction-survey. Screenshot to be added as indicated in the text.

Based on Shell's comments in the previous experimenter link, we should launch the survey April 22 and April 29. If those don't launch for any reason, it'll launch upon add-on expiration on May 6th. May 13th is the new ending date as a buffer to allow users to respond and to correlated them with telemetry data.

    Who is the owner of the data analysis for this study?

Ilana Segall

    Will this experiment require uplift?

False

    QA Status of your code:

https://github.com/mozilla/federated-learning-v2-study-addon/blob/master/docs/TESTPLAN.md

    Link to more information about this study:

https://experimenter.services.mozilla.com/experiments/add-on-study-federated-learning-v2-relaunch/

Untracking since we have this information easily findable now in Experimenter.

experiment complete

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.