Closed Bug 1501004 Opened 6 years ago Closed 5 years ago

[Shield] Pref Flip Study: Updated certificate error pages impact on retention and engagement, release 64, 65

Categories

(Shield :: Shield Study, defect)

defect
Not set
normal

Tracking

(firefox64+ fixed, firefox65+ fixed)

RESOLVED FIXED
Tracking Status
firefox64 + fixed
firefox65 + fixed

People

(Reporter: RT, Assigned: RT)

References

()

Details

User Story

Basic description of experiment: 
We want to enable the new certificate error page design to the test cohort and keep the control cohort with the current certificate error page design.
Here are the certificate error pages affected by the new design ordered by share of overall certificate errors display:
70.48% – SEC_ERROR_UNKNOWN_ISSUER
18.51% – SSL_ERROR_BAD_CERT_DOMAIN
4.58% – SEC_ERROR_EXPIRED_CERTIFICATE
1.96% – SEC_ERROR_OCSP_INVALID_SIGNING_CERT
1.81% – SEC_ERROR_OCSP_FUTURE_RESPONSE
We want to measure the impact that the new error page design may have on retention and engagement as well as understand the impact that the new design had on bypassing error pages.
What is the preference we will be changing? 
browser.security.newcerterrorpage.enabled (to true)
Is your feature compatible with default branch or is it a user branch study? 
Unsure?
What independent variable(s) (IVs) are you manipulating to affect measurements of your DV(s)? What different levels (values) can each IV assume?
Test branch 1 - browser.security.newcerterrorpage.enabled set to true - new certificate error page designs enabled
Control branch - browser.security.newcerterrorpage.enabled set to false
What percentage of users do you want in each branch? 
50% / 50%
What Channels and locales do you intend to ship to? 
Beta, en-US
What is your intended go live date and how long will the study run? 
Go live TBD and duration is 1 week acquisition followed by 3 weeks data collection
Are there specific criteria for participants? 
No

What is the main effect you are looking for and what data will you use to make these decisions? 
-Unique page views
-Usage hours 
-3 week retention
-Page displayed VS page bypassed ratio for pages that are bypassable

Who is the owner of the data analysis for this study?
Su 

Will this experiment require uplift? 
No

QA Status of your code: 
Pre-Beta sign-off gained on 64

Do you plan on surveying users at the end of the study? 
Yes, we want to collect overall user sentiment per error page type 

Link to any relevant google docs / Drive files that describe the project. Links to prior art if it exists:
Project overview: https://docs.google.com/document/d/1ENmkMcgyrdobP__aoCAqddjvawOm_AZxpyhKg5jE9wA/edit
Cert error page user testing: https://docs.google.com/presentation/d/1uZ6VqsUkrh2iAEQMv8Ak_LPY6EXYuSkVBLGuFDI85e0/edit#slide=id.p

Will this experiment graduate into a rollout? 
Yes

Post study plan of record: 
Go/No go decision based on the results if this study.
Basic description of experiment: 
We want to enable the new certificate error page design to the test cohort and keep the control cohort with the current certificate error page design.
Here are the certificate error pages affected by the new design ordered by share of overall certificate errors display:
70.48% – SEC_ERROR_UNKNOWN_ISSUER
18.51% – SSL_ERROR_BAD_CERT_DOMAIN
4.58% – SEC_ERROR_EXPIRED_CERTIFICATE
1.96% – SEC_ERROR_OCSP_INVALID_SIGNING_CERT
1.81% – SEC_ERROR_OCSP_FUTURE_RESPONSE
We want to measure the impact that the new error page design may have on retention and engagement as well as understand the impact that the new design had on bypassing error pages.
What is the preference we will be changing? 
browser.security.newcerterrorpage.enabled (to true)
Is your feature compatible with default branch or is it a user branch study? 
Unsure?
What independent variable(s) (IVs) are you manipulating to affect measurements of your DV(s)? What different levels (values) can each IV assume?
Test branch 1 - browser.security.newcerterrorpage.enabled set to true - new certificate error page designs enabled
Control branch - browser.security.newcerterrorpage.enabled set to false
What percentage of users do you want in each branch? 
50% / 50%
What Channels and locales do you intend to ship to? 
Release, en-US
What is your intended go live date and how long will the study run? 
Go live TBD and duration is 1 week acquisition followed by 3 weeks data collection
Are there specific criteria for participants? 
No

What is the main effect you are looking for and what data will you use to make these decisions? 
-Unique page views
-Usage hours 
-3 week retention
-Page displayed VS page bypassed ratio for pages that are bypassable

Who is the owner of the data analysis for this study?
Su 

Will this experiment require uplift? 
No

QA Status of your code: 
Pre-Beta sign-off gained on 64

Do you plan on surveying users at the end of the study? 
Yes, we want to collect overall user sentiment per error page type 

Link to any relevant google docs / Drive files that describe the project. Links to prior art if it exists:
Project overview: https://docs.google.com/document/d/1ENmkMcgyrdobP__aoCAqddjvawOm_AZxpyhKg5jE9wA/edit
Cert error page user testing: https://docs.google.com/presentation/d/1uZ6VqsUkrh2iAEQMv8Ak_LPY6EXYuSkVBLGuFDI85e0/edit#slide=id.p

Will this experiment graduate into a rollout? 
Yes

Post study plan of record: 
Go/No go decision based on the results if this study.
Hi Ilana, can you please help review and clarify what the timelines could be? We'd ideally like to run this during 64 Beta.
Flags: needinfo?(isegall)
for timeline discuss I defer to marnie.
Flags: needinfo?(isegall) → needinfo?(mpasciutowood)
User Story: (updated)
To avoid confusion I updated the user story to specify this is for Beta 64, not release as originally mentioned in my first post.
Depends on: 1501002
Marnie, ca you please help understand when the PHD could be reviewed and if this is something we can consider for Beta 64?
(In reply to Romain Testard [:RT] from comment #4)
> Marnie, ca you please help understand when the PHD could be reviewed and if
> this is something we can consider for Beta 64?

I think Marnie is out this week for Mozfest (returning next week) but Matt is this something we could maybe review in next week's session? Thank you!
Flags: needinfo?(mgrimes)
To circle back around to this after the PHD review today, it was decided that this study will run on release 64, instead of beta 64. Additionally, the team would like to run at the early part of the release cycle so there's adequate time for the data analysis. 

The Shield Pipeline Freeze that was discussed in the PHD review meeting (and over Slack) has been reduced from the entire block of December, meaning that the week of December 10 is available for launches. Romain, should we plan for a December 11 launch for this study? Per the PHD you'd have 1 week of enrollment then 3 weeks of data collection, putting your end date at 1/8/2019. How does that sound?
Flags: needinfo?(rtestard)
Flags: needinfo?(mpasciutowood)
Flags: needinfo?(mgrimes)
Summary: [Shield] Pref Flip Study: Updated certificate error pages impact on retention and engagement → [Shield] Pref Flip Study: Updated certificate error pages impact on retention and engagement, release 64
(In reply to Marnie Pasciuto-Wood [:marnie] from comment #6)
> To circle back around to this after the PHD review today, it was decided
> that this study will run on release 64, instead of beta 64. Additionally,
> the team would like to run at the early part of the release cycle so there's
> adequate time for the data analysis. 
> 
> The Shield Pipeline Freeze that was discussed in the PHD review meeting (and
> over Slack) has been reduced from the entire block of December, meaning that
> the week of December 10 is available for launches. Romain, should we plan
> for a December 11 launch for this study? Per the PHD you'd have 1 week of
> enrollment then 3 weeks of data collection, putting your end date at
> 1/8/2019. How does that sound?

Release 64 happens on Dec 11th and I think Ritu recommended we don't ship the study within the first 48 hours after the release so it sounds like Dec 14th is probably best as a start date. 1 week of enrollment then 3 weeks of data collection would then take us to an end date of Jan 11th.
Flags: needinfo?(rtestard)
Depends on: 1503572
(In reply to Romain Testard [:RT] from comment #7)
> (In reply to Marnie Pasciuto-Wood [:marnie] from comment #6)
> > To circle back around to this after the PHD review today, it was decided
> > that this study will run on release 64, instead of beta 64. Additionally,
> > the team would like to run at the early part of the release cycle so there's
> > adequate time for the data analysis. 
> > 
> > The Shield Pipeline Freeze that was discussed in the PHD review meeting (and
> > over Slack) has been reduced from the entire block of December, meaning that
> > the week of December 10 is available for launches. Romain, should we plan
> > for a December 11 launch for this study? Per the PHD you'd have 1 week of
> > enrollment then 3 weeks of data collection, putting your end date at
> > 1/8/2019. How does that sound?
> 
> Release 64 happens on Dec 11th and I think Ritu recommended we don't ship
> the study within the first 48 hours after the release so it sounds like Dec
> 14th is probably best as a start date. 1 week of enrollment then 3 weeks of
> data collection would then take us to an end date of Jan 11th.

Marnie: Are we allowed to start a shield study on a friday or would it need to be on the monday after?

If we do start on December 14th and conclude on January 11th, we would then need time for data analysis and go/no-go...the issue is that 65 Pre-Release Sign-off is on January 18th which doesn't seem feasible to hit with those additional milestones. Do we have any options like reducing the time for study or enrollment?
Flags: needinfo?(rtestard)
Flags: needinfo?(mpasciutowood)
I generally recommend keeping enrollment in 1 week segments (1 week, 2 weeks, etc) due to weekly seasonality issues (i.e. the mix of profiles who appear on monday are different then the mix on saturday). 

for the enrollment period, if we want to measure 3 week retention, we need 4 weeks (if subject enrolls on last day of the enrollment week, we need 21 more days to get their 3 week retention). 

this study is further complicated by the fact that the experiment subjects won't be the profiles who shield enrolls in the experiment but the subset of that group that runs into one of the modified error messages (which is the treatment we are testing), so there will need to be an 'observation period' after a subject is enrolled where we wait for the subject to hit one of the error messages, and then we can anchor them for retention. 

so to get 3 week retention, we would need to have a 1 week enrollment, 1 week observation, and then an additional 3 weeks of collection. 

if we wanted to speed things up, we could use 1 or 2 week retention, but that would come with tradeoffs. 

i'll also look into using alternative metrics to use as a proxy for "this user will keep using firefox in the future).
(In reply to Su-Young Hong from comment #9)
> I generally recommend keeping enrollment in 1 week segments (1 week, 2
> weeks, etc) due to weekly seasonality issues (i.e. the mix of profiles who
> appear on monday are different then the mix on saturday). 
> 
> for the enrollment period, if we want to measure 3 week retention, we need 4
> weeks (if subject enrolls on last day of the enrollment week, we need 21
> more days to get their 3 week retention). 
> 
> this study is further complicated by the fact that the experiment subjects
> won't be the profiles who shield enrolls in the experiment but the subset of
> that group that runs into one of the modified error messages (which is the
> treatment we are testing), so there will need to be an 'observation period'
> after a subject is enrolled where we wait for the subject to hit one of the
> error messages, and then we can anchor them for retention. 
> 
> so to get 3 week retention, we would need to have a 1 week enrollment, 1
> week observation, and then an additional 3 weeks of collection. 
> 
> if we wanted to speed things up, we could use 1 or 2 week retention, but
> that would come with tradeoffs. 
> 
> i'll also look into using alternative metrics to use as a proxy for "this
> user will keep using firefox in the future).



Thank you for the information here. So if we ship the study on Dec 14 then it will be  1 week enrollment, 1 week observation, and then an additional 3 weeks of collection. Then I assume another week for data analysis, is that correct? If this is the case then I think we would need to accept that this project will not launch in 64 or 65, but would then be a candidate for 66 (pending shield study results).
In it's current state, that is correct.
(In reply to Su-Young Hong from comment #11)
> In it's current state, that is correct.

Thank you very much for the confirmation and help in calendering this out :) 

@Romain can you confirm this is okay from a business perspective?
(In reply to Tony Cinotto [:tcinotto] (UTC-8) from comment #8)
> (In reply to Romain Testard [:RT] from comment #7)
> > (In reply to Marnie Pasciuto-Wood [:marnie] from comment #6)
> > > To circle back around to this after the PHD review today, it was decided
> > > that this study will run on release 64, instead of beta 64. Additionally,
> > > the team would like to run at the early part of the release cycle so there's
> > > adequate time for the data analysis. 
> > > 
> > > The Shield Pipeline Freeze that was discussed in the PHD review meeting (and
> > > over Slack) has been reduced from the entire block of December, meaning that
> > > the week of December 10 is available for launches. Romain, should we plan
> > > for a December 11 launch for this study? Per the PHD you'd have 1 week of
> > > enrollment then 3 weeks of data collection, putting your end date at
> > > 1/8/2019. How does that sound?
> > 
> > Release 64 happens on Dec 11th and I think Ritu recommended we don't ship
> > the study within the first 48 hours after the release so it sounds like Dec
> > 14th is probably best as a start date. 1 week of enrollment then 3 weeks of
> > data collection would then take us to an end date of Jan 11th.
> 
> Marnie: Are we allowed to start a shield study on a friday or would it need
> to be on the monday after?
> 
> If we do start on December 14th and conclude on January 11th, we would then
> need time for data analysis and go/no-go...the issue is that 65 Pre-Release
> Sign-off is on January 18th which doesn't seem feasible to hit with those
> additional milestones. Do we have any options like reducing the time for
> study or enrollment?

I'll leave your question on study and enrollment time up to Su; looks like you folks hashed it out. That said, launching on Fridays (or Thursdays even) are non-starters. If something goes wrong with the study after launching late in the week, we'd run into the weekend and not have folks available to mitigate. So, we only launch on Mondays or Tuesdays if we can help it.
Flags: needinfo?(mpasciutowood)
(In reply to Tony Cinotto [:tcinotto] (UTC-8) from comment #12)
> (In reply to Su-Young Hong from comment #11)
> > In it's current state, that is correct.
> 
> Thank you very much for the confirmation and help in calendering this out :) 
> 
> @Romain can you confirm this is okay from a business perspective?

66 sounds most likely now indeed and there are no reasons for pushing the cert error page update through without the full results from the shield study so it's a yes from the product side.
Flags: needinfo?(rtestard)
Hi Su, can you please confirm if this is fine from the data science standpoint?
Also thanks for confirming the cohort sizes so we have it all on the bug.
Flags: needinfo?(shong)
Hi Johann, Who peer reviewed the certs error work? Can we request them to provide peer review signoff for the study in this bug?
Hi Johann, Who peer reviewed the certs error work? Can we request them to provide peer review signoff for the study in this bug?
Flags: needinfo?(jhofmann)
Awesome. So sounds like there isn't a big rush, so why don't we start the study on Dec 17th (a Monday). That'll be enough time from the release of FF 64 for the version to be uplifted. Then, the study will end on Jan 21st (1 week enrollment, 1 week observation, 3 weeks for retention). 

Enrollment Start Date: 
    * Dec 17th

Duration: 
    * 1 week enrollment period
    * 4 weeks study period

Targeting: 
    * Release
    * FF 64
    * all locales / countries

Sampling: 
    * 0.5% treatment branch
    * 0.5% control branch 
    (1% total) 

Does this sound good to everyone?
Flags: needinfo?(shong)
(In reply to Su-Young Hong from comment #18)
> Awesome. So sounds like there isn't a big rush, so why don't we start the
> study on Dec 17th (a Monday). That'll be enough time from the release of FF
> 64 for the version to be uplifted. Then, the study will end on Jan 21st (1
> week enrollment, 1 week observation, 3 weeks for retention). 
> 
> Enrollment Start Date: 
>     * Dec 17th
> 
> Duration: 
>     * 1 week enrollment period
>     * 4 weeks study period
> 
> Targeting: 
>     * Release
>     * FF 64
>     * all locales / countries
> 
> Sampling: 
>     * 0.5% treatment branch
>     * 0.5% control branch 
>     (1% total) 
> 
> Does this sound good to everyone?


Hi Su, the timing looks good to me! thank you. I mapped out the dates on the following calendar and have verified that it's feasible with Marnie: https://docs.google.com/document/d/13iahoA_0SgCLxAfLd-oqgah-wP7mUglXB0vE5RCBNUI/edit?usp=sharing
(In reply to Tony Cinotto [:tcinotto] (UTC-8) from comment #17)
> Hi Johann, Who peer reviewed the certs error work? Can we request them to
> provide peer review signoff for the study in this bug?

This is a pref flip study, all code that this enables has already landed with Firefox peer approval, so we don't need another review or sign off here. :)
Flags: needinfo?(jhofmann)
Hi Su, would you be able to provide science review approval in this bug please? thank you!
Flags: needinfo?(shong)
Data review signed off in https://bugzilla.mozilla.org/show_bug.cgi?id=1503572 Is this okay Marnie?
Flags: needinfo?(mpasciutowood)
(In reply to Tony Cinotto [:tcinotto] (UTC-8) from comment #21)
> Hi Su, would you be able to provide science review approval in this bug
> please? thank you!

no problem

+r
Flags: needinfo?(shong)
All good, thanks Tony.
Flags: needinfo?(mpasciutowood)
I think this experiment is scheduled to start on Monday. 

Seeing as there's a significant bug (above) with the certifiate errors we want to test, lets pause on plans to start start the experiment. 

Some background, from the certificate-errors channel, it turns out that not all the translations have been done, so some locales still need translation. It's unclear which ones have/haven't been translated. 

I think we can proceed in the following ways:

* restrict experiment to en-US 
* find out all the untranslated locales and filter them out

:marnie, can we put this experiment on pause (so NOT start enrollment on Monday) until we have a plan in place? 

:rtestard, which choice do you think makes sense? I think either is fine, but the first will be faster. 

:carmenf, just to confirm, did you test all locales and es-ES was the only one that was not translated yet? Or did you test a subset of locales and es-ES just happened to be untranslated (but there could be others). 

- Su
Flags: needinfo?(rtestard)
Flags: needinfo?(mpasciutowood)
Flags: needinfo?(carmen.fat)
We only tested the following subset of top locales:
- en-US
- es-ES
- de-de
- fr-fr
- ru-ru

The es-ES build is the only one from this subset which still has a few strings that are not translated to Spanish. 
Please let us know about the following steps we need to take regarding this experiment.  

Just a reminder, besides the bug above (1514236), we have also filed bug 1513868. Please take a look at it and fill us in if there's anything we can do about it.
Flags: needinfo?(carmen.fat)
Su and I talked about this late Friday afternoon. To summarize, this study will not launch today, Dec. 17, due to the blocking bugs. Assuming the bugs are fixed, the next opportunity to ship this will be Jan. 7, due to the Shield pipeline freeze.
Flags: needinfo?(mpasciutowood)
Intent to ship sent on December 10th. Follow up intent to ship with new January 7th date emailed out on December 17th.
Updated cert error pages pref flip study
Targeted: Firefox Release 64
          Firefox Release 65

We have finished testing the “Updated cert error pages pref flip study” experiment.

QA’s recommendation: YELLOW - SHIP IT CONDITIONALLY

Reasoning:
1. There still are 2 open issues that concern us, but we don’t consider them blockers for releasing the study: 
- bug 1513868 - The old UI of SEC_ERROR_OCSP_FUTURE_RESPONSE error is displayed even though the new certificate error pages are enabled
- bug 1515943 - No telemetry events are generated if the "Learn more..." link is opened from context menu or using middle click

2. Initially, the study targeted all the Firefox locales and we started testing the top 5 most popular ones. After finding that one of the locales wasn’t 100% translated, we continued testing 8 more locales. This way, we found out that 2 of the tested locales are not completely translated (bug 1514236 and bug 1514771). Therefore, the team decided to only target en-X, fr, it, and zh-CN locales.


Testing Summary:
- Full Functional test suite: TestRail (https://testrail.stage.mozaws.net/index.php?/plans/view/14245)


Tested Platforms:
- Windows 10 x64
- Ubuntu 16.04 x64
- Mac 10.13.6

Tested Firefox versions:
- Firefox Release 64.0 (en-US, es-ES, fr, de, ru, pt-BR, zh-CN, pl, en-GB, it, es-MX, ja, nl)
- Firefox Beta 65.0b5 (en-US, es-ES, fr, de, ru, pt-BR, zh-CN, pl, en-GB, it, es-MX, ja, nl)

Regards,
Carmen
As per our meeting earlier today, here is the new targeting criteria for this experiment (disregard previous information): 

channel: release
version(s): FF64 and above
countries: all
locales: fr, it, zh-CN, en-CA, en-GB, en-US, en-ZA

branch prefs: 
    Test branch 1 - browser.security.newcerterrorpage.enabled set to true 
    Control branch - browser.security.newcerterrorpage.enabled set to false (default)  

sampling: 1% of eligible profiles
branch split: 50/50 to each branch

enrollment start date: Jan 7th
study duration(s): 
    [1] acquisition period: 1 week 
    [2] collection period: 5 weeks 

[1]: actively enroll profiles over this period
[2]: keep already enrolled profiles in the experiment for this duration (starting with their individual enrollment date) 

I've also updated the PHD document (https://docs.google.com/document/d/1wfopCoPzE1DRqffyTzcOGxzdRI5cwybOnyWTkirvmOg/edit?ts=5bc9bf1d) 

:tcinotto, please feel free to send the intent to ship email
Flags: needinfo?(tcinotto)
Ryan and Julien, we'd like to ship this study to Release on Monday, January 7th. It will span release 64 and 65. Tony is sending an updated Intent to Ship email shortly.
Flags: needinfo?(ryanvm)
Flags: needinfo?(jcristau)
Summary: [Shield] Pref Flip Study: Updated certificate error pages impact on retention and engagement, release 64 → [Shield] Pref Flip Study: Updated certificate error pages impact on retention and engagement, release 64, 65
(In reply to Su-Young Hong from comment #30)
> As per our meeting earlier today, here is the new targeting criteria for
> this experiment (disregard previous information): 
> 
> channel: release
> version(s): FF64 and above
> countries: all
> locales: fr, it, zh-CN, en-CA, en-GB, en-US, en-ZA
> 
> branch prefs: 
>     Test branch 1 - browser.security.newcerterrorpage.enabled set to true 
>     Control branch - browser.security.newcerterrorpage.enabled set to false
> (default)  
> 
> sampling: 1% of eligible profiles
> branch split: 50/50 to each branch
> 
> enrollment start date: Jan 7th
> study duration(s): 
>     [1] acquisition period: 1 week 
>     [2] collection period: 5 weeks 
> 
> [1]: actively enroll profiles over this period
> [2]: keep already enrolled profiles in the experiment for this duration
> (starting with their individual enrollment date) 
> 
> I've also updated the PHD document
> (https://docs.google.com/document/d/
> 1wfopCoPzE1DRqffyTzcOGxzdRI5cwybOnyWTkirvmOg/edit?ts=5bc9bf1d) 
> 
> :tcinotto, please feel free to send the intent to ship email

Thank you! Intent to ship filed.
Flags: needinfo?(tcinotto)
Testing new cert error pages at 1% of select locales; QA concerns seem addressed (and the event from bug 1515943 is not part of the study criteria AIUI).  Approving for relman.
Flags: needinfo?(jcristau) → shield-relman+
Flags: needinfo?(ryanvm)

Recipe 663 has been deployed.

pref-flip-cert-error-pages-1501004 is now live.

Flags: needinfo?(rtestard)

Recipe 663 has been disabled.

pref-flip-cert-error-pages-1501004 has ended.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.