Closed Bug 1506648 Opened 6 years ago Closed 5 years ago

Data science support for funnelcake experiment "Pinning to Win10 taskbar A/B testing"

Categories

(Data Science :: Experiment Collaboration, task)

x86_64
macOS
task
Not set
normal
Points:
3

Tracking

(data-science-status Data Acquisition)

RESOLVED FIXED
Tracking Status
data-science-status --- Data Acquisition

People

(Reporter: RT, Assigned: joy)

References

Details

Context:
Bug 1493597 is about building a funnelcake that pins a Firefox shortcut to Win10 taskbar. This is part of the effort to grow aDAU.

Request:
We need data science support in order to:
- Review experiment details
- Size test and control cohorts (measure 1% change in retention or 5% change in engagement)
- Perform comparative engagement (3 week retention) and retention analysis (number of searches, unique URIs browsed, number of sessions) to help reach decision to ship in release
Blocks: 1493597
Hi Romain, 

I reviewed the experiment details and I put a experiment size recommendation in the notes. 

I have a few open questions I'd like to dig into with you (below). 

Lets schedule a kickoff call sometime this week so we can talk in person about this experiment. 

----------

Questions: 
	- what's the distribution method? 
		* will it be randomly sampled from users who come to our website to download browser? 
		* how does that work? usually they get a stubinstaller but instead we give them the experiment funnelcake (full installer)?
		* why I want to know: 
			- I just want to understand the population we're drawing from (explicitly) and any potential bias that introduces
	- can we measure downloads of each branch? 
	- can we measure installs of each branch? 
		* why I want to know: 
			- because the sampling and treatment is being applied pre-profile creation/appearance, so any effect that happens very early in the funnel might be lost (i.e. users might churn before reporting us their profiles), it would be good to have the downloads and installs of each branch 
	- what happens with funnelcakes / distribution id's during pave-over installs? 
		* why I care: 
			- make sure my assumptions above above make sense
Assignee: nobody → shong
Status: NEW → ASSIGNED
(In reply to Su-Young Hong from comment #1)
> Hi Romain, 
> 
> I reviewed the experiment details and I put a experiment size recommendation
> in the notes. 
> 
> I have a few open questions I'd like to dig into with you (below). 
> 
> Lets schedule a kickoff call sometime this week so we can talk in person
> about this experiment. 
> 
> ----------
> 
> Questions: 
> 	- what's the distribution method? 
> 		* will it be randomly sampled from users who come to our website to
> download browser? 
I asked Jon on bug 1493595 (this bug addresses the traffic cop configuration (7 day randomly sampled acquisition is indeed what we should ask)
> 		* how does that work? usually they get a stubinstaller but instead we give
> them the experiment funnelcake (full installer)?
That's right, test and control will have full installers, this introduces a bias somehow but we have no way to work around this.
> 		* why I want to know: 
> 			- I just want to understand the population we're drawing from
> (explicitly) and any potential bias that introduces
> 	- can we measure downloads of each branch? 
Yes, this should be coming from analytics following bug 1493595 implementation
> 	- can we measure installs of each branch? 
> 		* why I want to know: 
> 			- because the sampling and treatment is being applied pre-profile
> creation/appearance, so any effect that happens very early in the funnel
> might be lost (i.e. users might churn before reporting us their profiles),
> it would be good to have the downloads and installs of each branch 
Installs is harder. We have the full installer ping now available in 65 but we want to run the funnelcake test in 63. Are you concerned that "install/download" ratio would be impacted by the change? This should be reflected in the "new profile/download" ratio anyway if this is the concern? 
> 	- what happens with funnelcakes / distribution id's during pave-over
> installs? 
> 		* why I care: 
> 			- make sure my assumptions above above make sense
Great question - given the high share of pave-overs (about 25%) I'd hope we set the distribution IDs. I'll add this to the QA list to check.
Hi Romain, 

Awesome. Thanks for the info. 

If we have the download numbers, that should suffice. Getting the install numbers would be nice for getting visibility, but if we have downloads, that would be sufficient I think.
Hi Romain, let me know when we start the study so I can start monitoring the enrollees

- Su
Flags: needinfo?(rtestard)
Engineering is now done.  We're awaiting QA approval and roll-out through the bouncer, hoping this starts rolling-out next week.
Flags: needinfo?(rtestard)
personal update: 

the previous attempt at deployment for this funnelcake ran into some issues. 

We met to discuss re-deploying this experiment in the near future (possibly next week), but there is an unresolved issue of: 

* how to deal with 32 bit vs 64 bit computers. 

It was suggested to just have a 32 bit build of the funnelcake, so if a user gets mis-identified as 32 bit (when they're 64 bit), they'll be served a working installer (32 bit FF works on 64 bit systems). 

I think our options are: 

1: build a 32 bit funnelcake to serve to 32 bit users and serve the 64 bit to 64 users
2: only have the current 64 bit builds, and to 32 bit users, serve them the stubinstaller
3: serve the 64 bit funnelcake to everyone (which leaves an estimated 4%of users who are using 32 bit systems stranded) 

---

Looking back, I realized I'm not sure if we ARE able to target 32 vs 64 bit OS on the website, and how accurate that targeting is... need to follow up.

note to self:

new customized stubinstallers seems to be the solution

it was communicated to me that these stubinstallers will report the same data in the same location, with the dist_id filled appropriately.

confirm that dist_id is available in install ping data

data-science-status: --- → Data Acquisition
Points: --- → 3
Depends on: 1521210

We've finished enrollment period for the stubinstallers. Now we are collecting data on usage.

Taskbar Pin Funnelcake Experiment (Win10 64bit)

Summary: currently, Firefox does not automatically pin the browser icon to the taskbar upon installation. This experiment is meant to test out the theory that doing so (pinning the browser to taskbar upon installation) will result in higher retention[1] and usage metrics[2].

Population Sampled: Since this needs to be new users, and the treatment applies upon installation, we cannot use a Normandy pref/addon study, and instead, used a funnelcake study. Bug for funnelcakes development here. The funnelcake was served to users downloading the browser from our website (excludes FTP downloads), where were:

  • en-US
  • Windows 10

Note: there was also a requriement that users be 64 bit OS. However, it was easier to serve our funnelcake stubinstaller to who where en-US, Win10, and upon installation, if their OS was detected to be 64bit OS, then the funnelcake was installed, otherwise, the regular version of Firefox was installed.

Traffic cop bug for sampling users here.

Users were enrolled over a 2 week period with a 50/50 split between funnelcake 138 and 139. Note, the total sampling rate was increased in week2 due to time issues, and there was some other hiccups (false starts) with the enrollment process previous to that 2 week period. See bug for details.

Branches:

  • Control
    • identified in Telemetry by distribution_id='mozilla138'
    • no changes from existing install experience, other then the distribution_id tag.
  • Branch 1 (pinned taskbar)
    • identified in Telemetry by distribution_id='mozilla139'
    • upon installation, the browser is automatically pinned to the OS taskbar.

Monitoring: dashboard is here

Next steps:

  • 1: Confirm that downloads of these funnelcakes happened at a 50/50 split as expected (since telemetry appearance happens further down the funnel)
  • 2: Check the installation rates (specifically new, successful installs) for each branch.
  • 3: Do normal experiment analysis.

NOTE: since this is funnelcake study, the population we are drawing from is any users who download from the mozilla.org website (that fit the eligibility criteria listed above) could be enrolled in this study. This does not necessarily mean they're new users/profiles. If a user with an existing installation and profile of Firefox downloads one of the funnelcakes, upon re-installation (a pave-over install), their existing profile[3] will now be tagged as that funnelcake ("mozilla138 or 139") in theirdistribution_id` see QA reference for behavior of distribution_id here.

So of the profiles who are tagged as belonging to our experiment, some users are new profiles, some are existing profiles. I suggest splitting up the analysis for:

  • existing profiles (confirm that automatically pinning to taskbar happens during paveover installs, both in the case that a pinned icon doesn't yet exist and a pinned icon already exists)
  • new profiles

Groups. i.e. do not assume that all test subjects in telemetry are new profiles.

[1] specifically 3 week, detectable to at least 1% absolute difference from control

[2] specifically the group average of:

  • total days used
  • total hours used
  • total active hours
  • total uris
  • total searches

per client, from enrollment date until 27 days after enrollment. We would like to detect a 5% difference relative to control for each of these averages.

Resources:

Per discussion with Su he won't be able to focus on this for a while.
Jessica, this study is likely to result in release changes that we hope will significantly impact retention - given it sounds like Su won't be able to progress this for 4 weeks, could we get someone assigned so we get the analysis to a conclusion within 2 weeks (this could then allow us to ship this change in 67 assuming succesful results)?

Flags: needinfo?(jmokrzecki)

(In reply to Romain Testard [:RT] from comment #12)

Per discussion with Su he won't be able to focus on this for a while.
Jessica, this study is likely to result in release changes that we hope will significantly impact retention - given it sounds like Su won't be able to progress this for 4 weeks, could we get someone assigned so we get the analysis to a conclusion within 2 weeks (this could then allow us to ship this change in 67 assuming succesful results)?

I will add it to our list to triage on Monday to see if anyone has the bandwidth to take on this ticket. I've added your note on priority as well.

Flags: needinfo?(jmokrzecki)
Assignee: shong → sguha

Some additional clarifying info from our convo earlier @saptarshi

Q: When were users served the funnelcakes:
A:
started serving:
15 January 2019
stopped serving:
29 January 2019

Q: How to get installation and download numbers

A (installation):
data source:
DSMO-RS
table:
download_stats_funnelcake

filters:

ping_version = 'v8-139' -- this gets us the funnelcake version
amd64_bit_build IS TRUE -- the stubinstaller served checks 32 v 64 bit. 
	    		-- 32 bit users were installed vanilla FF. 64 bit users
	    		-- were installed the funnelcake
had_old_install IS FALSE -- for new installs, this will be false. otherwise it is
	    		 -- a paveover installation
succeeded = TRUE -- the installation attempt was successful 

see queries in the monitoring dashboard for reference

A (downloads):
We'll have to get the download numbers for GA. We served the funnelcake-stubinstallers on the website. Honestly, not sure the correct way to get download numbers from the GA, but:

Contact Edward Cho on the Marketing Analytics team for access to GA.

For the "right" query to get the download numbers... I'm not sure, I was planning on researching it, but haven't gotten around to it. Maybe reach out to Cmore or Hoosteno?

Craig Cook was the website engineer who deployed the funnelcake distribution, so might be good idea to find out what URLs or events correspond to one of our funnelcake downloads.

Q: How to seperate new vs existing profiles

A: In my enrollment dashboard I think I used profile_creation_date within 7 days of first appearance by submission_date_s3. query here. I think just filtering for profiles that have PCD after the enrollment start date is a safe bet.

Hi Saptarshi, did you get a chance to look at this?

Flags: needinfo?(sguha)

Yes, was having it reviews. Will post the link here today

Flags: needinfo?(sguha)

romain, here is the report: https://metrics.mozilla.com/~sguha/mz/icon_bug_1506648/report.1.allprofiles.html

tl;dr

Based on the first 28 days of new profiles (created between 01/15 and 01/29) we have:

Proportion of profiles active in the 28 days after profile creation date (pcd) increased from 0.80 to 0.835. A relative difference of 3.5% (between 3% to 4.03%)
3rd week retention increased from 0.35 to 0.37 (relative increase is 7.1%(between 5.7% to 8.9%)
Days used (out of 28( went up from 8 to8.47 (relative increase is 5.8% [4.8% , 7.2%])
Total hours used/profile went up from 80.5 to 84.4 (a relative increase of 4.8% (between 2.8% to 6.7%)

No evidence of change in : active hours/profile, windows+tabs opened/profile or searches/profile.

That none of the internal usage metrics changed: active hours, searches or URIs but the results of opening the browser did (days used, hours kept open,third week retention) indicates that pinning the browser improves the likelihood of opening though what one does with the browser doesn’t change. This in itself makes the experiment a success.

Thanks Saptarshi
Can you please help clarify some follow-up questions:

  • How do you detect total active hours (VS total hours)?
  • Since "Ever Coming Back After 1st Use" increases on test, it may be that "Total Active Hours/Profile" and "URIs" drop just because the profiles that would not have opened Firefox without the shortcut are just lower usage profiles making the average drop?
  • I'm surprised that "Windows and Tabs Opened" drop - I was anticipating that Windows would increase to reflect users opening Firefox more but maybe this is explained by the fact that profiles who would not have opened Firefox in the first place end-up doing it less than the average of the other users.
  • How can there be more new profile pings than successful installs for new profiles? ALso not sure why this is different from Total New Profiles found in main summary? I was hoping for some evidence that the ratio new profiles/downloads was higher on test, is that true?

(In reply to Romain Testard [:RT] from comment #18)
Sorry for the late reply

Thanks Saptarshi
Can you please help clarify some follow-up questions:

  • How do you detect total active hours (VS total hours)?

active hours is based on active ticks. Every active tick corresponds to 5 seconds

  • Since "Ever Coming Back After 1st Use" increases on test, it may be that "Total Active Hours/Profile" and "URIs" drop just because the profiles that would not have opened Firefox without the shortcut are just lower usage profiles making the average drop?

This could be true.. to further your curioisty consider this

https://metrics.mozilla.com/~sguha/mz/icon_bug_1506648/report.1.pcdpostprofiles.html#32_usage_and_retention

please ignore the summary to this page

This analysis ignores the profile creation date of the profiles so only considers those profiles who came back after their profile creation date.

  • I'm surprised that "Windows and Tabs Opened" drop - I was anticipating that Windows would increase to reflect users opening Firefox more but maybe this is explained by the fact that profiles who would not have opened Firefox in the first place end-up doing it less than the average of the other users.

It could be but also look at the above link, it seems to increase per active hour overall (though active horus went down possibly because people use it for short bursts?)

  • How can there be more new profile pings than successful installs for new profiles? ALso not sure why this is different from Total New Profiles found in main summary? I was hoping for some evidence that the ratio new profiles/downloads was higher on test, is that true?

I tried capturing the funnel, and yes not all of makes it sense, but

a) they are close
b) test and control are same which is what i expected. There is no difference to new profiles to downloads at least not supported by the data.

But there is evidence we are capturing a different type of user and increased our 3rd week retention

Thanks!
So overall would you say that shipping this change would help contribute to a MAU increase without regressing overall search engagement? I think this is really the key decision point to ship this more generally on release.

Flags: needinfo?(sguha)

I would like to edit something i said in comment 19,

a) there are more new profiles in main_summary for the test group(Table 3 in 3.2) in [1] despite new profile pings remaining the same. If you look at this divided by downloads (which are the same for test and control), then your question "evidence for new profiles/downloads " is true.

Sorry for the confusion.

[1] https://metrics.mozilla.com/~sguha/mz/icon_bug_1506648/report.1.allprofiles.html#32_usage_and_retention

b) I haven't checked MAU, just third week retention is up, total days used is up but given MAU is "just having to use it once in 28 days" i cant sya if that changed. That said, ever coming back is up for test profiles who opened it again vs control profiles that ever opened it up again, so i expect it to have a positive effect on MAU but the report doesn't explicitly say so.

"So overall would you say that shipping this change would help contribute to a MAU increase without regressing overall search engagement? "

Soo, if you look at the report again

https://metrics.mozilla.com/~sguha/mz/icon_bug_1506648/report.1.allprofiles.html#323_total_days_used

i've added another table to each metric which compares the totals (days used, hours used etc) factoring the control group to be same size as test group (we know the test group has more profiles (see (a) above). From this view we see

ci) days used and total hours used increases
cii) nothing else changes (not even windows)

possibly windows counters incremenet after firefox startup (i.e. initial window opening doesnt count to this) and ci) makes sense that more people click on the browser icon but their behavior inside doesn't change much.

So ultimately i think this is a good thing. We get more users using our browser!

Hope this helps

Flags: needinfo?(sguha)

Thanks Saptarshi, this helps a lot. I feel we're good on the analysis, I'll now share this more widely with a recommendation to ship the feature by default on Win10 in order to reach a final decision.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.