Closed Bug 1434379 Opened 6 years ago Closed 6 years ago

[Shield] Pref Flip Study: Tracker request tailing

Categories

(Shield :: Shield Study, defect)

defect
Not set
normal

Tracking

(firefox59+ fixed)

RESOLVED FIXED
Tracking Status
firefox59 + fixed

People

(Reporter: mayhemer, Assigned: mayhemer)

References

()

Details

Basic description of experiment: 
Compare telemetry results of selected related probes with two groups of users (tailing feature fully enabled with other preferable parameters at their defaults - as shipped, vs feature fully disabled)

What is the preference we will be changing? 
network.http.tailing.enabled

What are the branches of the study and what values should each branch be set to? Control: 
the pref at false, Treatment branch (only one): the pref at true.

What percentage of users do you want in each branch? 
I think 50% Nightly and 25% Beta population with split of Control/Treatment to 50/50 each.  (We already have results for the feature being turned on, since it has shipped already, but we don’t know how many users may have switched the pref manually to false)

What Channels and locales do you intend to ship to? 
Nightly and Beta, all locales. 

What is your intended go live date and how long will the study run? 
Start aprox: soon
Run time: 1 week (to have also weekend coverage)

Are there specific criteria for participants? 
Can’t think of anything.

What is the main effect you are looking for and what data will you use to make these decisions? The main effect is lowering mean/average time of the following telemetry probes:
TIME_TO_NON_BLANK_PAINT_MS
TIME_TO_DOM_CONTENT_LOADED_START_MS
TIME_TO_DOM_INTERACTIVE_MS
TIME_TO_LOAD_EVENT_START_MS

Who is the owner of the data analysis for this study? Me, Honza Bambas or anyone more appropriate to do this

Will this experiment require uplift? 
No

QA Status of your code: 
We decided that a push to try is enough to check that the pref-flip doesn't cause issues (Nightly and Beta):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d06afc2eba9eaf32a57b15b1e0591316d86dcb54
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d925ef63d12e2122443dc6df303a7f08f47103f9

Do you plan on surveying users at the end of the study? 
Yes

Link to any relevant google docs / Drive files that describe the project. Links to prior art if it exists:

I can only provide a bug reference this has been designed and implemented at:
1. https://bugzilla.mozilla.org/show_bug.cgi?id=1358060
Flags: needinfo?(isegall)
science review: R+
Flags: needinfo?(isegall)
Ryan, can you please check the two tryruns from comment 0 and sign them off?  I did star jobs where apparently a known orange was hit.
Flags: needinfo?(ryanvm)
Nothing scary there that I can see!
Flags: needinfo?(ryanvm)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #3)
> Nothing scary there that I can see!

Thanks, Ryan!
Is that your explicit QA sign off? CC Krupa as well.
Flags: needinfo?(ryanvm)
(In reply to Matt Grimes [:Matt_G] from comment #5)
> Is that your explicit QA sign off?

Definitely not. I just reviewed the Try results as asked.
Flags: needinfo?(ryanvm) → needinfo?(kraj)
Thanks for clarifying. Much appreciated.
Update: run this study only on the Beta channel (59).  Reasons:
- we would have to reenable some of the probes needed for results collecting
- simpler to do QA only on one channel prior shipping the study
- having results from Beta is pretty much enough valuable to have (Beta pop is usually more reflecting Release pop)

thanks
This has been QA-signed off by Emil Pasca.
(In reply to Honza Bambas (:mayhemer) from comment #9)
> This has been QA-signed off by Emil Pasca.

Yep, this is ready to ship.
Flags: needinfo?(kraj)
Tracking for 59. Honza, does this mean we may want to disable the feature on 59 release based on the results we see from the experiment?
Flags: needinfo?(honzab.moz)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #11)
> Tracking for 59. Honza, does this mean we may want to disable the feature on
> 59 release based on the results we see from the experiment?

If it goes bad, we definitely can.  Note that we should not remove it completely, but only disable it for tracking scripts and requests (needs a small patch), since tailing is already used for icons and may find more use in the future too.
Flags: needinfo?(honzab.moz)
This study has been disabled - analysis pending...
Thanks, please let me or RyanVM know with needinfo, if anything needs to land for 59 release.
Flags: needinfo?(jgaunt)
Flags: needinfo?(honzab.moz)
@mreid: here's the a gist for a notebook that (hopefully) concisely illustrates how spark can hit a wall using much more than 10% of the available data for a pref-flip experiment... leading up to Austin we had briefly discussed whether I could do anything to get that % any higher?

https://gist.github.com/jtg567/a514aae354df4da9080ea3d72a5f8b8a

Compare cells [4] and [5], which are almost identical except for the 2nd arg passed to exptPings() being .1 and .15 respectively. As you can see .1 runs to completion whereas .15 leads to an exception.
Flags: needinfo?(jgaunt) → needinfo?(mreid)
Blocks: 1420885
I had a couple of specific suggestions via IRC, but the crux of it is to avoid calling collect() or cache() on large datasets - keep calculations in DataFrame / RDD land where possible, only pulling data into python when it has been aggregated down to a manageable size.
Flags: needinfo?(mreid)
Thank you, Mark - I believe it will take some time and critical thought to determine if it's possible to make some of this work less computationally intensive on the python side.

In the meantime I'm sharing results from this study here: https://gist.github.com/jtg567/a514aae354df4da9080ea3d72a5f8b8a there's no strong effect of the pref-flip on the probes between branches here.

I'm glad to answer any followup questions here.
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #14)
> Thanks, please let me or RyanVM know with needinfo, if anything needs to
> land for 59 release.

In my humble opinion, I think not.  No regression has been found.
Flags: needinfo?(honzab.moz)
Great! Can we mark this fixed now?
Flags: needinfo?(honzab.moz)
I think yes, but I still want to look more closely at the data as I don't believe in total zero effect of the feature.  I suspect something has been done. 

Anyway, this study is done now.
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(honzab.moz)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.