Closed Bug 1367951 Opened 3 years ago Closed 2 years ago

Pref study: Race Cache with Network iteration #1

Categories

(Shield :: Shield Study, enhancement)

enhancement
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jduell.mcbugs, Assigned: valentin)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-active])

The Race Cache with Network (RCWN) project is at a point where we want to turn the code on for our nightly users, so we can get telemetry to see how well the code works.

The main effect we're looking for is that the long tail for time to load resources from the disk cache (HTTP_SUB_COMPLETE_LOAD_CACHED_V2 in telemetry) should be less long, i.e. things that used to sit waiting for disk for a long time will sometimes now get satisfied faster by the network (and won't show up in the cache telemetry metric).  After talking to rweiss about this, we agreed the best metric for success is to look at the 95% percentile--it should show a lower timing when the pref is flipped on.

RWCN will primarily benefit users who have slow disks--other users are unlikely to see any effect.  Given that we're testing on nightly I think we will need a large sample size--I suggest all of our nightly users--for that to work, since most of our nightly users probably have better hardware than average. (OTOH I see in the Shield docs that "We can include/exclude based on anything available in telemetry": does that mean we could limit this to users whose 75% or 95% threshold is above some value?  That would work).

Valentin and Michal are the owners of the data analysis here.  We're planning to run only on nightly (code is not uplifted and it would be a fairly large patch to uplift IIRC)

I'm guessing that 2 or 3 days of having the pref flipped on ought to give us enough data.  But the count of interesting data points will rely on how many nightly users have slow drives, so it's hard to predict exactly.  

We aren't planning to survey users--this is a subtle performance fix that most users aren't going to be able to detect is on.
Asking for signoff from Shield studies owner.
Flags: needinfo?(glind)
And also from Data steward.
Flags: needinfo?(rweiss)
:jduell

+1 on provisional signoff here, with a few recommendations / amendments.  These are NON-BLOCKING.

1.  (what is your pref-name, btw, and the listed values.  There needs to be a SETTING that is the "don't do anything" setting, fwiw.  we can discuss.
2.  +1 on the 95% range statistic as a measure.  That should respond to be overall 'bad experience' getting better.  2nding Rweiss.
3.  Re: 'shield can find anything in telemetry'... Typically we use channel, addons, locales.  Looking for specific histogram values is... challenging [1]
4.  Sampling:  I have no personal issue with 100% of nightly on this.  we would roll out over several stages.  I do hope that if we go wide, there is some marker for this pref in the crash report / etc.  Rel-man will make the final call on 100%.


Worries / concerns / chances for growth....
1.  As you say, Nightly might already all have fast disks.  IF you have way of knowing if they have SSD vs not (or some other better measure of DISK AWESOME) then let's use it :).  Blue sky: Perhaps during first-run / turn on your feature could (mark something | set a pref | send a ping) after detecting this.


Timeline:
1.  If you have the pref all sorted, and it respects the shield pref-study assumptions [2], then it's "< 1 business day" to initial deploy AFTER rel-drivers approves [3].





Notes:
[1]:  http://normandy.readthedocs.io/en/latest/user/filter_expressions.html#normandy.telemetry
[2]:  Shield Pref flip assumptions:
   a.  users who HAND-SET their pref are OUT
   b.  the pref takes one of several values (true / false).
[3]:  We send them an email describing the project.  It can take 2-3 days.  Currently they must POSITIVELY APPROVE the change.  They will want to know potential bad effects, how they show up, and any uplift needed.
Flags: needinfo?(glind)
We've reconsidered the role of data stewards for pref flipping, and they are no longer required for review.  Signing off.
Flags: needinfo?(rweiss)
Jason, any updates on this RCWN pref study? Now that 55 is riding from Nightly to Beta this week, should the pref study be run in Nightly 56 and/or Beta 55?
Flags: needinfo?(jduell.mcbugs)
Whiteboard: [necko-active]
We should be ready to run the pref test very soon.  My understanding is that when we're ready, the next step is to email release drivers.  They should expect an email from Valentin soon.
Flags: needinfo?(jduell.mcbugs)
Depends on: 1367810, 1367742
Depends on: 1378714
Valentin, any updates on this RCWN experiment? I see you filed some bugs blocking this one. Are those bugs found by this experiment or bugs that must be fixed before it can proceed?
Flags: needinfo?(valentin.gosu)
Hi Chris, the bugs were to be fixed before the RCWN experiment. This first experiment has completed, and we're now waiting on info on how to evaluate the telemetry results.
Flags: needinfo?(valentin.gosu)
Hey Chris,
Are there any tool to compare telemetry probes between the active and control groups of a shield pref study?
How soon can we get the data? I'm asking mostly for the next shield study: bug 1381816 - which we intend to run next week.
Thanks!
Flags: needinfo?(cpeterson)
If the data you are looking for is not in experiments viewer then one of the folks from my team can help you out. @jgaunt or isegall, can one of you help Valentin out?
Flags: needinfo?(jgaunt)
Flags: needinfo?(isegall)
Matt_G will know. The Experiment Viewer is all I know of.

https://moz-experiments-viewer.herokuapp.com/
Flags: needinfo?(cpeterson)
(In reply to Chris Peterson [:cpeterson] from comment #11)
> Matt_G will know. The Experiment Viewer is all I know of.
> 
> https://moz-experiments-viewer.herokuapp.com/

From what I can tell there is no way in the experiments viewer to analyze the probes we are interested in:
 NETWORK_RACE_CACHE_WITH_NETWORK_*
 HTTP_PAGE_COMPLETE_LOAD_V2
 HTTP_PAGE_COMPLETE_LOAD_NET_V2
 HTTP_PAGE_COMPLETE_LOAD_CACHED_V2
 HTTP_SUB_COMPLETE_LOAD_V2
 HTTP_SUB_COMPLETE_LOAD_CACHED_V2
 HTTP_SUB_COMPLETE_LOAD_NET_V2
 TOTAL_CONTENT_PAGE_LOAD_TIME

It would be nice to be able to specify any arbitrary probes to compare in the experiments viewer.
That's the goal for experiments viewer eventually. That will probably come in a couple of months. In the meantime, I've got folks lined up to pull that data for you when needed. I think Ilana said she was taking this one? Please confirm.
(In reply to Matt Grimes [:Matt_G] from comment #13)
> In the meantime, I've got folks lined up to pull that data for you when needed.

That would be great! Thanks!
(In reply to Valentin Gosu [:valentin] from comment #14)
> (In reply to Matt Grimes [:Matt_G] from comment #13)
> > In the meantime, I've got folks lined up to pull that data for you when needed.
> 
> That would be great! Thanks!

In what form do you expect the data for analysis? I've isolated the telemetry pings from this experiment with columns for the probes you listed above plus clientId, channel, Fx version, OS, OS version, and experiment branch. The probes are all nested as histograms, whereas the fields I added are strings.

There's 3,127,874 total pings from 48,514 unique clientIds.

If this satisfies your requirements I can save to S3. If not, please clarify the specification.
Flags: needinfo?(valentin.gosu)
Flags: needinfo?(jgaunt)
Flags: needinfo?(isegall)
That sounds good. Apart from the raw data, could you also help us generate graphs similar to moz-experiments-viewer ?
For bug 1381816 do we need to take the same manual approach to analyzing the data?

Thanks!
Flags: needinfo?(valentin.gosu)
(In reply to Valentin Gosu [:valentin] from comment #16)
> That sounds good. Apart from the raw data, could you also help us generate
> graphs similar to moz-experiments-viewer ?
> For bug 1381816 do we need to take the same manual approach to analyzing the
> data?
> 
> Thanks!

Actually, ilana has been crafting a modular solution for extra probes in all of these pref flip studies that would include graphics.

You're correct that you'll need to anticipate some manual processing until the feature has been added to experiment viewer.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Valentin, what additional probes are you interested in?
Flags: needinfo?(valentin.gosu)
I think these are all the ones we're interested in:

(In reply to Valentin Gosu [:valentin] from comment #12)
>  NETWORK_RACE_CACHE* <- should be 7 probes
>  HTTP_PAGE_COMPLETE_LOAD_V2
>  HTTP_PAGE_COMPLETE_LOAD_NET_V2
>  HTTP_PAGE_COMPLETE_LOAD_CACHED_V2
>  HTTP_SUB_COMPLETE_LOAD_V2
>  HTTP_SUB_COMPLETE_LOAD_CACHED_V2
>  HTTP_SUB_COMPLETE_LOAD_NET_V2
>  TOTAL_CONTENT_PAGE_LOAD_TIME
Flags: needinfo?(valentin.gosu)
https://gist.github.com/jtg567/179d65bdcb3b4ed60a6c0eb0cc7502b8

Valentine - there are interim results at the end of the notebook linked above. The current approach is to collapse all histograms across pings over clients, but this leads to more active clients being over-represented in the data. Please bear that caveat in mind when interpreting. The method of analysis is still evolving while we determine what's optimal for these data.
You need to log in before you can comment on or make changes to this bug.