Derived dataset for e10s experiments

RESOLVED FIXED

Status

Cloud Services
Metrics: Pipeline
RESOLVED FIXED
2 years ago
a year ago

People

(Reporter: rvitillo, Assigned: rvitillo)

Tracking

unspecified
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(e10s+)

Details

We need derived datasets for the current and future e10s experiments. 

To avoid biasing our analyses we have to use a representative set of clients participating in the experiment. As some clients might experience severe lag, we might either ignore their submissions or waste a considerable amount of resources in our analyses filtering for experiment submissions on a day well beyond the experiment's end date. Let's do this work just once when creating the derived stream.

Furthermore, the derived stream should group all submissions by client and compute a representative measure for all metrics considered. Currently we randomly select a single session for a client which is not good enough for low signal-to-noise metrics like plugin crashes and slow script notice counts as we don't have enough statistical power to detect differences.
(Assignee)

Updated

2 years ago
Depends on: 1225076
(Assignee)

Updated

2 years ago
Depends on: 1223026
(Assignee)

Updated

2 years ago
Depends on: 1225080
(Assignee)

Updated

2 years ago
Depends on: 1225083
(Assignee)

Updated

2 years ago
Depends on: 1223045
(Assignee)

Updated

2 years ago
Blocks: 1222849

Updated

2 years ago
tracking-e10s: --- → +
(Assignee)

Comment 1

2 years ago
The code for the derived stream lives at [1]. Rerunning the "all histogram comparison analysis" [2] on the derived stream, for the data collected from the 22/10 to the 17/11, took less than 10 minutes on a single machine (about 100K users).

In comparison, the same analysis for the raw data collected from the 22/10 to the 27/10, took more than an hour with a 10 machine cluster (about 30K users).

We should rerun all our current e10s experiment analyses on the derived dataset and check for changes.

It should be easy re-use the code not only for future e10s experiments, but more generally for any experiment.

[1] https://github.com/vitillo/telemetry-batch-view/blob/master/src/main/scala/streams/E10sExperiment.scala
[2] http://nbviewer.ipython.org/github/vitillo/e10s_analyses/blob/master/aurora/e10s_all_histograms_experiment.ipynb
(Assignee)

Updated

2 years ago
Blocks: 1222894
Blocks: 1229104
(Assignee)

Updated

a year ago
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.