Closed Bug 1225074 Opened 10 years ago Closed 9 years ago

Derived dataset for e10s experiments

Tracking

(e10s+)

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

e10s

---

People

(Reporter: rvitillo, Assigned: rvitillo)

References

Details

Roberto Agostino Vitillo (:rvitillo)

Assignee

Description

•

10 years ago

We need derived datasets for the current and future e10s experiments. To avoid biasing our analyses we have to use a representative set of clients participating in the experiment. As some clients might experience severe lag, we might either ignore their submissions or waste a considerable amount of resources in our analyses filtering for experiment submissions on a day well beyond the experiment's end date. Let's do this work just once when creating the derived stream. Furthermore, the derived stream should group all submissions by client and compute a representative measure for all metrics considered. Currently we randomly select a single session for a client which is not good enough for low signal-to-noise metrics like plugin crashes and slow script notice counts as we don't have enough statistical power to detect differences.

Roberto Agostino Vitillo (:rvitillo)

Assignee

Updated

•

10 years ago

Depends on: 1225076

Roberto Agostino Vitillo (:rvitillo)

Assignee

Updated

•

10 years ago

Depends on: 1223026

Roberto Agostino Vitillo (:rvitillo)

Assignee

Updated

•

10 years ago

Depends on: 1225080

Roberto Agostino Vitillo (:rvitillo)

Assignee

Updated

•

10 years ago

Depends on: 1225083

Roberto Agostino Vitillo (:rvitillo)

Assignee

Updated

•

10 years ago

Depends on: 1223045

Roberto Agostino Vitillo (:rvitillo)

Assignee

Updated

•

10 years ago

Blocks: e10s-measurement

Jim Mathies [:jimm]

Updated

•

10 years ago

tracking-e10s: --- → +

Roberto Agostino Vitillo (:rvitillo)

Assignee

Comment 1

•

10 years ago

The code for the derived stream lives at [1]. Rerunning the "all histogram comparison analysis" [2] on the derived stream, for the data collected from the 22/10 to the 17/11, took less than 10 minutes on a single machine (about 100K users). In comparison, the same analysis for the raw data collected from the 22/10 to the 27/10, took more than an hour with a 10 machine cluster (about 30K users). We should rerun all our current e10s experiment analyses on the derived dataset and check for changes. It should be easy re-use the code not only for future e10s experiments, but more generally for any experiment. [1] https://github.com/vitillo/telemetry-batch-view/blob/master/src/main/scala/streams/E10sExperiment.scala [2] http://nbviewer.ipython.org/github/vitillo/e10s_analyses/blob/master/aurora/e10s_all_histograms_experiment.ipynb

Roberto Agostino Vitillo (:rvitillo)

Assignee

Updated

•

10 years ago

Blocks: 1222894

Vladan Djeric (:vladan)

Updated

•

10 years ago

Blocks: 1229104

Roberto Agostino Vitillo (:rvitillo)

Assignee

Updated

•

9 years ago

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

BMO Automation

Updated

•

7 years ago

Product: Cloud Services → Cloud Services Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Derived dataset for e10s experiments

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect)

Tracking

(e10s+)

People

(Reporter: rvitillo, Assigned: rvitillo)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Updated

Updated

Updated

Updated

Updated

Comment 1

Updated

Updated

Updated

Updated