1250941 - Create a derived dataset for the unified-urlbar experiment

Assignee

Description

•

9 years ago

the unified-urlbar experiment just finished on the Beta channel, and we'd like to be able to analyze the collected data. Could you please make us a dataset limited to only users who ran the experiment? We would like to query the following telemetry values: - environment data - UITelemetry (search, search-oneoff, click-builtin-item, environment, toolbars). This is part of the Simple Measurements. - FX_URLBAR_SELECTED_RESULT_TYPE and SEARCH_COUNTS histograms - wherever the experiment branch is stored (no idea)

Marco Bonardo [:mak]

Assignee

Comment 1

•

9 years ago

if the data is too large, we'd be fine with retaining 15k random users per branch (the experiment had 3 branches).

Roberto Agostino Vitillo (:rvitillo)

Updated

•

9 years ago

Depends on: 1252553

Benjamin Smedberg

Comment 2

•

9 years ago

Roberto, are you assuming that this needs to be a longitudinal dataset? I don't see that in the requirements here. We're clearly not going to get a strict schema out of UITelemetry in the short term. Can you just teach mak how to create a subset of get_pings data for this experiment and save it to S3?

Benjamin Smedberg

Updated

•

9 years ago

Flags: needinfo?(rvitillo)

Roberto Agostino Vitillo (:rvitillo)

Comment 3

•

9 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #2) > Roberto, are you assuming that this needs to be a longitudinal dataset? I > don't see that in the requirements here. We're clearly not going to get a > strict schema out of UITelemetry in the short term. It depends what kind of questions Marco intends to answer with the experimental data. Creating a longitudinal Parquet dataset per experiment is something I would like to deal with at some point instead of writing individual ETL jobs for each experiment. > Can you just teach mak how to create a subset of get_pings data for this > experiment and save it to S3? Certainly, this is what I proposed Marco in our e-mail thread before updating the Bug.

Flags: needinfo?(rvitillo)

Thomas Huelbert

Updated

•

9 years ago

Component: Metrics: Pipeline → Metrics: Product Metrics

Thomas Huelbert

Updated

•

9 years ago

Priority: -- → P4

Marco Bonardo [:mak]

Assignee

Comment 4

•

9 years ago

Since I plan to do the analysis on spark by myself, for now I'm assigning the bug to myself

Assignee: nobody → mak77

Marco Bonardo [:mak]

Assignee

Comment 5

•

9 years ago

first version is at https://github.com/mak77/telemetry_analysis/blob/master/unified-urlbar.ipynb This is pretty much restricted, so I could run it on a single node. Roberto, could you please take a look at it and tell me if I'm doing something very dumb? For the broader version, I will split it into extraction and analysis and store on S3, as you suggested. I'm not sure which values I should aim at for the extraction though, I was thinking to fetch from 20160112 to 20160212, and sample 10% of the data... is that too much? how much may it take using more clusters?

Flags: needinfo?(rvitillo)

Roberto Agostino Vitillo (:rvitillo)

Comment 6

•

9 years ago

(In reply to Marco Bonardo [::mak] from comment #5) > first version is at > https://github.com/mak77/telemetry_analysis/blob/master/unified-urlbar.ipynb > > This is pretty much restricted, so I could run it on a single node. > Roberto, could you please take a look at it and tell me if I'm doing > something very dumb? Your analysis is well written. > For the broader version, I will split it into extraction and analysis and > store on S3, as you suggested. I'm not sure which values I should aim at for > the extraction though, I was thinking to fetch from 20160112 to 20160212, > and sample 10% of the data... is that too much? how much may it take using > more clusters? 10 % might be OK. You could either compute the confidence intervals for your results and/or keep increasing the percentage of pings considered until the results stabilize. Feel free to spawn a larger cluster once you have the final version of your analysis.

Flags: needinfo?(rvitillo)

Marco Bonardo [:mak]

Assignee

Comment 7

•

9 years ago

https://github.com/mak77/telemetry_analysis/blob/master/unified-urlbar-store.ipynb https://github.com/mak77/telemetry_analysis/blob/master/unified-urlbar-analysis.ipynb I'm done here, thank you for the help.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Bugzilla

Create a derived dataset for the unified-urlbar experiment

Categories

(Cloud Services :: Metrics: Product Metrics, defect, P4)

Tracking

(Not tracked)

People

(Reporter: mak, Assigned: mak)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Comment 2

Updated

Comment 3

Updated

Updated

Comment 4

Comment 5

Comment 6

Comment 7