Closed
Bug 1250941
Opened 9 years ago
Closed 9 years ago
Create a derived dataset for the unified-urlbar experiment
Categories
(Cloud Services :: Metrics: Product Metrics, defect, P4)
Cloud Services
Metrics: Product Metrics
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mak, Assigned: mak)
References
Details
the unified-urlbar experiment just finished on the Beta channel, and we'd like to be able to analyze the collected data.
Could you please make us a dataset limited to only users who ran the experiment?
We would like to query the following telemetry values:
- environment data
- UITelemetry (search, search-oneoff, click-builtin-item, environment, toolbars). This is part of the Simple Measurements.
- FX_URLBAR_SELECTED_RESULT_TYPE and SEARCH_COUNTS histograms
- wherever the experiment branch is stored (no idea)
Assignee | ||
Comment 1•9 years ago
|
||
if the data is too large, we'd be fine with retaining 15k random users per branch (the experiment had 3 branches).
Comment 2•9 years ago
|
||
Roberto, are you assuming that this needs to be a longitudinal dataset? I don't see that in the requirements here. We're clearly not going to get a strict schema out of UITelemetry in the short term.
Can you just teach mak how to create a subset of get_pings data for this experiment and save it to S3?
Updated•9 years ago
|
Flags: needinfo?(rvitillo)
Comment 3•9 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #2)
> Roberto, are you assuming that this needs to be a longitudinal dataset? I
> don't see that in the requirements here. We're clearly not going to get a
> strict schema out of UITelemetry in the short term.
It depends what kind of questions Marco intends to answer with the experimental data. Creating a longitudinal Parquet dataset per experiment is something I would like to deal with at some point instead of writing individual ETL jobs for each experiment.
> Can you just teach mak how to create a subset of get_pings data for this
> experiment and save it to S3?
Certainly, this is what I proposed Marco in our e-mail thread before updating the Bug.
Flags: needinfo?(rvitillo)
Updated•9 years ago
|
Component: Metrics: Pipeline → Metrics: Product Metrics
Updated•9 years ago
|
Priority: -- → P4
Assignee | ||
Comment 4•9 years ago
|
||
Since I plan to do the analysis on spark by myself, for now I'm assigning the bug to myself
Assignee: nobody → mak77
Assignee | ||
Comment 5•9 years ago
|
||
first version is at
https://github.com/mak77/telemetry_analysis/blob/master/unified-urlbar.ipynb
This is pretty much restricted, so I could run it on a single node.
Roberto, could you please take a look at it and tell me if I'm doing something very dumb?
For the broader version, I will split it into extraction and analysis and store on S3, as you suggested. I'm not sure which values I should aim at for the extraction though, I was thinking to fetch from 20160112 to 20160212, and sample 10% of the data... is that too much? how much may it take using more clusters?
Flags: needinfo?(rvitillo)
Comment 6•9 years ago
|
||
(In reply to Marco Bonardo [::mak] from comment #5)
> first version is at
> https://github.com/mak77/telemetry_analysis/blob/master/unified-urlbar.ipynb
>
> This is pretty much restricted, so I could run it on a single node.
> Roberto, could you please take a look at it and tell me if I'm doing
> something very dumb?
Your analysis is well written.
> For the broader version, I will split it into extraction and analysis and
> store on S3, as you suggested. I'm not sure which values I should aim at for
> the extraction though, I was thinking to fetch from 20160112 to 20160212,
> and sample 10% of the data... is that too much? how much may it take using
> more clusters?
10 % might be OK. You could either compute the confidence intervals for your results and/or keep increasing the percentage of pings considered until the results stabilize. Feel free to spawn a larger cluster once you have the final version of your analysis.
Flags: needinfo?(rvitillo)
Assignee | ||
Comment 7•9 years ago
|
||
https://github.com/mak77/telemetry_analysis/blob/master/unified-urlbar-store.ipynb
https://github.com/mak77/telemetry_analysis/blob/master/unified-urlbar-analysis.ipynb
I'm done here, thank you for the help.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•