Note: There are a few cases of duplicates in user autocompletion which are being worked on.

Create a derived dataset for the unified-urlbar experiment

RESOLVED FIXED

Status

Cloud Services
Metrics: Product Metrics
P4
normal
RESOLVED FIXED
a year ago
a year ago

People

(Reporter: mak, Assigned: mak)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(Assignee)

Description

a year ago
the unified-urlbar experiment just finished on the Beta channel, and we'd like to be able to analyze the collected data.

Could you please make us a dataset limited to only users who ran the experiment?

We would like to query the following telemetry values:
- environment data
- UITelemetry (search, search-oneoff, click-builtin-item, environment, toolbars). This is part of the Simple Measurements.
- FX_URLBAR_SELECTED_RESULT_TYPE and SEARCH_COUNTS histograms
- wherever the experiment branch is stored (no idea)
(Assignee)

Comment 1

a year ago
if the data is too large, we'd be fine with retaining 15k random users per branch (the experiment had 3 branches).
Depends on: 1252553
Roberto, are you assuming that this needs to be a longitudinal dataset? I don't see that in the requirements here. We're clearly not going to get a strict schema out of UITelemetry in the short term.

Can you just teach mak how to create a subset of get_pings data for this experiment and save it to S3?
Flags: needinfo?(rvitillo)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #2)
> Roberto, are you assuming that this needs to be a longitudinal dataset? I
> don't see that in the requirements here. We're clearly not going to get a
> strict schema out of UITelemetry in the short term.

It depends what kind of questions Marco intends to answer with the experimental data. Creating a longitudinal Parquet dataset per experiment is something I would like to deal with at some point instead of writing individual ETL jobs for each experiment.

> Can you just teach mak how to create a subset of get_pings data for this
> experiment and save it to S3?

Certainly, this is what I proposed Marco in our e-mail thread before updating the Bug.
Flags: needinfo?(rvitillo)

Updated

a year ago
Component: Metrics: Pipeline → Metrics: Product Metrics

Updated

a year ago
Priority: -- → P4
(Assignee)

Comment 4

a year ago
Since I plan to do the analysis on spark by myself, for now I'm assigning the bug to myself
Assignee: nobody → mak77
(Assignee)

Comment 5

a year ago
first version is at
https://github.com/mak77/telemetry_analysis/blob/master/unified-urlbar.ipynb

This is pretty much restricted, so I could run it on a single node.
Roberto, could you please take a look at it and tell me if I'm doing something very dumb?
For the broader version, I will split it into extraction and analysis and store on S3, as you suggested. I'm not sure which values I should aim at for the extraction though, I was thinking to fetch from 20160112 to 20160212, and sample 10% of the data... is that too much? how much may it take using more clusters?
Flags: needinfo?(rvitillo)
(In reply to Marco Bonardo [::mak] from comment #5)
> first version is at
> https://github.com/mak77/telemetry_analysis/blob/master/unified-urlbar.ipynb
> 
> This is pretty much restricted, so I could run it on a single node.
> Roberto, could you please take a look at it and tell me if I'm doing
> something very dumb?

Your analysis is well written.

> For the broader version, I will split it into extraction and analysis and
> store on S3, as you suggested. I'm not sure which values I should aim at for
> the extraction though, I was thinking to fetch from 20160112 to 20160212,
> and sample 10% of the data... is that too much? how much may it take using
> more clusters?

10 % might be OK. You could either compute the confidence intervals for your results and/or keep increasing the percentage of pings considered until the results stabilize. Feel free to spawn a larger cluster once you have the final version of your analysis.
Flags: needinfo?(rvitillo)
(Assignee)

Comment 7

a year ago
https://github.com/mak77/telemetry_analysis/blob/master/unified-urlbar-store.ipynb
https://github.com/mak77/telemetry_analysis/blob/master/unified-urlbar-analysis.ipynb

I'm done here, thank you for the help.
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.