Request for specialized longitudinal dataset (one time job)

RESOLVED WONTFIX

Status

Cloud Services
Metrics: Pipeline
P1
normal
RESOLVED WONTFIX
a year ago
10 months ago

People

(Reporter: bmiroglio, Unassigned, Mentored)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

a year ago
Myself and dzeber have been using the longitudinal dataset for a modeling project. I'd like to work with someone in getting a longitudinal dataset across all profiles with the following features:

* release channel
* US geo_county and en-US locale
* profile created between 2016-01-01 and 2016-03-31

Profiles that meet the above consist of .8% of profiles in the current longitudinal dataset, implying the resulting data from this job likely wont be bigger than any traditional longitudinal dataset. We've seen positive results in our current model, however we haven't been able to segment fully in an effort to keep our data reasonably sized.
Mentor: rvitillo@mozilla.com
(Reporter)

Updated

a year ago
Points: --- → 2
Priority: -- → P1
(Reporter)

Updated

a year ago
Points: 2 → 1

Comment 1

10 months ago
is this still valid Ben?
Flags: needinfo?(bmiroglio)
(Reporter)

Comment 2

10 months ago
Yes this is still valid, however I'd like to make some adjustments. With our v1 results in and recent requests, it'd be nice for us to have, say, a script that allows these constraints to be passed by the user.

For example, we could run something like

spark-submit -- [...] \
             --channel release\
             --from 20160101\
             --to 20160630\
             --geo US\
             --locale en-US \
             --profile_creation_min 20160101
             --profile_creation_max 20160331

outputting a longitudinal dataset that meets these constraints.

This functionality (in addition to larger n) would allow us to more quickly iterate as new requests come in. I'm unsure of how feasible this type of job is since it would have to read 100% of the data--this is, however, the best case scenario.
Flags: needinfo?(bmiroglio)

Comment 3

10 months ago
Thanks Ben, is this something you are up for doing? I see the mentor field is filled out, but Mark Reid or Harter could help as well.
Flags: needinfo?(bmiroglio)
(Reporter)

Comment 4

10 months ago
In a meeting last week with mreid and rvitillo, it was decided that something like this (in the current state of things), isn't feasible since such a job would be very expensive.
Flags: needinfo?(bmiroglio) → needinfo?(mreid)
Status: NEW → RESOLVED
Last Resolved: 10 months ago
Resolution: --- → WONTFIX

Updated

10 months ago
Flags: needinfo?(mreid)
You need to log in before you can comment on or make changes to this bug.