Closed
Bug 1317775
Opened 8 years ago
Closed 7 years ago
Request for specialized longitudinal dataset (one time job)
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: bmiroglio, Unassigned, Mentored)
Details
Myself and dzeber have been using the longitudinal dataset for a modeling project. I'd like to work with someone in getting a longitudinal dataset across all profiles with the following features: * release channel * US geo_county and en-US locale * profile created between 2016-01-01 and 2016-03-31 Profiles that meet the above consist of .8% of profiles in the current longitudinal dataset, implying the resulting data from this job likely wont be bigger than any traditional longitudinal dataset. We've seen positive results in our current model, however we haven't been able to segment fully in an effort to keep our data reasonably sized.
Updated•7 years ago
|
Mentor: rvitillo
Reporter | ||
Updated•7 years ago
|
Points: --- → 2
Priority: -- → P1
Reporter | ||
Updated•7 years ago
|
Points: 2 → 1
Reporter | ||
Comment 2•7 years ago
|
||
Yes this is still valid, however I'd like to make some adjustments. With our v1 results in and recent requests, it'd be nice for us to have, say, a script that allows these constraints to be passed by the user. For example, we could run something like spark-submit -- [...] \ --channel release\ --from 20160101\ --to 20160630\ --geo US\ --locale en-US \ --profile_creation_min 20160101 --profile_creation_max 20160331 outputting a longitudinal dataset that meets these constraints. This functionality (in addition to larger n) would allow us to more quickly iterate as new requests come in. I'm unsure of how feasible this type of job is since it would have to read 100% of the data--this is, however, the best case scenario.
Flags: needinfo?(bmiroglio)
Comment 3•7 years ago
|
||
Thanks Ben, is this something you are up for doing? I see the mentor field is filled out, but Mark Reid or Harter could help as well.
Flags: needinfo?(bmiroglio)
Reporter | ||
Comment 4•7 years ago
|
||
In a meeting last week with mreid and rvitillo, it was decided that something like this (in the current state of things), isn't feasible since such a job would be very expensive.
Flags: needinfo?(bmiroglio) → needinfo?(mreid)
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
Updated•7 years ago
|
Flags: needinfo?(mreid)
Updated•6 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•