Closed Bug 1336617 Opened 7 years ago Closed 7 years ago

Investigate configuration-only solution to simple testpilot pipelines

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: harter, Assigned: harter)

References

Details

We'll be doing a lot more experimentation in 2017. 

Currently, analyzing testpilot data requires the analyst to filter and transform their experiment data using a scheduled ATMO job. It would be nice if this could be done without custom code and clusters.
Assignee: nobody → rharter
Is this addressed by Bug 1333206?
Flags: needinfo?(rharter)
We discussed this today. Documenting for posterity.

My goal is to be able to analyze experimental data from testpilot and testpilottest. Since many of the important testpilottest fields are experiment specific, we would need a new config for each experiment. It sounds like the deploy time for the solution described in Bug 1333206 would be prohibitive for this task.

I have an example implementation in this notebook[0]. The config structure makes it clear how we're mapping input fields to output columns.  

https://gist.github.com/harterrt/2a052f653c50df10920cfdb19c362438#file-cliqz-testpilot-pipeline-py-L79
Flags: needinfo?(rharter)
See Also: → 1340595
I refactored the code from the cliqz_pipeline and started a small helper library here:
https://github.com/harterrt/betl

I'll let this grow as needed. Closing this bug.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.