We'll be doing a lot more experimentation in 2017. Currently, analyzing testpilot data requires the analyst to filter and transform their experiment data using a scheduled ATMO job. It would be nice if this could be done without custom code and clusters.
Is this addressed by Bug 1333206?
We discussed this today. Documenting for posterity. My goal is to be able to analyze experimental data from testpilot and testpilottest. Since many of the important testpilottest fields are experiment specific, we would need a new config for each experiment. It sounds like the deploy time for the solution described in Bug 1333206 would be prohibitive for this task. I have an example implementation in this notebook. The config structure makes it clear how we're mapping input fields to output columns. https://gist.github.com/harterrt/2a052f653c50df10920cfdb19c362438#file-cliqz-testpilot-pipeline-py-L79
I refactored the code from the cliqz_pipeline and started a small helper library here: https://github.com/harterrt/betl I'll let this grow as needed. Closing this bug.