Closed Bug 1264049 Opened 8 years ago Closed 8 years ago

Make Test Pilot data available in Re:dash

Categories

(Cloud Services :: Metrics: Product Metrics, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: clouserw, Assigned: rweiss)

References

Details

I'd like to use https://sql.telemetry.mozilla.org/ to analyze the Test Pilot data (bug 1255182 and bug 1255184).

Thanks!
Hey Will, My limited understanding is you can use spark to answer questions right now - what's the timeline for needing redash?
Flags: needinfo?(wclouser)
We're launching on May 10th and I'd like to make dashboards for the projects we launch and make sure we're recording stuff in the right shape for them.  So.... April 22nd?

I'm flexible, but I haven't done this before so "sooner the better, but don't stress out."  I also don't know how much work this is for you or what your backlog is like.  Let me know if April 22nd is unreasonable?
Flags: needinfo?(wclouser)
I know we discussed that April 22nd would be out of scope, is there a more reasonable timeline?

Regarding Spark: we'll need to confirm that Spark is going to be enough for our dashboard requirement for launch.
Flags: needinfo?(thuelbert)
Reassigning to me; I'm working on the Spark notebook KPI check right now.

If I can demonstrate that we can compute basic DAU and MAU within the notebook, we can check to see about getting this data available through re:dash then.
Assignee: nobody → rweiss
Notebook that creates csv of MAU and DAU for Test Pilot is available here: 
https://gist.github.com/rjweiss/fabee4d22b6d272c3758aeca75b9728a

We will need to schedule this notebook to run regularly and also verify that this data is available from within re:dash (and can be used to populate some simple dashboard widgets therein).  I will consult with :mreid offline about this.
Thank you!
Flags: needinfo?(thuelbert)
Spoke to mreid, he said he will be checking to see if it is easy to migrate the CSV I uploaded in that notebook to Presto.  Once in Presto, it will be accessible within re:dash (and I have created a placeholder dashboard in re:dash that is awaiting data).

I also need to schedule the notebook to run on a daily basis to push that data to the CSV bucket, so it's important that the migration from csv-s3 bucket to Presto is performed on a schedule.

:mreid, can you let me know what the status is on migrating to Presto?
Flags: needinfo?(mreid)
Priority: -- → P1
It's relatively easy to make CSV data queryable by Presto. There's a small quirk where the header line with field names actually appears as data, which can be fixed by 1268896 (and is easy to work around in queries).

I forked Rebecca's notebook from Comment 5 to make a few minor tweaks[1] and tested it out via Presto[2] and all seems well.

Once a finalized notebook is scheduled and the output location of the CSV data is set, I can easily update the table definition in presto and we should be good to go. Nothing further needs to happen on the presto/redash side upon scheduled CSV updates.

[1] https://gist.github.com/mreid-moz/dac9c5b67f01ea3734a207821b120668
[2] https://sql.telemetry.mozilla.org/queries/263
Flags: needinfo?(mreid)
I will take this back to Javaun to make sure that our computation of DAU and MAU for test pilot suits their needs. Additionally, we will likely need to break this out further to compute DAU for each individual test separately.
Blocks: 1270961
Component: Metrics: Pipeline → Metrics: Product Metrics
I scheduled a job with the version of notebook in comment 8 above using a.t.m.o's job scheduler.  Output below:

Your code has been uploaded to s3://telemetry-analysis-code-2/jobs/TxP DAU MAU v1/Telemetry - Test Pilot KPI Validity Check.ipynb.
Any output files found in relative to where the notebook will be execute will be published at s3://telemetry-private-analysis-2/TxP DAU MAU v1/data/. The output files will overwrite anything already in that location in S3.
The job will be run daily at 4:00 UTC.
The job will be allowed to run for a max of 120 minutes, after which it will be killed. 
Cron spec will be 0 4 * * *
I created another notebook, which constructs another csv for each of the Test Pilot tests' DAU and MAU.

This is available here: https://gist.github.com/rjweiss/1193b079c3bfaa7038c41ca4c2ceadff

This notebook is currently NOT scheduled as it is waiting for review.

:mreid, can you review the notebook for badness?  If you sign off, I will file another bug to create a new table in presto using the csv created by this notebook as well as schedule the job to continue uploading on a daily basis.
Flags: needinfo?(mreid)
Per IRC discussion with :rweiss, this dataset has sort of become obsolete - a new bug will be filed when the mechanics of the testpilot / testpilottest pings are finalized.
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(mreid)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.