Closed
Bug 1311796
Opened 8 years ago
Closed 8 years ago
Create TxP Pings Dataset
Categories
(Data Platform and Tools :: General, defect, P2)
Data Platform and Tools
General
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: bugzilla, Assigned: bugzilla)
References
Details
- Investigate feasibility of creating a dataset that would be sufficient to run all the TxP dashboard queries
- Create said dataset if possible
Updated•8 years ago
|
Creating a dataset for 'testpilot' type pings is fairly straightforward using the ping format documented here: https://github.com/mozilla/testpilot/blob/master/docs/metrics/telemetry.md#testpilot-summary-ping
Creating a dataset for 'testpilottest' pings is a bit hairier since every test pilot test will have their own format and specifying the schema for each new test + format change would be a maintenance headache (envelope format documented here: https://github.com/mozilla/testpilot/blob/master/docs/metrics/telemetry.md#per-experiment-testpilottest-ping). I'd like to experiment with generating a schema programmatically (group the testpilottest pings by test, read the distinct fields in all the 'payload/payload' objects and add those to the standard fields we'll want in the dataset like client_id, submission_date, etc.)
A couple concerns here would be:
- Schema changes (might be solved by adding a version field to the 'testpilottest' envelope?)
- Nested objects within the inner payload (doesn't seem to be an issue yet with the testpilottest formats I've seen, so perhaps this is a concern we can defer until it presents an issue.)
- Types: the python version of createDataFrame will infer types for columns, but it's looking like the spark version will take a little more work.
I do think if this works this pattern will be useful in the future for other ping types as well in the future.
Any objections or other concerns before I dive into implementation?
Comment 2•8 years ago
|
||
As discussed on Monday you should make sure that this is still worth doing it considering that we are adding a Parquet sink for Hindsight.
Given Trink expects the sink to be ready on Monday, so I'm going to hold off on this and see if this would be doable with what he's building.
Updated•8 years ago
|
Component: Metrics: Pipeline → Datasets: General
Product: Cloud Services → Data Platform and Tools
Ancient bug -- isn't worth doing at this point given the txp team's reduced usage of telemetry
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
Updated•3 years ago
|
Component: Datasets: General → General
You need to log in
before you can comment on or make changes to this bug.
Description
•