Closed
Bug 1384861
Opened 7 years ago
Closed 7 years ago
Add direct-to-parquet for new "update" ping
Categories
(Data Platform and Tools :: General, enhancement)
Data Platform and Tools
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Dexter, Assigned: Dexter)
References
Details
(Whiteboard: [measurement:client])
Attachments
(1 file)
Once bug 1120372 lands, we need to make the "update" ping data available for analysis to re:dash and other data tools.
Assignee | ||
Comment 1•7 years ago
|
||
Before unblock this item, we need to understand what's the minimum set of fields that we'll need to save in parquet from the environment section of the ping. Please note that although it's possible to change the schema and backfill the data in the dataset (see 1360256 comment 20), we won't be able to perform schema evolution on this (not until Bug 1352521 is fixed). Here's the minimum set I would go for: build.applicationName build.architecture build.version build.buildId build.vendor build.hotfixVersion settings.isDefaultBrowser settings.defaultSearchEngine settings.defaultSearchEngineData.* settings.telemetryEnabled settings.locale settings.attribution.* settings.update.* profile.creationDate partner.* os.name os.version os.locale @Ben & @Benjamin, can you think of any other valuable field in the environment section [1] that should make it to the parquet table (in addition to the other ping data outside of the environment)? [1] - https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/data/environment.html
Flags: needinfo?(bhearsum)
Flags: needinfo?(benjamin)
Comment 2•7 years ago
|
||
Product, channel, and the version of the update that's just been applied should be all I need. It looks like the channel is in settings.update.channel. If build.* is information about the new build (not the currently running one), I think I'm good.
Flags: needinfo?(bhearsum)
Comment 3•7 years ago
|
||
build.* is about the existing build, not the new build. The data bout the new build is in the payload section described at https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/data/update-ping.html payload: { reason: <string>, // "ready" targetChannel: <string>, // "nightly" targetVersion: <string>, // "56.01a" targetBuildId: <string>, // "20080811053724" } I have no particular opinion about what needs to be in this dataset, given that I won't be a primary user.
Flags: needinfo?(benjamin)
Assignee | ||
Comment 4•7 years ago
|
||
Assignee: nobody → alessio.placitelli
Status: NEW → ASSIGNED
Assignee | ||
Comment 5•7 years ago
|
||
With both the parquet and the update ping schema landed [1], what else do I need to enable direct to parquet here? Any other PR I should file against some repo? [1] - https://github.com/mozilla-services/mozilla-pipeline-schemas/blob/dev/templates/telemetry/update/update.4.parquetmr.txt
Flags: needinfo?(whd)
Comment 6•7 years ago
|
||
No other PRs are currently needed. I'm going to add a monitor with a 1% ingestion error threshold that emails minimally myself and :Dexter. At some point including a monitor config will become a requirement when adding a new ping type to the schemas repo. This will be packaged and deployed some time next week (Monday is a US holiday). As there are substantial unrelated changes in this sprint it may take longer than usual before this is deployed.
Flags: needinfo?(whd)
Assignee | ||
Comment 7•7 years ago
|
||
(In reply to Wesley Dawson [:whd] from comment #6) > No other PRs are currently needed. I'm going to add a monitor with a 1% > ingestion error threshold that emails minimally myself and :Dexter. At some > point including a monitor config will become a requirement when adding a new > ping type to the schemas repo. > > This will be packaged and deployed some time next week (Monday is a US > holiday). As there are substantial unrelated changes in this sprint it may > take longer than usual before this is deployed. Awesome, thanks!
Comment 8•7 years ago
|
||
This data should now be available in the presto data source as "telemetry_update_parquet". The only schema errors I'm seeing in production are that profile.creationDate is sometimes null, not required by the json schema, and required in parquet.
Assignee | ||
Comment 9•7 years ago
|
||
(In reply to Wesley Dawson [:whd] from comment #8) > This data should now be available in the presto data source as > "telemetry_update_parquet". The only schema errors I'm seeing in production > are that profile.creationDate is sometimes null, not required by the json > schema, and required in parquet. Thanks Wesley! Do I need to tweak the parquet schema to make that field optional and make the error go away?
Comment 10•7 years ago
|
||
(In reply to Alessio Placitelli [:Dexter] from comment #9) > > Thanks Wesley! Do I need to tweak the parquet schema to make that field > optional and make the error go away? If you want the data where creationDate is null to be available in parquet, then yes, you should make it optional. Otherwise you'll only be able to access it via ATMO and the dataset APIs.
Assignee | ||
Comment 11•7 years ago
|
||
(In reply to Wesley Dawson [:whd] from comment #10) > (In reply to Alessio Placitelli [:Dexter] from comment #9) > > > > Thanks Wesley! Do I need to tweak the parquet schema to make that field > > optional and make the error go away? > > If you want the data where creationDate is null to be available in parquet, > then yes, you should make it optional. Otherwise you'll only be able to > access it via ATMO and the dataset APIs. Thanks. Created this PR to address the problem: https://github.com/mozilla-services/mozilla-pipeline-schemas/pull/82
Assignee | ||
Comment 12•7 years ago
|
||
Closing this as the parquet was created.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Comment 13•7 years ago
|
||
I've done an out-of-band deploy for https://github.com/mozilla-services/mozilla-pipeline-schemas/pull/82 in case that improves the situation for bug #1400921.
Updated•2 years ago
|
Component: Datasets: General → General
You need to log in
before you can comment on or make changes to this bug.
Description
•