Closed Bug 1347609 Opened 8 years ago Closed 8 years ago

Update core ping parquet schema

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, enhancement, P1)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: frank, Assigned: whd)

References

Details

Attachments

(1 file)

This should probably live in a repo somewhere. Other option is to convert to the JSON schema and use the new schema service. Bug 1036764 added a new field.
whd, how do we handle schema updates? Any specific process you'd like me to follow?
Points: --- → 1
Flags: needinfo?(whd)
Priority: -- → P1
For now, put the updated schema in this bug and I'll update the configs. If historical data is required, I will need to look into the best way of doing backfill (possibly a hindsight report), otherwise it will just be data from whenever the deploy happens onward. We should have a better story for this process that involves the schema service sometime in q2.
Flags: needinfo?(whd)
Quasi-related, I updated the core schema to add the UTF8 annotation for core pings. :robotblake and I confirmed that this retroactively applies the annotation to older parquet files so telemetry_core_parquet is now accessible from re:dash (but without the new field). Also it appears adding a field does not require a version bump, so I won't need to do any backfill with this schema change.
Attached file new_core_config
Assignee: nobody → fbertsch
Handing this off to whd to deploy.
Assignee: fbertsch → whd
While we're doing this, can we also backfill the data to the beginning of 2017? The earliest submission_date right now is 20170126, so that is 25 days to backfill. I'd like to be able to use this for all 2017 Mobile KPIs.
Flags: needinfo?(whd)
(In reply to Frank Bertsch [:frank] from comment #6) > While we're doing this, can we also backfill the data to the beginning of > 2017? The earliest submission_date right now is 20170126, so that is 25 days > to backfill. I'd like to be able to use this for all 2017 Mobile KPIs. Yes.
Flags: needinfo?(whd)
I deployed this but it appears flashUsage is actually an integer: https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/data/core-ping.html. I'm updating the schema accordingly.
After fixing the schema and redeploying things worked as expected. I've also backfilled this data to the beginning of the year, and ran a few queries from re:dash like select * from telemetry_core_parquet WHERE v = 8 and submission = '20170324' and flash_usage is not NULL LIMIT 1 select * from telemetry_core_parquet WHERE submission = '20170101' LIMIT 1 which worked.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: