Support pipeline metadata fields in JSON schemas
Categories
(Data Platform and Tools :: General, enhancement, P1)
Tracking
(Not tracked)
People
(Reporter: klukas, Assigned: klukas)
References
Details
Attachments
(1 file)
In order to support AET for Glean (see https://bugzilla.mozilla.org/show_bug.cgi?id=1634468), we will need to be able to transmit mappings of encrypted field names to their decrypted counterparts to the pipeline.
We have discussed in the past that it would be desirable to encode metadata such as retention policies in directly in schemas in the mozilla-pipeline-schemas repo.
We need to follow roughly these steps:
- Write and distribute a proposal for pipeline metadata in JSON schemas, including naming conventions and concrete plans for retention metadata and decrypted field name metadata
- Update mozilla-schema-generator to inject decrypted field name mappings into glean schemas based on scraped probe information
- Update gcp-ingestion to recognize field name mappings when loading JSON schemas
- Update gcp-ingestion's DecryptAetIdentifiers class to use the field mappings
Assignee | ||
Updated•5 years ago
|
Comment 1•5 years ago
|
||
Assignee | ||
Comment 2•5 years ago
|
||
The proposal is now created: https://docs.google.com/document/d/1oNi-eX_IPIZa8C0h5GZnWTM0WFivZ1hPMgMMX3c3HHs/edit#
I'm currently seeking feedback on an early draft before circulating more widely.
Assignee | ||
Comment 3•5 years ago
|
||
Pipeline metadata fields now exist, including decryption field name mappings for AET pings. So we are unblocked in terms of updating the pipeline to use these mappings.
Assignee | ||
Comment 4•5 years ago
|
||
The pipeline is now using the jwe_field_mappings
metadata to determine location of ecosystem_anon_id
in source pings and where to determine where to put the decrypted value.
The only missing piece here is mozilla-schema-generator support for detecting glean jwe probes, which may be difficult until we have a glean application that has actually declared a JweMetric. I'm closing this as fixed for now and will split out the schema generation work to a separate bug.
Description
•