`telemetry.deletion_request` does not appear to contain `impression_id`
Categories
(Data Platform and Tools :: General, defect, P1)
Tracking
(Not tracked)
People
(Reporter: chutten, Assigned: ascholtz)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
With bug 1604312 we added scalars to the deletion-request
ping in Firefox Desktop. BigQuery console does not appear to have a column for the included scalars: https://console.cloud.google.com/bigquery?project=moz-fx-data-shared-prod&folder=&organizationId=&p=moz-fx-data-shared-prod&d=telemetry&t=deletion_request&page=table
Schema change was merged in February via https://github.com/mozilla-services/mozilla-pipeline-schemas/pull/481
Comment 1•4 years ago
|
||
The schema change allowed a payload.scalars
struct, but I would only expect it to appear if mozilla-schema-generator is pulling in probes and populating them in the generated schema. Perhaps we never added that support?
Assignee | ||
Comment 2•4 years ago
|
||
Scalar processing for the deletion-request ping is implemented here: https://github.com/mozilla/mozilla-schema-generator/pull/120.
I can merge this now.
Updated•4 years ago
|
Assignee | ||
Comment 3•4 years ago
|
||
The deletion_request
schema got updated and, based on the probe info service, now has a field payload.processes.parent.scalars. deletion_request_impression_id
to capture impression IDs.
However, the impression ID is sent in the following format: {"deletion.request.impression_id":"{86<some UUID>ed}"}
. Since BigQuery schemas do not allow .
in field names those are replaced by _
. Because of that the sent deletion.request.impression_id
does not match deletion_request_impression_id
as defined in the schema. So data still gets written to additional_properties
. I'm not sure if there is an elegant way to translate this field into a valid BigQuery schema field name? Or if the field name/payload format in deletion_request pings should be changed?
Another thing I noticed, impression_id
s are wrapped in curly braces: "{86<some UUID>ed}"
vs "86<some UUID>ed"
. This seems like a bug to me.
Comment 4•4 years ago
|
||
(In reply to Anna Scholtz from comment #3)
However, the impression ID is sent in the following format:
{"deletion.request.impression_id":"{86<some UUID>ed}"}
. Since BigQuery schemas do not allow.
in field names those are replaced by_
. Because of that the sentdeletion.request.impression_id
does not matchdeletion_request_impression_id
as defined in the schema. So data still gets written toadditional_properties
. I'm not sure if there is an elegant way to translate this field into a valid BigQuery schema field name? Or if the field name/payload format in deletion_request pings should be changed?
It looks like this format did get called out and discussed in https://github.com/mozilla-services/mozilla-pipeline-schemas/pull/481 We already have logic in the pipeline to normalize field names and I expect it would be relatively straightforward to apply that same normalization to these field names (so '.' would become '_'). I suppose we would need to apply that normalization in the schema generator in this case and I'm less certain what normalization happens there.
Another thing I noticed,
impression_id
s are wrapped in curly braces:"{86<some UUID>ed}"
vs"86<some UUID>ed"
. This seems like a bug to me.
The curly braces are a documented variant UUID representation and impression IDs are consistently sent in this format. It's actually probably a good thing that they appear in this form here so that they exactly match with the values present in the tables we'll need to delete from.
Reporter | ||
Comment 5•4 years ago
|
||
I can confirm that the string values are as they should be. As for normalization, I hope that can still be applied on the pipeline side (if not, let me know what client changes are required)
Comment 6•4 years ago
|
||
Assignee | ||
Comment 7•4 years ago
|
||
The schema changes have been deployed and impression_ids are now available in payload.scalars.parent.deletion_request_impression_id
Updated•2 years ago
|
Description
•