Closed Bug 1626706 Opened 4 years ago Closed 4 years ago

`telemetry.deletion_request` does not appear to contain `impression_id`

Categories

(Data Platform and Tools :: General, defect, P1)

defect
Points:
2

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: chutten, Assigned: ascholtz)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

With bug 1604312 we added scalars to the deletion-request ping in Firefox Desktop. BigQuery console does not appear to have a column for the included scalars: https://console.cloud.google.com/bigquery?project=moz-fx-data-shared-prod&folder=&organizationId=&p=moz-fx-data-shared-prod&d=telemetry&t=deletion_request&page=table

Schema change was merged in February via https://github.com/mozilla-services/mozilla-pipeline-schemas/pull/481

The schema change allowed a payload.scalars struct, but I would only expect it to appear if mozilla-schema-generator is pulling in probes and populating them in the generated schema. Perhaps we never added that support?

Scalar processing for the deletion-request ping is implemented here: https://github.com/mozilla/mozilla-schema-generator/pull/120.
I can merge this now.

Assignee: nobody → ascholtz
Points: --- → 2
Priority: -- → P1

The deletion_request schema got updated and, based on the probe info service, now has a field payload.processes.parent.scalars. deletion_request_impression_id to capture impression IDs.

However, the impression ID is sent in the following format: {"deletion.request.impression_id":"{86<some UUID>ed}"}. Since BigQuery schemas do not allow . in field names those are replaced by _. Because of that the sent deletion.request.impression_id does not match deletion_request_impression_id as defined in the schema. So data still gets written to additional_properties. I'm not sure if there is an elegant way to translate this field into a valid BigQuery schema field name? Or if the field name/payload format in deletion_request pings should be changed?

Another thing I noticed, impression_ids are wrapped in curly braces: "{86<some UUID>ed}" vs "86<some UUID>ed". This seems like a bug to me.

Flags: needinfo?(chutten)

(In reply to Anna Scholtz from comment #3)

However, the impression ID is sent in the following format: {"deletion.request.impression_id":"{86<some UUID>ed}"}. Since BigQuery schemas do not allow . in field names those are replaced by _. Because of that the sent deletion.request.impression_id does not match deletion_request_impression_id as defined in the schema. So data still gets written to additional_properties. I'm not sure if there is an elegant way to translate this field into a valid BigQuery schema field name? Or if the field name/payload format in deletion_request pings should be changed?

It looks like this format did get called out and discussed in https://github.com/mozilla-services/mozilla-pipeline-schemas/pull/481 We already have logic in the pipeline to normalize field names and I expect it would be relatively straightforward to apply that same normalization to these field names (so '.' would become '_'). I suppose we would need to apply that normalization in the schema generator in this case and I'm less certain what normalization happens there.

Another thing I noticed, impression_ids are wrapped in curly braces: "{86<some UUID>ed}" vs "86<some UUID>ed". This seems like a bug to me.

The curly braces are a documented variant UUID representation and impression IDs are consistently sent in this format. It's actually probably a good thing that they appear in this form here so that they exactly match with the values present in the tables we'll need to delete from.

I can confirm that the string values are as they should be. As for normalization, I hope that can still be applied on the pipeline side (if not, let me know what client changes are required)

Flags: needinfo?(chutten)

The schema changes have been deployed and impression_ids are now available in payload.scalars.parent.deletion_request_impression_id

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Blocks: 1598720
Component: Datasets: General → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: