VPN: Exclude data collected with telemetry_sdk_version < 0.17.0
Categories
(Data Platform and Tools :: General, enhancement)
Tracking
(Not tracked)
People
(Reporter: mdroettboom, Assigned: relud)
Details
Attachments
(1 file)
Prior to glean.js 0.17.0, it had no working persistent storage, therefore client_id is relative to the session, not the client.
Data collected prior to this time is effectively useless (or at least not comparable to data collected after this).
Can we create a view that would ignore all data collected with the older Glean SDK? Eventually we should do a mass deletion, but for some period of time we would expect to still collect data from both before and after this fix.
| Assignee | ||
Comment 1•4 years ago
|
||
we can mark this data unwanted and throw it out in the pipeline.
Comment 2•4 years ago
|
||
| Assignee | ||
Comment 3•4 years ago
•
|
||
this data is being thrown out as of mid-day on 2021-11-30, I need to have ops will need to run this sql to remove the old data:
DELETE FROM
`moz-fx-data-shared-prod`.mozillavpn_stable.main_v1
WHERE
REGEXP_CONTAINS(client_info.telemetry_sdk_build, "^0[.]([0-9]|1[0-6])[.].*$")
AND DATE(submission_timestamp) < DATE "2021-12-01"
Comment 4•4 years ago
|
||
Mike, can you confirm the "Eventually" (from the bug description regarding deletion) is now, at least WRT to VPN data?
There's a rigorous process we have for deleting source data from stable tables that involves additional communication but I think your approval is likely sufficient in this case.
| Reporter | ||
Comment 5•4 years ago
|
||
Yes. Eventually is "now". I can confirm we will never want the < 0.17 data for analysis (and we already used it for basic validation).
Comment 6•4 years ago
|
||
BQ job ID: moz-fx-data-shared-prod:US.bquxjob_15472f20_17d8160b58d
This statement removed 7,515,346 rows from moz-fx-data-shared-prod:mozillavpn_stable.main_v1.
We have 7 days to recover this data if for some reason the delete statement was overbroad.
Description
•