Exclude pings with "automation" tag from stable tables
Categories
(Data Platform and Tools :: General, enhancement, P1)
Tracking
(Not tracked)
People
(Reporter: klukas, Assigned: klukas)
References
Details
Attachments
(1 file)
The proposal for supporting X-Source-Tags specifies:
Pings that are tagged with “automation” (regardless of the other present tags) will not make it to the stable tables and will only remain in the live tables for 30 days.
We need to update the copy_deduplicate query to exclude pings that contain "automation" in the metadata.header.x_source_tags
field.
This will also require changes to some monitoring queries where we compare unique document_id counts between decoded, live, and stable tables. We'll likely need to filter on x_source_tags
for all of those stages.
Assignee | ||
Comment 1•4 years ago
|
||
Also, I realize that I implemented the pipeline support for x_source_tags
slightly differently from what was spec'd in the final proposal; I don't think I had previously seen the final "pipeline changes" section with that specification.
The metadata.header.x_source_tags
field is a string rather than an ARRAY<STRING>
as specified in the proposal. I do believe that's the most "correct" thing to do as headers are sent as strings and we so far have maintained all metadata.header
fields in their raw format as sent by the client.
I am, however, planning to add a metadata.header.parsed_source_tags
field at the view level that presents this data as an array for better ease of use. That field won't be available, however, when querying live tables.
Comment 2•4 years ago
|
||
(In reply to Jeff Klukas [:klukas] (UTC-4) from comment #1)
I am, however, planning to add a
metadata.header.parsed_source_tags
field at the view level that presents this data as an array for better ease of use. That field won't be available, however, when querying live tables.
That sounds good enough for me! Thanks for doing this!
Comment 3•4 years ago
|
||
Assignee | ||
Comment 4•4 years ago
|
||
PR for modifying user-facing views is https://github.com/mozilla/bigquery-etl/pull/1214
Assignee | ||
Updated•4 years ago
|
Updated•2 years ago
|
Description
•