Closed Bug 1657360 Opened 4 years ago Closed 4 years ago

Exclude pings with "automation" tag from stable tables

Categories

(Data Platform and Tools :: General, enhancement, P1)

enhancement
Points:
2

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: klukas, Assigned: klukas)

References

Details

Attachments

(1 file)

The proposal for supporting X-Source-Tags specifies:

Pings that are tagged with “automation” (regardless of the other present tags) will not make it to the stable tables and will only remain in the live tables for 30 days.

We need to update the copy_deduplicate query to exclude pings that contain "automation" in the metadata.header.x_source_tags field.

This will also require changes to some monitoring queries where we compare unique document_id counts between decoded, live, and stable tables. We'll likely need to filter on x_source_tags for all of those stages.

Also, I realize that I implemented the pipeline support for x_source_tags slightly differently from what was spec'd in the final proposal; I don't think I had previously seen the final "pipeline changes" section with that specification.

The metadata.header.x_source_tags field is a string rather than an ARRAY<STRING> as specified in the proposal. I do believe that's the most "correct" thing to do as headers are sent as strings and we so far have maintained all metadata.header fields in their raw format as sent by the client.

I am, however, planning to add a metadata.header.parsed_source_tags field at the view level that presents this data as an array for better ease of use. That field won't be available, however, when querying live tables.

(In reply to Jeff Klukas [:klukas] (UTC-4) from comment #1)

I am, however, planning to add a metadata.header.parsed_source_tags field at the view level that presents this data as an array for better ease of use. That field won't be available, however, when querying live tables.

That sounds good enough for me! Thanks for doing this!

PR for modifying user-facing views is https://github.com/mozilla/bigquery-etl/pull/1214

Assignee: nobody → jklukas
Status: NEW → RESOLVED
Points: --- → 2
Closed: 4 years ago
Priority: -- → P1
Resolution: --- → FIXED
Component: Pipeline Ingestion → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: