Default-browser-agent with “1” in document_type space should be dropped in MessageScrubber
Categories
(Data Platform and Tools :: General, task)
Tracking
(Not tracked)
People
(Reporter: akomar, Assigned: wstuckey)
References
Details
(Whiteboard: [dataquality])
This came up in missing doctypes/versions section of Platform Health Check: https://mozilla.cloud.looker.com/dashboards/387?Submission+Date=90+day
Reporter | ||
Updated•3 years ago
|
Comment 1•3 years ago
|
||
This would involve a change in https://github.com/mozilla/gcp-ingestion/blob/main/ingestion-beam/src/main/java/com/mozilla/telemetry/decoder/MessageScrubber.java and adding an associated test.
We would throw UnwantedDataException
in the case that document_type = "1"
and document_namespace = "default-browser-agent"
. See https://github.com/mozilla-services/mozilla-pipeline-schemas for more context on these concepts.
Comment 2•3 years ago
|
||
I'm reassigning this to :wil as a good first bug to gain some familiarity with the platform.
Comment 3•3 years ago
|
||
Wil found that we already handle this case. See https://bugzilla.mozilla.org/show_bug.cgi?id=1626020 and this block in MessageScrubber. So these should already be getting filtered out and there's something we're missing.
Reporter | ||
Comment 4•3 years ago
|
||
We discard this by throwing AffectedByBugException
. There's a comment few lines above mentioning that these exceptions should appear in monitoring. This is consistent with the query that "fired" this alarm.
I think since this bug has been fixed in the client we can switch the exception to UnwantedDataException
. This will make it disappear from the monitoring dashboard.
Assignee | ||
Comment 5•3 years ago
|
||
Exception class has been switched to UnwantedDataException in https://github.com/mozilla/gcp-ingestion/pull/1998
Updated•2 years ago
|
Description
•