schema error in `firefox_desktop.events_v1` for #/metrics/boolean/urlbar.pref_quicksuggest_data_collection
Categories
(Data Platform and Tools :: General, defect)
Tracking
(Not tracked)
People
(Reporter: Leli, Unassigned)
References
Details
(Whiteboard: [dataquality])
Attachments
(2 files)
seems to be a new error since the previous week there was 0 but this week we have 300k errors
string [urlbar.pref_quicksuggest_data_collection] does not match pattern ^[a-z_][a-z0-9_]{0,29}(\.[a-z_][a-z0-9_]{0,29})+$
Updated•2 years ago
|
Comment 1•2 years ago
|
||
Probably started after https://github.com/mozilla/gecko-dev/commit/3b311a8bd7b99983a13ab719cbc5b1d6f44d1d4a I guess?
We extended the limit for metric name length, but didn't update the schema it seems.
perry, can you take care of that? See also https://bugzilla.mozilla.org/show_bug.cgi?id=1844388#c5
Comment 2•2 years ago
|
||
Comment 3•2 years ago
|
||
Adding a bit more context here:
- bug 1844388 extended the allowed metric names from 30 to 70 characters in the glean_parser schema used to validate
metrics.yaml
files. dot_separated_short_id.1
is used to validatecategory.name
in the actually received ping data.- the prior pattern accounted for the category, a dot and the name
- however in the parser schema the category can be 40 characters, so allowing 61 in the pipeline was always wrong?
labeled_metric_id
used the more correct 71 characters (and only this year dropped the more strict pattern)- that pattern (
dotted_snake_case
) also enforces up to 29 characters without a dot, not more. - however that is only used for the category, metric names are validated against
short_id
, which requiressnake_case
and a maximum of 70 now (and 30 before).
So really our pipeline schema and the glean_parser schema disagree about what patterns and length is allowed. Perry's change will now allow any snake_case value for category.name
with an upper limit of 111 (40 characters category, a dot and 70 characters for the name).
IMO it's the right thing to drop the whole up to 29 characters before a dot needs to appear
part. We don't enforce it client side for names, the regex is hard to read and I still don't have a full understanding on why we went with it.
In the meantime bug 1849726 was filed, discussed and a patch was landed to shorten the metric name, which means we don't need to backfill it I guess.
I bet a similar issue will pop up in the future though (plus we will keep getting this "wrong" data for a while), so I argue relaxing the schema is still the right choice.
Tagging Anna for visibility, you're also tagged for review on the PR and I don't have merge permissions there.
Updated•2 years ago
|
Comment 4•2 years ago
|
||
merged
Description
•