Closed Bug 1849615 Opened 2 years ago Closed 2 years ago

schema error in `firefox_desktop.events_v1` for #/metrics/boolean/urlbar.pref_quicksuggest_data_collection

Categories

(Data Platform and Tools :: General, defect)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Leli, Unassigned)

References

Details

(Whiteboard: [dataquality])

Attachments

(2 files)

Attached image image.png

seems to be a new error since the previous week there was 0 but this week we have 300k errors

string [urlbar.pref_quicksuggest_data_collection] does not match pattern ^[a-z_][a-z0-9_]{0,29}(\.[a-z_][a-z0-9_]{0,29})+$

Probably started after https://github.com/mozilla/gecko-dev/commit/3b311a8bd7b99983a13ab719cbc5b1d6f44d1d4a I guess?
We extended the limit for metric name length, but didn't update the schema it seems.

perry, can you take care of that? See also https://bugzilla.mozilla.org/show_bug.cgi?id=1844388#c5

Flags: needinfo?(pmcmanis)
See Also: → 1844388

Adding a bit more context here:

  • bug 1844388 extended the allowed metric names from 30 to 70 characters in the glean_parser schema used to validate metrics.yaml files.
  • dot_separated_short_id.1 is used to validate category.name in the actually received ping data.
    • the prior pattern accounted for the category, a dot and the name
  • however in the parser schema the category can be 40 characters, so allowing 61 in the pipeline was always wrong?
    • labeled_metric_id used the more correct 71 characters (and only this year dropped the more strict pattern)
    • that pattern (dotted_snake_case) also enforces up to 29 characters without a dot, not more.
    • however that is only used for the category, metric names are validated against short_id, which requires snake_case and a maximum of 70 now (and 30 before).

So really our pipeline schema and the glean_parser schema disagree about what patterns and length is allowed. Perry's change will now allow any snake_case value for category.name with an upper limit of 111 (40 characters category, a dot and 70 characters for the name).
IMO it's the right thing to drop the whole up to 29 characters before a dot needs to appear part. We don't enforce it client side for names, the regex is hard to read and I still don't have a full understanding on why we went with it.

In the meantime bug 1849726 was filed, discussed and a patch was landed to shorten the metric name, which means we don't need to backfill it I guess.
I bet a similar issue will pop up in the future though (plus we will keep getting this "wrong" data for a while), so I argue relaxing the schema is still the right choice.

Tagging Anna for visibility, you're also tagged for review on the PR and I don't have merge permissions there.

Flags: needinfo?(ascholtz)
See Also: → 1849726
Flags: needinfo?(pmcmanis)

merged

Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(ascholtz)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: