Open Bug 1675773 Opened 4 years ago Updated 2 years ago

Investigate supporting integers fields as event_properties in events_daily

Categories

(Data Platform and Tools :: General, task, P4)

task

Tracking

(Not tracked)

People

(Reporter: frank, Unassigned)

References

Details

The current state of events_daily is to stringifiy event_properties, and give them a UTF-8 char based on how recently we saw them. This approach doesn't work well for event_properties that are clearly ordered, like integers. We would like to support numeric fields as top-level types, up to 1M values.

If we enable this, 0 would be \U0000, 1 would be \U0001, and so on (except using our specialized encoding, which skips , and "). Then in our analysis UDFs, we could support these fields as ranges: `r"[\U0007-\U0100]" matches 7-100.

The difficulty here is knowing that the type is number, and disallowing other values. There are some options here:

  • We specify types in metrics.yaml and flow that through to the ETL
  • We set aside a range (e.g. 500k - 1M) as solely numeric, thus allowing any event_property to have numbers, but limiting the number of allowed non-numeric fields (to 500k)
  • We specify in the ETL (but this would usually require backfill, which is not optimal, since we'd probably see instances of the events before we add the definition)
See Also: → 1675782
Component: Datasets: General → General
You need to log in before you can comment on or make changes to this bug.