Open
Bug 1675773
Opened 4 years ago
Updated 2 years ago
Investigate supporting integers fields as event_properties in events_daily
Categories
(Data Platform and Tools :: General, task, P4)
Data Platform and Tools
General
Tracking
(Not tracked)
NEW
People
(Reporter: frank, Unassigned)
References
Details
The current state of events_daily is to stringifiy event_properties, and give them a UTF-8 char based on how recently we saw them. This approach doesn't work well for event_properties that are clearly ordered, like integers. We would like to support numeric fields as top-level types, up to 1M values.
If we enable this, 0 would be \U0000
, 1 would be \U0001
, and so on (except using our specialized encoding, which skips ,
and "
). Then in our analysis UDFs, we could support these fields as ranges: `r"[\U0007-\U0100]" matches 7-100.
The difficulty here is knowing that the type is number, and disallowing other values. There are some options here:
- We specify types in
metrics.yaml
and flow that through to the ETL - We set aside a range (e.g. 500k - 1M) as solely numeric, thus allowing any event_property to have numbers, but limiting the number of allowed non-numeric fields (to 500k)
- We specify in the ETL (but this would usually require backfill, which is not optimal, since we'd probably see instances of the events before we add the definition)
Assignee | ||
Updated•2 years ago
|
Component: Datasets: General → General
You need to log in
before you can comment on or make changes to this bug.
Description
•