subsession_start_date not compatible with presto's `from_iso8601_timestamp`



Data Platform and Tools
Datasets: Main Summary
a month ago
11 days ago


(Reporter: amiyaguchi, Unassigned)





a month ago
The subsession start date (and creation date) will break the from_iso8601_timestamp function. The subsession start date contains a time zone designator "[+|-]hh" outside of the [-12, 12] range.[1]

Timezones can be outside of the [-12, 12] range AFAIK:

Do you suspect a client issue here? Are there actual invalid timezones submitted?
Or do we need a presto fix?
Or dataset normalization as a workaround?
Flags: needinfo?(amiyaguchi)

Comment 2

a month ago
There are timezones out of those UTC offsets (+17, -19, etc). The workaround during query time is to use `date_parse(substr(___))`, but something in main_summary might be worthwhile to avoid the hassle involved with different date/time formats.
Flags: needinfo?(amiyaguchi)
These are probably timezones clients are legitly reporting (from a Telemetry perspective), so i would suggest that we fix them up for derived datasets if needed.
This would be good to document though for people that are interested in the real offsets clients see.
Component: Telemetry → Datasets: General
Product: Toolkit → Data Platform and Tools


11 days ago
Points: --- → 2
Priority: -- → P1
Looks like the plan is to clean the data flowing into main_summary, so moving this bug to that component.
Component: Datasets: General → Datasets: Main Summary
Priority: P1 → P3
You need to log in before you can comment on or make changes to this bug.