subsession_start_date not compatible with presto's `from_iso8601_timestamp`

NEW
Unassigned

Status

Data Platform and Tools
Datasets: Main Summary
P3
normal
a month ago
11 days ago

People

(Reporter: amiyaguchi, Unassigned)

Tracking

Details

(Reporter)

Description

a month ago
The subsession start date (and creation date) will break the from_iso8601_timestamp function. The subsession start date contains a time zone designator "[+|-]hh" outside of the [-12, 12] range.[1]

[1] https://sql.telemetry.mozilla.org/queries/49895/source
Timezones can be outside of the [-12, 12] range AFAIK:
https://en.wikipedia.org/wiki/Time_zone#List_of_UTC_offsets

Do you suspect a client issue here? Are there actual invalid timezones submitted?
Or do we need a presto fix?
Or dataset normalization as a workaround?
Flags: needinfo?(amiyaguchi)
(Reporter)

Comment 2

a month ago
There are timezones out of those UTC offsets (+17, -19, etc). The workaround during query time is to use `date_parse(substr(___))`, but something in main_summary might be worthwhile to avoid the hassle involved with different date/time formats.
Flags: needinfo?(amiyaguchi)
These are probably timezones clients are legitly reporting (from a Telemetry perspective), so i would suggest that we fix them up for derived datasets if needed.
This would be good to document though for people that are interested in the real offsets clients see.
Component: Telemetry → Datasets: General
Product: Toolkit → Data Platform and Tools

Updated

11 days ago
Points: --- → 2
Priority: -- → P1
Looks like the plan is to clean the data flowing into main_summary, so moving this bug to that component.
Component: Datasets: General → Datasets: Main Summary
Priority: P1 → P3
You need to log in before you can comment on or make changes to this bug.