Glean: Implement timezone metric for the baseline ping
Categories
(Toolkit :: Telemetry, enhancement, P1)
Tracking
()
People
(Reporter: mdroettboom, Assigned: Dexter)
References
Details
(Whiteboard: [telemetry:mobilesdk:m4] )
Attachments
(1 file)
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 1•6 years ago
|
||
(In reply to Michael Droettboom [:mdroettboom] from comment #0)
The current draft of the SDK defines the timezone metric as a number of
hours offset from UTC. However, there may be better ways to handle this,
including using the timezone field in an ISO8601 date field in the core
metrics.
There are at least two fields that contain some way to infer the timezone:
start_time
, the start of the data collection window for this ping, ISO 8601 format (e.g. "2018-12-19T12:36-06:00")end_time
the end of the data collection window for this ping, same format as above. Both are generated with this codeDate
header, added at submission time right before, generated with this code
@all - Are the above data points sufficient to infer the timezone for a specific client or do we still need to add a timezone
field? exclusively for the baseline ping?
@travis - given your previous experiences with date/times, what do you think?
Comment 2•6 years ago
|
||
I think that the ISO format gives us enough information to infer the correct timezone, from my viewpoint.
The one thing we might want to settle now is if we have a start_time
and end_time
that falls into different timezones, how do we want to handle this? Do we want one or the other to be the go-to field for determining the client timezone or does this need to be more sophisticated in some way?
Assignee | ||
Comment 3•6 years ago
|
||
(In reply to Travis Long from comment #2)
The one thing we might want to settle now is if we have a
start_time
andend_time
that falls into different timezones, how do we want to handle this? Do we want one or the other to be the go-to field for determining the client timezone or does this need to be more sophisticated in some way?
In this document data scientists mention that knowing if the timezone changed is a good thing. Moreover, we don't know yet how frequent timezone changes are.
I'm not sure there's any evidence on picking the timezone of one against another: maybe we can suggest picking start_time
by default (in the docs) and then leave it to the scientists?
Comment 4•6 years ago
|
||
One issue to keep in mind is that timezones could (in the extreme) change multiple times during a pings "measurement window".
This is unlikely for the baseline ping specifically though, where zero or one timezone change are the most likely to happen.
If we, as a general Glean guideline, add timezone information to all date fields, that should as a consequence solve the question of start_time
& date_time
.
This would also leave it up to the data engineers or analysts to choose which timezone information (start or end) is most meaningful for their use-case. We would also be able to change how we use it later as we learn more.
I see we also add the timezone to the date header, so this should cover every analysis question we might have.
Comment 5•6 years ago
|
||
To not be blocked for the implementation too long, i'd propose:
- We settle on a way forward (like ISO format with TZ everywhere).
- We move forward with the implementation.
- Later, in the response draft/documentation to the reporting schedule recommendations (bug 1520838), we clearly call these fields and implications out.
- If there are change decisions based on that, we adopt Glean to them later, in a follow-up bug.
Comment 6•6 years ago
|
||
For what it's worth, I've been using the offset in my queries, and it's better for Presto because it doesn't support TZ explicitly. BigQuery does, however, the offset is sufficient. I've never seen a need to infer the name of the timezone, if that's part of the question.
Comment 7•6 years ago
|
||
If the goal is to record the local time, +1 for using ISO format with timezone.
If it is desirable to have the timezone as a standalone piece of information (ie. the name of the tz), then I think it should be a separate field.
Assignee | ||
Comment 8•6 years ago
|
||
Ok, looks like we settled on not having the timezone
field in the baseline
ping, but rather keep using the one either one of the data points already available from comment 1.
I'll use this bug to update the docs and get rid of the commented definition in glean's metrics.yaml
file.
Assignee | ||
Comment 9•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Description
•