Closed Bug 1518165 Opened 6 years ago Closed 6 years ago

Glean: Implement timezone metric for the baseline ping

Categories

(Toolkit :: Telemetry, enhancement, P1)

enhancement
Points:
2

Tracking

()

RESOLVED FIXED

People

(Reporter: mdroettboom, Assigned: Dexter)

References

Details

(Whiteboard: [telemetry:mobilesdk:m4] )

Attachments

(1 file)

This is a follow-on to 1497894 to add another metric specified for the baseline ping in the SDK. The current draft of the SDK defines the timezone metric as a number of hours offset from UTC. However, there may be better ways to handle this, including using the timezone field in an ISO8601 date field in the core metrics. We should discuss how best to handle this, and how important it is to be consistent with existing behavior. See also the related bug 1514211, which is about using sane times for metrics.
Blocks: 1491345
Whiteboard: [telemetry:mobilesdk:m4]
Points: --- → 2
Priority: -- → P1
Priority: P1 → P3
Assignee: nobody → alessio.placitelli
Priority: P3 → P1

(In reply to Michael Droettboom [:mdroettboom] from comment #0)

The current draft of the SDK defines the timezone metric as a number of
hours offset from UTC. However, there may be better ways to handle this,
including using the timezone field in an ISO8601 date field in the core
metrics.

There are at least two fields that contain some way to infer the timezone:

  • start_time, the start of the data collection window for this ping, ISO 8601 format (e.g. "2018-12-19T12:36-06:00")
  • end_time the end of the data collection window for this ping, same format as above. Both are generated with this code
  • Date header, added at submission time right before, generated with this code

@all - Are the above data points sufficient to infer the timezone for a specific client or do we still need to add a timezone field? exclusively for the baseline ping?

@travis - given your previous experiences with date/times, what do you think?

Flags: needinfo?(tlong)
Flags: needinfo?(mreid)
Flags: needinfo?(gfritzsche)
Flags: needinfo?(fbertsch)

I think that the ISO format gives us enough information to infer the correct timezone, from my viewpoint.

The one thing we might want to settle now is if we have a start_time and end_time that falls into different timezones, how do we want to handle this? Do we want one or the other to be the go-to field for determining the client timezone or does this need to be more sophisticated in some way?

Flags: needinfo?(tlong)

(In reply to Travis Long from comment #2)

The one thing we might want to settle now is if we have a start_time and end_time that falls into different timezones, how do we want to handle this? Do we want one or the other to be the go-to field for determining the client timezone or does this need to be more sophisticated in some way?

In this document data scientists mention that knowing if the timezone changed is a good thing. Moreover, we don't know yet how frequent timezone changes are.

I'm not sure there's any evidence on picking the timezone of one against another: maybe we can suggest picking start_time by default (in the docs) and then leave it to the scientists?

One issue to keep in mind is that timezones could (in the extreme) change multiple times during a pings "measurement window".
This is unlikely for the baseline ping specifically though, where zero or one timezone change are the most likely to happen.

If we, as a general Glean guideline, add timezone information to all date fields, that should as a consequence solve the question of start_time & date_time.
This would also leave it up to the data engineers or analysts to choose which timezone information (start or end) is most meaningful for their use-case. We would also be able to change how we use it later as we learn more.

I see we also add the timezone to the date header, so this should cover every analysis question we might have.

To not be blocked for the implementation too long, i'd propose:

  • We settle on a way forward (like ISO format with TZ everywhere).
  • We move forward with the implementation.
  • Later, in the response draft/documentation to the reporting schedule recommendations (bug 1520838), we clearly call these fields and implications out.
  • If there are change decisions based on that, we adopt Glean to them later, in a follow-up bug.
Flags: needinfo?(gfritzsche)

For what it's worth, I've been using the offset in my queries, and it's better for Presto because it doesn't support TZ explicitly. BigQuery does, however, the offset is sufficient. I've never seen a need to infer the name of the timezone, if that's part of the question.

Flags: needinfo?(fbertsch)

If the goal is to record the local time, +1 for using ISO format with timezone.

If it is desirable to have the timezone as a standalone piece of information (ie. the name of the tz), then I think it should be a separate field.

Flags: needinfo?(mreid)

Ok, looks like we settled on not having the timezone field in the baseline ping, but rather keep using the one either one of the data points already available from comment 1.

I'll use this bug to update the docs and get rid of the commented definition in glean's metrics.yaml file.

Attached file GitHub Pull Request
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: