structured missing columns in `firefox_desktop.metrics_v1` for `events`.[...].`timestamp`
Categories
(Data Platform and Tools :: Glean: SDK, defect, P2)
Tracking
(Not tracked)
People
(Reporter: kik, Assigned: janerik)
References
Details
(Whiteboard: [dataquality])
Attachments
(2 files)
|
42 bytes,
text/x-github-pull-request
|
Details | Review | |
|
2.86 KB,
text/plain
|
chutten
:
data-review+
|
Details |
structured missing columns in firefox_desktop.metrics_v1 for events.[...].timestamp
Last 7 day count: 5,935
2 weeks ago count: 3,710
A 59.97% increase.
Updated•2 years ago
|
Updated•3 months ago
|
Comment 1•3 months ago
|
||
The moz-fx-data-shared-prod.firefox_desktop_stable.metrics_v1 table does have an events.timestamp column.
It turns out the cause of these "missing column" errors is that Glean's event timestamp field is an unsigned 64-bit integer, while BigQuery's integer type is a signed 64-bit integer, and when an event timestamp has a value larger than BigQuery's maximum supported integer value of 9,223,372,036,854,775,807 it gets shunted into the additional_properties column instead, and thus gets reported as a missing column.
I have reported the underlying issue to the Glean team in Slack, and since this is only happening very rarely (8.4k errors in the last week, or only 0.002% of pings) I'm closing it as won't-fix.
| Assignee | ||
Comment 2•2 months ago
|
||
Those seem like rather not-valid values regardless. We should more gracefully handle these.
Re-opening and moving to the SDK, at least we should not sent out these large values.
| Assignee | ||
Updated•2 months ago
|
| Assignee | ||
Updated•2 months ago
|
| Assignee | ||
Comment 3•2 months ago
|
||
Comment 4•2 months ago
|
||
FYI, here's a Redash query I made to make it easier to check how often this is still happening: https://sql.telemetry.mozilla.org/queries/112212?p_date_range=d_last_30_days
| Assignee | ||
Comment 5•9 days ago
|
||
Comment 6•7 days ago
|
||
Comment on attachment 9536134 [details]
1873482-data-review.txt
DATA COLLECTION REVIEW RESPONSE:
Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?
Yes.
Is there a control mechanism that allows the user to turn the data collection on and off?
Yes. This collection can be controlled through the product's preferences.
If the request is for permanent data collection, is there someone who will monitor the data over time?
No. This collection will expire on 2026-06-31.
Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?
Category 1, Technical.
Is the data collection request for default-on or default-off?
Default on for all channels.
Does the instrumentation include the addition of any new identifiers?
No.
Is the data collection covered by the existing Firefox privacy notice?
Yes.
Does the data collection use a third-party collection tool?
No.
Result: datareview+
| Assignee | ||
Comment 7•3 days ago
|
||
badboy merged PR [mozilla/glean]: Bug 1873482 - Clamp event timestamps to i64::MAX (#3308) in 74fb2e2.
Description
•