Validate fenix event stream table
Categories
(Data Platform and Tools :: Glean Platform, task, P1)
Tracking
(Not tracked)
People
(Reporter: janerik, Assigned: janerik)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
For completeness we should validate the current table:
- Does it have all the data we expect it to have?
- Any weird outliers?
- Event counts matching the unnested table?
- Correctly updated daily?
Assignee | ||
Comment 1•1 year ago
|
||
Checking in on whether event/client counts match between source table and event stream: https://sql.telemetry.mozilla.org/queries/97109/source?p_app=org_mozilla_fenix&p_channel=nightly&p_start%20date=2023-06-01#239768
(nightly query for a longer time, but release is also fine, just takes a lot longer to run, even for a small timeframe)
tiny outliers on nightly:
- 2023-12-01: one more client in the source table than in the event stream.
outliers on release:
- from 2023-12-01 to now: between 1 and 12 clients more in the source table
that's a bit weird, given that we don't filter anything
At total client numbers that's of course a tiny amount and shouldn't break any other analysis, but I'm going to look into that.
Assignee | ||
Comment 2•1 year ago
|
||
Anna, do you have an idea why we would see a small discrepancy in the distinct client IDs in the two tables?
Are there any new pings added later to the events table, which then the events_stream won't get because it is not re-run for that day?
Assignee | ||
Comment 3•1 year ago
|
||
or could it be shredder?
Comment 4•1 year ago
|
||
The issue here is that these clients sent event pings with events
being set to null
.
So the events
table contains the pings of these clients (that's why they get counted there), but in the events_stream
table the events
field gets unnested and joined onto some of the ping data: https://github.com/mozilla/bigquery-etl/blob/10d7002b95278395abd93d21a1dad95f34f0af01/sql_generators/glean_usage/templates/events_stream_v1.query.sql#L43-L44
Since events
is empty, these pings get essentially filtered out from there.
Assignee | ||
Comment 5•1 year ago
|
||
ah thanks! I should have checked more closely. That answers it and is not a problem then. No events -> nothing in the events stream.
Comment 6•1 year ago
|
||
(In reply to Jan-Erik Rediger [:janerik] from comment #5)
ah thanks! I should have checked more closely. That answers it and is not a problem then. No events -> nothing in the events stream.
I think this should be noted somewhere in the table documentation (table metadata.yaml?), as this discrepancy will surely pop up when working in the data.
Comment 7•1 year ago
|
||
yup metadata.yaml
would be a good place to leave a note
Updated•1 year ago
|
Comment 8•1 year ago
|
||
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 9•1 year ago
|
||
badboy merged PR [mozilla/bigquery-etl]: Bug 1876010 - Call out the tiny gotcha our analysis made clear (#4885) in a22221e.
With that tiny gotcha out of the way I call this done.
Description
•