Closed Bug 1876010 Opened 1 year ago Closed 1 year ago

Validate fenix event stream table

Categories

(Data Platform and Tools :: Glean Platform, task, P1)

task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: janerik, Assigned: janerik)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

For completeness we should validate the current table:

  • Does it have all the data we expect it to have?
  • Any weird outliers?
  • Event counts matching the unnested table?
  • Correctly updated daily?

Checking in on whether event/client counts match between source table and event stream: https://sql.telemetry.mozilla.org/queries/97109/source?p_app=org_mozilla_fenix&p_channel=nightly&p_start%20date=2023-06-01#239768
(nightly query for a longer time, but release is also fine, just takes a lot longer to run, even for a small timeframe)

tiny outliers on nightly:

  • 2023-12-01: one more client in the source table than in the event stream.

outliers on release:

  • from 2023-12-01 to now: between 1 and 12 clients more in the source table

that's a bit weird, given that we don't filter anything
At total client numbers that's of course a tiny amount and shouldn't break any other analysis, but I'm going to look into that.

Anna, do you have an idea why we would see a small discrepancy in the distinct client IDs in the two tables?
Are there any new pings added later to the events table, which then the events_stream won't get because it is not re-run for that day?

Flags: needinfo?(ascholtz)

or could it be shredder?

The issue here is that these clients sent event pings with events being set to null.
So the events table contains the pings of these clients (that's why they get counted there), but in the events_stream table the events field gets unnested and joined onto some of the ping data: https://github.com/mozilla/bigquery-etl/blob/10d7002b95278395abd93d21a1dad95f34f0af01/sql_generators/glean_usage/templates/events_stream_v1.query.sql#L43-L44
Since events is empty, these pings get essentially filtered out from there.

Flags: needinfo?(ascholtz)

ah thanks! I should have checked more closely. That answers it and is not a problem then. No events -> nothing in the events stream.

(In reply to Jan-Erik Rediger [:janerik] from comment #5)

ah thanks! I should have checked more closely. That answers it and is not a problem then. No events -> nothing in the events stream.

I think this should be noted somewhere in the table documentation (table metadata.yaml?), as this discrepancy will surely pop up when working in the data.

yup metadata.yaml would be a good place to leave a note

Flags: needinfo?(jrediger)
Flags: needinfo?(jrediger)

badboy merged PR [mozilla/bigquery-etl]: Bug 1876010 - Call out the tiny gotcha our analysis made clear (#4885) in a22221e.

With that tiny gotcha out of the way I call this done.

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: