[Telemetry Latency] Determine how long it takes main pings to get to us

NEW
Assigned to

Status

()

Toolkit
Telemetry
P1
normal
14 days ago
41 minutes ago

People

(Reporter: chutten, Assigned: chutten)

Tracking

Trunk
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Assignee)

Description

14 days ago
The Summary is willfully inaccurate, so let's get some specifics:

"main" pings carry most of the information that our analyses rely on. (Other pings we like (crash, new-profile) come in more quickly (ie, immediately) so their latencies aren't as relevant anyway) They come in at various speeds based on channel, day of week, presence of pingsender... and no doubt other variables.

The goal here is to measure how quickly we get some "critical mass" number (mean? 95%ile? 99%ile?) of main pings and to provide a dashboard that measures it.

(I fully expect this to be consumed by Mission Control once its training wheels come off.)

This will require exploratory work to see what ranges of values we "typically" receive and what variables are most likely predictors (channel, day of week, and presence of pingsender are big ones. But if we still can't get stable numbers, we may need to go deeper)

The result here is a collection of times it takes certain proportions of certain populations of "main" pings to be received.

For instance, one thing we should be able to say with this is "Yesterday we received 95% of release-channel "main" pings within 23.7 hours" 

This is to provide a concrete resource for people writing or using recurring analyses which lets them know how far back our "incomplete information" window stretches for their population.
(Assignee)

Comment 1

41 minutes ago
Current progress: https://sql.telemetry.mozilla.org/queries/5522

I've been trying different x axes: submitted date, received date, created date, session_start_date

The problem with created/session started is that the more recent data will change, worsening, over time.
The problem with submitted date is that it is a client clock.
The problem with received date (ie, submission_date_s3) is that it reflects when we received the data more than when the data was actually about (ie, we lose information about the time the ping was created)

Of course, with pingsender (and adjusting for clock skew) you'd think all of these should be fuzzy-close enough it wouldn't matter. My fiddling suggests that there are still differences (the nice client submission delay cliff doesn't look so obvious when plotted vs. submission date, for instance)

I think I will go for submission date to align with mission control and to encourage the view that this information is immutable over time. Today we already have all of yesterday's data, so the graph shouldn't budge.

Now that I have an idea of how to work with this data, it's time to narrow down to just latest versions (this is problematic around merge days, but I think I can make it work with some thresholds. All else fails, display all we have appreciable data for and let the viewer sort 'em out)
You need to log in before you can comment on or make changes to this bug.