(Wait, this isn't tomorrow. Whoops.)
Event Timestamps (part 2)
Originally I tried looking at the distribution of event timestamps as minutes, but that didn't go very far. Even if you exclude the first event of 0, you still get swamped by the sheer number of events that happen within the first minute. That makes it hard to see the long tail of "longer than it should be... right?" times, even if you cap at a day's number of minutes (1440).
So then I looked at something that I could definitely say was wrong: How many events pings contain events that cover a longer span than the ping itself covers? The answer is 1.4% of pings across 12.6% of clients. If I add an extra minute of wriggle room it only drops to 1.0% of pings across 9.1% of clients, so it's not just close calls.
In fact, if we graph the distribution of the differences in "event time" to "ping time" from those pings where "event time" is longer than "ping time" we can see half of the pings show times in their events that are over 3min longer than the times reported by the pings.
I'll reiterate that this is a small proportion of pings (1.4%), but it might be worth thinking up theories about by which mechanism events might be coming in over a wider period than the ping covers.
Event Category Exploration
Fenix sends events from few categories: only
custom_tab, and (the imaginatively-named)
events are represented in the sample. Since there are so few, I threw the
names in as well (though plotly in re:dash didn't handle it terribly well): https://sql.telemetry.mozilla.org/queries/62960/source#161501
One thing that strikes me is that the only event that has more events than clients is
quick_action_sheet.closed. I guess those who want it closed close it often?
Delay and Clock Skew
For this analysis I played with BigQuery. So it took a while. Sorry for the delay.
For clock skew I subtracted the
submission_timestamp from the Date header
metadata.header.date in milliseconds and then used
APPROX_QUANTILES to get the deciles:
Clock skew seems to mostly be on the order of 0-3 seconds (which might just be transmission delay, not skew) though there are, as always, the weird ones on the outside.
A quick expansion to 100 quantiles shows that ~6% of pings that are time travelling more than one second ahead (only 2% more than 4s), and on the other side of thing only 2% are skewing longer than 7s late.
We might actually be able to (sometimes, broadly) trust these clients' clocks? Less than 4% of pings being more than 7s out on either direction is pretty nice. (Future analysis should explore whether this is predominantly certain clients or not.)
On the delay front, the 95 %ile is under 2 hours. Once again, these pings get to us quickly. No need to wait days.
"events" pings seem to be operating within reasonable parameters and can be confidently used for analysis with the usual caveats around watching for duplicates and a unique caveat of watching for pings reporting events over longer spans than the ping's reporting.
- Dupes, still. Already being looked into for "baseline" pings in bug 1547234 and bug 1554729.
- Investigate what's going on with those events reporting outside of the reporting window of the ping. It's possible this is something that is perfectly allowed, but if so it's a little weird and should at least be documented.
Alessio, please take a look and let me know if you have any questions/concerns/corrections/etc.