I took a preliminary look into available "baseline" pings. Here's my report. ### Scope I was able to look at over 21 thousand pings from clients running builds at least as new as 10281206 with pings received after 2019-01-28. ### Ping and Client Counts #### Aggregate https://sql.telemetry.mozilla.org/queries/61238#157776 In the cohort we've not yet hit 1k DAU, but we can receive nearly 5k pings in a given day. The curve is consistent with an adoption curve of builds >= 10281206 intersecting with a weekend slump. Also of note is [Frank's query of *AU](https://sql.telemetry.mozilla.org/queries/61138/source#157551). Originally I thought it was alarming how many WAU and MAU it's showing, but it makes sense that &browser's engagement rate is lower that Firefox's. The raw numbers (WAU flattening at 2.5k) are a little higher than expected, I'm told. Apparently we have many fewer than that enrolled in the Beta. It's not that `client_id`s are cycling on build updates, as hundreds of clients are popping up with the same `client_id` [across multiple builds](https://sql.telemetry.mozilla.org/queries/61281/source#157886). #### Per-client, Per-day There are some outliers when looking at the number of pings sent [per-client](https://sql.telemetry.mozilla.org/queries/61240/source#157786) and [per-day](https://sql.telemetry.mozilla.org/queries/61257/source#157829), but for the most part they're also both the expected exponential decay curves. ### Sequence Numbers #### Distribution Aggregate distribution of sequence numbers is exactly the exponential decay we'd expect: https://sql.telemetry.mozilla.org/queries/61239/source#157777 We expect to see exactly one client reporting one ping with a given seq, and we see that. There are only two extra pings, one with seq 0 another with seq 17, accounting for less than 0.01% of received pings. Nice. The highest `seq` is over 400. I originally though this was Alessio, but the pings have `start_time` timezones of -05:00 which points to an Eastern Standard Time culprit. So maybe it's Mike. It was tempting to try and draw a conclusion between the number of pings/clients with `seq` of 0 and WAU/DAU across the builds but it doesn't make sense to do so until modern-enough builds have hit saturation in the population. #### Holes [Here's some bad news](https://sql.telemetry.mozilla.org/queries/61259/source#157832): at least 16.8% of clients have one or more hole in their `seq` sequence. The query undercounts because is doesn't attempt to detect holes at the beginning of the sequence (ie, doesn't start at 0) because I'm not sure that the Scope is clean enough to have caught the beginning of everyone's `seq` record. It also *can't* detect holes at the end of the sequence. "Holey" clients are most likely to have a sequence of length 4. This doesn't mean anything, it's just the intersection between lower `seq` values being more frequent and longer sequence lengths having more opportunities for holes. (though it might be fun to try and determine what distribution holes follow (e.g. uniformly random?) by taking it as a proportion of the number of pings we expect to see with `seq` values that high) The sum of the lengths of all holes in the record is most likely to be 1. This is unsurprising given the short lengths of sequences overall and the relative rarity of `seq` holes. There *is* a hole of length -1 from the client who sent two pings with `seq` 0. Since we didn't catch the duplicate with `seq` of 17, it either means they have two holes and their sum is undercounted, or they had one hole nullified precisely by the single dupe and thus were missed by the query altogether. This is of curiosity-level value. ### Field Compositions **Note**: I initially found it difficult to find the "baseline" ping's metrics. The docs identify, for example, the `duration` metric. But it needs to be found at `metrics.timespan['glean.baseline.duration']`. Both the `timespan` and the `glean.baseline.` namespacing were unclear from the docs. The [distribution of durations](https://sql.telemetry.mozilla.org/queries/61278/source#157883) is a lightly-sloping exponential. There's a bump around the one-minute (60s) mark suggesting maybe there's some automation at play already? Or do apps get sent to the background when the screen dims and 60s is just a common length for that setting on Android? Most of the ["baseline" metrics](https://github.com/mozilla-mobile/android-components/blob/master/components/service/glean/docs/baseline.md) check out. I worked [the query](https://sql.telemetry.mozilla.org/queries/61273/source) in a way that might be adaptable to regular alerting using re:dash's tooling. * `duration` 4 pings have `NULL` duration. All the rest have unit 'second' as expected. There were also 299 pings with 0 seconds of duration. * `os` All of the pings have 'Android' for their os. (**Note**: The docs use 'android' without the capital A. May wish to update that.) * `os_version` All pings have an `os_version` of some value or another. Moreover they're all >= 16 (in fact they're all >= 21). I couldn't find out the minimum system requirements of &browser (even Google Play Store won't tell me) so I went with "at least newer than the API version we check for Fennec". * `device`, `device_manufacturer`, and `device_model`. bug 1522552 was merged after build 10281206 so I was confused by so many `NULL` values. But no, all pings have valid device information. The top manufacturers are Xiaomi, Samsung, Google, and OnePlus. The models are scattered to the winds and have no clear winners, really. A quick scan showed nothing too strange (though I thought `TP-Link` was a router manufacturer...) * `architecture` All pings have architecture information. Moreover they all start with `arm` or `x86`, with `arm64-v8a` the overwhelming favourite (though a couple of token `x86` and `x86_64`). * `locale` No pings contain `locale`. Which is weird for a field we're including in the "baseline" ping. ### Delay I didn't study ping delays as it requires the use of the `metadata` fields which I can't reach using available tooling. Alas. ## Conclusion I conclude that these pings are mostly complete but should not yet be used for any decision-making analyses. ## Recommendations It appears as though there's a widespread problem affecting the ability of hundreds of clients to send their "baseline" pings (or for us to receive them). I did check the `telemetry-errors` stream for "baseline" pings and found none, suggesting there's a problem in transmit. This is the primary problem holding up validation. `duration` isn't as reliable as we'd probably like. 4 pings with `NULL` duration is odd. 299 pings with 0s of duration may throw off analyses. I recommend looking a little into `NULL` `duration` values, especially if they increase in number. I also recommend that we consider rounding fractional time units to the next whole second value so that adding pings to an analysis always increases the time over which the measurements were taken. (I presume `duration` will be used as a denominator for many decision-making metrics) `locale` should have a value set, even if it's `und-ZZ` ("unknown language" ISO 639 + "Unknown or Invalid Territory" Common Locale Data Repository). Otherwise it should be omitted from the docs like other not-yet-implemented fields (I'm looking at you "field that should not be called 'profile_age'"). --- Please let me know if you have any questions/concerns/corrections.
Bug 1520182 Comment 9 Edit History
Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.
I took a preliminary look into available "baseline" pings. Here's my report. ### Scope I was able to look at over 21 thousand pings from clients running builds at least as new as 10281206 with pings received after 2019-01-28. ### Ping and Client Counts #### Aggregate https://sql.telemetry.mozilla.org/queries/61238#157776 In the cohort we've not yet hit 1k DAU, but we can receive nearly 5k pings in a given day. The curve is consistent with an adoption curve of builds >= 10281206 intersecting with a weekend slump. Also of note is [Frank's query of *AU](https://sql.telemetry.mozilla.org/queries/61138/source#157551). Originally I thought it was alarming how many WAU and MAU it's showing, but it makes sense that &browser's engagement rate is lower that Firefox's. The raw numbers (WAU flattening at 2.5k) are a little higher than expected, I'm told. Apparently we have many fewer than that enrolled in the Beta. It's not that `client_id`s are cycling on build updates, as hundreds of clients are popping up with the same `client_id` [across multiple builds](https://sql.telemetry.mozilla.org/queries/61281/source#157886). #### Per-client, Per-day There are some outliers when looking at the number of pings sent [per-client](https://sql.telemetry.mozilla.org/queries/61240/source#157786) and [per-day](https://sql.telemetry.mozilla.org/queries/61257/source#157829), but for the most part they're also both the expected exponential decay curves. ### Sequence Numbers #### Distribution Aggregate distribution of sequence numbers is exactly the exponential decay we'd expect: https://sql.telemetry.mozilla.org/queries/61239/source#157777 We expect to see exactly one client reporting one ping with a given seq, and we see that. There are only two extra pings, one with seq 0 another with seq 17, accounting for less than 0.01% of received pings. Nice. The highest `seq` is over 400. I originally though this was Alessio, but the pings have `start_time` timezones of -05:00 which points to an Eastern Standard Time culprit. So maybe it's Mike. It was tempting to try and draw a conclusion between the number of pings/clients with `seq` of 0 and WAU/DAU across the builds but it doesn't make sense to do so until modern-enough builds have hit saturation in the population. #### Holes [Here's some bad news](https://sql.telemetry.mozilla.org/queries/61259/source#157832): at least 16.8% of clients have one or more hole in their `seq` sequence. The query undercounts because is doesn't attempt to detect holes at the beginning of the sequence (ie, doesn't start at 0) because I'm not sure that the Scope is clean enough to have caught the beginning of everyone's `seq` record. It also *can't* detect holes at the end of the sequence. "Holey" clients are most likely to have a sequence of length 4. This doesn't mean anything, it's just the intersection between lower `seq` values being more frequent and longer sequence lengths having more opportunities for holes. (though it might be fun to try and determine what distribution holes follow (e.g. uniformly random?) by taking it as a proportion of the number of pings we expect to see with `seq` values that high) The sum of the lengths of all holes in the record is most likely to be 1. This is unsurprising given the short lengths of sequences overall and the relative rarity of `seq` holes. There *is* a hole of length -1 from the client who sent two pings with `seq` 0. Since we didn't catch the duplicate with `seq` of 17, it either means they have two holes and their sum is undercounted, or they had one hole nullified precisely by the single dupe and thus were missed by the query altogether. This is of curiosity-level value. ### Field Compositions **Note**: I initially found it difficult to find the "baseline" ping's metrics. The docs identify, for example, the `duration` metric. But it needs to be found at `metrics.timespan['glean.baseline.duration']`. Both the `timespan` and the `glean.baseline.` namespacing were unclear from the docs. The [distribution of durations](https://sql.telemetry.mozilla.org/queries/61278/source#157883) is a lightly-sloping exponential. There's a bump around the one-minute (60s) mark suggesting maybe there's some automation at play already? Or do apps get sent to the background when the screen dims and 60s is just a common length for that setting on Android? Most of the ["baseline" metrics](https://github.com/mozilla-mobile/android-components/blob/master/components/service/glean/docs/baseline.md) check out. I worked [the query](https://sql.telemetry.mozilla.org/queries/61273/source) in a way that might be adaptable to regular alerting using re:dash's tooling. * `duration` 4 pings have `NULL` duration. All the rest have unit 'second' as expected. There were also 299 pings with 0 seconds of duration. * `os` All of the pings have 'Android' for their os. (**Note**: The docs use 'android' without the capital A. May wish to update that.) * `os_version` All pings have an `os_version` of some value or another. Moreover they're all >= 16 (in fact they're all >= 21). I couldn't find out the minimum system requirements of &browser (even Google Play Store won't tell me) so I went with "at least newer than the API version we check for Fennec". * `device`, `device_manufacturer`, and `device_model`. bug 1522552 was merged after build 10281206 so I was confused by so many `NULL` values. But no, all pings have valid device information. The top manufacturers are Xiaomi, Samsung, Google, and OnePlus. The models are scattered to the winds and have no clear winners, really. A quick scan showed nothing too strange (though I thought `TP-Link` was a router manufacturer...) * `architecture` All pings have architecture information. Moreover they all start with `arm` or `x86`, with `arm64-v8a` the overwhelming favourite (though a couple of token `x86` and `x86_64`). * `locale` No pings contain `locale`. Which is weird for a field we're including in the "baseline" ping. ### Delay I didn't study ping delays as it requires the use of the `metadata` fields which I can't reach using available tooling. Alas. ## Conclusion I conclude that these pings are mostly complete but should not yet be used for any decision-making analyses. ## Recommendations It appears as though there's a widespread problem affecting the ability of hundreds of clients to send their "baseline" pings (or for us to receive them). I did check the `telemetry-errors` stream for "baseline" pings and found none, suggesting there's a problem in transmit. This is the primary problem holding up validation. `duration` isn't as reliable as we'd probably like. 4 pings with `NULL` duration is odd. 299 pings with 0s of duration may throw off analyses. I recommend looking a little into `NULL` `duration` values, especially if they increase in number. I also recommend that we consider rounding fractional time units to the next whole second value so that adding pings to an analysis always increases the time over which the measurements were taken. (I presume `duration` will be used as a denominator for many decision-making metrics) `locale` should have a value set, even if it's `und-ZZ` ("unknown language" ISO 639 + "Unknown or Invalid Territory" Common Locale Data Repository). Otherwise it should be omitted from the docs like other not-yet-implemented fields (I'm looking at you "field that should not be called 'profile_age'"). --- Please let me know if you have any questions/concerns/corrections.