Closed Bug 1596168 Opened 6 years ago Closed 6 years ago

Validate incoming 'baseline' pings for Lockwise Android

Categories

(Data Platform and Tools :: Glean: SDK, task, P1)

task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Dexter, Assigned: travis_)

References

Details

(Whiteboard: [telemetry:glean-rs:m14])

Once integrated, we'll need to make sure incoming ping data from the 'baseline' ping matches the one from their core ping.

Priority: -- → P3
Whiteboard: [telemetry:glean-rs:m14]
Assignee: nobody → tlong
Priority: P3 → P1

Scope

The queries are limited to the pings received between Feb. 2 and 8, 2020.

This includes 142,466 pings from 33,844 clients.

Ping and Client Counts

Aggregate

With approximately 8k DAU we're seeing around 20k pings per day. (query)

Per-client: https://sql.telemetry.mozilla.org/queries/68126/source#172232
This looks like a nice curve and is what I would expect to see as most users are sending less than 10 pings over the sample period and we see some "heavy" users sending as many as 75 pings over the sample period (a cohort of 17 users at this level)

Sequence Numbers

Holes and Dupes

https://sql.telemetry.mozilla.org/queries/68141/source#172261

105 clients (0.31%) have any holes/dupes in their sequence record, this is low and that is good, but it might bear a little more investigation to see if there is room for improvement. On the whole, it appears that dupes are slightly more prevalent than holes but these seem to be evenly distributed among the sequence numbers of clients so I don't think there is any correlation to new or old clients with regards to either dupes or holes.

Field Compositions

durations: https://sql.telemetry.mozilla.org/queries/68163/source#172373
I struggled to get a histogram out of BigQuery on this one, my lack of knowledge being the limiting factor. Other than that, we see a nice logarithmic curve of baseline ping durations which looks like what I would expect.

client_info stuff and locale: https://sql.telemetry.mozilla.org/queries/68197/source
There are a few null durations (64 out of 142k) and a few zero durations (2345 out of 142k) that we might want to investigate, but overall the rates are low, which is good. The rest of the fields look to be as expected.

Delay

https://sql.telemetry.mozilla.org/queries/68127/source#172233
This is measured using the header date and the submission_timestamp, we see the vast majority of clients' pings arrive < 1 minute, but we do have a few (4364) that arrived between 1-2 minutes, and then a smattering of time travellers. Longer than 3 hours is a cohort of only 66 out of the 142k pings, which seems pretty good. All in all, I think we may want to investigate a little more about why there are more time-travellers than I would have expected, but I suspect it has more to do with timezones or other oddities rather than anything buggy.

We see only 0.04% of over-3-hour submission delays which is fantastic. Pings are getting to us in a reasonable time.

1.5% are received before they're recorded.
95% of pings are received within a minute of their recording.

I struggled with a good visualization for this data but would be happy to entertain suggestions.

Conclusion

I conclude that Lockwise-Android baseline pings look good with the exception of a few more time-travellers than I would have expected, and a similar hole/dupe problem as seen on other apps.

:chutten, do you mind giving my analysis a quick over-the-shoulder look to make sure I didn't overlook anything (or misinterpret anything).

Flags: needinfo?(chutten)

Hey Travis, please note that this should be about checking baseline vs core ping from legacy telemetry, in addition to the overall sanity check of the baseline ping.

Flags: needinfo?(tlong)

(In reply to Travis Long [:travis_] from comment #1)

Scope

The queries are limited to the pings received between Feb. 2 and 8, 2020.

Specifically "baseline" pings.

Ping and Client Counts

Aggregate

With approximately 8k DAU we're seeing around 20k pings per day. (query)

Per-client: https://sql.telemetry.mozilla.org/queries/68126/source#172232
This looks like a nice curve and is what I would expect to see as most users are sending less than 10 pings per day and we see some "heavy" users sending as many as 75 pings per day (a cohort of 17 users at this level)

Less than 10 pings over the sample interval, not per-day. If we want it per-day we'd have to group by {client_id, DATE(submission_timestamp)} tuples instead.

Sequence Numbers

Holes and Dupes

https://sql.telemetry.mozilla.org/queries/68141/source#172261

105 clients (0.31%) have any holes/dupes in their sequence record, this is low and that is good, but it might bear a little more investigation to see if there is room for improvement. On the whole, it appears that dupes are slightly more prevalent than holes but these seem to be evenly distributed among the sequence numbers of clients so I don't think there is any correlation to new or old clients with regards to either dupes or holes.

Dupes do appear to happen to ~3x as many clients as holes do. You're right, we could do better. Maybe we need a bug.

Field Compositions

durations: https://sql.telemetry.mozilla.org/queries/68163/source#172373
I struggled to get a histogram out of BigQuery on this one, my lack of knowledge being the limiting factor. Other than that, we see a nice logarithmic curve of client durations which looks like what I would expect.

The linked query seems to be incorrect. It's not using the same sample date range, and it's counting pings instead of clients. (Which is fine if that's what you wanted). You may wish to choose a Logarithmic scale for the Y-axis of the visualization, too.

I think the distribution per-ping is the most interesting one. I think there are questions that should be asked about why so many lockwise foreground sessions are shorter than 10s. It might be neat to do a CDF of the durations so we can see proportions more easily... but I've forgotten how to do that (Alessio knows how I think).

client_info stuff and locale: https://sql.telemetry.mozilla.org/queries/68197/source
There are a few null durations (64 out of 142k) and a few zero durations (2345 out of 142k) that we might want to investigate, but overall the rates are low, which is good. The rest of the fields look to be as expected.

Zero-value durations are expected for foreground sessions shorter than half a second (IIRC). Null durations should be impossible, I thought.

Delay

https://sql.telemetry.mozilla.org/queries/68127/source#172233
This is measured using the header date and the submission_timestamp, we see the vast majority of clients' pings arrive < 1 minute, but we do have a few (4364) that arrived between 1-2 minutes, and then a smattering of time travellers. Longer than 3 hours is a cohort of only 66 out of the 142k pings, which seems pretty good. All in all, I think we may want to investigate a little more about why there are more time-travellers than I would have expected, but I suspect it has more to do with timezones or other oddities rather than anything buggy.

We see only 0.04% of over-3-hour submission delays which is fantastic. Pings are getting to us in a reasonable time.

1.5% are received before they're recorded.
95% of pings are received within a minute of their recording.

I struggled with a good visualization for this data but would be happy to entertain suggestions.

A Histogram like the ping counts one would do. You'll need to clamp the numeric value instead of using strings if you want it to be pretty.

Also, you might wish to prefer TIMESTAMP_DIFF to avoid some of the casts (though it probably comes out the same in the end)

Conclusion

I conclude that Lockwise-Android baseline pings look good with the exception of a few more time-travellers than I would have expected, and a similar hole/dupe problem as seen on other apps.

I concur, though you probably need to take a look at durations again.

Flags: needinfo?(chutten)

Due to the fact that there doesn't appear to be a way to filter out the core ping data associated with the beta build that the Glean data came from, it appears that I will have to wait until Glean makes it into the release app. That release should happen today, according to the Lockwise team, and then once we collect a new week of information I'll come back and do a side-by-side comparison between the baseline and the core pings.

Flags: needinfo?(tlong)

After doing some initial comparison between core pings and baseline pings, it looks like Lockwise may be suffering from some of the baseline ping issues and the side-by-side may need to be looked at once Lockwise updates to a version of A-C which contains the baseline ping fixes. I should get an update on when this will happen in two days after the Lockwise team discusses it in their planning meeting.

After further investigation, it appears that the comparison between the baseline ping and the core ping were being affected more by the issues listed in Bug 1617243 than from the baseline ping 'force-close' issues.

Okay, here is my side-by-side comparision. I didn't do anything as fancy as an Iodide notebook, so I just created a gdoc with some links and charts. I'm open to any ideas as to what the document might be lacking or any other feedback.

Blocks: 1617926

Closing this as resolved, and there is a follow-up bug to come back and re-validate the counts once Lockwise-Android updates their version of A-C, and we determine how to proceed with the issue of not seeing the normal lifecycle events when the OS invokes the app for Autofill purposes.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED

Travis, could you kindly add your analysis as a markdown comment here instead of a google doc?

Flags: needinfo?(tlong)

(In reply to Alessio Placitelli [:Dexter] from comment #10)

Travis, could you kindly add your analysis as a markdown comment here instead of a google doc?

Actually nvm :)

Flags: needinfo?(tlong)
You need to log in before you can comment on or make changes to this bug.