Add a "reason" field to command-received telemetry events
Categories
(Firefox :: Sync, enhancement)
Tracking
()
Tracking | Status | |
---|---|---|
firefox80 | --- | fixed |
People
(Reporter: rfkelly, Assigned: lougenia)
References
(Blocks 1 open bug)
Details
Attachments
(2 files, 3 obsolete files)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
3.55 KB,
text/plain
|
chutten
:
data-review+
|
Details |
The sync ping contains "command-received" events designed to measure the success rate of send-tab. I don't believe there is currently a way to tell whether the command was received as a result of a push notification (which is the intended and best behaviour) or as a result of periodic background polling (which is fallback with poorer user experience).
We have quite a few reports of push-based receipt of tabs not working well. We could add a "reason" field to these events to distinguish between the two cases, and get a sense of how well the push-based delivery of tabs is actually working in practice.
Reporter | ||
Comment 1•4 years ago
|
||
(The situation is probably not quite as simple as I make out in the description above, because a single push can trigger us to fetch many sent tabs in a single operation. I recall us having some metrics around the number of tabs discovered via fallback polling in the past, but removing them.)
Assignee | ||
Comment 2•4 years ago
|
||
(In reply to Ryan Kelly [:rfkelly] from comment #1)
(The situation is probably not quite as simple as I make out in the description above, because a single push can trigger us to fetch many sent tabs in a single operation. I recall us having some metrics around the number of tabs discovered via fallback polling in the past, but removing them.)
Do you have any references to that fallback polling work that I could review?
Comment 3•4 years ago
|
||
(In reply to Ryan Kelly [:rfkelly] from comment #1)
(The situation is probably not quite as simple as I make out in the description above, because a single push can trigger us to fetch many sent tabs in a single operation. I recall us having some metrics around the number of tabs discovered via fallback polling in the past, but removing them.)
Those metrics were never actually removed - they just expired. They are here and I re-enabled them as part of bug 1644598. Note however that this polling function is only called when the user hits "sync now", so I'm not convinced the metrics will be useful (eg, if a user never hits "sync now", then they will never record they had missed commands even though they do.)
Reporter | ||
Comment 4•4 years ago
|
||
Adding notes/context.
The telemetry event is emitted here, and I expect the reason
field could be added alongside flowID
and streamID
in the event properties. As a new data collection, it will require data review.
The entrypoint for receiving a command is pollDeviceCommands, which already has the metrics Mark links to above. The receivedIndex
argument, if non-zero, tells you which command we got a push message about, so I expect that if you plumb this down into the call to _handleCommands
, you can use this to determine whether each individual received command was fetched thanks to a push (its index == the received index) or not.
The API that you're talkkng to here (and the shape of the returned commands) is documented here.
Reporter | ||
Comment 5•4 years ago
|
||
As a mental check that this is worth doing: with this extra data in place, I expect we can look at all the command-received
telemetry events on a given day, and determine what percentage were retrieved as a result of a push notification and what percentage weren't (e.g. they were retreived via "sync now", or discovered in the process of fetching some other command from a push notification). We want to aim for most tabs being received via push notification.
Note however that this polling function is only called when the user hits "sync now", so I'm not convinced the metrics will be useful
(eg, if a user never hits "sync now", then they will never record they had missed commands even though they do.)
Noting for completeness that the approach in Comment 4 would help with this usefulness problem, because it will log cases when the user discovers missed commands as a side-effect of handling a push message for a more recent command. But it won't help for commands that are never received at all.
Comment 6•4 years ago
|
||
(In reply to Ryan Kelly [:rfkelly] from comment #5)
Determine what percentage were retrieved as a result of a push notification and what percentage weren't (e.g. they were retreived via "sync now", or discovered in the process of fetching some other command from a push notification). We want to aim for most tabs being received via push notification.
Right - however, as you imply, it's subtle - it's more about "did we get a push notification for this tab". IOW, if I sent a tab last week but didn't get a push, then this week I sent a tab and did get a push, I'll see both tabs based on getting that push. We only want that second tab to count as "via push" and that first tab should count as "via {something else}".
I was a bit worried that was going to be really tricky - but I think it's only going to be a little tricky :) For whoever takes this (IIUC, Lougenia?) I think this means:
The entry-point here is here
- This will be called with 0 when we are polling for missed commands via "sync now", or it will be called with an "index" when called via push.
- However, at line 116 we adjust this to be the index we last saw - which will almost certainly be non-zero, but less than
receivedIndex
if specified. - Each message we get back in the payload has an
index
property. - If the original
receivedIndex
is zero (ie, we are polling), then every command is considered missed. - If the original
receivedIndex
is non-zero, every message with an index less than that value is considered missed - even though it was a push notification that caused us to see it.
The slightly tricky part here is that the code which sends the telemetry event doesn't know these indexes and so can't know if it should be "missed" or not - but it shouldn't be difficult to pass the 'reason' down to where it's recorded.
Assuming that's correct: I actually only came here to ask what values we should record, and whether it's worth trying to capture the 3 possible cases:
- explicit poll for no good reason (ie, "sync now" mashed)
- push for a different command and we noticed the old one?
- push for this command (which is what we want to see!)
I can't see a good reason not to record all 3 - so I propose reason be one of poll
, push-missed
, push
?
(push-missed
kinda sucks, but it doesn't matter as it's just for us, but bike-sheds welcome ;)
Assignee | ||
Comment 7•4 years ago
|
||
Assignee | ||
Comment 8•4 years ago
|
||
Updated•4 years ago
|
Assignee | ||
Comment 9•4 years ago
|
||
Assignee | ||
Comment 10•4 years ago
|
||
Comment 11•4 years ago
|
||
Comment on attachment 9162025 [details] data-review-request.md There is a [new version](https://github.com/mozilla/data-review/blob/master/request.md) of the data review request form with a question pertaining to the location of the collection's documentation that should be used instead of the older version.
Assignee | ||
Comment 12•4 years ago
|
||
Replacing the old data review request form using the new template.
Comment 13•4 years ago
|
||
We probably want to add this to https://searchfox.org/mozilla-central/source/toolkit/components/telemetry/docs/data/sync-ping.rst and then you can add a link to it to the data review.
Assignee | ||
Comment 14•4 years ago
|
||
Assignee | ||
Comment 15•4 years ago
|
||
(In reply to Mark Hammond [:markh] [:mhammond] from comment #13)
We probably want to add this to https://searchfox.org/mozilla-central/source/toolkit/components/telemetry/docs/data/sync-ping.rst and then you can add a link to it to the data review.
Just updated and reloaded the data form--thanks.
Assignee | ||
Updated•4 years ago
|
Comment 16•4 years ago
|
||
Comment on attachment 9162831 [details]
data-review-request2.md
DATA COLLECTION REVIEW RESPONSE:
Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?
Yes.
Is there a control mechanism that allows the user to turn the data collection on and off?
Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.
If the request is for permanent data collection, is there someone who will monitor the data over time?
Yes, Lougenia Bailey is responsible.
Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?
Category 2, Interaction.
Is the data collection request for default-on or default-off?
Default on for all channels.
Does the instrumentation include the addition of any new identifiers?
No.
Is the data collection covered by the existing Firefox privacy notice?
Yes.
Does there need to be a check-in in the future to determine whether to renew the data?
No. This collection is permanent.
---
Result: datareview+
Comment 17•4 years ago
|
||
Pushed by lobailey@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9c735f9393c9 Added reason field to command-received telemetry events r=markh,rfkelly
Comment 18•4 years ago
|
||
bugherder |
Description
•