Closed Bug 1639843 Opened 4 months ago Closed 3 months ago

Add a "reason" field to command-received telemetry events

Categories

(Firefox :: Sync, enhancement)

enhancement

Tracking

()

RESOLVED FIXED
Firefox 80
Tracking Status
firefox80 --- fixed

People

(Reporter: rfkelly, Assigned: lougenia)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 3 obsolete files)

The sync ping contains "command-received" events designed to measure the success rate of send-tab. I don't believe there is currently a way to tell whether the command was received as a result of a push notification (which is the intended and best behaviour) or as a result of periodic background polling (which is fallback with poorer user experience).

We have quite a few reports of push-based receipt of tabs not working well. We could add a "reason" field to these events to distinguish between the two cases, and get a sense of how well the push-based delivery of tabs is actually working in practice.

(The situation is probably not quite as simple as I make out in the description above, because a single push can trigger us to fetch many sent tabs in a single operation. I recall us having some metrics around the number of tabs discovered via fallback polling in the past, but removing them.)

(In reply to Ryan Kelly [:rfkelly] from comment #1)

(The situation is probably not quite as simple as I make out in the description above, because a single push can trigger us to fetch many sent tabs in a single operation. I recall us having some metrics around the number of tabs discovered via fallback polling in the past, but removing them.)

Do you have any references to that fallback polling work that I could review?

(In reply to Ryan Kelly [:rfkelly] from comment #1)

(The situation is probably not quite as simple as I make out in the description above, because a single push can trigger us to fetch many sent tabs in a single operation. I recall us having some metrics around the number of tabs discovered via fallback polling in the past, but removing them.)

Those metrics were never actually removed - they just expired. They are here and I re-enabled them as part of bug 1644598. Note however that this polling function is only called when the user hits "sync now", so I'm not convinced the metrics will be useful (eg, if a user never hits "sync now", then they will never record they had missed commands even though they do.)

Adding notes/context.

The telemetry event is emitted here, and I expect the reason field could be added alongside flowID and streamID in the event properties. As a new data collection, it will require data review.

The entrypoint for receiving a command is pollDeviceCommands, which already has the metrics Mark links to above. The receivedIndex argument, if non-zero, tells you which command we got a push message about, so I expect that if you plumb this down into the call to _handleCommands, you can use this to determine whether each individual received command was fetched thanks to a push (its index == the received index) or not.

The API that you're talkkng to here (and the shape of the returned commands) is documented here.

As a mental check that this is worth doing: with this extra data in place, I expect we can look at all the command-received telemetry events on a given day, and determine what percentage were retrieved as a result of a push notification and what percentage weren't (e.g. they were retreived via "sync now", or discovered in the process of fetching some other command from a push notification). We want to aim for most tabs being received via push notification.

Note however that this polling function is only called when the user hits "sync now", so I'm not convinced the metrics will be useful
(eg, if a user never hits "sync now", then they will never record they had missed commands even though they do.)

Noting for completeness that the approach in Comment 4 would help with this usefulness problem, because it will log cases when the user discovers missed commands as a side-effect of handling a push message for a more recent command. But it won't help for commands that are never received at all.

(In reply to Ryan Kelly [:rfkelly] from comment #5)

Determine what percentage were retrieved as a result of a push notification and what percentage weren't (e.g. they were retreived via "sync now", or discovered in the process of fetching some other command from a push notification). We want to aim for most tabs being received via push notification.

Right - however, as you imply, it's subtle - it's more about "did we get a push notification for this tab". IOW, if I sent a tab last week but didn't get a push, then this week I sent a tab and did get a push, I'll see both tabs based on getting that push. We only want that second tab to count as "via push" and that first tab should count as "via {something else}".

I was a bit worried that was going to be really tricky - but I think it's only going to be a little tricky :) For whoever takes this (IIUC, Lougenia?) I think this means:

The entry-point here is here

  • This will be called with 0 when we are polling for missed commands via "sync now", or it will be called with an "index" when called via push.
  • However, at line 116 we adjust this to be the index we last saw - which will almost certainly be non-zero, but less than receivedIndex if specified.
  • Each message we get back in the payload has an index property.
  • If the original receivedIndex is zero (ie, we are polling), then every command is considered missed.
  • If the original receivedIndex is non-zero, every message with an index less than that value is considered missed - even though it was a push notification that caused us to see it.

The slightly tricky part here is that the code which sends the telemetry event doesn't know these indexes and so can't know if it should be "missed" or not - but it shouldn't be difficult to pass the 'reason' down to where it's recorded.

Assuming that's correct: I actually only came here to ask what values we should record, and whether it's worth trying to capture the 3 possible cases:

  • explicit poll for no good reason (ie, "sync now" mashed)
  • push for a different command and we noticed the old one?
  • push for this command (which is what we want to see!)

I can't see a good reason not to record all 3 - so I propose reason be one of poll, push-missed, push?

(push-missed kinda sucks, but it doesn't matter as it's just for us, but bike-sheds welcome ;)

Assignee: nobody → lougenia
Status: NEW → ASSIGNED
Attached file data-review-request.md (obsolete) —
Attached file data-review-request.md (obsolete) —
Attachment #9162015 - Attachment is obsolete: true
Attachment #9162025 - Flags: data-review?(chutten)
Comment on attachment 9162025 [details]
data-review-request.md

There is a [new version](https://github.com/mozilla/data-review/blob/master/request.md) of the data review request form with a question pertaining to the location of the collection's documentation that should be used instead of the older version.
Attachment #9162025 - Flags: data-review?(chutten)
Attached file data-review-request2.md (obsolete) —

Replacing the old data review request form using the new template.

Attachment #9162025 - Attachment is obsolete: true
Attachment #9162301 - Flags: data-review?(chutten)

We probably want to add this to https://searchfox.org/mozilla-central/source/toolkit/components/telemetry/docs/data/sync-ping.rst and then you can add a link to it to the data review.

Attachment #9162301 - Attachment is obsolete: true
Attachment #9162301 - Flags: data-review?(chutten)

(In reply to Mark Hammond [:markh] [:mhammond] from comment #13)

We probably want to add this to https://searchfox.org/mozilla-central/source/toolkit/components/telemetry/docs/data/sync-ping.rst and then you can add a link to it to the data review.

Just updated and reloaded the data form--thanks.

Attachment #9162831 - Flags: data-review?(chutten)
Comment on attachment 9162831 [details]
data-review-request2.md

DATA COLLECTION REVIEW RESPONSE:

    Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes.

    Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.

    If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes, Lougenia Bailey is responsible.

    Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 2, Interaction.

    Is the data collection request for default-on or default-off?

Default on for all channels.

    Does the instrumentation include the addition of any new identifiers?

No.

    Is the data collection covered by the existing Firefox privacy notice?

Yes.

    Does there need to be a check-in in the future to determine whether to renew the data?

No. This collection is permanent.

---
Result: datareview+
Attachment #9162831 - Flags: data-review?(chutten) → data-review+
Pushed by lobailey@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9c735f9393c9
Added reason field to command-received telemetry events r=markh,rfkelly
Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 80
You need to log in before you can comment on or make changes to this bug.