Closed Bug 1639781 Opened 1 year ago Closed 11 months ago

Send-tab telemetry should record a new `streamID` GUID to assist analysis when sending to multiple devices


(Firefox :: Sync, enhancement)




Firefox 79
Tracking Status
firefox79 --- fixed


(Reporter: rfkelly, Assigned: markh)


(Blocks 1 open bug)



(2 files)

Send-tab includes a randomly-generated "flow id" in the metrics it emits via telemetry, with the intention that we can use it to match any given "command-sent" event to the corresponding "command-received" event emitted by another device. The code is here:

And it generates a fresh flowID on each call to the send() method.

However, the send() method supports sending a tab to multiple devices in a single operation, to power e.g. the "send to all devices" feature. IIUC the code will currently send the same flowID to all N devices, then emit N "command-sent" telemetry events with the same flowID. This complicates analysis when matching events via flowID, because we have to try to disambiguate up to N "command-received" events by matching in device ID.

Could we simplify analysis by using a different flowID for each unique send to each unique device? Or is the ability to tie together sends to multiple devices important for other reasons?

Flags: needinfo?(markh)

I seem to recall this bug being opened in the past? :) I don't think there's a good reason for the current situation. A theoretical benefit of how things are done now is slightly better tracking when sending to multiple devices, but I don't believe that was the reason it's done this way (and that's probably not strictly true anyway - maybe we could just ensure the exact same timestamp is used in that scenario?)

Flags: needinfo?(markh)

I made a redash query to explore how often this happens in practice:

The good news is that the vast majority of command-sent and command-received events correspond to a single flowID, so optimistically I expect the effect of this on any aggregate analysis to be small.

One interesting tidbit: it's vastly more common (like, an order of magnitude more common) for there to be multiple command-set events with the same flowID, than for there to be multiple command-received events with the the same flowID. It makes me wonder if some users go to send themselves a tab, see that their device list contains many stale duplicates, and select "send to all devices" as an easy way to ensure it gets delivered to their one still-active device.

This complicates analysis when matching events via flowID, because we have to try to
disambiguate up to N "command-received" events by matching in device ID.

In theory, dealing with the current state of affairs should be a simple matter of joining "command-sent" to "command-received" using the (flowID, targetDeviceID) tuple rather than just the flowID. In practice we seem to be missing a surprisingly high number of device ids (Bug 1639831) so that risks some false negatives.

From slack conversation with :markh, I think the current implementation gives us the worst of both worlds in the tradeoff between information-content and ease-of-use:

  • If the flowID was unique to the largest single UX event when sending a tab (multiple tabs sent to multiple devices via one UX operation) then it would provide the most information about the user experience.
  • If the flowID was unique to the smallest atomic unit of sending a tab (one tab sent to one device) then it would simplify analysis.

But it currently occupies this middle ground (one tab sent to multiple devices) that doesn't optimize for either outcome.

Assignee: nobody → markh
Summary: Send-tab telemetry should use a different flowId for each device, when sending to multiple devices → Send-tab telemetry should record a new `streamID` GUID to assist analysis when sending to multiple devices
Attached file data-review.txt
Attachment #9157561 - Flags: data-review?(chutten)
Comment on attachment 9157561 [details]

For future requests, please provide the link to the publicly-hosted version of the documentation instead of the relative path in the source tree.


    Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?


    Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.

    If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes,  :markh is responsible.

    Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 2, Interaction.

    Is the data collection request for default-on or default-off?

Default on for all channels.

    Does the instrumentation include the addition of any new identifiers?

Yes. This collection adds an identifier for an internal implementation detail ( a "stream" of a "flow" ).

    Is the data collection covered by the existing Firefox privacy notice?


    Does there need to be a check-in in the future to determine whether to renew the data?

No. This collection is permanent.

Result: datareview+
Attachment #9157561 - Flags: data-review?(chutten) → data-review+
Pushed by
record a new 'streamID' guid in sync-tab telemetry. r=rfkelly
Closed: 11 months ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 79
You need to log in before you can comment on or make changes to this bug.