Desktop: Add event telemetry for send tab

RESOLVED FIXED in Firefox 53

Status

()

defect
P1
normal
Rank:
1
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: eoger, Assigned: markh, NeedInfo)

Tracking

unspecified
Firefox 53
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox53 fixed)

Details

(Whiteboard: [send-tab][send-tab-funnel])

Attachments

(2 attachments, 3 obsolete attachments)

Reporter

Description

3 years ago
We may want to measure:

- How often do people use "all devices" instead of a particular device target
- Are they using send tab or send page or even send link?
- Are they sending tabs to a mobile device or a desktop one?
- Number of tabs sent on average (or per session)
Whiteboard: [send-tab]
I'm nominating this for a P1
Priority: -- → P1
alex to define what 'questions we have about the send tab UI'
Assignee: nobody → adavis
Whiteboard: [send-tab] → [sync-sp1]
Whiteboard: [sync-sp1] → [send-tab]
Duplicate of this bug: 1288126
I wrote down some questions I think we would like to answer in the below document.

I also included what we would have to track to answer these questions and finally, the calculation we would do with these to generate valuable insights.

Please comment on it if you believe something is missing or is unclear.

https://docs.google.com/document/d/1_zT9s6QnbsxRAGPt8GydrOUYDYa_NPO3FNAUSFhMN58/edit
Status: NEW → ASSIGNED
Duplicate of this bug: 1290976
I marked the bug around tracking the notifications of send tab as a dupe to this one because it did not feel clear why they should be separate. Ultimately we need to integrate telemetry to measure the success of send tab. How users interact with the feature and engage with the notification go hand in hand. 

The document linked above covers both.
Assignee: adavis → nobody
Status: ASSIGNED → NEW
Priority: P1 → --
Priority: -- → P2
NI: talk to karlof about review of doc.
Flags: needinfo?(adavis)
Priority: P2 → --
NI: ckarlof to review doc.
Flags: needinfo?(adavis) → needinfo?(ckarlof)
I reviewed the doc and had a meeting with adavis to discuss. Adavis is going to revise the above linked doc and reach out to rweiss or one of her folks.
Flags: needinfo?(ckarlof)
Alex is working on clarifying this. Let's resurface once it becomes actionable.
Assignee: nobody → adavis
Priority: -- → P3
@Rebecca, as discussed please review and sign-off on the following Analysis Plan, list of measurements and data review: https://docs.google.com/document/d/1_zT9s6QnbsxRAGPt8GydrOUYDYa_NPO3FNAUSFhMN58/edit#heading=h.ns2dtoyth425

@Katie, I am flagging you here to put this on your radar because once Rebecca signs off, the Sync team will work on integrating the telemetry on the client side. We will then need the Metrics team to help us get the data into the pipeline and into Re:Dash.

Thanks in advance. Let me know if I am missing anything.
Flags: needinfo?(rweiss)
Flags: needinfo?(kparlante)

Comment 12

3 years ago
:adavis, it's not the metrics team to get the data into the pipeline, it's the pipeline team.  I added the comment to the PRD.

I will sign off on the data review, though :kparlante should acknowledge your request in the pipeline.  To that end, even though I am signing off on the data review, please reply to this bug with your expectations about how quickly you want the data to be available in re:dash.  More specifically, do you need this data in re:dash immediately from launch day?  If so, please coordinate a schedule with :kparlante and your engineering team for timelines.
Flags: needinfo?(rweiss)
Assignee

Updated

3 years ago
Priority: P3 → P2
Assignee

Updated

3 years ago
Flags: needinfo?(markh)
Assignee

Comment 13

3 years ago
There are a number of pieces that we need to work through here. For background:

* There is a "main" ping that records most telemetry in the client. While this identifies how many devices the user has in their Sync account, it does not carry a "sync user id". For example, this means that it's basically impossible to determine that 2 different pings from 2 different desktop machines are actually the same Sync user. However, IIUC, there is a "client id" that does allow you to determine whether 2 different pings came from the same desktop installation. UITelemetry is carried in this ping.

* While our mobile platforms do record a "main" ping (I'm not sure about iOS TBH), I'm not sure if they carry how many devices the user has connected, and I've on idea what UITelemetry they record (ie, whether they record a "send tab" event)

* There is a sync ping that landed recently. This does *not* carry the "client id" stored in the main ping, but does include a (hashed) "sync user id" and a (hashed) "sync device id". This currently records nothing related to the UI.

* The Sync ping is not recorded for either mobile platform.

Because we are attempting to measure a "life-cycle" that crosses devices, the above poses some challenges for determining what "a sync user" actually means - for example, using just the "main" ping's UITelemetry capability, we can tell you that "a tab was sent XXXX times by multi-device sync users", but what we *can't* tell you reliably is whether that is a few Sync users using the feature many times, or many Sync users using the feature a few times. Similarly, using just the "main" ping, it seems impossible to track whether a single "send to device" on desktop-A was actually received by mobile-B - even assuming mobile-B recorded that event, which I'm fairly sure it doesn't.

Some of this analysis could be simplified if we avoided attempting to track a "sync user" - for example, if we just count the number of tabs sent and the number of tabs received, we can probably get a rough idea of how many sent tabs are being received - but similarly to the above, we can't tell if the failures are limited to a small number of Sync users, or evenly distributed across all users. Without a better statistical background I can't really comment on how important this is.

So at a minimum, we will need to:
* change desktop telemetry to record send and receive events.
* change mobile telemetry to record send and receive events.

If we really do need to track individual Sync users, we will also need to:
* ensure the telemetry for send and receive events is in the "sync" ping, rather than in the UITelemetry portion of the main ping where it most obviously would live - OR - change the main ping to record the Sync user-id.
* have mobile devices submit a "sync ping" using the same basic schema as desktop - possibly with a subset of the data we record, but with the user-id and device-id
* ensure the mobile devices include the send and receive events in this "sync" ping.

The work for getting the sync ping into re:dash is well underway (bug 1298763) and changing that to accommodate new UI specific data in these pings would probably be fairly easy once that's done. Similarly, having that same code process sync pings from mobile devices is likely to be far easier than getting those mobile devices to submit the ping in the first place.

I hope that helps clarify things a little. I suspect we might end up wanting a couple of phases - a first phase so we can get *some* data out without waiting the mobile teams, and subsequent phases done after we get better data from mobile to support this.
Flags: needinfo?(markh)
(In reply to Mark Hammond [:markh] from comment #13)
> The work for getting the sync ping into re:dash is well underway (bug
> 1298763) and changing that to accommodate new UI specific data in these
> pings would probably be fairly easy once that's done. Similarly, having that
> same code process sync pings from mobile devices is likely to be far easier
> than getting those mobile devices to submit the ping in the first place.
> 
> I hope that helps clarify things a little. I suspect we might end up wanting
> a couple of phases - a first phase so we can get *some* data out without
> waiting the mobile teams, and subsequent phases done after we get better
> data from mobile to support this.

I agree that it would be best to get it in our Sync ping. I've already mentioned this to Katie during an informal discussion. You seem to think it's better too so we're all aligned. I also think that the user-id will benefit us once we get mobile to integrate the Sync ping too.

Which leads me to mobile. Shall we create separate bugs to land Sync Ping in iOS and Android?

I'm ni? Barbara so she knows we will need to land telemetry on Android (and hopefully iOS).
Flags: needinfo?(markh)
Flags: needinfo?(bbermes)
Assignee: adavis → markh
Priority: P2 → P1
Assignee

Comment 15

3 years ago
(In reply to Alex Davis [:adavis] from comment #14)
> I agree that it would be best to get it in our Sync ping. I've already
> mentioned this to Katie during an informal discussion. You seem to think
> it's better too so we're all aligned.

TBH I'm not sure it is better in all cases. For example, moving it into the sync ping means that we will not be able to leverage any UITelemetry work being planned, or will mean we need to record some things in both pings. For example, you have a plan to use "send tab to device" to drive account signups - that obviously isn't going to be measurable in the sync ping as those users aren't syncing. IOW, you will not be able to get full engagement rates from either of the pings.

What I do think is that if we really want to track this per-user, then having it in the sync ping is the only way to achieve that. However, I'm not sure that we do really want to track this per-user, nor if that is actually practical given the volume of data - for example, doing it this way will mean it is impossible to take a random 10% sample of sync pings. A 10% sample would probably give us a fair estimation of "how often was it clicked", but not "how many users clicked it". The former seems fine to track engagement (and as mentioned above, it's impossible to use the sync ping to track engagement for non-sync users). I also don't really see why we need to track users to measure if the feature is actually working - raw numbers of "number sent" and "number received" without regard for userIDs can probably tell us that.

I'd really like to see someone with a data science background chime in here. I'm happy to charge ahead and work out how to add this to the sync ping, but I'm not actually convinced that's the best thing to do as yet.

> Which leads me to mobile. Shall we create separate bugs to land Sync Ping in
> iOS and Android?

I'm fairly sure that a bug will be necessary for this to happen.
Flags: needinfo?(markh)
Markh: 
You bring up great points about sampling. I have no idea how to get around that. At what level does sampling occur? Within the browser or in the pipeline?

I'd be disappointed to not know the engagement rate of non-sync users but I think we would capture the most important part... registration. We can track account sign-up via FxA flow events.

I'm not familiar enough with the pings to know if it will complicate anything else for us. I also thought UITelemetry was deprecated.

My worry about only measuring raw counts is that we won't really know much about behavior. It'll be a "best guess". 

*Example 1: If we see 100 sent and 100 received. Are we sure everything is ok? Did someone get their content twice? Did someone not get it?

*Example 2: What if we see 100 sent and 70 received. Do we have a 30% failure rate? Or is it broken for a few users that keep trying to resend?

Could we measure the discovery rate of the feature for Sync users with just UI Telemetry? To my knowledge, no.
Could we measure how often users are sending content? To my knowledge, no.

One thing that is for sure, we need to get someone in metrics to help us out. I'll reach out to Katie to find out more.
Flags: needinfo?(adavis)

Comment 17

3 years ago
markh and adavis:

We're working on a project to add events to Telemetry: a common format for the main ping and for custom pings. Whether you end up adding events to the main ping or to the sync ping (or both), I think we should coordinate. FWIW, "UI telemetry" will migrate to event telemetry, I'm guessing ~Q1 2017. We'd be delighted to work with you with Sync events as one of the first projects (ahead of UI telemetry more generally). Also, we really appreciate the documentation you've put together so far :).

Here are the project docs: https://wiki.mozilla.org/Event_Telemetry

I think a conversation with gfritzsche is a good next step.
Flags: needinfo?(kparlante) → needinfo?(gfritzsche)
Assignee

Comment 18

3 years ago
(In reply to Alex Davis [:adavis] from comment #16)
> Markh: 
> You bring up great points about sampling. I have no idea how to get around
> that. At what level does sampling occur? Within the browser or in the
> pipeline?

IIUC, it is in the pipeline. For example, the longitudinal dataset is generated by the pipeline from 1% of all pings. Currently the pipeline for the sync pings are taking 100% of pings, but I suspect that as our requirements are refined, we will end up creating something similar to the longitudinal set for Sync, and I expect that considering 100% of them will be problematic.

It is important to clarify how this sampling is done. For example, if we just select 1% of all pings, it will be very difficult to say anything about a "sync user" - we can't know how many pings for a user were considered, so a "missing" entry could be that it never existed, or that it simply wasn't in the sample. If we consider all pings for 1% of sync users, we can probably extrapolate better.

TBH, I'm not clear on how the sampling for longitudinal works - I was under the impression it was something like "all pings from a 1% sample of client_ids", but I'm really not sure.

> I'd be disappointed to not know the engagement rate of non-sync users but I
> think we would capture the most important part... registration. We can track
> account sign-up via FxA flow events.

I'm not sure we can capture this from the main ping - maybe you are talking about back-end telemetry?

> My worry about only measuring raw counts is that we won't really know much
> about behavior. It'll be a "best guess". 

I think that someone with a data background would be able to state this with a little more certainty. For example, data using the longitudinal set is (hopefully) considered to be slightly stronger than a "best guess" even though it uses a 1% sample. IMO it should be possible to get data with a high degree of confidence without considering 100% of pings. It's not at all clear we have the resources to make 100% of sync pings viable over a long period of time (and as soon as we limit the time scales we run the risk of important events being snipped anyway).

> *Example 1: If we see 100 sent and 100 received. Are we sure everything is
> ok? Did someone get their content twice? Did someone not get it?
> 
> *Example 2: What if we see 100 sent and 70 received. Do we have a 30%
> failure rate? Or is it broken for a few users that keep trying to resend?

I'm not clear on what we would do differently based on this. It would be somewhat interesting to know, but how would that change the decisions we make?

> Could we measure the discovery rate of the feature for Sync users with just
> UI Telemetry? To my knowledge, no.

Once we start recording these events, then yes, I believe we can - in exactly the same way we have already measured the discovery rate the "synced tabs" from sync and non-sync users.

> Could we measure how often users are sending content? To my knowledge, no.

As above, I believe we could. What we can't measure this way is how many sync users send content. ie, we can say that we recorded XXXX "send tab to device" actions. What we can't record is how many Sync users this XXXX represents.

Comment 19

3 years ago
> IIUC, it is in the pipeline. For example, the longitudinal dataset is
> generated by the pipeline from 1% of all pings. Currently the pipeline for
> the sync pings are taking 100% of pings, but I suspect that as our
> requirements are refined, we will end up creating something similar to the
> longitudinal set for Sync, and I expect that considering 100% of them will
> be problematic.

Correct, the longitudinal dataset is a sample of the pings from 1% of client_ids.

If needed, you could create a sync data set that was a sample of sync user ids (1% or 10% or whatever).

> > *Example 1: If we see 100 sent and 100 received. Are we sure everything is
> > ok? Did someone get their content twice? Did someone not get it?
> > 
> > *Example 2: What if we see 100 sent and 70 received. Do we have a 30%
> > failure rate? Or is it broken for a few users that keep trying to resend?
> 
> I'm not clear on what we would do differently based on this. It would be
> somewhat interesting to know, but how would that change the decisions we
> make?

I suspect you're going to want both: summary stats about events as well as the ability to analyze by user. You should be able to take a "longitudinal sample" whether the events are on the main ping or the sync ping or both.

Seems like these are considerations for putting events on the sync ping vs the main ping:
- the main ping doesn't have the sync user id. (Perhaps this can be revisited?)
- you may be able to roll sync ping changes into production faster than adding events to the main ping
- the main ping can collect events for non sync users

I wouldn't worry about what exists in UI telemetry right now, I'd look ahead to what events you want/need on the main ping, and then define and schedule that work.
Just an update here:
Katie, Mark, Georg and I met yesterday and discussed integration.

We talked about:
- how sampling would happen server side on syncId and would work across devices
- how a flow id would help to follow the user experience across devices
- how the sync ping makes the most sense due to user id
- how we can use event telemetry with the sync id
- if we use event telemetry, we don't need to write scala because we'll be pushing to an API
- that although event telemetry would make our lives a bit easier, it will only be available EOY or early 2017
- that event telemetry replaces UI telemetry and that UI telemetry will be shut off in 2017

Next step is to integrate our event tracking with the same syntax as event telemetry. This will give us flexibility (do it ourselves or use event telemetry). Although, we seem to be leaning towards being patient and working with metrics team to get their support and also to help them by being among the first teams to integrate this new API. This should free up a bit of Mark's time so he can focus on other Q4 objectives instead of writing scala.
Flags: needinfo?(adavis)
Flags: needinfo?(gfritzsche)
Assignee

Updated

3 years ago
Depends on: 1314879
Assignee

Comment 21

3 years ago
(In reply to Alex Davis [:adavis] from comment #20)
> - that although event telemetry would make our lives a bit easier, it will
> only be available EOY or early 2017
> - that event telemetry replaces UI telemetry and that UI telemetry will be
> shut off in 2017
> 
> Next step is to integrate our event tracking with the same syntax as event
> telemetry.

This patch is an early WIP which I think does what's described above - it arranges for the sync ping to have an "events" array in the top-level of the payload. The format is this is as described in Georg's document on event telemetry, and records 2 discrete events - when a tab is sent, and when it is received.

Georg, is this roughly what you had in mind? I *think* it will allow is later to reuse your event mechanism once it lands, and I don't see much value in pre-defining the "category" etc params as you have in your other patches at this time (but obviously we would need to define them once we switch to "real" events).
Attachment #8807438 - Flags: feedback?(gfritzsche)
Comment on attachment 8807438 [details] [diff] [review]
0002-Bug-1289536-add-events-to-Sync-ping-and-record-an-ev.patch

Review of attachment 8807438 [details] [diff] [review]:
-----------------------------------------------------------------

This looks like it matches the event telemetry format.
An output example would be nice to look at.

::: services/sync/modules/telemetry.js
@@ +125,5 @@
>  
> +// An "event" record. This class is hopefully temporary until the telemetry
> +// modules grow the ability to add events to our ping directly.
> +class EventRecord {
> +  constructor(category, method, object, value, extra = null) {

Do you actually need to keep those instances around?
Or could you just directly record these events in the array format ([ts, category, method, ...]) and avoid the serialization step later?

@@ +127,5 @@
> +// modules grow the ability to add events to our ping directly.
> +class EventRecord {
> +  constructor(category, method, object, value, extra = null) {
> +    // XXX - should do basic validation of the params.
> +    this.ts = tryGetMonotonicTimestamp();

You should `Math.floor()` this, sub-millisecond resolution is probably not needed (Telemetry will only submit positive integers).

@@ +451,4 @@
>      this.maxPayloadCount = Svc.Prefs.get("telemetry.maxPayloadCount");
>      this.submissionInterval = Svc.Prefs.get("telemetry.submissionInterval") * 1000;
>      this.lastSubmissionTime = Telemetry.msSinceProcessStart();
> +    this.maxEventsCount = Svc.Prefs.get("telemetry.maxEventsCount", 1000);

I've started to look into storage estimation for events in bug 1313592.
Realizing there that events can become rather expensive, we should estimate storage impact before putting too much work in recording events.
We can talk about this together.

@@ +461,5 @@
>        version: PING_FORMAT_VERSION,
>        syncs: this.payloads.slice(),
> +      events: this.events.length == 0 ? undefined : this.events.map(ev => ev.toJSON())
> +      // XXX should we also record the count of discarded events? I don't think
> +      // telemetry itself will record this, so we probably shouldn't either.

This is up to you and depends on if you need it.
Its probably enough to know that the limit was hit.
Attachment #8807438 - Flags: feedback?(gfritzsche) → feedback+
Assignee

Comment 23

3 years ago
(In reply to Georg Fritzsche [:gfritzsche] from comment #22)

> This looks like it matches the event telemetry format.
> An output example would be nice to look at.

I tested this by having a profile receive a tab from a different device, then sending a tab to a different device - the only 2 actions which record an event in this patch. about:telemetry shows:

"events": [
      [
        13794,
        "sync",
        "processcommand",
        "displayURI",
        "cwpK5pB9QScW"
      ],
      [
        26259,
        "sync",
        "sendcommand",
        "displayURI",
        "TVG6K64xThch"
      ]
    ]

> Do you actually need to keep those instances around?
> Or could you just directly record these events in the array format ([ts,
> category, method, ...]) and avoid the serialization step later?

Yeah, done.

> You should `Math.floor()` this, sub-millisecond resolution is probably not
> needed (Telemetry will only submit positive integers).

Done.

> I've started to look into storage estimation for events in bug 1313592.
> Realizing there that events can become rather expensive, we should estimate
> storage impact before putting too much work in recording events.
> We can talk about this together.

For this particular use-case, I think the number of events will be very small - 2 events (1 each on 2 different devices) will be recorded when the user selects "Send Tab to Device" from the context menu of a page. I expect this will be relatively rare - I expect the vast majority of syncs to report zero events, and heavy users may record a few per ping.

> This is up to you and depends on if you need it.
> Its probably enough to know that the limit was hit.

TBH, I think that knowing the number of events is exactly equal to the maximum is a good indicator we probably dropped some events.

Thom, as the defacto owner of the sync telemetry module, what do you think of this patch?

Georg, please let me know when you think the time is right to formally request review of this and get it landed?
Attachment #8807438 - Attachment is obsolete: true
Flags: needinfo?(gfritzsche)
Attachment #8808063 - Flags: feedback?(tchiovoloni)
Comment on attachment 8808063 [details] [diff] [review]
0002-Bug-1289536-add-events-to-Sync-ping-and-record-an-ev.patch

Review of attachment 8808063 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good for me, but the ping docs will need updating and IMO the schema should be a bit more descriptive.

::: services/sync/tests/unit/sync_ping_schema.json
@@ +13,5 @@
>        "minItems": 1,
>        "items": { "$ref": "#/definitions/payload" }
> +    },
> +    "events": {
> +      "type": "array"

Seems worth describing the type here. E.g. `{ "events": { "type": "array", "items": { "type": "array", "minItems": 4, "maxItems": 6 } } }` is probably enough, although if it's possible in JSON-schema to describe the types of the entries that might be good too.

Aside from documentation, it would also help catch errors in generating the ping once bug 1313573 lands.

Also we will want to update https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/docs/data/sync-ping.rst with these changes as well.
Attachment #8808063 - Flags: feedback?(tchiovoloni) → feedback+
Comment on attachment 8808063 [details] [diff] [review]
0002-Bug-1289536-add-events-to-Sync-ping-and-record-an-ev.patch

Review of attachment 8808063 [details] [diff] [review]:
-----------------------------------------------------------------

::: services/sync/modules/telemetry.js
@@ +523,5 @@
> +    // extra can be falsey, or an object with string names and values.
> +    if (extra) {
> +      for (let name in extra) {
> +        if (typeof name != "string" || typeof extra[name] != "string") {
> +          log.warn(`Invalid event parameters: extra item "${name} is invalid`, eventDetails);

One other thought: It's hard for me to think this level of checking is a *bad* thing, but it's more thorough than the rest of the code in this file.

If we were stricter in the schema, or if we threw instead of warning (here and probably above), it might make it more likely that we'd catch type errors here. (Personally, I'd prefer being more thorough in the schema over throwing here, since I don't think we should give up on the sync because of an error here...).

Up to you though, I'm fine with this, just thought I should mention it.
Assignee

Comment 26

3 years ago
(In reply to Thom Chiovoloni [:tcsc] from comment #24)
> Seems worth describing the type here. E.g. `{ "events": { "type": "array",
> "items": { "type": "array", "minItems": 4, "maxItems": 6 } } }` is probably
> enough

Yes, that does make sense.

> although if it's possible in JSON-schema to describe the types of
> the entries that might be good too.

Note that ultimately it is expected that Georg's work will allow the telemetry modules themselves to capture the events (ie, we'd call telemetry directly instead of building and validating the array ourselves) and then at submission time we'd ask those same telemetry modules for the "sync" events array to include in our ping. The telemetry modules will have strict validation (including a grammar described in YAML IIRC) and should be ready in Q1. For this reason I'm not sure it's worthwhile for us to include too much in the way of a schema - I see this as a temporary solution so we can record events before the underlying framework is ready.

So unless you or Georg object, I think I'll use the minimal description you mention above and leave the more formal schema for Georg's work.
Yep, sounds fine to me.
Assignee

Comment 28

3 years ago
(In reply to Georg Fritzsche [:gfritzsche] from comment #22)
> > +// modules grow the ability to add events to our ping directly.
> > +class EventRecord {
> > +  constructor(category, method, object, value, extra = null) {
> > +    // XXX - should do basic validation of the params.
> > +    this.ts = tryGetMonotonicTimestamp();
> 
> You should `Math.floor()` this, sub-millisecond resolution is probably not
> needed (Telemetry will only submit positive integers).

I notice the google doc with the spec mentions "A resolution of seconds should be sufficient and lessen privacy concerns." - should I be reducing the precision of this accordingly, or is milli-second resolution OK? Relatedly, the spec doesn't seem to mention what unit and origin the timestamp is.
(In reply to Mark Hammond [:markh] from comment #28)
> (In reply to Georg Fritzsche [:gfritzsche] from comment #22)
> > > +// modules grow the ability to add events to our ping directly.
> > > +class EventRecord {
> > > +  constructor(category, method, object, value, extra = null) {
> > > +    // XXX - should do basic validation of the params.
> > > +    this.ts = tryGetMonotonicTimestamp();
> > 
> > You should `Math.floor()` this, sub-millisecond resolution is probably not
> > needed (Telemetry will only submit positive integers).
> 
> I notice the google doc with the spec mentions "A resolution of seconds
> should be sufficient and lessen privacy concerns." - should I be reducing
> the precision of this accordingly, or is milli-second resolution OK?
> Relatedly, the spec doesn't seem to mention what unit and origin the
> timestamp is.

I increased the resolution to milliseconds because of platform team needs.
Floored milliseconds are what we will use and the spec needs more updating.
Flags: needinfo?(gfritzsche)
(In reply to Mark Hammond [:markh] from comment #23)
> Georg, please let me know when you think the time is right to formally
> request review of this and get it landed?

We will have discussions about the storage impact this week, which is the last point that could result in any change in the format.
Lets go for review of this on monday.
Assignee

Comment 31

3 years ago
(In reply to Georg Fritzsche [:gfritzsche] from comment #30)
> We will have discussions about the storage impact this week, which is the
> last point that could result in any change in the format.
> Lets go for review of this on monday.

Sounds great! In case it is relevant, I think we will end up using null for |value| and an object (2 keys, both string GUID values) as |extra|) - see bug 1314879.
Whiteboard: [send-tab] → [send-tab][send-tab-funnel]
Assignee

Comment 32

3 years ago
(In reply to Georg Fritzsche [:gfritzsche] from comment #30)
> We will have discussions about the storage impact this week, which is the
> last point that could result in any change in the format.
> Lets go for review of this on monday.

Georg, what was the outcome of that discussion? I'm also wondering if, now that bug 1302662 has landed, if there are any other changes you would like to see to this patch, or if I can put it up for final review (I've locally swapped the order of 2 params, but other than that, the patch is unchanged from the earlier version I posted before)
Flags: needinfo?(gfritzsche)
(In reply to Mark Hammond [:markh] from comment #32)
> (In reply to Georg Fritzsche [:gfritzsche] from comment #30)
> > We will have discussions about the storage impact this week, which is the
> > last point that could result in any change in the format.
> > Lets go for review of this on monday.
> 
> Georg, what was the outcome of that discussion? I'm also wondering if, now
> that bug 1302662 has landed, if there are any other changes you would like
> to see to this patch, or if I can put it up for final review

This should be good to go.
The only last thing to consider is that i tuned the limits a little bit (see below), but that shouldn't really affect things here.
It would be good to apply sanity-limits to at least the value & extra string lengths though.

* category: maximum string length: 30
* method: maximum string length: 20
* object: maximum string length: 20
* value: maximum string length: 80
* extra:
  - maximum key count: 10
  - maximum key string length: 15
  - maximum value string length: 80
Flags: needinfo?(gfritzsche)
Comment hidden (mozreview-request)
Assignee

Comment 36

3 years ago
mozreview-review
Comment on attachment 8817237 [details]
Bug 1289536 (part 1) - Add events to Sync ping.

https://reviewboard.mozilla.org/r/97606/#review97930

::: services/sync/modules/telemetry.js:132
(Diff revision 1)
>    return -1;
>  }
>  
> +// This function validates the payload of a telemetry "event" - this can be
> +// removed once there are APIs available for the telemetry modules to collect
> +// these events (bug XXXXXXX) - but for now we simulate that planned API as best

Georg, is there an existing bug that describes this, or should I open a new one?

Comment 37

3 years ago
mozreview-review
Comment on attachment 8817238 [details]
Bug 1289536 (part 2) - Add a unique flowID GUID to each command sent via the clients collection.

https://reviewboard.mozilla.org/r/97608/#review98204

::: services/sync/modules/engines/clients.js:494
(Diff revision 1)
>      }
>  
>      let action = {
>        command: command,
>        args: args,
> +      flowID: flowId || Utils.makeGUID(), // used for telemetry.

Can we s/flowId/flowID throughout? This distinction is likely to cause human error.

::: services/sync/modules/engines/clients.js:498
(Diff revision 1)
>        args: args,
> +      flowID: flowId || Utils.makeGUID(), // used for telemetry.
>      };
>  
>      this._log.trace("Client " + clientId + " got a new action: " + [command, args]);
>      this._addClientCommand(clientId, action);

`_addClientCommand` can be a no-op, but you're still recording telemetry as if a new command was sent. Sending the same tab twice will make it look like success rate is 50% lower.

You should add a return value to `_addClientCommand` and check it, bailing out before recording telemetry if the command wasn't queued.

::: services/sync/modules/engines/clients.js:500
(Diff revision 1)
>      };
>  
>      this._log.trace("Client " + clientId + " got a new action: " + [command, args]);
>      this._addClientCommand(clientId, action);
>      this._tracker.addChangedID(clientId);
> +    let deviceId = this.service.identity.hashedDeviceID(clientId);

`hashedDeviceID` can throw if we don't have a token for the user, which will cause an error here. Don't let telemetry cause a failure!

::: services/sync/modules/engines/clients.js:525
(Diff revision 1)
>        // Process each command in order.
>        for (let rawCommand of commands) {
> -        let {command, args} = rawCommand;
> +        let {command, args, flowID} = rawCommand;
>          this._log.debug("Processing command: " + command + "(" + args + ")");
>  
> +        this.service.recordTelemetryEvent("processcommand", command, undefined,

I can see the arguments to `recordTelemetryEvent` getting confused. Is there a reason not to use an object as the argument instead of positional arguments?

::: services/sync/modules/engines/clients.js:579
(Diff revision 1)
>     *        Command to invoke on remote clients
>     * @param args
>     *        Array of arguments to give to the command
>     * @param clientId
>     *        Client ID to send command to. If undefined, send to all remote
>     *        clients.

```
@param flowId
       A unique identifier used to track success for this operation across devices.
```

::: services/sync/modules/engines/clients.js:600
(Diff revision 1)
>      if (clientId) {
> -      this._sendCommandToClient(command, args, clientId);
> +      this._sendCommandToClient(command, args, clientId, flowId);
>      } else {
>        for (let [id, record] of Object.entries(this._store._remoteClients)) {
>          if (!record.stale) {
> -          this._sendCommandToClient(command, args, id);
> +          this._sendCommandToClient(command, args, id, flowId);

You're using the same flowId for multiple clients. When each displays, you'll record a `processcommand` telemetry event for the same flow. Is that intended?
Attachment #8817238 - Flags: review?(rnewman) → review-
Assignee

Comment 38

3 years ago
(In reply to Richard Newman [:rnewman] from comment #37)
> Can we s/flowId/flowID throughout? This distinction is likely to cause human
> error.

heh - I actually used flowId here as I guessed you would have asked me to change it to "flowId" had I used "flowID", as this code tends to use Id locally (eg, clientId), even though our telemetry code already uses ID :)

So done - flowID is now used exclusively here.

> >      this._log.trace("Client " + clientId + " got a new action: " + [command, args]);
> >      this._addClientCommand(clientId, action);
> 
> `_addClientCommand` can be a no-op, but you're still recording telemetry as
> if a new command was sent. Sending the same tab twice will make it look like
> success rate is 50% lower.
> 
> You should add a return value to `_addClientCommand` and check it, bailing
> out before recording telemetry if the command wasn't queued.

Good point - if it returns false it also seems unnecessary to add the clientId to the tracker, so I also did that.

> `hashedDeviceID` can throw if we don't have a token for the user, which will
> cause an error here. Don't let telemetry cause a failure!

Fair enough - I now catch an exception and just don't record the deviceID in that case (although I don't believe it can actually throw in practice - it will only throw if we never once had a token, and I don't believe we could get here in that case)

> I can see the arguments to `recordTelemetryEvent` getting confused. Is there
> a reason not to use an object as the argument instead of positional
> arguments?

I wrote it this way to mirror the equivalent nsITelemetry method (https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/nsITelemetry.idl#488) that we will eventually end up using once it is ready and capable of recording an event for a ping other than the main ping - this method is really just temporary until that time. I didn't change this, but can if you or Georg feel strongly enough about it.

> @param flowId
>        A unique identifier used to track success for this operation across
> devices.

Doh - yeah, thanks.

> You're using the same flowId for multiple clients. When each displays,
> you'll record a `processcommand` telemetry event for the same flow. Is that
> intended?

Yes - each event will record a different device ID, so I think this makes sense (eg, it will allow us to track that the command did indeed make it to each device).

Thanks for the review - new patch forthcoming.
Comment hidden (mozreview-request)
Assignee

Updated

3 years ago
Attachment #8817237 - Attachment is obsolete: true
Attachment #8817237 - Flags: review?(gfritzsche)
Comment hidden (mozreview-request)
Assignee

Updated

3 years ago
Attachment #8808063 - Attachment is obsolete: true

Comment 41

3 years ago
mozreview-review
Comment on attachment 8817238 [details]
Bug 1289536 (part 2) - Add a unique flowID GUID to each command sent via the clients collection.

https://reviewboard.mozilla.org/r/97608/#review99120
Attachment #8817238 - Flags: review?(rnewman) → review+
Summary: Add UI telemetry for send tab → Desktop: Add event telemetry for send tab

Comment 42

3 years ago
mozreview-review
Comment on attachment 8817714 [details]
Bug 1289536 (part 1) - Add events to Sync ping.

https://reviewboard.mozilla.org/r/97934/#review103310

Sorry for the long delay, this slipped over the holiday and travel season.

The event recording and serialization looks good to me!

This seems to be still missing a documentation update for `toolkit/components/telemetry/docs/data/sync-ping.rst`, including the recorded event data.
Is this coming in a separate patch? The doc update will be required for data collection review anyway.
The doc should include the recorded event method, object, etc. In the future this should be easier to do using a standard format like Events.yaml.

::: services/sync/modules/telemetry.js:546
(Diff revision 1)
>      }
>    }
>  
> +  _recordEvent(eventDetails) {
> +    if (this.events.length >= this.maxEventsCount) {
> +      log.warn("discarding event - already queued our maximum", topic);

`topic` seems undefined here?

::: services/sync/modules/telemetry.js:558
(Diff revision 1)
> +    }
> +    log.debug("recording event", eventDetails);
> +
> +    let { object, method, value, extra } = eventDetails;
> +    let category = "sync";
> +    let ts = Math.floor(tryGetMonotonicTimestamp()); // 1-second resolution for privacy

The comment is outdated? This is recording ms now.
Attachment #8817714 - Flags: review?(gfritzsche) → review+
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Assignee

Updated

3 years ago
Depends on: 1329530
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Assignee

Comment 47

3 years ago
(In reply to Mark Hammond [:markh] from comment #36)
> Comment on attachment 8817237 [details]
> Bug 1289536 (part 1) - Add events to Sync ping.
> > +// This function validates the payload of a telemetry "event" - this can be
> > +// removed once there are APIs available for the telemetry modules to collect
> > +// these events (bug XXXXXXX) - but for now we simulate that planned API as best
> 
> Georg, is there an existing bug that describes this, or should I open a new
> one?

I opened bug 1329530 and updated the comment.

(In reply to Georg Fritzsche [:gfritzsche] from comment #42)
> This seems to be still missing a documentation update for
> `toolkit/components/telemetry/docs/data/sync-ping.rst`, including the
> recorded event data.
> Is this coming in a separate patch?

I updated this patch with those docs.

> The doc update will be required for data
> collection review anyway.

Comment 12 gave us data collection review.

> `topic` seems undefined here?

Fixed.

> The comment is outdated? This is recording ms now.

I removed the comment.

Thanks.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 50

3 years ago
Pushed by mhammond@skippinet.com.au:
https://hg.mozilla.org/integration/autoland/rev/30375e031e09
(part 1) - Add events to Sync ping. r=gfritzsche
https://hg.mozilla.org/integration/autoland/rev/3cb6640378ea
(part 2) - Add a unique flowID GUID to each command sent via the clients collection. r=rnewman
No longer depends on: 1329530

Comment 51

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/30375e031e09
https://hg.mozilla.org/mozilla-central/rev/3cb6640378ea
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 53
Assignee

Updated

2 years ago
Depends on: 1338439
You need to log in before you can comment on or make changes to this bug.