Closed Bug 1571584 Opened 1 year ago Closed 9 months ago

Ensure appropriate metrics are emitted when signed in to Firefox, but not using Sync

Categories

(Firefox :: Firefox Accounts, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: rfkelly, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

As noted in Bug 1570565 Comment 2, we currently depend on the cert_signed metrics event for measuring signed-in user activity, and that event is more reliably triggered by the browser when it syncs. In a world where you can be signed in to Firefox without necessarily syncing, we need to ensure that we don't lose visibility into account-related activity.

I expect that Firefox will still generate cert_signed metrics events on a regular basis for signed-in users, due to periodic background refresh of profile information. But this could also be an opportunity to take a fresh look at how we plan to measure things in this new world, and ensure that the metrics are in place to support it.

For example, how will we differentiate between users who are and are not syncing? Should we include new information in the "sync ping" for other services being used in the browser. Should it stop being called the "sync ping" and start being called the "accounts ping"?

Alex, Ana, Leif, please take a moment to think about metrics events we would want to trigger here, so that we can build them in as part of the initial engineering work

Flags: needinfo?(amedinac)
Flags: needinfo?(adavis)

In the long term, we'd like to be measuring engagement of the users on a per service basis, however, as a first step we'd like to:

  1. [Edited] Add measurements to the "main" ping, that records the users connecting or disconnecting to each service:
  • Proxy
  • Send
  • Monitor
  • Sync
  • Maybe VPN
  1. We'd also like to record when (the timestamp) the user signs in/out

Here are the meeting notes: https://docs.google.com/document/d/1JXmUgimZ7XaEVX5IkNS2fUlbS1GTDdo5EKzejmaFoGA/edit

Flags: needinfo?(amedinac)

Add measurements to the "main" ping, that records the users connecting or disconnecting to each service
[...]

  • Send

I don't think it's meaningful to speak of "connecting" or "disconnecting" Send in the browser as something different to signing in/out of Send on the web. What do we want to learn here that we don't already know from the existing Send OAuth metrics, is it specifically about whether users are using Send in a signed-in Firefox and being able to tie that back to other things they're doing in that browser?

  • Monitor

There isn't any UI for the user to disconnect Monitor from within the browser; if you view about:protections and you're enabled in Monitor-the-service, it will show your Monitor report.

Flags: needinfo?(adavis)

Sorry for the delay.

But this could also be an opportunity to take a fresh look at how we plan to measure things in this new world, and ensure that the metrics are in place to support it.

Yes, I think it's the right opportunity. The choice may be to not do a single thing now and to postpone but I think we should reflect on that. Why postpone? I'll elaborate below. (think ecosystem telemetry)

how will we differentiate between users who are and are not syncing?

I have no idea and I'd like us to be able to differentiate. In the short term, however, there won't be more services so it might buy us some time for the ideal solution.

Should we include new information in the "sync ping" for other services being used in the browser. Should it stop being called the "sync ping" and start being called the "accounts ping"?

I think Mark proposes in bug 1580342 to pass the sync ping for all account users. I think that this would certainly allow us to continue to get Send Tab telemetry even once it's decoupled from Sync but it's unclear to me if there are other short term benefits that we wouldn't get from the main ping or account telemetry. Are there?

Longer-term, it's a valid question. Should Sync ping become the accounts ping where we pass other service telemetry? This is where I think we start to unpack all of the complexity of the projects happening in parallel. Leif and I are working on the ecosystem-telemetry which many have proposed to call "account telemetry" (but let's forget that name for now to avoid confusion). In Q4, we plan to pass our first events to the ecosystem telemetry starting with Lockwise.

How does ecosystem telemetry impact sync telemetry? Should it deprecate it? Are they complementary? If so what are their differences?

There's one last thing we didn't cover. Main telemetry. There's currently a flag to indicate if a user has sync enabled or not. How does that change?

To recap, we need to revise:

  • Main ping sync flag with the decoupling
  • Sync ping (now vs later with ecosystem telemetry)
  • FxA telemetry with cert_signed and telling users apart

Leif, thoughts? This scope is getting pretty big and it seems like the time to lay out the long-term strategy for all of this.

Flags: needinfo?(loines)

(In reply to Alex Davis [:adavis] [PM FxA+Sync] from comment #3)

I think Mark proposes in bug 1580342 to pass the sync ping for all account users. I think that this would certainly allow us to continue to get Send Tab telemetry even once it's decoupled from Sync but it's unclear to me if there are other short term benefits that we wouldn't get from the main ping or account telemetry. Are there?

Beyond "send tab" I can't think of anything.

Longer-term, it's a valid question. Should Sync ping become the accounts ping where we pass other service telemetry?

That was roughly my idea, yes.

How does ecosystem telemetry impact sync telemetry? Should it deprecate it? Are they complementary? If so what are their differences?

That's a great question and really is the crux - are there more details on what the "ecosystem telemetry" is in practice? I've only heard of it being mentioned as an aspiration rather than a concrete proposal.

If it really is still at the "aspirational" phase, it probably makes sense to abuse the "sync ping" in the meantime, otherwise we run the risk of ecosystem telemetry not landing until 2020-Q2 and having a large gap in our understandings until then.

There's one last thing we didn't cover. Main telemetry. There's currently a flag to indicate if a user has sync enabled or not. How does that change?

It doesn't need to change, although it probably should - bug 1238810 exists because it's not currently useful :(

Adding :m_and_m since he has a deep and abiding interest in "ecosystem telemetry".

Late to the party, but here's my 2c.

I'll start with main telemetry. ideally:

  1. We should ensure the current histogram FXA_CONFIGURED continues to reflect whether a user is authenticated with FxA in the browser, regardless if they have any services connected.
  2. We should consider adding a couple probes to main telemetry. The first should be something like a keyed histogram of services a user is currently authenticated with e.g. FXA_SERVICES_AUTHENTICATED: { sync: TRUE, monitor: FALSE, ... }. I get that this might not be possible, see below.
  3. The second should be an event that fires when a user disconnects or connects a service. E.g. event action = connect/disconnect, event_object = {service_name}.
  4. If we can't have 2 or 3 we should fix SYNC_CONFIGURED so it accurately reports TRUE or FALSE for all users. The fill rate (rate that it is non-null) on that probe makes it next to useless currently. Maybe we should just fix this anyway - I know Mark tried at one point but I think he was blocked by the telemetry folks for some arcane reason.

Why would we want 2 & 3? Because there have been a lot of questions, from the desktop side about what types of users sign into these services. E.g. are users who save a lot of passwords in the browser more likely to sign into monitor? This is part of what ecosystem telemetry is supposed to get us, but if we had these probes in main telemetry (without logging the FxA uid anywhere) then answering these questions starts to become possible (and from an analysis perspective, perhaps even easier). People will progressively begin to ask more and more questions of the nature "is measure X associated with more users signing into Y service?", I am quite sure of this. Finally, I suggest (3) because events have timestamps that allow us to know, relative to other events, when a user signs in (e.g. did they sign in before or after a doorhanger promoting monitor?).

However, based on what Ryan says above, is it true that, even after decoupling, if a user signs into send then nothing will change in the browser? If that's true, then maybe we can't do 2 or 3, and we will have to wait until ecosystem telemetry to answer the previous questions. But to the extent we can do 2 and/or 3, i think we should.

On FxA telemetry:

We want to know for each user, separately, if they are connected to the browser and/or sync. We do this currently with the service property, which for all non-sync services is a mapping from oauth client id. In the ideal world I think we would continue to have sync as a value for service, reflecting only users who have sync configured. Perhaps then we add an additional firefox service. All users who have sync as a service should also have firefox, but not vice-versa. I'm hazy on the technical hurdles here, but this would be the ideal state.

Finally, we should continue to track service for all other reliers via FxA server-side measurements in the same way we do today to avoid any data discontinuities.

On sync telemetry:

I would be ok with sending the sync ping for all FxA users, though if we instrument the main telemetry probes above then the only real reason to do this would be to track send tab telemetry for non-sync users (as noted above), which I believe we definitely want to do in any case. Edit: elaborating on this point a bit more, we want accurate telemetry on send tab success and failure rates, including:

  1. a tab sent event with the device_id of initiating device, timestamp, random id identifying the sending event, list of target device_ids, and number of tabs sent
  2. a tab received event logged by each receiving device including its device_id, the same random id from 1, number of tabs received, and if possible the device id of the sending device.

We in theory have that information in today's send tab telemetry (except maybe number of tabs sent?). If it makes sense to continue using the sync ping to log these events, even for users who are now send-tab only (no formal sync), then we should continue to do that. However, depending on the technicals, we could also move this telemetry to "main" event telemetry, and somehow obfuscate the device_ids with a client-side secret so that they cannot be joined on FxA telemetry data.

On ecosystem telemetry

We have an ambitious goal for lockwise mobile & desktop to be the first consumer of this by EOY. So it is very "real", but the first iterations of it will not allow us to answer questions that involve correlating services used to desktop measurements (similar to those posed above). Matt is the one taking the charge on that now. For the immediate future, my gut is telling me that if have the additional desktop measurements, then we don't have to worry too much about this ATM. Note that having those measurements will allow us to answer some, but not ALL of the questions that ecosystem telemetry should in theory allow us to answer. It won't allow us to answer things about cross-device usage, for example.

I typed all that out very quickly - feel free to ask for clarification.

Flags: needinfo?(loines) → needinfo?(rfkelly)
Depends on: 1238810
Depends on: 1582253
Depends on: 1582263

However, based on what Ryan says above, is it true that, even after decoupling, if a user signs into send then nothing will change in the browser?

For now, yes.

We could cheat here though. We do have a privileged communication channel between accounts.firefox.com and the browser, so we could arrange for FxA to notify the browser whenever a user signs into an FxA RP on the web, and we could have the browser report that in main telemetry. I'm not sure how we'd feel about that from a security or data-principles perspective, but if it sounds worth pursuing, I'm happy to propose something more concrete.

Perhaps then we add an additional firefox service

Ref Bug 1582256 Comment 3 for my take on handling this; broadly I agree this makes sense, but I think we should OAuth-ify it while we're here.

I think the tricky bit actually will be arranging to associate users with the sync service in the case where they enabled it after logging in. My ramblings in Bug 1577690 Comment 7 are in support of that goal. Leaving ni? myself here to come back to that in more detail later today.

In the ideal world I think we would continue to have sync as a value for service, reflecting only users who have sync configured.

So one thing that's not clear to me if what, if anything, we need to do about users who turn sync off. Those users will still have sync in their list of fxa_services_used, because that's a historical record of everything they've ever used. But we currently won't do anything to tell the FxA servers that the user has disabled sync, they'll just stop seeing account.signed events with service=sync. Does that seem OK?

Leaving ni? myself here to come back to that in more detail later today.

More detailed commentary in https://bugzilla.mozilla.org/show_bug.cgi?id=1582256#c4

Flags: needinfo?(rfkelly)

If we can't have 2 or 3 we should fix SYNC_CONFIGURED so it accurately reports TRUE or FALSE for all users. The fill rate (rate that it is non-null) on
that probe makes it next to useless currently. Maybe we should just fix this anyway - I know Mark tried at one point but I think he was blocked by
the telemetry folks for some arcane reason.

Mark, is this covered by the work in Bug 1238810, or is it something we should be tracking separately?

Flags: needinfo?(markh)

(In reply to Ryan Kelly [:rfkelly] from comment #10)

Mark, is this covered by the work in Bug 1238810, or is it something we should be tracking separately?

Yes, that bug fixes SYNC_CONFIGURED and FXA_CONFIGURED to be both meaningful and reliable.

Flags: needinfo?(markh)

The only remaining work here is to monitor the probes as the release rolls out, so I'm going to leave that smaller bug open then close this one out.

Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.