Open Bug 1582253 Opened 11 months ago Updated 8 months ago

Monitor "fxa_configured" probe with sync decouple work

Categories

(Firefox :: Firefox Accounts, task)

task
Not set
normal

Tracking

()

REOPENED

People

(Reporter: amedinac, Assigned: loines)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Currently, the "fxa_configured" probe is working correctly. We should monitor the probe once we implement the sync decouple telemetry to make sure it reports the same data as today and to avoid regression.

And specifically, this probe should always be TRUE if a user is signed into the browser (even if sync isn't configured). We will try to fix sync/weave_configured so that it accurately reports whether a user is syncing. That way the two probes in combination will allow us to know how many accounts are authenticated, and among those how many have sync on.

Is this a change in the code or a change in how we are processing metrics? I think I'm just missing the context here.

If the decoupling affects when this probe is triggered/its reliability, then we should change the code so that it continues to reflect whether a user is signed into their account (regardless of sync). I have no idea whether or not the decoupling work will affect this probe, so we filed the issue just in case.

If we are confident that the probe will continue reporting its data accurately and that we don't have to make any changes, then we can close this.

Thom, can you please update the title of this bug to more accurrately reflect the work you are doing and add some description of what done means.

Flags: needinfo?(tchiovoloni)

I'm going to move this to the telemetry environment in https://bugzilla.mozilla.org/show_bug.cgi?id=1238810 so it's a moot point.

Status: NEW → RESOLVED
Closed: 10 months ago
Flags: needinfo?(tchiovoloni)
Resolution: --- → INVALID

Actually, It's probably worth monitoring still, but that would be on :loines.

Status: RESOLVED → REOPENED
Resolution: INVALID → ---

So this can't be sanely done until bug 1238810 lands, but as Thom says, this falls on Leif, so assigning to him.

Assignee: nobody → loines
Depends on: 1238810
Depends on: 1593773

:loines update?

Flags: needinfo?(loines)

waiting on this to be merged and data backfilled (we will have data starting ~2019-10-25). then i will make a few charts for us to reference and get a baseline before 71 goes out.

Flags: needinfo?(loines)

I put this small dashboard together https://sql.telemetry.mozilla.org/dashboard/new-fxa-sync-probe-location to show what values are being returned for the new locations of fxa and sync configured within the telemetry environement (firefox 71 beta only).

Thom, would you mind taking look at that and (dis)confirm the following

  1. if environment.services.(fxa|sync)_configured is null (e.g. not set by the client) can we generally assume that the client does not have fxa or sync setup? do you know of any cases where a client might not set the probe to true but might actually have one or the other turned on?

  2. related, will a client only explicitly set the sync probe to false in the case where they have an account configured but sync turned off (in other words, if both fxa and sync are turned off then the value of environment.services.sync_configured will be null, but if a client has an account turned on but sync turned off then value will be false)?

  3. related, the proportion of clients with environment.services.sync_configured= false where environment.services.fxa_configured = true seems lower than I expected, though not so low as to be inconceivable. Are there any edge cases, e.g. where maybe the client has the browser open for less than the time it takes to setup sync, where a client should have the sync probe set to true but it might get reported as false?

lmk if that is hard to parse.

Flags: needinfo?(tchiovoloni)

I also looked at some random clients who reported true for these probes for 1 subsession and found that most of them also reported true for the previous subsession, suggesting that we no longer have the problem of the probe only being set to true on the subsession that fxa was first turned on. https://sql.telemetry.mozilla.org/queries/66667/source

(In reply to Leif Oines [:loines] from comment #10)

  1. if environment.services.(fxa|sync)_configured is null (e.g. not set by the client) can we generally assume that the client does not have fxa or sync setup?

If IIUC, it should be "impossible" for these to be null, they should always be true or false. null probably means 71 or 72 builds before the new probes landed. Now 71 is in release, I believe it should be impossible for 71 builds on the release channel to have null. If that's not true, it's worth looking in to.

do you know of any cases where a client might not set the probe to true but might actually have one or the other turned on?

I can't think of any cases except pathological cases (eg, badly damaged profiles, but not damaged enough to prevent some telemetry reporting)

  1. related, will a client only explicitly set the sync probe to false in the case where they have an account configured but sync turned off (in other words, if both fxa and sync are turned off then the value of environment.services.sync_configured will be null, but if a client has an account turned on but sync turned off then value will be false)?

The patch actually takes a shortcut - it first checks if sync is configured and if that's true, it doesn't check FxA, it just assumes it must be true. FxA is only explicitly checked if sync doesn't appear to be configured.

Again, there might be edge-cases - eg, a user hitting about:config could cause both sync and fxa to be reported as configured when none of them actually are, but that should be rare. There might also be obscure "damaged profile" type scenarios here too, but they should be rare (and with the decoupled work, the UI should reflect the strangeness and allow the user to fix things up)

There should be no case when one is null and the other is a bool - they should either both be missing (ie, the 71/72 scenarios above) or both be bools)

  1. related, the proportion of clients with environment.services.sync_configured= false where environment.services.fxa_configured = true seems lower than I expected, though not so low as to be inconceivable. Are there any edge cases, e.g. where maybe the client has the browser open for less than the time it takes to setup sync, where a client should have the sync probe set to true but it might get reported as false?

That should be impossible IIUC. However, I'm not at all surprised to find the number of users with FxA but without Sync to be extremely low, because the only sane way of getting into that state is via about:welcome and choosing to opt out of sync even though there's no other obvious thing of value offered by doing so.

lmk if I didn't answer what you asked!

(I'll clear thom's ni? - he can chime in if necessary)

Flags: needinfo?(tchiovoloni)

Thanks for the detailed response mark, this should be good documentation for this going forward. A few responses inline:

(In reply to Mark Hammond [:markh] from comment #12)

(In reply to Leif Oines [:loines] from comment #10)
If IIUC, it should be "impossible" for these to be null, they should always be true or false. null probably means 71 or 72 builds before the new probes landed. Now 71 is in release, I believe it should be impossible for 71 builds on the release channel to have null. If that's not true, it's worth looking in to.

So we only have 1 day of data so far for the release channel, but I'm still seeing null as the dominant value https://sql.telemetry.mozilla.org/queries/66709#169114

That should be impossible IIUC. However, I'm not at all surprised to find the number of users with FxA but without Sync to be extremely low, because the only sane way of getting into that state is via about:welcome and choosing to opt out of sync even though there's no other obvious thing of value offered by doing so.

Sorry I should have been more clear about this, what I was getting at is that the number of users with FxA = true but Sync = False seems high and that the proportion of FxA clients with Sync configured seems low. I'm just basing this on intuition: that the ability of users to turn sync off but still be signed in is relatively new, so for 10+% of FxA be in this situation seems high. All that said, looking at our single day of release data, its closer to 5% of FxA not having sync, so maybe that's a more reasonable proportion (although, we should figure out the null issue before we read into these numbers at all)

ps i added the charts for the release channel to the dash:
https://sql.telemetry.mozilla.org/dashboard/new-fxa-sync-probe-location (you need to hover on the chart to see the values since there's only one day of data so far)

(In reply to Leif Oines [:loines] from comment #13)

Thanks for the detailed response mark, this should be good documentation for this going forward. A few responses inline:

(In reply to Mark Hammond [:markh] from comment #12)

(In reply to Leif Oines [:loines] from comment #10)
If IIUC, it should be "impossible" for these to be null, they should always be true or false. null probably means 71 or 72 builds before the new probes landed. Now 71 is in release, I believe it should be impossible for 71 builds on the release channel to have null. If that's not true, it's worth looking in to.

So we only have 1 day of data so far for the release channel, but I'm still seeing null as the dominant value https://sql.telemetry.mozilla.org/queries/66709#169114

Yeah, sorry, I lied :) Re-looking at the patch, this will be true when both are false - ie, if there's no FxA configured, these values will both be null. Sorry for the confusion.

It should correctly detect a login though, so should become true on the very next ping after a login.

That should be impossible IIUC. However, I'm not at all surprised to find the number of users with FxA but without Sync to be extremely low, because the only sane way of getting into that state is via about:welcome and choosing to opt out of sync even though there's no other obvious thing of value offered by doing so.

Sorry I should have been more clear about this, what I was getting at is that the number of users with FxA = true but Sync = False seems high and that the proportion of FxA clients with Sync configured seems low. I'm just basing this on intuition: that the ability of users to turn sync off but still be signed in is relatively new, so for 10+% of FxA be in this situation seems high. All that said, looking at our single day of release data, its closer to 5% of FxA not having sync, so maybe that's a more reasonable proportion (although, we should figure out the null issue before we read into these numbers at all)

Huh - I'm inclined to think that 5% is high (and 10%+ seems clearly wrong) - but I think what I said about this is correct - once FxA reports true, I can't see how Sync could report the wrong value.

Could we correlate this with sync ping counts? ie, I think we'd expect the number of unique devices with sync=true in the main ping would be the same as the number of unique devices submitting sync pings per day?

Another possible way to correlate might be for entry-points on the server metrics - IIUC, the only way to currently end up with Fxa=true and Sync=false is via about:welcome (or explicitly disconnecting just sync after signing in to both, but that seems unlikely?), so if we can isolate that in server metrics it might help?

(In reply to Mark Hammond [:markh] from comment #14)

Yeah, sorry, I lied :) Re-looking at the patch, this will be true when both are false - ie, if there's no FxA configured, these values will both be null. Sorry for the confusion.

Oops - I also meant to add that it's also possible a session that lasts less than "a few" seconds will report null. Specifically, we should start initializing FxA within 2 seconds of the browser start (and usually less) but a very underpowered or overloaded machine might end up taking another second or 2 to actually initialize and report the state.

There's some more discussion of this in this github issue

You need to log in before you can comment on or make changes to this bug.