Closed Bug 1097406 Opened 10 years ago Closed 9 years ago

Add telemetry for sync migration progress.

Categories

(Firefox :: Sync, defect)

defect
Not set
normal
Points:
5

Tracking

()

RESOLVED FIXED
Firefox 38
Iteration:
38.2 - 9 Feb
Tracking Status
firefox37 --- fixed
firefox38 --- fixed

People

(Reporter: markh, Assigned: markh)

References

Details

Attachments

(2 files, 4 obsolete files)

As part of migrating users to Fxa sync we want to capture some metrics.  Bug 1019395 is the meta bug for all such metrics (ie, including those that can be captured by the various servers).  For this bug I want to focus just on telemetry in the client.

For background: The migration will be driven by a state machine.  The states are, eg:
* waiting for the server to advertise a migration offer (ie, server sends "EOL" notifications)
* waiting for the user to create an FxA account.
* waiting for the user to click on the verification email for their account.
* waiting for an existing sync to finish
* waiting for us to write a migration sentinel
* aaand, we are done.

The UI will help the user through these states (eg, open about:accounts at the right time, etc), and in edge-cases the states may move backwards (eg, user logs out of their FxA account before they verify it).

I'd like to capture the progress of these states via telemetry.  Eg, "how many users saw the offer to upgrade but never made an account", or "how many didn't get past the 'verify your account' state, etc.

gps, benjamin: I'm looking for advice on how to best measure this in telemetry.  Eg, should I have one new probe name per state and count how many times we enter it?  Or maybe assign a number to each state and record the max number we reach?  Or something else entirely?

Note we have existing probes for "is this using legacy sync", and "are they using the default servers" - so we'd also like to be able to determine how many users with legacy sync saw that offer to upgrade (which once this is unthrottled, should be everyone using legacy sync against the default server) - I assume we can make these correlations - is that correct?
Flags: needinfo?(gps)
Flags: needinfo?(benjamin)
Are you going to want this for release users (FHR) or just prerelease (telemetry)? And what release is this scheduled for?

What kind of dashboard/reporting system are you going to use to monitor this? That might inform the design of the data: knowing "point-in-time" what states people are in is probably pretty different from knowing when people transition between states.
Flags: needinfo?(benjamin)
Flags: needinfo?(gps)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #1)
> Are you going to want this for release users (FHR) or just prerelease
> (telemetry)?

Release users are our focus, although pre-release will be important too.

> And what release is this scheduled for?

If all goes well, me may make 36 (with a few uplifts)

> What kind of dashboard/reporting system are you going to use to monitor
> this? That might inform the design of the data: knowing "point-in-time" what
> states people are in is probably pretty different from knowing when people
> transition between states.

TBH, I'm not sure and hope to get your advice on this too.
Flags: needinfo?(benjamin)
For that population and timeframe, you'll have to use the current FHR as your measurement system. Probably you can just one measurement to org.mozilla.sync.sync indicating the state-in-flow for the transition. See http://mxr.mozilla.org/mozilla-central/source/services/healthreport/docs/dataformat.rst#1573

You or some PM will need to decide on the product requirements for reporting; my main concern is to figure out who is going to be creating that dashboard and then who is responsible for monitoring it.
Flags: needinfo?(benjamin)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #3)
> You or some PM will need to decide on the product requirements for
> reporting; my main concern is to figure out who is going to be creating that
> dashboard and then who is responsible for monitoring it.

Ed, are you able to help here?
Flags: needinfo?(edwong)
Flags: qe-verify-
Flags: firefox-backlog+
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #3)
> For that population and timeframe, you'll have to use the current FHR as
> your measurement system. Probably you can just one measurement to
> org.mozilla.sync.sync indicating the state-in-flow for the transition. 

That sounds like a completely reasonable recommendation.

Timing information is a nice to have: how long has the user been in this state?

:markh, can we add these states?
- stopped syncing ("unlink old sync")
- user upgraded but device not attached ("join the party")
 
> You or some PM will need to decide on the product requirements for
> reporting; my main concern is to figure out who is going to be creating that
> dashboard and then who is responsible for monitoring it.

Edwin's driving the product requirements; I'm responsible for the querying/reporting. Our plan is to spot check (a few times) how many people are stuck. (We'll also have some server side data from FxA.)
:markh, there's one more state we might be missing... if the user is in the pre-verified state and chooses "forget" from the settings panel, what state are they in and are we counting it? If the user is then essentially in the "stop syncing" state, we should be covering that with the additions from https://bugzilla.mozilla.org/show_bug.cgi?id=1097406#c5

Does that sound right?
Flags: needinfo?(mhammond)
(In reply to Katie Parlante from comment #6)
> :markh, there's one more state we might be missing... if the user is in the
> pre-verified state and chooses "forget" from the settings panel, what state
> are they in and are we counting it?

I'm not sure I understand the question, but let me try and clarify:

During this entire migration process the "sync preferences" pane will continue to reflect legacy sync.  So while the infobar is up asking them to create an FxA account, and after they have done that but we are waiting for verification, sync is going to continue to reflect the legacy account.

During this time while the legacy account is shown in prefs, the user will always have the ability to "unlink this device" - which basically resets sync completely.  They could do that at any point during the migration. If they do that, migration is going to stop completely - we will notice that no user is currently signed into sync, so there is nothing to migrate.  The hamburger etc will then be in exactly the same state for any user without sync configured - they will be offered to "Sign in to sync" etc - the fact they previously had sync configured is lost.

We don't currently have plans to capture this happening.  We could if necessary, but it seems an unlikely scenario.  OTOH though, I guess if the user feels "trapped" at one state and can't work out how to complete the migration, they might try this as a last resort.

Does that answer the question?
Flags: needinfo?(mhammond)
My question is motivated by rfeeley's flow chart, in particular: https://www.dropbox.com/s/0l8aj6sjgjz0xbw/Sync%20Preverified.pdf and https://www.dropbox.com/s/mdiupkg15inwlzy/Sync%20Delete%20Old.pdf

I think Ryan and John are interested in knowing if people unlink/reset -- hopefully its unlikely, but good to know if we're wrong about that.
(In reply to Katie Parlante from comment #8)
> My question is motivated by rfeeley's flow chart, in particular:
> https://www.dropbox.com/s/0l8aj6sjgjz0xbw/Sync%20Preverified.pdf

So IIUC, that "Forget" button will sign the user out of FxA, and the migration process will just "step back" to "you need a FxA to migrate" - IOW, they should be then able to create a *different* FxA account.

I *hope* our telemetry will capture that to some degree - we should see the user entered the "needs fxa account" state multiple times (but yeah, more thought is needed to make sure we do)

> https://www.dropbox.com/s/mdiupkg15inwlzy/Sync%20Delete%20Old.pdf

That one is the "unlink" I mentioned before - the user will end up with no sync configured - and yeah, I suppose we do need to capture this.
I'll meet with Katie this week to write down what we need to resolve this.
Flags: needinfo?(edwong)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #3)
> For that population and timeframe, you'll have to use the current FHR as
> your measurement system. Probably you can just one measurement to
> org.mozilla.sync.sync indicating the state-in-flow for the transition. See
> http://mxr.mozilla.org/mozilla-central/source/services/healthreport/docs/
> dataformat.rst#1573

Thanks for the pointer - I think this patch covers most of our requirements here.  Most of the data is populated in this patch, but some other parts which come from the UI still needs to be done (and will probably happen in another patch).  This patch *creates* those fields, just doesn't populate them.

The patch captures the migration state in a FIELD_DAILY_LAST_TEXT, as we only care about the last reported state value for a day, not every value we transitioned through for that day.  But I'm only guessing that is the appropriate field type for this case (I couldn't find any docs on these FHR fields)

Benjamin, it would be great if you can sanity-check this patch, or delegate it to someone who can help us going forward.

> You or some PM will need to decide on the product requirements for
> reporting; my main concern is to figure out who is going to be creating that
> dashboard and then who is responsible for monitoring it.

I've started a conversation with John Jensen/Hamilton Ulmer about getting access to the existing dashboard repos so I can see what is involved here.
Assignee: nobody → mhammond
Status: NEW → ASSIGNED
Attachment #8532191 - Flags: feedback?(benjamin)
Comment on attachment 8532191 [details] [diff] [review]
0008-Bug-1097406-FHR-data-for-sync-migration.-r.patch

marking feedback+ based on reviewing the data design in dataformat.rst

delegating actual review to Georg
Attachment #8532191 - Flags: review?(georg.fritzsche)
Attachment #8532191 - Flags: feedback?(benjamin)
Attachment #8532191 - Flags: feedback+
Comment on attachment 8532191 [details] [diff] [review]
0008-Bug-1097406-FHR-data-for-sync-migration.-r.patch

Review of attachment 8532191 [details] [diff] [review]:
-----------------------------------------------------------------

f+ for now for open questions.
Not being familiar with the sync code, you may want to have someone else look over the FxaMigrator.jsm changes for sanity/completeness :)

Is it enough to have an xpcshell test of the provider here?
Do we need a mochitest-browser for a whole integration test or do we have other measures to ensure that things get triggered right here?

::: services/healthreport/docs/dataformat.rst
@@ +1645,5 @@
> +semi-automated process of migrating a legacy sync account to an FxA account.)
> +
> +Measurements will start being recorded after a migration is offered by the
> +sync server and stop after migration is complete or the user elects to "unlink"
> +their sync account.  IOW, it is expected that users with Sync setup for FxA or

Can we not use abbrevations like IOW please? (i had to look it up)

@@ +1652,5 @@
> +
> +Version 1
> +^^^^^^^^^
> +
> +Version 1 was introduced with Firefox 38 and includes the following properties:

We are currently on 37 - do you intend to hold the patch back until 38?

@@ +1662,5 @@
> +
> +internalState
> +   Corresponds to a STATE_INTERNAL_* string in FxaMigration.jsm.  This reflects
> +   a state where we are no longer waiting for the user, but waiting for internal
> +   processes to complete before the migration can complete.

So, are those two independent states or not?
Should they be two separate measurements or maybe one?

::: services/sync/modules/healthreport.jsm
@@ +62,5 @@
>      return Metrics.Storage.FIELD_DAILY_COUNTER;
>    },
>  });
>  
> +function SyncMigrationMeasurement1() {

Note that we don't necessarily need to integrate this here, see e.g.:
http://dxr.mozilla.org/mozilla-central/search?q=ExperimentsProvider&case=true

That would save us from introducing more notifications... but it looks like its most reasonable to intregrate with the existing sync provider here.

@@ +166,5 @@
> +        field = "userState";
> +        break;
> +      case "fxa-migration:internal-state-changed":
> +        field = "internalState"
> +        break;

Add default handling?

@@ +170,5 @@
> +        break;
> +    }
> +    let m = this.getMeasurement(SyncMigrationMeasurement1.prototype.name,
> +                                SyncMigrationMeasurement1.prototype.version);
> +    return this.enqueueStorageOperation(function recordMigration() {

We don't need explicit naming anymore with the js debugging improvements.
Attachment #8532191 - Flags: review?(georg.fritzsche) → feedback+
Hi Mark, should this bug be added to IT 37.2?  If yes, please provide a point value.  Thanks.
Flags: needinfo?(mhammond)
Comment on attachment 8532191 [details] [diff] [review]
0008-Bug-1097406-FHR-data-for-sync-migration.-r.patch

Review of attachment 8532191 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Georg Fritzsche [:gfritzsche] from comment #13)
> Not being familiar with the sync code, you may want to have someone else
> look over the FxaMigrator.jsm changes for sanity/completeness :)

Here's a preemptive r+ on the migrator and test parts of this patch, just to keep things moving...

::: services/sync/tests/unit/test_healthreport_migration.js
@@ +98,5 @@
> +
> +    // in reality, the internal state will only be set when we also have a
> +    // user state, so notify one of them too.
> +    Services.obs.notifyObservers(null, "fxa-migration:internal-state-changed",
> +                                 fxaMigrator.STATE_USER_FXA_VERIFIED);

To avoid confusion for someone trying to understand how this all works, I think you should use the user state (i.e., non-internal) topic here.  The effect is the same, but the internal topic is only ever used for internal states in the migrator code.
Attachment #8532191 - Flags: review+
(In reply to Marco Mucci [:MarcoM] from comment #14)
> Hi Mark, should this bug be added to IT 37.2?  If yes, please provide a
> point value.  Thanks.

Done.  Making a conservative points estimate since there are still unknowns.
Iteration: --- → 37.2
Points: --- → 5
Flags: needinfo?(mhammond)
(In reply to Drew Willcoxon :adw from comment #15)
> ::: services/sync/tests/unit/test_healthreport_migration.js
> @@ +98,5 @@
> > +
> > +    // in reality, the internal state will only be set when we also have a
> > +    // user state, so notify one of them too.
> > +    Services.obs.notifyObservers(null, "fxa-migration:internal-state-changed",
> > +                                 fxaMigrator.STATE_USER_FXA_VERIFIED);
> 
> To avoid confusion for someone trying to understand how this all works, I
> think you should use the user state (i.e., non-internal) topic here.  The
> effect is the same, but the internal topic is only ever used for internal
> states in the migrator code.

Thanks Drew, but I don't understand what you mean here.  This test is just simulating the notifications the FxAMigrator module uses.  It sends both "public" and "internal" notifications to ensure that both are reflected in the data.  So only using the public user state will not cover those internal states.
Flags: needinfo?(adw)
Thanks Georg!

(In reply to Georg Fritzsche [:gfritzsche] from comment #13)

> Is it enough to have an xpcshell test of the provider here?
> Do we need a mochitest-browser for a whole integration test or do we have
> other measures to ensure that things get triggered right here?

That's a good question and one that I don't like the answer to :(  We are having alot of trouble with browser tests that use Sync, so it is likely we are going to settle on a "best effort" approach.  That said though, I do think we can change some of the tests Drew is working on to ensure additional coverage.  I'll update this patch once they land.

> We are currently on 37 - do you intend to hold the patch back until 38?

Oops, no - changed to 37.

> > +internalState
> > +   Corresponds to a STATE_INTERNAL_* string in FxaMigration.jsm.  This reflects
> > +   a state where we are no longer waiting for the user, but waiting for internal
> > +   processes to complete before the migration can complete.
> 
> So, are those two independent states or not?
> Should they be two separate measurements or maybe one?

Hrm - yeah, they can be a single field as the "user state" is always null when internal state is set.  I guess there's no good reason to keep redundant data and we can always change things should requirements change in the future.  This also makes the patch quite a bit simpler, so I changed it.

> That would save us from introducing more notifications... but it looks like 
> its most reasonable to intregrate with the existing sync provider here.

Thanks for the pointer, but I left this as it was.

All other comments addressed.

Drew: my needinfo is somewhat redundant with this new patch, so I'm replacing it with a feedback request :)
Attachment #8532191 - Attachment is obsolete: true
Flags: needinfo?(adw)
Attachment #8535399 - Flags: feedback?(georg.fritzsche)
Attachment #8535399 - Flags: feedback?(adw)
Comment on attachment 8535399 [details] [diff] [review]
0003-Bug-1097406-FHR-data-for-sync-migration.-r.patch

Review of attachment 8535399 [details] [diff] [review]:
-----------------------------------------------------------------

::: services/sync/tests/unit/test_healthreport_migration.js
@@ +52,5 @@
> +
> +    // We record both a "user" and "internal" state in the same field.
> +    // So simulate a "user" state first.
> +    Services.obs.notifyObservers(null, "fxa-migration:internal-state-changed",
> +                                 fxaMigrator.STATE_USER_FXA_VERIFIED);

Shouldn't this fire a "fxa-migration:state-changed"?
Attachment #8535399 - Flags: feedback?(georg.fritzsche) → feedback+
(In reply to Mark Hammond [:markh] from comment #18)
> Created attachment 8535399 [details] [diff] [review]
> 0003-Bug-1097406-FHR-data-for-sync-migration.-r.patch
> 
> Thanks Georg!
> 
> (In reply to Georg Fritzsche [:gfritzsche] from comment #13)
> 
> > Is it enough to have an xpcshell test of the provider here?
> > Do we need a mochitest-browser for a whole integration test or do we have
> > other measures to ensure that things get triggered right here?
> 
> That's a good question and one that I don't like the answer to :(  We are
> having alot of trouble with browser tests that use Sync, so it is likely we
> are going to settle on a "best effort" approach.  That said though, I do
> think we can change some of the tests Drew is working on to ensure
> additional coverage.  I'll update this patch once they land.

Thanks, that sounds good. While we could do manual coverage around landing, without integration tests i would be worried about these metrics breaking.

> > That would save us from introducing more notifications... but it looks like 
> > its most reasonable to intregrate with the existing sync provider here.
> 
> Thanks for the pointer, but I left this as it was.

Right, that was mostly me answering my own question.
Mark:

(In reply to Georg Fritzsche [:gfritzsche] from comment #19)
> ::: services/sync/tests/unit/test_healthreport_migration.js
> @@ +52,5 @@
> > +
> > +    // We record both a "user" and "internal" state in the same field.
> > +    // So simulate a "user" state first.
> > +    Services.obs.notifyObservers(null, "fxa-migration:internal-state-changed",
> > +                                 fxaMigrator.STATE_USER_FXA_VERIFIED);
> 
> Shouldn't this fire a "fxa-migration:state-changed"?

That's what I mean.  You're passing a user state but using the internal state topic on this particular line.  It shouldn't matter in practice, but it would be better to match the user state with the user state topic.
Attachment #8535399 - Flags: feedback?(adw) → feedback+
Iteration: 37.2 → 37.3
This is a new version of the patch that includes some other measurements.

Along with the "state" described in comment 0, this has additional fields:

* "accepted" - a count of how often the user accepted an offer to upgrade via our UI - or to put it another way, how often they clicked on the various migration-specific buttons.

* "declined" - a count of how often the user closed the migration infobar by clicking the "x" - that is, how often the user said "go away" to our migration UI.

* "unlinked" - conceptually a boolean flag that indicates the user opted do disable sync rather than upgrade.

Georg and Saptarshi, I'd like some feedback on the "types" here.  Specifically:

* The counters (accepted/declined) are DAILY_COUNTER_FIELD, but in practice we'd like to aggregate these counter fields across multiple sessions and days - in other words, we'd really like to know an absolute total for the counters - but there's no concept of a counter that's *not* a "daily counter".  Is this something our dashboards can do?

* Similarly, "unlinked" is LAST_NUMERIC_FIELD as this is a simple "flag" and we'd like to know that a profile set the flag regardless of what day.  If it's possible to aggregate the DAILY_COUNTER_FIELD then it might also make sense to make this a DAILY_LAST_NUMERIC field to simplify the reporting, as this would make all fields "daily".

* The "state" field is DAILY_LAST_TEXT_FIELD, and while daily values are valuable, we'd also like to make sure we can find the "most recent" value for this.

Saptarshi, John Jensen volunteered you to help us with this, particularly with a potential future dashboard.  We'd like some feedback on whether this patch can give us the data we desire, and later, to help us build a dashboard to report on this data.

Everyone: general feedback is welcome in addition to the above, although note that most of the "browser/" changes are more in Drew's domain, so feel free to leave those parts to him.

Thanks!
Attachment #8535399 - Attachment is obsolete: true
Attachment #8544373 - Flags: feedback?(sguha)
Attachment #8544373 - Flags: feedback?(gfritzsche)
Attachment #8544373 - Flags: feedback?(adw)
> * "accepted" - a count of how often the user accepted an offer to upgrade
> 
> * The counters (accepted/declined) are DAILY_COUNTER_FIELD, but in practice
> we'd like to aggregate these counter fields across multiple sessions and
> days - in other words, we'd really like to know an absolute total for the
> counters - but there's no concept of a counter that's *not* a "daily
> counter".  Is this something our dashboards can do?

It is possible(easy in fact) to aggregate these daily counters across a profiles history to get a grand sum of "declined" for a profile (and subsequently all profiles). The dashboard would use these computed numbers.


> 
> * Similarly, "unlinked" is LAST_NUMERIC_FIELD as this is a simple "flag" and
> we'd like to know that a profile set the flag regardless of what day.  If
> it's possible to aggregate the DAILY_COUNTER_FIELD then it might also make
> sense to make this a DAILY_LAST_NUMERIC field to simplify the reporting, as
> this would make all fields "daily".

I'm not sure of the programming logic of the fields, but  if this account is unlinked then the "unlinked" daily field ought to remain the same from that day onwards (unless the profile decides to link again in some future date -is this possible?). Hence for reporting purposes the last value would be considered.

> 
> * The "state" field is DAILY_LAST_TEXT_FIELD, and while daily values are
> valuable, we'd also like to make sure we can find the "most recent" value
> for this.
> 

My understanding is that if some modification is made to "org.mozilla.sync.migration" it will be stored in the data#days field of the FHR payload. Thus the most recent value would be the most recent day it was modified. This again would be need to be computed from the FHR field and fed to any dashboard.



> Saptarshi, John Jensen volunteered you to help us with this, particularly
> with a potential future dashboard.  We'd like some feedback on whether this
> patch can give us the data we desire, and later, to help us build a
> dashboard to report on this data.


My understanding of this this data will provide at a daily level
- the state
- total declines, accepts and unlinked (ideally all should be in {0,1})

Note, 
a) if the state changes multiple times in the session, I _think_ the last state is stored.
b) If no modification is made to sync.migration, will this field be present in the FHR payload?

Georg can confirm (a) and (b)

From this we can compute
- % of profiles who were ever in a state and moved to other states etc
- and the usual counts of accept/declines etc

it would be helpful to provide a diagrammatic state flow diagram of the states (USER* and INTERNAL*) described in FxaMigrator.jsm. Does such a flow already exist?
Flags: needinfo?(gfritzsche)
(In reply to "Saptarshi Guha[:joy]" from comment #23)
> Note, 
> a) if the state changes multiple times in the session, I _think_ the last
> state is stored.

Correct, i think we had previously decided in the bug here that thats fine.

> b) If no modification is made to sync.migration, will this field be present
> in the FHR payload?

I'm fairly certain that the answer is "no". Should be easy to check via about:healthreport though or by pinging gps.
Flags: needinfo?(gfritzsche)
Comment on attachment 8544373 [details] [diff] [review]
0005-Bug-1097406-FHR-data-for-sync-migration.-r.patch

Review of attachment 8544373 [details] [diff] [review]:
-----------------------------------------------------------------

Reading up more on the storage types, i wonder in which cases we really need "daily" values.
This is probably between you and Joy (as i don't really know what data you want to get out of there and i'm not used to analysis), but we should probably avoid "daily" where its not needed.

Otherwise this looks ok to me, although we should really get at least proper manual testing on this to check this is working as intended.

::: services/sync/modules/FxaMigrator.jsm
@@ +100,5 @@
> +  // Flags for the telemetry we record.  The UI will call a helper to record
> +  // the fact some UI was interacted with.
> +  TELEMETRY_ACCEPTED: "accepted",
> +  TELEMETRY_DECLINED: "declined",
> +  TELEMETRY_UNLINKED: "declined",

This should be "unlinked"?

::: services/sync/modules/healthreport.jsm
@@ +74,5 @@
> +  name: "migration",
> +  version: 1,
> +
> +  fields: {
> +    state: DAILY_LAST_TEXT_FIELD, // last "user" or "internal" state we saw for the day

Would Storage.FIELD_LAST_TEXT not be sufficient?

@@ +76,5 @@
> +
> +  fields: {
> +    state: DAILY_LAST_TEXT_FIELD, // last "user" or "internal" state we saw for the day
> +    accepted: DAILY_COUNTER_FIELD, // number of times user tried to start migration
> +    declined: DAILY_COUNTER_FIELD, // number of times user closed nagging infobar

Is LAST_NUMERIC_FIELD not sufficient for accepted & declined?

::: services/sync/tests/unit/test_healthreport_migration.js
@@ +57,5 @@
> +
> +    // Wait for storage to complete.
> +    yield m.storage.enqueueOperation(() => {
> +      return Promise.resolve();
> +    });

Not important here, but you could just do: yield m.storage.enqueueOperation(() => Promise.resolve());
Attachment #8544373 - Flags: feedback?(gfritzsche) → feedback+
Comment on attachment 8544373 [details] [diff] [review]
0005-Bug-1097406-FHR-data-for-sync-migration.-r.patch

Review of attachment 8544373 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/base/content/browser-fxaccounts.js
@@ +119,5 @@
> +        if (notif.title == this.SYNC_MIGRATION_NOTIFICATION_TITLE &&
> +            this._expectingNotifyClose) {
> +          return; // it's not an [x]
> +        }
> +        // record telemetry.

This conditional isn't quite right, is it?  It means we'd log any Sync notification that happened to be removed, even ones unrelated to migration.  (I don't care if that's unlikely to happen in practice.)  I think we want:

if (notif.title == this.SYNC_MIGRATION_NOTIFICATION_TITLE &&
   !this._expectingNotifyClose) {
  // Record telemetry.
}

Although I'd prefer fixing whatever fires weave:notification:removed to include a bool that indicates whether the notification was canceled, assuming that's straightforward.  Having to maintain a _expectingNotifyClose sucks.

::: services/sync/modules/FxaMigrator.jsm
@@ +476,5 @@
>    }),
>  
> +  recordTelemetry(flag) {
> +    // Note the value is the telemetry field name - but this is an
> +    // implementation detail which could be changed laster.

laster -> later

::: services/sync/tests/unit/test_healthreport_migration.js
@@ +52,5 @@
> +
> +    // We record both a "user" and "internal" state in the same field.
> +    // So simulate a "user" state first.
> +    Services.obs.notifyObservers(null, "fxa-migration:internal-state-changed",
> +                                 fxaMigrator.STATE_USER_FXA_VERIFIED);

Georg and I asked you to use fxa-migration:state-changed here to match the STATE_USER value.  Is there a reason you're still not?
Attachment #8544373 - Flags: feedback?(adw) → feedback+
> it would be helpful to provide a diagrammatic state flow diagram of the states
> (USER* and INTERNAL*) described in FxaMigrator.jsm. Does such a flow already
> exist?

Not really - the entire UX flow is at https://www.lucidchart.com/publicSegments/view/53960efc-4b5c-484a-bcd5-13930a00de31/image.pdf, but that's probably not useful in this context.  But I hope these words help:

When the Sync server advises a migration should proceed, we initially enter STATE_USER_FXA.  This means we are blocked on the user taking the action of creating a Firefox account (and the UI guides the user through this).

After account creation, we enter a state of STATE_USER_FXA_VERIFIED.  This means we are blocked on the user receiving an email and clicking on the verification link in that mail (and again, the UI reminds the user and offers help to do this)

After this point we are no longer waiting for the user - so will not see another STATE_USER_* state (nor any migration-specific UI).  However, we *are* blocked on, for want of a better term, the "environment".  The STATE_INTERNAL_* flags reflect these parts:

STATE_INTERNAL_WAITING_SYNC_COMPLETE reflects we are waiting for a currently running Sync to complete. Most users will not be syncing (so will skip this state) and the ones that are should generally wait only for a few seconds. A small minority of users may hit cases where Sync *never* completes in this session, so for these users they will need to restart to get past this state (we don't tell the user about this; we automatically pick up after a restart)

STATE_INTERNAL_WAITING_WRITE_SENTINEL reflects we are waiting to write some data to the sync servers.  This probably will not truly block in practice as we ignore errors.  There's a chance we will block here in the case of network errors etc, but in practice this should be rare as the network must have been recently working for us to determine the user is verified.

STATE_INTERNAL_WAITING_START_OVER reflects we are waiting for Sync to reset itself - again, in practice this should never actually fail and should only take a few seconds.

And that's it.  Note that the path through these states is almost always linear - about the only time this is not true is if a user creates an FxAccount but logs out instead of verifying it - this will cause us to "reset" back to STATE_USER_FXA.  There's no other condition I can think of that would cause us to move backwards.
Thanks Mark. In response to comment 25, 
- state should ve daily since we would like to see state changes across days
- accept,declined and unlinked should be 
--- Daily if they can change across time and we want to see how this changes
--+ Latest value only if we only care about the profiled latest value and nothing about the past.

I prefer daily because
A) if no sync migration happens no data is added to data#days field ( no space waste)
B) we can see the date when changes happen

Mark, your thoughts?
Flags: needinfo?(mhammond)
(In reply to Georg Fritzsche [:gfritzsche] from comment #25)

> Reading up more on the storage types, i wonder in which cases we really need
> "daily" values.
> This is probably between you and Joy (as i don't really know what data you
> want to get out of there and i'm not used to analysis), but we should
> probably avoid "daily" where its not needed.

Comment 28 came in as I was writing this - it sounds to me like "daily" is the way to go. In addition to what :joy said there, it looks to me like "daily" is the only way to get actual counts too (other than maintaining the counter via prefs or something and writing the value as a "discrete" value, which sounds like it sucks)

> Otherwise this looks ok to me, although we should really get at least proper
> manual testing on this to check this is working as intended.

Agreed.

> ::: services/sync/modules/FxaMigrator.jsm
> @@ +100,5 @@
> > +  // Flags for the telemetry we record.  The UI will call a helper to record
> > +  // the fact some UI was interacted with.
> > +  TELEMETRY_ACCEPTED: "accepted",
> > +  TELEMETRY_DECLINED: "declined",
> > +  TELEMETRY_UNLINKED: "declined",
> 
> This should be "unlinked"?

Oops - that's a little embarrassing :)  I changed the test to use these constants instead of literals, which would have picked this up.

> ::: services/sync/modules/healthreport.jsm
> @@ +74,5 @@
> > +  name: "migration",
> > +  version: 1,
> > +
> > +  fields: {
> > +    state: DAILY_LAST_TEXT_FIELD, // last "user" or "internal" state we saw for the day
> 
> Would Storage.FIELD_LAST_TEXT not be sufficient?

Again, this might just be my lack of understanding, but I think it would be useful to know how this transitioned - eg, "how many users took multiple days to transition" or "how many users spent more than 1 day at {some state}".  Given comment 28, it sounds like we really do want daily here too.

> @@ +76,5 @@
> > +
> > +  fields: {
> > +    state: DAILY_LAST_TEXT_FIELD, // last "user" or "internal" state we saw for the day
> > +    accepted: DAILY_COUNTER_FIELD, // number of times user tried to start migration
> > +    declined: DAILY_COUNTER_FIELD, // number of times user closed nagging infobar
> 
> Is LAST_NUMERIC_FIELD not sufficient for accepted & declined?

As above, we want counters here so we can ask "how often did the user decline before they finally unlinked/accepted?", and "how many times did they accept before completion?"  If we can do that with LAST_NUMERIC_FIELD, then yeah, we should use it - but it sounds like that's not the case.

On the plus side, I'm certainly getting an education in FHR :)

> ::: services/sync/tests/unit/test_healthreport_migration.js
> @@ +57,5 @@
> > +
> > +    // Wait for storage to complete.
> > +    yield m.storage.enqueueOperation(() => {
> > +      return Promise.resolve();
> > +    });
> 
> Not important here, but you could just do: yield
> m.storage.enqueueOperation(() => Promise.resolve());

Or even m.storage.enqueueOperation(Promise.resolve); :)

(In reply to "Saptarshi Guha[:joy]" from comment #28)
> I prefer daily because
> A) if no sync migration happens no data is added to data#days field ( no
> space waste)
> B) we can see the date when changes happen
> 
> Mark, your thoughts?

I defer to you guys here - I'm happy to use whatever the consensus is that gets us the reports we need! At this point, that sounds like daily and I'm happy to go along with that.

In that vein, I've made the "unlinked" flag daily too, as this will give us some insight into how many days of nagging it took before the user finally unlinked.
Flags: needinfo?(mhammond)
(In reply to Drew Willcoxon :adw from comment #26)
> 
> This conditional isn't quite right, is it?  It means we'd log any Sync
> notification that happened to be removed, even ones unrelated to migration. 

Oops - that's embarrassing too - fixed.

> Although I'd prefer fixing whatever fires weave:notification:removed to
> include a bool that indicates whether the notification was canceled,
> assuming that's straightforward.  Having to maintain a _expectingNotifyClose
> sucks.

I'd prefer it too, but I can't see a good way to do that.  The "core" of the notifications implementation is the toolkit version (toolkit/content/widgets/notification.xml), but Sync has it's own implementation that builds on the core (browser/base/content/sync/notification.xml).  It overrides the ".close()" method to call into the sync notifications module (services/sync/modules/notifications.js) and that's what sends the observer notification.

However, the "core" toolkit implementation doesn't provide a way to determine how the item was closed - the .close() method is called for both the [x] and a regular button.  I guess we could tweak that (I guess we'd add a param to the close() method and pass that down these layers, which would mean touching every file listed above) which is doable if you feel strongly about it.  I *almost* do, but not quite :)  Maybe we could open a followup to fix this and reference the bug#?

> Georg and I asked you to use fxa-migration:state-changed here to match the
> STATE_USER value.  Is there a reason you're still not?

My apologies - I could have sworn I already fixed that!  Now I'm *sure* I have :)

I think this version has fixed all review comments from everyone, and reflects the discussion re types (ie, everything is a "daily" now).  So pending further comments/objections, I think I'm getting close to requesting review.
Attachment #8544373 - Attachment is obsolete: true
Attachment #8544373 - Flags: feedback?(sguha)
Attachment #8545056 - Flags: feedback?(sguha)
Attachment #8545056 - Flags: feedback?(gfritzsche)
Attachment #8545056 - Flags: feedback?(adw)
Comment on attachment 8545056 [details] [diff] [review]
0005-Bug-1097406-FHR-data-for-sync-migration.-r.patch

Review of attachment 8545056 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks. I was just mainly asking about the types to make sure there is a reason to specifically use daily.
Things look fine for me from a client-side data collection perspective.

::: services/sync/tests/unit/test_healthreport_migration.js
@@ +106,5 @@
> +      yield m.storage.enqueueOperation(Promise.resolve);
> +      let values = yield m.getValues();
> +      Assert.equal(values.days.size, 1);
> +      return values.days.getDay(now);
> +    }

Nit: missing semicolon.
Attachment #8545056 - Flags: feedback?(gfritzsche) → feedback+
Comment on attachment 8545056 [details] [diff] [review]
0005-Bug-1097406-FHR-data-for-sync-migration.-r.patch

Review of attachment 8545056 [details] [diff] [review]:
-----------------------------------------------------------------

It looks like notificationbox._removeNotificationElement() calls button.eventCallback("removed") when the close button is pressed but not when the notification is closed otherwise: http://mxr.mozilla.org/mozilla-central/source/toolkit/content/widgets/notification.xml#183  _removeNotificationElement ends up getting called in both if-branches of removeNotification().  It doesn't look like the Sync "subclasses" do anything to mess that up.

Maybe you've already tried that, but could you see if that works?
Attachment #8545056 - Flags: feedback?(adw) → feedback+
(In reply to Drew Willcoxon :adw from comment #32)
> It looks like notificationbox._removeNotificationElement() calls
> button.eventCallback("removed")

Oops, notification.eventCallback().
(In reply to Drew Willcoxon :adw from comment #32)

> It looks like notificationbox._removeNotificationElement() calls
> button.eventCallback("removed") when the close button is pressed but not
> when the notification is closed otherwise:

FTR, I just verified manually that unfortunately, that *is* called on all removals, not just the one we care about.  I chatted with Drew in IRC, and he said he was fine with a followup to fix this.  So I opened bug 1119020 and added the following comment:

  // _expectingNotifyClose is a hack that helps us determine if the
  // migration notification was closed due to being "dismissed" vs closed
  // due to one of the migration buttons being clicked.  It's ugly and somewhat
  // fragile, so bug 1119020 exists to help us do this better.
  _expectingNotifyClose: false,
(In reply to Mark Hammond [:markh] from comment #27)
> > it would be helpful to provide a diagrammatic state flow diagram of the states
> > (USER* and INTERNAL*) described in FxaMigrator.jsm. Does such a flow already
> > exist?
> 
> Not really - the entire UX flow is at
> https://www.lucidchart.com/publicSegments/view/53960efc-4b5c-484a-bcd5-
> 13930a00de31/image.pdf, but that's probably not useful in this context.  But
> I hope these words help:
> 
> When the Sync server advises a migration should proceed, we initially enter
> STATE_USER_FXA.  This means we are blocked on the user taking the action of
> creating a Firefox account (and the UI guides the user through this).
> 


1. So there is no state before STATE_USER_FXA?
2. Assuming the user decides to migrate, the time between states will likely be small. Since FHR records the last state for the session what state would you expect to see stored?
3. given (2) how will you use this?
Flags: needinfo?(mhammond)
(In reply to "Saptarshi Guha[:joy]" from comment #36)
> 1. So there is no state before STATE_USER_FXA?

Correct - that is the first state related to migration.

> 2. Assuming the user decides to migrate, the time between states will likely
> be small. Since FHR records the last state for the session what state would
> you expect to see stored?

That's correct too - we'd still expect to see a null state there (although we will also see existing SyncMeasurement1.activeProtocol move from "1.1" to "1.5" - that tells us a migration was completed)

> 3. given (2) how will you use this?

The main thing is to see where this does *not* happen quickly.  Something like:

* On a day where we have an entry for "accepted" (ie, the user elected to migrate) and migration is complete (ie, where SyncMeasurement1.activeProtocol=="1.5") then the plan has worked.  Ideally all users would see this.

* On a day where we have an entry for "accepted" but the state gets stuck at any value for subsequent days (and thus SyncMeasurement1.activeProtocol remains at "1.1") then we have a problem - the user has tried to migrate but has stalled.  The exact value for the state field will help us understand exactly where they are stuck and allow us to formulate a plan for either preventing the technical issue causing it, or for tweaking the UX such that the user is encouraged to complete the necessary steps.

As mentioned, we *do* expect some users will be "stuck" at one of the states until the next session - eg, we do expect to see single days with migration started but not complete.

Does this answer your question?
Flags: needinfo?(sguha)
Got it. So this state field is really for debugging and not to understand how people move from state to state. Makes sense.
Thanks
Flags: needinfo?(sguha)
Iteration: 37.3 - 12 Jan → 38.1 - 26 Jan
Flags: needinfo?(mhammond)
I think this patch is what we've agreed in other comments in this bug.  All counters are daily (as per comment 28), it has tests and I've manually verified the fields are what I expect using about:healthreport.
Attachment #8545056 - Attachment is obsolete: true
Attachment #8545056 - Flags: feedback?(sguha)
Attachment #8555026 - Flags: review?(gfritzsche)
Attachment #8555026 - Flags: review?(adw)
Iteration: 38.1 - 26 Jan → 38.2 - 9 Feb
Attachment #8555026 - Flags: review?(adw) → review+
Comment on attachment 8555026 [details] [diff] [review]
0001-Bug-1097406-FHR-data-for-sync-migration.-r.patch

Gentle review-ping for Georg!
Comment on attachment 8555026 [details] [diff] [review]
0001-Bug-1097406-FHR-data-for-sync-migration.-r.patch

Review of attachment 8555026 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good to me on the data collection bits.
Only two minor things below.

::: services/healthreport/docs/dataformat.rst
@@ +1653,5 @@
> +
> +Version 1
> +^^^^^^^^^
> +
> +Version 1 was introduced with Firefox 37 and includes the following properties:

Firefox 38 now.

::: services/sync/tests/unit/test_healthreport_migration.js
@@ +69,5 @@
> +    Assert.equal(day.get("state"), fxaMigrator.STATE_USER_FXA_VERIFIED);
> +
> +    // And an  internal state.
> +    Services.obs.notifyObservers(null, "fxa-migration:internal-state-changed",
> +                                 fxaMigrator.STATE_INTERNAL_WAITING_SYNC_COMPLETE);

Except the topic and the state, the lines for STATE_USER_FXA_VERIFIED and this are identical.
Might be nice to reduce repetition here.
Attachment #8555026 - Flags: review?(gfritzsche) → review+
(In reply to Georg Fritzsche [:gfritzsche] from comment #41)
> ::: services/healthreport/docs/dataformat.rst
> @@ +1653,5 @@
> > +
> > +Version 1
> > +^^^^^^^^^
> > +
> > +Version 1 was introduced with Firefox 37 and includes the following properties:
> 
> Firefox 38 now.

Or is this targeting uplift?
(In reply to Georg Fritzsche [:gfritzsche] from comment #42)
> > Firefox 38 now.
> 
> Or is this targeting uplift?

Yep, I'll be requesting uplift.  Pushed with your other note addressed:

https://hg.mozilla.org/integration/fx-team/rev/e834bf83217e
https://hg.mozilla.org/mozilla-central/rev/e834bf83217e
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 38
Comment on attachment 8555026 [details] [diff] [review]
0001-Bug-1097406-FHR-data-for-sync-migration.-r.patch

This enables FHR reporting for the Sync migration.

Approval Request Comment
[Feature/regressing bug #]: Sync Migration
[User impact if declined]: None, but we will have limited visibility into the success or otherwise of the migration.
[Describe test coverage new/current, TreeHerder]: New tests, all current tests pass.
[Risks and why]: Low risk, limited to Sync and Sync migration
[String/UUID change made/needed]: None
Attachment #8555026 - Flags: approval-mozilla-aurora?
Comment on attachment 8555026 [details] [diff] [review]
0001-Bug-1097406-FHR-data-for-sync-migration.-r.patch

I think that it is highly preferable that we're able to measure the success of sync migration. The changes here look safe. Aurora+
Attachment #8555026 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
That patch actually doesn't apply cleanly to Aurora.  Only reason I know is that I need to uplift bug 1120716 too, and it depends on this one.  So I'm requesting approval again with this patch that does apply cleanly.

Approval Request Comment: Please see comment 46 and 45.
Flags: needinfo?(lmandel)
Attachment #8559935 - Flags: approval-mozilla-aurora?
Comment on attachment 8559935 [details] [diff] [review]
Aurora (37) patch

Thanks for catching that the patch doesn't apply cleanly and producing a new patch. Aurora+
Flags: needinfo?(lmandel)
Attachment #8559935 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8555026 [details] [diff] [review]
0001-Bug-1097406-FHR-data-for-sync-migration.-r.patch

Clearing Aurora approval. Use the Aurora specific patch instead.
Attachment #8555026 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: