Store recency of browser data in telemetry when importing to see how good a predictor default browser is

RESOLVED FIXED in Firefox 50

Status

()

defect
P2
normal
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: Gijs, Assigned: Gijs)

Tracking

(Blocks 1 bug)

Trunk
Firefox 50
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox50 fixed)

Details

Attachments

(1 attachment)

+++ This bug was initially created as a clone of Bug #1275114 +++

Bug 1248077 seeks to improve the experience of the first-run "import data from another browser" process by doing it automatically. We'd like to gather some additional data to help understand if our selection of what browser to import from is doing a good job, by looking more closely at the choices users are currently making. As with the original probes added in bug 731025, these will need to be opt-out because they need to record things before the browser has fully started for the first time.

<snip #1 and #2 which are in bug 1275114>

3) How fresh/relevant is the data available from other browsers? In some cases, the user's default browser may not be the browser they actually use the most (and thus want to import from). In other cases, the most-recently-used browser may not have been used for anything more than downloading Firefox, so importing that minimal history and default bookmarks is not really useful. Again, with telemetry analysis, we'd like to see how often the browser users actually import from is actually what we think is the freshest.


#3 is a little tricky to define precisely, as difficulty of implementation may guide what we actually do. I'd suggest a erring on the side of simplicity to see what the initial signal looks like. One option would be to pick a particular data source (bookmarks? history? passwords? all?), and see how old the most recent entry is. A file timestamp might be enough. Another option would be to see how many entries there are for a type -- is the user importing from the browser with lots of bookmarks/history/passwords? Are browsers with < X bookmarks/history/passwords generally ignored?
I've tested the dates produced for all the migrators except 360se. For both IE and Safari, I'm not sure to what degree they are influenced by other system uses of their core components. Certainly on my Mac, Safari was apparently used earlier today, but I have no recollection of doing so, and checking the safari.app file in Finder.app, it reports it was last opened May 9th... I don't think that's really something we can address, keeping in mind that not everyone will use bookmarks or more distinctive "browser type" data, as opposed to history/cookies.
Comment on attachment 8757996 [details]
Bug 1276694 - store recency of browser data in telemetry when importing to see how good a predictor default browser is,

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56348/diff/1-2/
(In reply to :Gijs Kruitbosch from comment #2)
> I've tested the dates produced for all the migrators except 360se.

Hi Gijs, if you need someone to test the 360se migrator, I think Yanfang will be able to help.
Summary: Add telemetry probes to understand what browser users are initially importing data from → Store recency of browser data in telemetry when importing to see how good a predictor default browser is
Comment on attachment 8757996 [details]
Bug 1276694 - store recency of browser data in telemetry when importing to see how good a predictor default browser is,

https://reviewboard.mozilla.org/r/56348/#review55236

::: browser/components/migration/EdgeProfileMigrator.js:440
(Diff revision 2)
> +    return OS.File.stat(path).catch(_ => null).then(info => {
> +      return info ? info.lastModificationDate : new Date(0);
> +    });
> +  });
> +  return Promise.all(datePromises).then(dates => {
> +    return new Date(Math.max.apply(Math, dates));

I think all this code would be simpler if you dropped using Date objects, and just used the timestamp integers directly (ala Unix time_t). Less runtime garbage, too. :)

Fair to have getLastUsedDate itself return a Date object.

::: browser/components/migration/IEProfileMigrator.js:511
(Diff revision 2)
> +  timePromises.push(new Promise(resolve => {
> +    let typedURLs = new Map();
> +    try {
> +      typedURLs = MSMigrationUtils.getTypedURLs("Software\\Microsoft\\Internet Explorer");
> +    } catch (ex) {}
> +    let times = [0, ... typedURLs.values()];

Do we not get these in any particular order? Seems like there could be a lot of these -- do we know how long this call may take?

::: browser/components/migration/content/migration.js:542
(Diff revision 2)
>      this._listItems("doneItems");
> +  },
> +
> +  reportDataRecencyTelemetry() {
> +    let histogram = Services.telemetry.getKeyedHistogramById("FX_STARTUP_MIGRATION_DATA_RECENCY");
> +    for (let [key, migrator] of this._availableMigrators) {

Just to confirm -- the key here is the same as used for the migrator names in the other probes? So that it should simple (hopefully!) to generate an analysis saying if the migrator the user chose was the most recent?

This is important to verify, because just the probe in isolation or looking at telemetry.mo isn't going to tell us anything useful.

Hmm, how about also adding a simple probe to answer that directly, without need for analysis? We still might want to dig into the full recency data (e.g. do we find a minimum threshold for cases like "I just now used IE to download Firefox on my new computer, but don't want to import data from anything"?), but adding a probe here is probably a lot less work.

::: browser/components/migration/content/migration.js:546
(Diff revision 2)
> +    let histogram = Services.telemetry.getKeyedHistogramById("FX_STARTUP_MIGRATION_DATA_RECENCY");
> +    for (let [key, migrator] of this._availableMigrators) {
> +      // No block-scoped let in for...of loop conditions, so get the source:
> +      let localKey = key;
> +      migrator.getLastUsedDate().then(date => {
> +        let diffInHours = Math.min(8760, Math.round((Date.now() - date) / (60 * 60 * 1000)));

Why 8760? Ah, that's 24 * 365.

TBH I'd find this clearer as:

const ONE_YEAR = 24 * 365;
let diffInHours = Math.round(...);
if (diffInHours > ONE_YEAR)
  diffInHours = ONE_YEAR;
Attachment #8757996 - Flags: review?(dolske) → review+
https://reviewboard.mozilla.org/r/56348/#review55670

::: browser/components/migration/IEProfileMigrator.js:511
(Diff revision 2)
> +  timePromises.push(new Promise(resolve => {
> +    let typedURLs = new Map();
> +    try {
> +      typedURLs = MSMigrationUtils.getTypedURLs("Software\\Microsoft\\Internet Explorer");
> +    } catch (ex) {}
> +    let times = [0, ... typedURLs.values()];

There are only normally less than 50 of these. The call is synchronous, and there's no current API to get these individually, although I *think* the list is sorted. I'm not convinced this is worth optimizing for now, though, unless you feel strongly?

::: browser/components/migration/content/migration.js:542
(Diff revision 2)
>      this._listItems("doneItems");
> +  },
> +
> +  reportDataRecencyTelemetry() {
> +    let histogram = Services.telemetry.getKeyedHistogramById("FX_STARTUP_MIGRATION_DATA_RECENCY");
> +    for (let [key, migrator] of this._availableMigrators) {

So... this is a bit tricky. We accumulate on an enumerated histogram which uses numbers to identify browsers for a number of the other ones (and counts 1 in the bucket corresponding to the browser that gets used). We can't do that here because we're not accumulating, we actually want the values that get submitted here.

I tried to use a keyed histogram with numeric keys, but that doesn't work - the contents need to be a string. So continuing to use a string seems like the most sensible option here.
Comment on attachment 8757996 [details]
Bug 1276694 - store recency of browser data in telemetry when importing to see how good a predictor default browser is,

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56348/diff/2-3/
Attachment #8757996 - Attachment description: MozReview Request: Bug 1276694 - store recency of browser data in telemetry when importing to see how good a predictor default browser is, r?dolske → Bug 1276694 - store recency of browser data in telemetry when importing to see how good a predictor default browser is,
Comment on attachment 8757996 [details]
Bug 1276694 - store recency of browser data in telemetry when importing to see how good a predictor default browser is,

Manually re-requesting review...

I added the boolean telemetry you asked for, dolske, and I fixed and replied to some of your other comments. I also realized I'd forgotten to implement this for Firefox - I'm just always returning unix epoch so we never think we're the best browser to import from. Little lies, hopefully acceptable? :-)

I also consolidated some naming in the different methods to make them a bit more uniform.

Finally, I nixed the cookies stuff from the edge implementation and reused the typedURLs stuff from the IE one in it... though now that I think about it, maybe I should leave the cookies check in? Up to you... I think I was confused by the relation with "download this browser with IE, then never use IE again", and how mozilla.org would probably set cookies. But then again, we check the timestamp on the cookie dir in IE, too. The flip side of that stuff is that the only other measures we have for both browsers is effectively when people last bookmarked something or wrote a fully-fledged web address in the URL bar (searches don't trigger it), which might be 'never' for some users...
Attachment #8757996 - Flags: review+ → review?(dolske)
Comment on attachment 8757996 [details]
Bug 1276694 - store recency of browser data in telemetry when importing to see how good a predictor default browser is,

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56348/diff/3-4/
(put the cookies stuff back in the edge migrator. Let me know and I can take it out both there and for IE.)
Comment on attachment 8757996 [details]
Bug 1276694 - store recency of browser data in telemetry when importing to see how good a predictor default browser is,

Benjamin, can you privacy-review the telemetry addition here?
Attachment #8757996 - Flags: review?(benjamin)
Attachment #8757996 - Flags: review?(benjamin) → review-
Comment on attachment 8757996 [details]
Bug 1276694 - store recency of browser data in telemetry when importing to see how good a predictor default browser is,

https://reviewboard.mozilla.org/r/56348/#review56956

::: toolkit/components/telemetry/Histograms.json:4464
(Diff revision 4)
> +    "keyed": true,
> +    "kind": "exponential",
> +    "n_buckets": 50,
> +    "high": 8760,
> +    "releaseChannelCollection": "opt-out",
> +    "description": "The 'last modified' time of the data we imported on the initial profile migration (time delta with 'now' at the time of migration, in hours)."

Why is this keyed? What are the key values?

::: toolkit/components/telemetry/Histograms.json:4473
(Diff revision 4)
> +    "alert_emails": ["gijs@mozilla.com"],
> +    "expires_in_version": "53",
> +    "keyed": true,
> +    "kind": "boolean",
> +    "releaseChannelCollection": "opt-out",
> +    "description": "Whether the browser we migrated from was the browser with the most recent data."

Again, key formats and values are mysterious. Is the key the previous default browser? Or the browser that wasn't the former default browser but is now? Or does this record a value for every "found" browser?
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #12)
> Comment on attachment 8757996 [details]
> Bug 1276694 - store recency of browser data in telemetry when importing to
> see how good a predictor default browser is,
> 
> https://reviewboard.mozilla.org/r/56348/#review56956
> 
> ::: toolkit/components/telemetry/Histograms.json:4464
> (Diff revision 4)
> > +    "keyed": true,
> > +    "kind": "exponential",
> > +    "n_buckets": 50,
> > +    "high": 8760,
> > +    "releaseChannelCollection": "opt-out",
> > +    "description": "The 'last modified' time of the data we imported on the initial profile migration (time delta with 'now' at the time of migration, in hours)."
> 
> Why is this keyed? What are the key values?

Individual browsers from which we could potentially (have) import(ed).

> ::: toolkit/components/telemetry/Histograms.json:4473
> (Diff revision 4)
> > +    "alert_emails": ["gijs@mozilla.com"],
> > +    "expires_in_version": "53",
> > +    "keyed": true,
> > +    "kind": "boolean",
> > +    "releaseChannelCollection": "opt-out",
> > +    "description": "Whether the browser we migrated from was the browser with the most recent data."
> 
> Again, key formats and values are mysterious. Is the key the previous
> default browser? Or the browser that wasn't the former default browser but
> is now? Or does this record a value for every "found" browser?

This has nothing to do with the default browser. It records a boolean for the browser from which we migrated (and that browser only) that indicates whether or not that browser had the most recent data. That browser may or may not be the default browser here. The other histogram records the recency of data for every browser for which data was available.

How am I supposed to document the keys? It doesn't seem like the histograms.json file supports comments. The values are somewhat self-documenting when you actually see them ("ie", "edge", "safari", "chrome", "chromium", etc.).
Flags: needinfo?(benjamin)
The documentation should happen in the histogram description. This will show up on telemetry.mozilla.org and we're also going to (soon I hope) include all the histogram data into the auto-generated telemetry docs. An interested onlooker should be able to look at just the description/docs and know what will be collected, without having to look at the actual data or test.
Flags: needinfo?(benjamin)
Comment on attachment 8757996 [details]
Bug 1276694 - store recency of browser data in telemetry when importing to see how good a predictor default browser is,

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56348/diff/4-5/
Attachment #8757996 - Flags: review- → review?(benjamin)
Comment on attachment 8757996 [details]
Bug 1276694 - store recency of browser data in telemetry when importing to see how good a predictor default browser is,

https://reviewboard.mozilla.org/r/56348/#review57448
Attachment #8757996 - Flags: review?(benjamin) → review+
Comment on attachment 8757996 [details]
Bug 1276694 - store recency of browser data in telemetry when importing to see how good a predictor default browser is,

https://reviewboard.mozilla.org/r/56348/#review57898
Attachment #8757996 - Flags: review?(dolske) → review+
Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/fx-team/rev/3ee696c36f57
store recency of browser data in telemetry when importing to see how good a predictor default browser is, r=dolske,bsmedberg
https://hg.mozilla.org/mozilla-central/rev/3ee696c36f57
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 50
Whiteboard: [migration-needs-uplift]
Whiteboard: [migration-needs-uplift]
Depends on: 1298244
Depends on: 1310596
You need to log in before you can comment on or make changes to this bug.