Closed Bug 1358907 Opened 7 years ago Closed 7 years ago

Telemetry shouldn't query the add-ons DB at startup

Categories

(Toolkit :: Telemetry, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla56
Performance Impact high
Tracking Status
firefox56 --- fixed

People

(Reporter: kmag, Assigned: aswan)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [measurement:client])

Attachments

(3 files, 2 obsolete files)

Telemetry currently makes several calls to getAddonsByTypes during early startup, which causes the add-ons database to be loaded and initialized very early.

We always load basic information about active add-ons synchronously in early startup, which includes information like IDs and versions. Telemetry can access information for free as early as it likes.

Information like add-on names is stored in a separate DB, and loading it can add tens to hundreds of milliseconds of delay to the startup sequence. We should wait until delayed startup for the first browser window before querying this information and adding it to telemetry.
I can think of 2 use cases that might require the addon data relatively early:
1) crash annotations per [1]
2) snippets / UI tour, but they currently don't need addons data AFAIK

SHIELD recipes run later in the session, so i don't think they would be affected.

@Benjamin, how fundamental is the addon data to startup crash diagnostics?
Would it be tolerable to have the addon data only annotated at delayed startup?
Please redirect if this now falls to ddursts team. 

1: https://dxr.mozilla.org/mozilla-central/rev/8e969cc9aff49f845678cba5b35d9dd8aa340f16/toolkit/components/telemetry/TelemetryStartup.js#34
Flags: needinfo?(benjamin)
Priority: -- → P2
Whiteboard: [qf] → [measurement:client] [qf]
It is fairly important that we know which addons are running as they start.

Presumably the addon manager knows, at it is starting an addon, the data for that addon? If so can we have the addon manager inform telemetry of this data instead of using the getAddonsByTypes API surface?
Flags: needinfo?(benjamin) → needinfo?(kmaglione+bmo)
See comment 0. We can get basic information about active add-ons at startup without querying the add-ons DB. In fact, the add-on manager already adds a list of active add-on IDs and their versions to telemetry and the crash reporter at startup:

http://searchfox.org/mozilla-central/rev/313e5199bf58200f158c6fcbe193e41b88ed58a6/toolkit/mozapps/extensions/internal/XPIProvider.jsm#3322-3328

For details like names, we currently need to query the DB. For those, we can either add them to the information we load at early startup, or wait until delayed startup to add that information to telemetry. I'm happy with either option, but I think I'd prefer the second.
Flags: needinfo?(kmaglione+bmo)
So yes, we should use the cached version at startup.

The question about whether we can afford to delay-load the rest is not as much a question for me as for analysts: would it be ok for the environment block to have some data missing near startup.

Which data could we have at startup without loading the addon DB today?:
* ID (yes)
* version (yes)
* scope
* type
* foreignInstall
* hasBinaryComponents
* installDay
* updateDay
* signedState
* isSystem
* isWebExtension
* MPC - bug 1358620
Flags: needinfo?(kmaglione+bmo)
(In reply to Benjamin Smedberg [:bsmedberg] from comment #4)
> Which data could we have at startup without loading the addon DB today?:

These are available:

> * ID (yes)
> * version (yes)
> * scope
> * type
> * updateDay
> * isSystem
> * isWebExtension
> * MPC - bug 1358620

These are not:

> * foreignInstall
> * hasBinaryComponents
> * installDay
> * signedState

hasBinaryComponents should not be possible anymore. I was planning to remove
it.
Flags: needinfo?(kmaglione+bmo)
Whiteboard: [measurement:client] [qf] → [measurement:client] [qf:p1]
Priority: P2 → P1
Assignee: nobody → aswan
Alessio (:dexter) can help with the Telemetry bits here.
The relevant test coverage for the TelemetryEnvironment is here:
https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/tests/unit/test_TelemetryEnvironment.js
We should add a test that covers the "early startup" scenario.
Before I dive in, can I get confirmation that we're okay with losing the 4 fields Kris mentions at the bottom of comment 5?
Flags: needinfo?(benjamin)
Missing that data temporarily near startup is fine.

If you're proposing losing those fields permanently, we'd have to have a discussion with some stakeholders.
Flags: needinfo?(benjamin)
Missing temporarily near startup is what we want. We should be relatively OK loading the DB after content paint.
Attachment #8868392 - Flags: review?(kmaglione+bmo)
Attachment #8868392 - Flags: review?(gfritzsche)
Comment on attachment 8868392 [details]
Bug 1358907 Part 1 Addon Manager hooks for startup telemetry

https://reviewboard.mozilla.org/r/139984/#review143832

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:40
(Diff revision 5)
> +XPCOMUtils.defineLazyServiceGetter(this, "aomStartup",
> +                                   "@mozilla.org/addons/addon-manager-startup;1",
> +                                   "amIAddonManagerStartup");

This is unused.

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:640
(Diff revision 5)
>  
> -        if (addon.signedState !== undefined)
> -          activeAddons[addon.id].signedState = addon.signedState;
> -
> +        // getActiveAddons() gives limited data during startup and full
> +        // data after the addons database is loaded.  Use the presence of
> +        // the "appDisabled" property as the indicator for which we have.
> +        if (addon.appDisabled !== undefined) {
> +          let installDate = new Date(Math.max(0, addon.installDate));

I know you didn't write this, but using Math.max here is weird... We should always have a valid date value here, but it should *definitely* never be a negative number (which is the only way this would be valid).

Maybe make it `addon.installDate || 0` instead?

::: toolkit/mozapps/extensions/internal/XPIProvider.jsm:3360
(Diff revision 5)
>            Services.obs.removeObserver(this, "final-ui-startup");
>          }
>        }, "final-ui-startup");
>  
> +      // Try to load the XPI database after other important startup
> +      // work is done and we have some time.

Please also mention that we're doing this for the sake of telemetry.

::: toolkit/mozapps/extensions/internal/XPIProvider.jsm:4640
(Diff revision 5)
>     * @param  aCallback
>     *         A callback to pass an array of Addons to
>     */
>    getAddonsByTypes(aTypes, aCallback) {
>      let typesToGet = getAllAliasesForTypes(aTypes);
> +    if (typesToGet && typesToGet.every(type => !ALL_TYPES.has(type))) {

Nit: `!typesToGet.some(type => ALL_TYPES.has(type))`

::: toolkit/mozapps/extensions/internal/XPIProvider.jsm:4678
(Diff revision 5)
> +    for (let [locName, data] of XPIStates.db.entries()) {
> +      let location = this.installLocationsByName[locName];
> +      if (!location) {
> +        logger.error(`Found non-existent install location ${locName}`);
> +        continue;
> +      }

for (let addon of XPIStates.enabledAddons()) {
      let {location} = addon;
      ...

::: toolkit/mozapps/extensions/test/xpcshell/test_shutdown.js:83
(Diff revision 5)
> -function run_test() {
> +async function run_test() {
> +  do_test_pending();

add_task(async () => {
      ...

::: toolkit/mozapps/extensions/test/xpcshell/test_webextension.js:326
(Diff revision 5)
>      name: "Meh API",
>    });
>  
>    await promiseInstallAllFiles([addonFile]);
>  
> -  let addons = await AddonManager.getAddonsByTypes(["apiextension"]);
> +  let addons = await AddonManager.getAddonsByTypes(["extension"]);

Hm. Well, I guess it's technically no worse than it was before...
Attachment #8868392 - Flags: review?(kmaglione+bmo) → review+
(In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #15)
> ::: toolkit/components/telemetry/TelemetryEnvironment.jsm:640
> (Diff revision 5)
> >  
> > -        if (addon.signedState !== undefined)
> > -          activeAddons[addon.id].signedState = addon.signedState;
> > -
> > +        // getActiveAddons() gives limited data during startup and full
> > +        // data after the addons database is loaded.  Use the presence of
> > +        // the "appDisabled" property as the indicator for which we have.
> > +        if (addon.appDisabled !== undefined) {
> > +          let installDate = new Date(Math.max(0, addon.installDate));
> 
> I know you didn't write this, but using Math.max here is weird... We should
> always have a valid date value here, but it should *definitely* never be a
> negative number (which is the only way this would be valid).
> 
> Maybe make it `addon.installDate || 0` instead?

We ran into various weird issues around addon properties misbehaving (probably addon authors doing weird things).
Unless this is a problem i'd rather error on the side of caution here, to protect our data collection from breakage.
Comment on attachment 8868392 [details]
Bug 1358907 Part 1 Addon Manager hooks for startup telemetry

https://reviewboard.mozilla.org/r/139984/#review143906

Cheers for getting this to work!
I took a first pass here and have some concerns on the Telemetry side.
:Dexter can take the next review once we settled the open questions & issues:

This is a potentially risky change for our Telemetry data.
Is there a plan for manual QA yet?

For documentation:
We need to update the documentation in environment.rst to make clear when which data is available.
We should also add a note to crash-ping.rst (which is probably the most affected).

Before landing this, we need to:
- make sure the pipeline schemas are up-to-date (:Dexter can check).
- make sure the main consumers of the data are prepared for the change (:Dexter, can you loop in :mdoglio, :ddurst?)

::: browser/experiments/Experiments.jsm
(Diff revision 5)
>      AddonManager.addAddonListener(listener);
>  
>      for (let addon of addons) {
> -      // Disabling the add-on before uninstalling is necessary to cause tests to
> -      // pass. This might be indicative of a bug in XPIProvider.
> -      // TODO follow up in bug 992396.

Should we close that bug?

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:485
(Diff revision 5)
>        return Promise.reject(err);
>      }
>  
> +    // If the addon database has not been loaded, listen for when it gets
> +    // loaded so we can immediately gather full data at that time.
> +    if (!AddonManagerPrivate.isDBLoaded()) {

For very short session, how can we make sure that the environment addons data is fully loaded?
We should do this before generating our "main" Telemetry ping.

We could e.g. wait/block in TelemetryEnvironments `shutdown()`?

We need test coverage for that.

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:486
(Diff revision 5)
>      }
>  
> +    // If the addon database has not been loaded, listen for when it gets
> +    // loaded so we can immediately gather full data at that time.
> +    if (!AddonManagerPrivate.isDBLoaded()) {
> +      const ADDON_LOAD_NOTIFICATION = "xpi-database-loaded";

When does that notification fire?
Can you document this?

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:636
(Diff revision 5)
> -        if (addon.signedState !== undefined)
> -          activeAddons[addon.id].signedState = addon.signedState;
> -
> +        // getActiveAddons() gives limited data during startup and full
> +        // data after the addons database is loaded.  Use the presence of
> +        // the "appDisabled" property as the indicator for which we have.
> +        if (addon.appDisabled !== undefined) {

Why not use something more clear like `AddonManagerPrivate.isDBLoaded()`?
Attachment #8868392 - Flags: review?(gfritzsche)
(In reply to Georg Fritzsche [:gfritzsche] from comment #16)
> (In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment
> > I know you didn't write this, but using Math.max here is weird... We should
> > always have a valid date value here, but it should *definitely* never be a
> > negative number (which is the only way this would be valid).
> >
> > Maybe make it `addon.installDate || 0` instead?
>
> We ran into various weird issues around addon properties misbehaving
> (probably addon authors doing weird things).
> Unless this is a problem i'd rather error on the side of caution here, to
> protect our data collection from breakage.

Math.max shouldn't give us any protection here. The only values it would
return usable results different from the input value for are null and negative
numbers. Negative numbers are valid date stamps. They're unexpected, but no
more unexpected than 0. For null, we'd get the same results as we'd get with
`installDate || 0`. For `undefined`, Math.max will give us NaN (which is not a
valid datestamp), where `installDate || 0` will give us 0, which is.
Comment on attachment 8868392 [details]
Bug 1358907 Part 1 Addon Manager hooks for startup telemetry

https://reviewboard.mozilla.org/r/139984/#review143906

> For very short session, how can we make sure that the environment addons data is fully loaded?
> We should do this before generating our "main" Telemetry ping.
> 
> We could e.g. wait/block in TelemetryEnvironments `shutdown()`?
> 
> We need test coverage for that.

I think we should probably OK receiving only partial data for a subset of sessions. We should still get enough data overall to get a good picture of the state of things.

If we really do need the complete data for all sessions, I'd be OK with adding a shutdown blocker.

> Why not use something more clear like `AddonManagerPrivate.isDBLoaded()`?

Yeah, this made me a bit uncomfortable too. I think I'd prefer adding another property, like `addon.partialData`, that we can check instead, though.
(In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #19)
> > For very short session, how can we make sure that the environment addons data is fully loaded?
> > We should do this before generating our "main" Telemetry ping.
> > 
> > We could e.g. wait/block in TelemetryEnvironments `shutdown()`?
> > 
> > We need test coverage for that.
> 
> I think we should probably OK receiving only partial data for a subset of
> sessions. We should still get enough data overall to get a good picture of
> the state of things.
> 
> If we really do need the complete data for all sessions, I'd be OK with
> adding a shutdown blocker.

The problem is that this changes the shape of the submitted data for potentially all pings that are using the environment data, including the important "main" ping.
This would bring up a bigger set of dependencies: we'd have to make sure various data jobs and analysis are not negatively impacted.

I would rather limit the impact here to the one ping that we need to (the crash ping).
(In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #18)
> (In reply to Georg Fritzsche [:gfritzsche] from comment #16)
> > (In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment
> > > I know you didn't write this, but using Math.max here is weird... We should
> > > always have a valid date value here, but it should *definitely* never be a
> > > negative number (which is the only way this would be valid).
> > >
> > > Maybe make it `addon.installDate || 0` instead?
> >
> > We ran into various weird issues around addon properties misbehaving
> > (probably addon authors doing weird things).
> > Unless this is a problem i'd rather error on the side of caution here, to
> > protect our data collection from breakage.
> 
> Math.max shouldn't give us any protection here. The only values it would
> return usable results different from the input value for are null and
> negative
> numbers. Negative numbers are valid date stamps. They're unexpected, but no
> more unexpected than 0. For null, we'd get the same results as we'd get with
> `installDate || 0`. For `undefined`, Math.max will give us NaN (which is not
> a
> valid datestamp), where `installDate || 0` will give us 0, which is.

Changing this will potentially change the submitted data to include negative values, which has impact on analysis etc.
Unless we have a good reason to change this (e.g. we need to see the negative values etc.) we should avoid this.

If there is a good reason to do this we should take it to a separate bug.
Comment on attachment 8868392 [details]
Bug 1358907 Part 1 Addon Manager hooks for startup telemetry

https://reviewboard.mozilla.org/r/139984/#review143832

> Hm. Well, I guess it's technically no worse than it was before...

In fact it is marginally better (considering just the work done inside XPIProvider) since the previous version actually fetched all types.
Comment on attachment 8868392 [details]
Bug 1358907 Part 1 Addon Manager hooks for startup telemetry

https://reviewboard.mozilla.org/r/139984/#review143906

> Should we close that bug?

I'll do it once this lands...

> I think we should probably OK receiving only partial data for a subset of sessions. We should still get enough data overall to get a good picture of the state of things.
> 
> If we really do need the complete data for all sessions, I'd be OK with adding a shutdown blocker.

Are you worried about something like startup crashes?  Or just a user manually opening then closing the browser very quickly?  We load the full database from sessionstore-windows-restored and empirically, the full data is loaded very quickly at startup in a local run.  So I'm skeptical that this is a scenario worth spending much energy on.

> Yeah, this made me a bit uncomfortable too. I think I'd prefer adding another property, like `addon.partialData`, that we can check instead, though.

good idea
(In reply to Georg Fritzsche [:gfritzsche] from comment #21)
> Changing this will potentially change the submitted data to include negative
> values, which has impact on analysis etc.
> Unless we have a good reason to change this (e.g. we need to see the
> negative values etc.) we should avoid this.
> 
> If there is a good reason to do this we should take it to a separate bug.

Do we have a reason to think that negative values happen in practice? That would mean the add-on has an install date before 01/01/1970. In any case, if this is something we care about, we should probably handle it in millisecondsToDays rather than here.
Attachment #8868392 - Flags: review?(alessio.placitelli)
:dexter can you estimate how long the things from the top of comment 17 are likely to take?  If its going to be more than a day or two, I'd like to land the add-ons manager bits for this big separately since there's a ton of other work going on there concurrently.
Flags: needinfo?(alessio.placitelli)
Comment on attachment 8868392 [details]
Bug 1358907 Part 1 Addon Manager hooks for startup telemetry

https://reviewboard.mozilla.org/r/139984/#review143906

> Are you worried about something like startup crashes?  Or just a user manually opening then closing the browser very quickly?  We load the full database from sessionstore-windows-restored and empirically, the full data is loaded very quickly at startup in a local run.  So I'm skeptical that this is a scenario worth spending much energy on.

I've split this into 2 patches, the shutdown handling is in part 2, it still needs a test though.
Attachment #8868392 - Flags: review?(alessio.placitelli)
Attachment #8869173 - Flags: review?(alessio.placitelli)
Keywords: leave-open
(In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #24)
> (In reply to Georg Fritzsche [:gfritzsche] from comment #21)
> > Changing this will potentially change the submitted data to include negative
> > values, which has impact on analysis etc.
> > Unless we have a good reason to change this (e.g. we need to see the
> > negative values etc.) we should avoid this.
> > 
> > If there is a good reason to do this we should take it to a separate bug.
> 
> Do we have a reason to think that negative values happen in practice? That
> would mean the add-on has an install date before 01/01/1970. In any case, if
> this is something we care about, we should probably handle it in
> millisecondsToDays rather than here.

There is no specific reason for this.
In general though, with data collection at the Firefox population scale we tend to hit all edge cases eventually.
This has risk to change the incoming data. What is the upside for doing this?
(In reply to Andrew Swan [:aswan] from comment #23)
> > If we really do need the complete data for all sessions, I'd be OK with adding a shutdown blocker.
> 
> Are you worried about something like startup crashes?  Or just a user
> manually opening then closing the browser very quickly?  We load the full
> database from sessionstore-windows-restored and empirically, the full data
> is loaded very quickly at startup in a local run.  So I'm skeptical that
> this is a scenario worth spending much energy on.

I'm just worried about short sessions where users close the browser quickly.
As long as the data is guaranteed to be loaded before normal shutdown we should be fine and can ask QA to confirm.
(In reply to Andrew Swan [:aswan] from comment #27)
> :dexter can you estimate how long the things from the top of comment 17 are
> likely to take?  If its going to be more than a day or two, I'd like to land
> the add-ons manager bits for this big separately since there's a ton of
> other work going on there concurrently.

Feel free to land the first patch if that makes things simpler, just don't forget to mark this bug as "leave-open" for now.
Flags: needinfo?(alessio.placitelli)
Comment on attachment 8869173 [details]
Bug 1358907 Part 3 Avoid reading XPI database at startup

https://reviewboard.mozilla.org/r/140784/#review144560

Thanks for working on this! Good work so far. I've got a couple of comments:

- looks like these changes mistakenly notify of an "addon change" during startup, when the addon database is loaded;
- these patches seem to lock up a couple of telemetry (e.g. test_TelemetrySession.js).

The comments in the review are mainly for the first point, with a few nits here and there.

::: commit-message-c7df5:5
(Diff revision 1)
> +Bug 1358907 Part 2 Avoid reading XPI database at startup
> +
> +Switch telemetry and experiments from AddonManager.getAddonsByTypes()
> +to AddonManager.getActiveAddons() to avoid unnecessarily reading the
> +extensions database.

Could you add to the commit message summary that "This could result in partial addon data early during startup"?

::: browser/experiments/Experiments.jsm:168
(Diff revision 1)
>  }
>  
>  // Returns a promise that is resolved with an Array<Addon> of the installed
>  // experiment addons.
>  function installedExperimentAddons() {
> -  return AddonManager.getAddonsByTypes(["experiment"]).then(addons => {
> +  return AddonManager.getActiveAddons(["experiment"]).then(addons => {

nit: could you please add a comment here mentioning that |getActiveAddons| can return partial data for the addons depending on when it's called?

It's not a real problem for the current callers, as far as I can tell.

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:479
(Diff revision 1)
>          () => this._shutdownBlocker());
>      } catch (err) {
>        return Promise.reject(err);
>      }
>  
> +    this.fullReportPromise = Promise.resolve();

What about something like:

```
this.fullReportPromise = new Promise(resolve =>    {
      if (AddonManagerPrivate.isDBLoaded()) {
        resolve();
        return;
      }
      
      ... 
}
```

so we don't risk of accessing different values for fullReportPromise?

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:490
(Diff revision 1)
> +        const ADDON_LOAD_NOTIFICATION = "xpi-database-loaded";
> +        let self = this;
> +        Services.obs.addObserver({
> +          observe(subject, topic, data) {
> +            Services.obs.removeObserver(this, ADDON_LOAD_NOTIFICATION);
> +            self.watchForChanges();

Why do we need to watchForChanges here? Can't we simply inconditionally do that at the other callsite?

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:491
(Diff revision 1)
> +        let self = this;
> +        Services.obs.addObserver({
> +          observe(subject, topic, data) {
> +            Services.obs.removeObserver(this, ADDON_LOAD_NOTIFICATION);
> +            self.watchForChanges();
> +            self._checkForChanges(ADDON_LOAD_NOTIFICATION);

I don't think we should use _checkForChanges_ here. This will end up notifying the rest of the Telemetry modules telling "hey, the user changed something", generating an [environment-changed subsession split](https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/concepts/sessions.html).

We should, instead, load the addon data that's already available and then refresh the current copy using _updateAddons

Some [consumers](http://searchfox.org/mozilla-central/rev/24c443a440104dabd9447608bd41b8766e8fc2f5/toolkit/components/telemetry/TelemetryStartup.js#34) require the data as soon we're starting up, even if partial. But we need to make sure they get notified when the full data is available too.

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:571
(Diff revision 1)
>      }
> -    return this._pendingTask;
> +
> +    // If we shutdown before the extensions database has been loaded,
> +    // wait for it to be loaded (fullReportPromise) and for the
> +    // corresponding details to get into the environment (_pendingTask).
> +    return this.fullReportPromise.then(() => this._pendingTask);

This is called on |profileBeforeChange|, which happens before |profileBeforeChangeTelemetry| so we should be good here.

Would you mind adding a comment about this anyway?

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:838
(Diff revision 1)
>    }
>  
>    let setup = () => {
>      this._initTask = null;
>      this._startWatchingPrefs();
> +    if (AddonManagerPrivate.isDBLoaded()) {

Why is this needed? Does loading the addon database trigger addon-related notifications (enabled, disabled, installed, etc.?)?

::: toolkit/components/telemetry/docs/data/environment.rst:203
(Diff revision 1)
>            },
>        },
>        addons: {
>          activeAddons: { // the currently enabled addons
>            <addon id>: {
> -            blocklisted: <bool>,
> +            blocklisted: <bool>,         // not present in "early" pings

Instead of marking what's not available, what about just mentioning what's always available in the paragraph below?

::: toolkit/components/telemetry/docs/data/environment.rst:204
(Diff revision 1)
>        },
>        addons: {
>          activeAddons: { // the currently enabled addons
>            <addon id>: {
> -            blocklisted: <bool>,
> -            description: <string>, // null if not available
> +            blocklisted: <bool>,         // not present in "early" pings
> +            description: <string>,       // null if not available

Is description available at early startup?

::: toolkit/components/telemetry/docs/data/environment.rst:386
(Diff revision 1)
>  activeAddons
>  ~~~~~~~~~~~~
>  
>  Starting from Firefox 44, the length of the following string fields: ``name``, ``description`` and ``version`` is limited to 100 characters. The same limitation applies to the same fields in ``theme`` and ``activePlugins``.
>  
> +Some of the fields in the record for each addon are not available during startup so a ping that is sent before the browser is fully started may be missing some fields.  These fields are noted in the Structure section above.

nit: can we change this to list the fields that will definitely be in each ping?

"The addon data will always contains at least the following fields: ``description``, ``version``, ... "

::: toolkit/components/telemetry/docs/data/environment.rst:386
(Diff revision 1)
>  activeAddons
>  ~~~~~~~~~~~~
>  
>  Starting from Firefox 44, the length of the following string fields: ``name``, ``description`` and ``version`` is limited to 100 characters. The same limitation applies to the same fields in ``theme`` and ``activePlugins``.
>  
> +Some of the fields in the record for each addon are not available during startup so a ping that is sent before the browser is fully started may be missing some fields.  These fields are noted in the Structure section above.

Can we also document _when_ does the full data becomes available? e.g. after which notification

::: toolkit/components/telemetry/tests/unit/test_TelemetryEnvironment.js:916
(Diff revision 1)
> +               "addons database is not loaded");
> +
> +  // Now simulate startup finishing and make sure telemetry has
> +  // had a chance to react.
> +  let changePromise = new Promise(resolve => {
> +    TelemetryEnvironment.registerChangeListener("blabbityblah", () => {

Unfortunately, you can't test it this way, is it implies using a notification of "change by the user".

I'm not sure if there's a better way other than waiting for the field to appear in the currentEnvironment though.
Attachment #8869173 - Flags: review?(alessio.placitelli)
Basically, there are 3 cases we need to consider:

1) Firefox is started and quickly closed.

For this case, we still want the full environment with all the addons when the 'shutdown' ping is sent.

2) Firefox is started and crashes early, before the addon database is completely loaded.

For this, given the previous comments, we're ok to have partial addon data.

3) Firefox is started and crashes after the addon database is loaded.

For this we want all the data, as with case 1.
Attachment #8868392 - Attachment is obsolete: true
Comment on attachment 8869173 [details]
Bug 1358907 Part 3 Avoid reading XPI database at startup

https://reviewboard.mozilla.org/r/140784/#review144560

> nit: could you please add a comment here mentioning that |getActiveAddons| can return partial data for the addons depending on when it's called?
> 
> It's not a real problem for the current callers, as far as I can tell.

Are you talking about the list of addons or the contents of the returned objects?  This code only cares about active addons so the list of addons should be complete.  This code only appears to care about the `id` property, documenting the behavior of `getActiveAddons()` at every caller seems excessive.

> I don't think we should use _checkForChanges_ here. This will end up notifying the rest of the Telemetry modules telling "hey, the user changed something", generating an [environment-changed subsession split](https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/concepts/sessions.html).
> 
> We should, instead, load the addon data that's already available and then refresh the current copy using _updateAddons
> 
> Some [consumers](http://searchfox.org/mozilla-central/rev/24c443a440104dabd9447608bd41b8766e8fc2f5/toolkit/components/telemetry/TelemetryStartup.js#34) require the data as soon we're starting up, even if partial. But we need to make sure they get notified when the full data is available too.

I've switched to `_updateAddons()` but how do you want the last part of the comment addressed?  A new event in TelemetryEnvironment and corresponding listener in TelemetryStartup?

> This is called on |profileBeforeChange|, which happens before |profileBeforeChangeTelemetry| so we should be good here.
> 
> Would you mind adding a comment about this anyway?

I'm sorry, I didn't follow this comment?

> Why is this needed? Does loading the addon database trigger addon-related notifications (enabled, disabled, installed, etc.?)?

No, this is left over from an earlier draft of the patch, I've removed it.
Comment on attachment 8869173 [details]
Bug 1358907 Part 3 Avoid reading XPI database at startup

Sorry reviewboard automatically set r? again even though I haven't addressed all the feedback yet.  There are a few questions above that I could use answers to but no need for another review pass yet.
Attachment #8869173 - Flags: review?(alessio.placitelli)
Comment on attachment 8869173 [details]
Bug 1358907 Part 3 Avoid reading XPI database at startup

https://reviewboard.mozilla.org/r/140784/#review144560

> I've switched to `_updateAddons()` but how do you want the last part of the comment addressed?  A new event in TelemetryEnvironment and corresponding listener in TelemetryStartup?

[Here](http://searchfox.org/mozilla-central/rev/2933592c4a01b634ab53315ce2d0e43fccb82181/toolkit/components/telemetry/TelemetryStartup.js#32) we already have the partial environment available, so we can simply make |onInitialized| complete when the full data is available. That should be as straightforward as changing [here](http://searchfox.org/mozilla-central/rev/2933592c4a01b634ab53315ce2d0e43fccb82181/toolkit/components/telemetry/TelemetryEnvironment.jsm#478) to wait for this.fullReportPromise to complete before updating the addons.

The other consumer is the experiments module, which I think would require the full addon info anyway.

> I'm sorry, I didn't follow this comment?

Whoops, let me be clearer :) This shutdown blocker is fired on |profileBeforeChange|, AFAICT. The global shutdown of telemetry happens on |profileBeforeChangeTelemetry|, which is fired after |profileBeforeChange|.

The rest of the Telemetry shutdown may require the "full" environment (all addon info) and this is guarateed by this shutdown being called before the "global" shutdown.
Attachment #8871094 - Attachment is obsolete: true
Comment on attachment 8869173 [details]
Bug 1358907 Part 3 Avoid reading XPI database at startup

https://reviewboard.mozilla.org/r/140784/#review144560

> Whoops, let me be clearer :) This shutdown blocker is fired on |profileBeforeChange|, AFAICT. The global shutdown of telemetry happens on |profileBeforeChangeTelemetry|, which is fired after |profileBeforeChange|.
> 
> The rest of the Telemetry shutdown may require the "full" environment (all addon info) and this is guarateed by this shutdown being called before the "global" shutdown.

Sorry if I'm being dense but I'm still not following and am not sure what you're looking for here...
Comment on attachment 8869173 [details]
Bug 1358907 Part 3 Avoid reading XPI database at startup

https://reviewboard.mozilla.org/r/140784/#review144560

> Sorry if I'm being dense but I'm still not following and am not sure what you're looking for here...

Nothing to be sorry about, no worries! The Telemetry shutdown process begins when |profileBeforeChangeTelemetry| is fired, because of [this blocker](http://searchfox.org/mozilla-central/rev/35b37316149108d53a02fb8498e250ea8a22ab5d/toolkit/components/telemetry/TelemetryController.jsm#752) in TelemetryController.jsm. The calls from there end up calling the shutdown sequence [here](http://searchfox.org/mozilla-central/rev/35b37316149108d53a02fb8498e250ea8a22ab5d/toolkit/components/telemetry/TelemetryController.jsm#793).

The important parts are the calls to _TelemetryEnvironment.shutdown()_ and _TelemetrySession.shutdown()_. We want to make sure that, before the latter starts, the TelemetryEnvironment is complete. This is needed because during the *TelemetrySession* shutdown, we generate two very important pings. We want the environemnt to be complete [there](http://searchfox.org/mozilla-central/rev/35b37316149108d53a02fb8498e250ea8a22ab5d/toolkit/components/telemetry/TelemetrySession.jsm#2077).

As far as I can tell |profileBeforeChange| gets fired before the telemetry one, so we should be save and have a complete environment when the shutdown ping gets generated.
Comment on attachment 8869173 [details]
Bug 1358907 Part 3 Avoid reading XPI database at startup

https://reviewboard.mozilla.org/r/140784/#review146410

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:493
(Diff revision 4)
> +      const ADDON_LOAD_NOTIFICATION = "xpi-database-loaded";
> +      let self = this;
> +      Services.obs.addObserver({
> +        observe(subject, topic, data) {
> +          Services.obs.removeObserver(this, ADDON_LOAD_NOTIFICATION);
> +          resolve(self._updateAddons());

I think that would race with pendingTask to update the data in the environment.

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:498
(Diff revision 4)
> +          resolve(self._updateAddons());
> +        },
> +      }, ADDON_LOAD_NOTIFICATION);
> +    });
> +
>      this._pendingTask = this._updateAddons().then(

What about doing something here like this:

```
let fullReportPromise = ...

this._pendingTask = async() => {
  try {
    // Perform an initial update of the addons in the environment.
    await this._updateAddons();
    // Wait for the full data to be available 
    await fullReportPromise;
    await this._updateAddons();
  } catch(e) {
    this._environment._log.error("init - Exception in the pending task", e);
  } finally {
    this._pendingTask = null;
  }
};
```

If we remove the updateAddons call from |resolve|, above, this should help insuring that we're on the safe side with |updateAddons| racyness.

This way you would simply return the pendingTask below and just block on it when shutting down, without other changes.

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:571
(Diff revision 4)
>      }
> -    return this._pendingTask;
> +
> +    // If we shutdown before the extensions database has been loaded,
> +    // wait for it to be loaded (fullReportPromise) and for the
> +    // corresponding details to get into the environment (_pendingTask).
> +    return this.fullReportPromise.then(() => this._pendingTask);

How does this behave when this.\_pendingTask is null?

::: toolkit/components/telemetry/tests/unit/test_TelemetryEnvironment.js:896
(Diff revision 4)
>    await spoofProfileReset();
>    TelemetryEnvironment.delayedInit();
>  });
>  
>  add_task(async function test_checkEnvironment() {
> +  // During startup we have partial addon records.

Could you please add a test to make sure that the environment is complete when shutting down?

::: toolkit/mozapps/extensions/AddonManager.jsm:2526
(Diff revision 4)
>      let addons = [];
>  
>      for (let provider of this.providers) {
>        let providerAddons;
>        if ("getActiveAddons" in provider) {
> -        providerAddons = await callProvider(provider, "getActiveAddons", aTypes);
> +        providerAddons = await callProvider(provider, "getActiveAddons", null, aTypes);

Is this a fix for the code that landed in the other patch or a leftover from some test?
Attachment #8869173 - Flags: review?(alessio.placitelli)
Comment on attachment 8869173 [details]
Bug 1358907 Part 3 Avoid reading XPI database at startup

This isn't actually ready for review yet, I'm not sure how to make reviewboard not automatically set that flag.

Anyhow, Kris, with this patch all xpcshell tests and mochitests are passing but talos is broken.  It looks like the talos extensions are not actually getting installed.  Or, more specifically, they don't end up in addonStartup.json after the first run.  Can you take a glance at the patch and see if anything jumps out at you?
Flags: needinfo?(kmaglione+bmo)
Attachment #8869173 - Flags: review?(alessio.placitelli)
(In reply to Andrew Swan [:aswan] from comment #49)
> Comment on attachment 8869173 [details]
> Bug 1358907 Part 2 Avoid reading XPI database at startup
> 
> This isn't actually ready for review yet, I'm not sure how to make
> reviewboard not automatically set that flag.

Andrew, any progress on this? Is there something that I can do to help moving forward with this?
Flags: needinfo?(aswan)
I'm still waiting for a response from Kris to comment 49.  I'll try to contact him directly today.
Flags: needinfo?(aswan)
Bug 1371888 _might_ be a dupe of this, but I'm not 100% sure.
See Also: → 1371888
No they are separate.
I pushed a rebased version of the patch, the talos issue from a few weeks ago seems to have been cleared up by something else that landed in the mean time :/

Unfortunately, test_TelemetryEnvrionment.js is now broken with this patch since bug 1359031 landed.  Briefly, the problem is that the bits that test the addons data in the environment separately check the data gathered during startup and data gathered after startup, where sessionstore-windows-restored is triggered to get us to "after startup".  But the bits that test the search service want to do basically the same thing, however that same event now causes the initialization of the search service (?!).  So, when the addons tests run, the search service gets initialized which causes the "during startup" part of the search engine test code here to fail:
http://searchfox.org/mozilla-central/rev/3291398f10dcbe192fb52e74974b172616c018aa/toolkit/components/telemetry/tests/unit/test_TelemetryEnvironment.js#1432-1437

Off the top of my head, a couple options are:
1. Re-organize a bunch of test_TelemetryEnvironment.js so that all the "during startup" stuff happens first, then we trigger sessionstore-windows-restored once, then move on to all the "after startup" stuff.  That's unattractive since currently all the related bits (eg addons, search, etc.) are grouped together and we would lose that.
2. Split test_TelemetryEnvironment.js into multiple tests for things like addons, search, etc.  That sounds appealing at first glance since that test is already extremely long.  I haven't looked at it closely enough to assess how practical it is though.

Dexter, any thoughts?
Flags: needinfo?(kmaglione+bmo) → needinfo?(alessio.placitelli)
(In reply to Andrew Swan [:aswan] from comment #55)
> I pushed a rebased version of the patch, the talos issue from a few weeks
> ago seems to have been cleared up by something else that landed in the mean
> time :/
> ...
> Dexter, any thoughts?

Gah, bad timing :( What about adding the following at the beginning of the test that's failing?

> // Make sure we don't have anything already cached for this test.
> await TelemetryEnvironment.testCleanRestart().onInitialized();

It should reset the state of the environment. Does this fix the fallout?
Flags: needinfo?(alessio.placitelli) → needinfo?(aswan)
Also, sorry for the time it took for me to reply to your ni? :-\
(In reply to Alessio Placitelli [:Dexter] from comment #57)
> (In reply to Andrew Swan [:aswan] from comment #55)
> > I pushed a rebased version of the patch, the talos issue from a few weeks
> > ago seems to have been cleared up by something else that landed in the mean
> > time :/
> > ...
> > Dexter, any thoughts?
> 
> Gah, bad timing :( What about adding the following at the beginning of the
> test that's failing?
> 
> > // Make sure we don't have anything already cached for this test.
> > await TelemetryEnvironment.testCleanRestart().onInitialized();
> 
> It should reset the state of the environment. Does this fix the fallout?

Not quite.  With that change alone, this test fails:
http://searchfox.org/mozilla-central/rev/fdb811340ac4e6b93703d72ee99217f3b1d250ac/toolkit/components/telemetry/tests/unit/test_TelemetryEnvironment.js#1454
since the search service itself is already initialized so `Services.search.init()` a few lines above is essentially a no-op.  That means that the search settings in the telemetry environment are never updated.  I tried to work around this by adding a call to
Services.obs.notifyObservers(null, "browser-search-service", "init-complete");

But now at the test line cited above, `defaultSearchEngine` is set to "google" instead of "telemetrySearchIdentifier".
I'm not familiar enough with how this test works to understand what's going wrong.
Flags: needinfo?(aswan) → needinfo?(alessio.placitelli)
https://hg.mozilla.org/integration/mozilla-inbound/rev/34e563f2316bee8e241bb11c752c21ba19a08d3f
Bug 1358907 Part 1a Use callProvider() properly in AddonManager.getActiveAddons r=me
Attachment #8882706 - Flags: review?(rhelmer)
Attachment #8882707 - Flags: review?(rhelmer)
Attachment #8869173 - Flags: review?(alessio.placitelli)
Comment on attachment 8869173 [details]
Bug 1358907 Part 3 Avoid reading XPI database at startup

https://reviewboard.mozilla.org/r/140784/#review159102

::: toolkit/components/telemetry/tests/unit/test_TelemetryEnvironment.js:908
(Diff revision 9)
> +
> +  checkAddonsSection(TelemetryEnvironment.currentEnvironment, false, true);
> +
> +  // Now continue with startup.
> +  finishAddonManagerStartup();
>    let environmentData = await TelemetryEnvironment.onInitialized();

Can this lead to intermittents on try? Maybe we should do something like this to make sure we don't miss the notification:

```
 let promiseInitEnvironment = TelemetryEnvironment.onInitialized();
 finishAddonManagerStartup();
 let environmentData = await promiseInitEnvironment;
```
Attachment #8869173 - Flags: review?(alessio.placitelli) → review+
Flags: needinfo?(alessio.placitelli)
Attachment #8882706 - Flags: review?(rhelmer) → review?(dtownsend)
Attachment #8882707 - Flags: review?(rhelmer) → review?(dtownsend)
Comment on attachment 8882706 [details]
Bug 1358907 Part 1b Handle types argument to getActiveAddons()

https://reviewboard.mozilla.org/r/153786/#review159134
Attachment #8882706 - Flags: review?(dtownsend) → review+
Comment on attachment 8882707 [details]
Bug 1358907 Part 2 Wait for database load before shutting down

https://reviewboard.mozilla.org/r/153788/#review159138

I think there's a slight change in the timing of anything waiting on the promise returned from this function in that it will now definitely be after the references are cleared. That's probably what we want though.
Attachment #8882707 - Flags: review?(dtownsend) → review+
Pushed by aswan@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d112bc46842f
Part 1b Handle types argument to getActiveAddons() r=mossop
https://hg.mozilla.org/integration/autoland/rev/5368cc2c4f38
Part 2 Wait for database load before shutting down r=mossop
https://hg.mozilla.org/integration/autoland/rev/529eccc2bc69
Part 3 Avoid reading XPI database at startup r=Dexter
Comment on attachment 8869173 [details]
Bug 1358907 Part 3 Avoid reading XPI database at startup

https://reviewboard.mozilla.org/r/140784/#review160432

r=me for the XPIProvider changes.

::: toolkit/mozapps/extensions/internal/XPIProvider.jsm:2287
(Diff revision 11)
> +      // and a visible toolbox window.
> +      // Finally, we have a test-only event called test-load-xpi-database
> +      // as a temporary workaround for bug 1372845.  The latter can be
> +      // cleaned up when that bug is resolved.
>        if (!this.isDBLoaded) {
> -        Services.obs.addObserver({
> +        const EVENTS = [ "sessionstore-windows-restored", "xul-window-visible", "test-load-xpi-database" ];

Nit: No whitespace inside []

::: toolkit/mozapps/extensions/internal/XPIProvider.jsm:2288
(Diff revision 11)
> +        let observer = {
>            observe(subject, topic, data) {

Please make this a plain function rather than an object with an `observe` method.
Attachment #8869173 - Flags: review+
devtools failures have been greened up:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=448f4efc7905818c75cead9fa2ce9d8cc75e922c

Lets try this again
Flags: needinfo?(aswan)
Pushed by aswan@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/51a1bde393d2
Part 1b Handle types argument to getActiveAddons() r=mossop
https://hg.mozilla.org/integration/autoland/rev/3baed119e198
Part 2 Wait for database load before shutting down r=mossop
https://hg.mozilla.org/integration/autoland/rev/12cfb7c5610f
Part 3 Avoid reading XPI database at startup r=Dexter,kmag
Status: NEW → RESOLVED
Closed: 7 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Depends on: 1379831
This also landed some improvements:

== Change summary for alert #7776 (as of July 08 2017 01:24 UTC) ==

Improvements:

  3%  sessionrestore linux64 opt e10s     764.83 -> 744.27
  3%  ts_paint linux64 pgo e10s           1,045.79 -> 1,019.50
  2%  ts_paint linux64 opt e10s           1,185.58 -> 1,157.17
  2%  ts_paint_webext linux64 pgo e10s    1,130.42 -> 1,104.25
  2%  sessionrestore_no_auto_restore linux64 opt e10s813.21 -> 794.67
  2%  ts_paint_webext linux64 opt e10s    1,283.54 -> 1,254.33

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7776
I also noticed an Autophone improvement:

== Change summary for alert #7756 (as of July 08 2017 01:24 UTC) ==

Improvements:

  2%  remote-blank summary android-4-4-armv7-api15 opt      984.89 -> 961.15

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7756
Depends on: 1381432
Depends on: 1398287
Regressions: 1601678
Performance Impact: --- → P1
Whiteboard: [measurement:client] [qf:p1] → [measurement:client]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: