Closed Bug 1439519 Opened 6 years ago Closed 6 years ago

Use eager-push/lazy-pull to get active plugin information for TelemetryEnvironment

Categories

(Toolkit :: Telemetry, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla60
Performance Impact high
Tracking Status
firefox60 --- fixed

People

(Reporter: Gijs, Assigned: Gijs)

References

(Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: [measurement:client] [fxperf:p1])

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #1371888 +++

Here's a profile:

https://perfht.ml/2sdsDQN

Here's the code in question: https://dxr.mozilla.org/mozilla-central/rev/f60cfdc81e946cf052cc29b8beae7c50613a3039/toolkit/components/telemetry/TelemetryEnvironment.jsm#737-772

We should be able to make the blocklist indicate when it's been loaded (which will happen if/when things try to use plugins) and not force it to load from telemetry.

We can add some dummy data for plugins in telemetryenvironment prior to that as a 'canary' so that we can tell if this approach misbehaves - though we'd probably still be able to correlate plugin data from other pings anyway.
What is the relationship between this bug and bug 1371888?

The current requirement is to have plugin data available in the environment data for startup crash diagnosis.
This is annotated early in startup here:
https://searchfox.org/mozilla-central/source/toolkit/components/telemetry/TelemetryStartup.js

Do we need bug 1371888 first to achieve this?
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to Georg Fritzsche [:gfritzsche] from comment #1)
> What is the relationship between this bug and bug 1371888?
> 
> The current requirement is to have plugin data available in the environment
> data for startup crash diagnosis.
> This is annotated early in startup here:
> https://searchfox.org/mozilla-central/source/toolkit/components/telemetry/
> TelemetryStartup.js

I don't understand this. We don't load the plugins at that point, right? So then we also don't need them for startup crash diagnosis. Plus the environment information explicitly gets updated by other add-on and experiment changes. Do those never make it into the crash information?

> Do we need bug 1371888 first to achieve this?

In theory, no. In practice, I might just end up doing all the work here because it seems some things implicitly depend on telemetry forcing the load of the blocklist at this point.
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(gfritzsche)
(In reply to :Gijs from comment #2)
> (In reply to Georg Fritzsche [:gfritzsche] from comment #1)
> > What is the relationship between this bug and bug 1371888?
> > 
> > The current requirement is to have plugin data available in the environment
> > data for startup crash diagnosis.
> > This is annotated early in startup here:
> > https://searchfox.org/mozilla-central/source/toolkit/components/telemetry/
> > TelemetryStartup.js
> 
> I don't understand this. We don't load the plugins at that point, right? So
> then we also don't need them for startup crash diagnosis. Plus the
> environment information explicitly gets updated by other add-on and
> experiment changes. Do those never make it into the crash information?

This is the requirement i'm currently aware of.
It might not be critical at this point, but we should make sure with whoever uses the data first.

ddurst, did something come out of this conversation? https://mozilla.logbot.info/telemetry/20180131
Or do you know who to flag for more information?
Flags: needinfo?(gfritzsche) → needinfo?(ddurst)
Comment on attachment 8952440 [details]
Bug 1439519 - fetch plugin information lazily in TelemetryEnvironment.jsm,

https://reviewboard.mozilla.org/r/221662/#review227606

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:761
(Diff revision 1)
> +        this._blocklistObserverAdded = true;
> +      }
> +      return [{
> +        name: "dummy", version: "0.1", description: "Blocklist unavailable",
> +        blocklisted: false, disabled: true, clicktoplay: false,
> +        mimeTypes: "text/there.is.only.blocklist",

Self-nit: This should be an array.
The general take was that there have been no identified startup crashes stemming from plugins. But rather than telephone game this, I'll NI jimm.
Flags: needinfo?(ddurst) → needinfo?(jmathies)
Comment on attachment 8952440 [details]
Bug 1439519 - fetch plugin information lazily in TelemetryEnvironment.jsm,

https://reviewboard.mozilla.org/r/221662/#review228222

I'm taking this review off Georg :-) This looks good, it's only lacking the documentation updates (that's why I'm holding back the r+). Let's also wait for Jimm to clear up the ni? before landing any of this.

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:753
(Diff revision 1)
>     * @return Object containing the plugins data.
>     */
>    _getActivePlugins() {
> +    // If we haven't yet loaded the blocklist, pass back dummy data for now,
> +    // and add an observer to update this data as soon as we get it.
> +    if (!Services.blocklist.isLoaded) {

Can you please document this new behaviour by adding a new sub-section [here](https://dxr.mozilla.org/mozilla-central/rev/3904c3f9314fd040828c5f1cf1fcc86fd8adfe3e/toolkit/components/telemetry/docs/data/environment.rst#398)? We shouls mention that:

- blocklist data might be missing early at startup;
- what is expected in the environment in that case;
- what happens when the blocklist is loaded (and when that's expected to happen).

Same thing for the GMPlugins.
Attachment #8952440 - Flags: review?(alessio.placitelli)
(In reply to David Durst [:ddurst] from comment #6)
> The general take was that there have been no identified startup crashes
> stemming from plugins. But rather than telephone game this, I'll NI jimm.

I think we wanted this information so that when session restore restored tabs with plugins that crashed, we'd have that data available in crash reporting. We've since changed the behavior of tab restore, and we've also added click-to-play for plugins, so the startup + plugin crash risk is greatly reduced. If we can grab a little startup perf by avoid this I'd say go for it.

cc'ing Chris Peterson here in case he has an opinion.
Flags: needinfo?(jmathies)
Attachment #8952440 - Flags: review?(gfritzsche)
Attachment #8952440 - Flags: review?(alessio.placitelli)
This looks like it's being actively worked on by Gijs, so I'm going to mark this fxperf:p1.
Whiteboard: [measurement:client][qf:f61][qf:p1] [fxperf] → [measurement:client][qf:f61][qf:p1] [fxperf:p1]
Comment on attachment 8952440 [details]
Bug 1439519 - fetch plugin information lazily in TelemetryEnvironment.jsm,

https://reviewboard.mozilla.org/r/221662/#review230310

Looks good. The following comments are just suggestions.

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:577
(Diff revision 2)
>    observe(aSubject, aTopic, aData) {
>      this._environment._log.trace("observe - Topic " + aTopic);
> +    if (aTopic == "experiment-changed") {
> -    this._checkForChanges("experiment-changed");
> +      this._checkForChanges("experiment-changed");
> +    } else if (aTopic == BLOCKLIST_LOADED_TOPIC) {
> +      Services.obs.removeObserver(this, BLOCKLIST_LOADED_TOPIC);

Should this reset this._blocklistObserverAdded to false?

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:605
(Diff revision 2)
>  
>    _shutdownBlocker() {
>      if (this._loaded) {
>        AddonManager.removeAddonListener(this);
>        Services.obs.removeObserver(this, EXPERIMENTS_CHANGED_TOPIC);
> +      try {

And this check for this._blocklistObserverAdded instead of a try/catch?

::: toolkit/mozapps/extensions/nsBlocklistService.js:810
(Diff revision 2)
>    _isBlocklistLoaded() {
>      return this._addonEntries != null && this._gfxEntries != null && this._pluginEntries != null;
>    },
>  
> +  get isLoaded() {
> +    return this._isBlocklistLoaded();

Should we just rename _isBlocklistLoaded to isLoaded everywhere? (there are only 11 occurrences on 2 files)
Attachment #8952440 - Flags: review?(florian) → review+
Attachment #8952440 - Flags: review?(gfritzsche) → review?(chutten)
Comment on attachment 8952440 [details]
Bug 1439519 - fetch plugin information lazily in TelemetryEnvironment.jsm,

https://reviewboard.mozilla.org/r/221662/#review230728

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:578
(Diff revision 2)
>      this._environment._log.trace("observe - Topic " + aTopic);
> +    if (aTopic == "experiment-changed") {
> -    this._checkForChanges("experiment-changed");
> +      this._checkForChanges("experiment-changed");
> +    } else if (aTopic == BLOCKLIST_LOADED_TOPIC) {
> +      Services.obs.removeObserver(this, BLOCKLIST_LOADED_TOPIC);
> +      this._checkForChanges("blocklist-changed");

I don't think we actually want to trigger an environment change at this point. That could trigger a new "main" ping (reason: "environment-change"), which will just add noise to every session.

I think instead we want to silently update the environment once the blocklist has loaded.

::: toolkit/components/telemetry/TelemetryEnvironment.jsm:809
(Diff revision 2)
>     */
>    async _getActiveGMPlugins() {
> +    // If we haven't yet loaded the blocklist, pass back dummy data for now,
> +    // and add an observer to update this data as soon as we get it.
> +    if (!Services.blocklist.isLoaded) {
> +      if (!this._blocklistObserverAdded) {

There is no _blocklistObserverAdded defined on EnvironmentAddonBuilder
Attachment #8952440 - Flags: review?(chutten)
Comment on attachment 8952440 [details]
Bug 1439519 - fetch plugin information lazily in TelemetryEnvironment.jsm,

https://reviewboard.mozilla.org/r/221662/#review231388

Clearing review for the moment due to circular services dependencies.
Attachment #8952440 - Flags: review?(chutten)
The re-entrant JS stack is:

 0:01.59 pid:19012 0 _getActivePlugins() ["resource://gre/modules/TelemetryEnvironment.jsm":780]
 0:01.59 pid:19012     this = [object Object]
 0:01.59 pid:19012 1 observe(aSubject = null, aTopic = "blocklist-loaded", aData = null) ["resource://gre/modules/TelemetryEnvironment.jsm":584]
 0:01.59 pid:19012     this = [object Object]
 0:01.59 pid:19012 2 _loadBlocklistFromXML(doc = [object XMLDocument]) ["NightlyDebug.app/Contents/Resources/components/nsBlocklistService.js":908]
 0:01.59 pid:19012     this = [object Object]
 0:01.59 pid:19012 3 _loadBlocklistFromString(text = "<?xml version="1.0" encoding="UTF-8"?>
 0:01.59 pid:19012 <blocklist xmlns="http://www.mozilla.org/2006/addons-blocklist">
 0:01.59 pid:19012 </blocklist>
 0:01.59 pid:19012 ") ["NightlyDebug.app/Contents/Resources/components/nsBlocklistService.js":875]
 0:01.59 pid:19012     this = [object Object]
 0:01.59 pid:19012 4 _loadBlocklistFromFile(file = [xpconnect wrapped nsIFile @ 0x10e802760 (native @ 0x12a1b0a00)]) ["NightlyDebug.app/Contents/Resources/components/nsBlocklistService.js":802]
 0:01.59 pid:19012     this = [object Object]
 0:01.59 pid:19012 5 _loadBlocklist() ["NightlyDebug.app/Contents/Resources/components/nsBlocklistService.js":689]
 0:01.59 pid:19012     this = [object Object]
 0:01.59 pid:19012 6 getPluginBlocklistState(plugin = [xpconnect wrapped nsIPluginTag @ 0x10e802580 (native @ 0x1211fb7a0)], appVersion = "", toolkitVersion = "") ["NightlyDebug.app/Contents/Resources/components/nsBlocklistService.js":1094]
 0:01.59 pid:19012     this = [object Object]

`getFlashVersion` (frame 7) does this:

    let host = Cc["@mozilla.org/plugin/host;1"].getService(Ci.nsIPluginHost);

which getservice's the pluginhost service, which in its C++ internals gets blocklist states (until bug 1371888, anyway, which is probably why I didn't notice this issue before...), so that forces the blocklist to load at that point instead...


 0:01.59 pid:19012 7 getFlashVersion() ["resource://gre/modules/TelemetrySession.jsm":1575]
 0:01.59 pid:19012     this = [object Object]
 0:01.59 pid:19012 8 getMetadata(reason = "shutdown") ["resource://gre/modules/TelemetrySession.jsm":1037]
 0:01.59 pid:19012     this = [object Object]
 0:01.59 pid:19012 9 getSessionPayload(reason = "shutdown", clearSubsession = false) ["resource://gre/modules/TelemetrySession.jsm":1365]
 0:01.59 pid:19012     this = [object Object]
 0:01.59 pid:19012 10 saveShutdownPings() ["resource://gre/modules/TelemetrySession.jsm":1710]
 0:01.59 pid:19012     this = [object Object]
 0:01.59 pid:19012 11 shutdownChromeProcess/cleanup/<() ["resource://gre/modules/TelemetrySession.jsm":1947]


So what seems to be happening is that shutting down the chrome process triggers creating a telemetry session which starts up the plugin host service, which starts up the blocklist, which fires the new 'blocklist-loaded' observer topic, which triggers telemetry trying to talk to the plugin host again, which makes everything sad.

I expect this is part of why there's the _pendingTask re-entrancy check in TelemetryEnvironment.jsm's update checks. (but it can't be the only reason...)

Unfortunately that doesn't help us here because per discussion with chutten on IRC yesterday I'm not reusing that mechanism. We could revisit that decision, but really there are a lot of issues with re-entrancy with that code already and I'd like to not make the problem worse. So instead, I'm going to make the `blocklist-loaded` notification dispatch to mainthread, which will ensure that the current stack resolves first, which should avoid any re-entrancy for either the blocklist code or any of its dependents etc., be it telemetry or anything else.
(I've verified locally that this fixes the issue. Additionally, I expect I'll remove the workaround again in bug 1371888 where I remove all the direct calls from the plugin host into the blocklist service until after this notification fires anyway - but I'd like to be able to land this change separately if/when it has review (but after 60 splits off from trunk).)
Comment on attachment 8952440 [details]
Bug 1439519 - fetch plugin information lazily in TelemetryEnvironment.jsm,

https://reviewboard.mozilla.org/r/221662/#review231638

One nit.

::: toolkit/mozapps/extensions/nsBlocklistServiceContent.js:88
(Diff revision 4)
>    getAddonBlocklistState(aAddon, aAppVersion, aToolkitVersion) {
>      return Ci.nsIBlocklistService.STATE_BLOCKED;
>    },
>  
> +  get isLoaded() {
> +    // Lie for now.

Probably need a bug number in here for "Lie until we fix bug #####"
Attachment #8952440 - Flags: review?(chutten) → review+
Depends on: 1443870
(In reply to Chris H-C :chutten from comment #18)
> Comment on attachment 8952440 [details]
> Bug 1439519 - fetch plugin information lazily in TelemetryEnvironment.jsm,
> 
> https://reviewboard.mozilla.org/r/221662/#review231638
> 
> One nit.
> 
> ::: toolkit/mozapps/extensions/nsBlocklistServiceContent.js:88
> (Diff revision 4)
> >    getAddonBlocklistState(aAddon, aAppVersion, aToolkitVersion) {
> >      return Ci.nsIBlocklistService.STATE_BLOCKED;
> >    },
> >  
> > +  get isLoaded() {
> > +    // Lie for now.
> 
> Probably need a bug number in here for "Lie until we fix bug #####"

Filed bug 1443870 and am including that.
Per discussion with Florian, going to land this on nightly now, because I believe it's relatively low-risk. Hopefully it'll be possible to uplift bug 1371888 once that lands and take this stuff out of the startup path entirely.
Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/b4f223c8fe46
fetch plugin information lazily in TelemetryEnvironment.jsm, r=chutten,florian
https://hg.mozilla.org/mozilla-central/rev/b4f223c8fe46
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Performance Impact: --- → P1
Whiteboard: [measurement:client][qf:f61][qf:p1] [fxperf:p1] → [measurement:client] [fxperf:p1]
You need to log in before you can comment on or make changes to this bug.