Intermittent "FATAL ERROR: AsyncShutdown timeout in AddonManager: Waiting for providers to shut down. Conditions: [{"name":"Experiments.jsm shutdown"..."

RESOLVED FIXED in Firefox 33

Status

()

defect
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: ttaubert, Assigned: Irving)

Tracking

({intermittent-failure})

Trunk
mozilla35
x86
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox33 fixed, firefox34 fixed, firefox35 fixed, firefox-esr31 unaffected)

Details

Attachments

(2 attachments)

https://tbpl.mozilla.org/php/getParsedLog.php?id=49096796&tree=Mozilla-Central

FATAL ERROR: AsyncShutdown timeout in AddonManager: Waiting for providers to shut down. Conditions: [{"name":"Experiments.jsm shutdown","state":{"isShutdown":true,"isEnabled":true,"isRefresh":false,"isDirty":false,"isFirstEvaluate":false,"hasLoadTask":false,"hasMainTask":true,"hasTimer":false,"hasAddonProvider":false,"latestLogs":["10: onUninstalled() - addon id: addon1@tests.mozilla.org","10: onUninstalled() - addon id: 4@personas.mozilla.org","10: updateManifest()","10: _run","10: _main iterat... [exceeded max length]
[Parent 1845] ###!!! ABORT: file resource://app/modules/experiments/Experiments.jsm, line 435
[Parent 1845] ###!!! ABORT: file resource://app/modules/experiments/Experiments.jsm, line 435
255 ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_sorting_plugins.js | application terminated with exit code 11
PROCESS-CRASH | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_sorting_plugins.js | application crashed [@ mozalloc_abort(char const*)]
Return code: 1
That's a failing m-e10s-bc run btw.
I suspect this is a similar problem to bug 1057312. The full AsyncShutdown failure status, with the JSON pretty-printed, is:

03:51:05     INFO -  FATAL ERROR: AsyncShutdown timeout in AddonManager: Waiting for providers to shut down. Conditions: 
[
  {
    "name": "Experiments.jsm shutdown",
    "state": {
      "isShutdown": true,
      "isEnabled": true,
      "isRefresh": false,
      "isDirty": false,
      "isFirstEvaluate": false,
      "hasLoadTask": false,
      "hasMainTask": true,
      "hasTimer": false,
      "hasAddonProvider": false,
      "latestLogs": [
        "10: onUninstalled() - addon id: addon1@tests.mozilla.org",
        "10: onUninstalled() - addon id: 4@personas.mozilla.org",
        "10: updateManifest()",
        "10: _run",
        "10: _main iteration",
        "10: _loadManifest",
        "10: httpGetRequest(http://127.0.0.1:8888/experiments-dummy/manifest)",
        "60: httpGetRequest::onLoad() - Request to http://127.0.0.1:8888/experiments-dummy/manifest returned status 404",
        "60: _loadManifest - failure to fetch/parse manifest (continuing anyway): Error: Experiments - XHR status for http://127.0.0.1:8888/experiments-dummy/manifest is 404",
        "10: _evaluateExperiments",
        "10: onUninstalled() - addon id: remote1@tests.mozilla.org",
        "10: uninit: started",
        "10: uninit: finished with _loadTask",
        "10: uninit: no previous shutdown",
        "10: Unregistering instance with Addon Manager.",
        "10: Removing install listener from add-on manager.",
        "10: Removing addon listener from add-on manager.",
        "10: Finished unregistering with addon manager.",
        "10: Unregistering previous experiment add-on provider.",
        "10: uninit: waiting on _mainTask"
      ],
      "experiments": {
        
      },
      "terminateReason": null
    },
    "filename": "resource://app/modules/experiments/Experiments.jsm",
    "lineNumber": 435,
    "stack": [
      "resource://app/modules/experiments/Experiments.jsm:Experiments.Experiments.prototype.init:435",
      "resource://app/modules/experiments/Experiments.jsm:Experiments.Experiments:416",
      "resource://app/modules/experiments/Experiments.jsm:Experiments.instance:250",
      "resource://app/components/ExperimentsService.js:ExperimentsService.prototype._delayedInit:65",
      "resource://services-common/utils.js:notify:182",
      "null:null:0"
    ]
  }
]
 At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources.
Blocks: 1057312
Hm, i wonder if we could catch _evaluateExperiments() in a bad state:

(In reply to :Irving Reid from comment #2)
>       "latestLogs": [
[...]
>         "10: _main iteration",
>         "10: _loadManifest",
[...]
>         "10: _evaluateExperiments",
>         "10: onUninstalled() - addon id: remote1@tests.mozilla.org",

... we may get interrupted somewhere after this:
http://hg.mozilla.org/mozilla-central/annotate/5e704397529b/browser/experiments/Experiments.jsm#l1225

>         "10: uninit: started",
>         "10: uninit: finished with _loadTask",
>         "10: uninit: no previous shutdown",
>         "10: Unregistering instance with Addon Manager.",
>         "10: Removing install listener from add-on manager.",
>         "10: Removing addon listener from add-on manager.",
>         "10: Finished unregistering with addon manager.",
>         "10: Unregistering previous experiment add-on provider.",
>         "10: uninit: waiting on _mainTask"

... and swap back to the _mainTask, yielding/waiting on some AddonManager result.
(In reply to Georg Fritzsche [:gfritzsche] from comment #3)
> Hm, i wonder if we could catch _evaluateExperiments() in a bad state:
> 
> (In reply to :Irving Reid from comment #2)
> >       "latestLogs": [
> [...]
> >         "10: _main iteration",
> >         "10: _loadManifest",
> [...]
> >         "10: _evaluateExperiments",
> >         "10: onUninstalled() - addon id: remote1@tests.mozilla.org",

Actually... that can't come from Experiments.jsm#l1225, otherwise we'd see logging for "_evaluateExperiments() - unknown add-ons [...].
Tweaking the summary to make this TBPL/Treeherder suggestible.
Summary: Intermittent AsyncShutdown timeout in AddonManager: Waiting for providers to shut down (Experiments.jsm) → Intermittent "FATAL ERROR: AsyncShutdown timeout in AddonManager: Waiting for providers to shut down. Conditions: [{"name":"Experiments.jsm shutdown"..."
Comment hidden (Legacy TBPL/Treeherder Robot)
This is a combination of dangerous code in the test harness, and weak exception handling in the AddonManager's asynchronous provider callback mechanism. Testing a patch for AddonManager now, then I'll look at the test harness code.
Assignee: nobody → irving
Status: NEW → ASSIGNED
gfritzsche spotted this in the TBPL log:

09:47 (gfritzsche) hm: ERROR	Exception calling provider getAddonsByTypes: TypeError: this.timerLocations is null
09:50 (gfritzsche) ok, that looks like test-failure here: http://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/extensions/test/browser/head.js#1041

The mock provider's getAddonsByTypes callback is being called after the mock provider is shut down. This could be caused by some other provider calling AddonManager.getAddonsByTypes in its own shutdown, or by an async call to getAddonsByTypes running in the background when AddonManager.shutdownManager is called. Investigating...
Just thought of another possible failure path - if AddonManager is in the process of starting up providers and one of those providers calls one of the AddonManager iterate-through-all-providers methods, a provider could be called before it is initialized.

Experiments.jsm / _evaluateExperiments calls installedExerimentAddons() which calls AddonManager.getAddonsByTypes(), so that could be the trigger.

Note that calling getAddonsByTypes forces the AddonManager to load the XPIProviderUtils and the XPI database, so in the long term it would be nice to find a better solution for installedExperimentAddons.

Need to think a bit more about what the right solution is. My current ideas are:
1) Don't allow clients to call AddonManager APIs that call providers, until all the providers are initialized. This might require adding another notification to tell clients when that initialization is done (there might be an existing notification that's good enough).

2) Start up and shut down providers sequentially, and add/remove them on the active providers list as they are ready / stopped. That would make things like the current behaviour of the Experiments provider difficult, since it relies on the XPIProvider being available and having loaded the experiments; this would break if those providers got initialized in the wrong order.
Flags: needinfo?(bmcbride)
On further thought, we probably also need a way to either cancel or wait for completion of in-progress async methods that call providers, to make sure they don't race with shutdown.
https://tbpl.mozilla.org/?tree=Try&rev=4a3d4626750d

I'd really like to burn AsyncObjectCaller with fire, but it supports a mixed sync / async code path that we'd need to deprecate before it's worth striking any matches.
Attachment #8497024 - Flags: review?(bmcbride)
(In reply to :Irving Reid from comment #11)
> I'd really like to burn AsyncObjectCaller with fire

Please.

(In reply to :Irving Reid from comment #9)
> 1) Don't allow clients to call AddonManager APIs that call providers, until
> all the providers are initialized. This might require adding another
> notification to tell clients when that initialization is done (there might
> be an existing notification that's good enough).

+1. This is the simplest and most deterministic.
Flags: needinfo?(bmcbride)
Attachment #8497024 - Flags: review?(bmcbride) → review+
https://hg.mozilla.org/integration/fx-team/rev/5c156a876c7e

(In reply to Blair McBride [:Unfocused] from comment #12)

> (In reply to :Irving Reid from comment #11)
> > I'd really like to burn AsyncObjectCaller with fire
> 
> Please.
> 
> (In reply to :Irving Reid from comment #9)
> > 1) Don't allow clients to call AddonManager APIs that call providers, until
> > all the providers are initialized. This might require adding another
> > notification to tell clients when that initialization is done (there might
> > be an existing notification that's good enough).
> 
> +1. This is the simplest and most deterministic.

Split off to bug 1075625.
Whiteboard: [fixed-in-fx-team]
https://hg.mozilla.org/mozilla-central/rev/5c156a876c7e
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-fx-team]
Target Milestone: --- → mozilla35
Irving, do you think that we can uplift this to 34, given bug 1064470 et al?
Flags: needinfo?(irving)
crash-stats is showing a pretty sharp drop off in related shutdown crashes, and the AddonManager exception telemetry isn't showing any new exceptions indicating problems with these patches, so we should request uplift for this and bug 1059674 together.
Flags: needinfo?(irving)
Comment on attachment 8497024 [details] [diff] [review]
Part 1 - call back after exceptions when looping through async provider calls

Approval Request Comment
[Feature/regressing bug #]: 1057312, 1064470
[User impact if declined]: Shutdown hangs / crashes
[Describe test coverage new/current, TBPL]:
Has been on Nightly for a week. Crash-stats shows substantial reduction in AsyncShutdown crashes caused by Experiments.jsm, AddonManager exception telemetry shows no new forms of failure.
[Risks and why]: Low risk - at worst, would cause other forms of shutdown crash.
[String/UUID change made/needed]: None

NOTE: depends on uplift of bug 1059674 attachment 8496467 [details] [diff] [review].
Attachment #8497024 - Flags: approval-mozilla-aurora?
Comment on attachment 8497024 [details] [diff] [review]
Part 1 - call back after exceptions when looping through async provider calls

This looks pretty low risk. Let's see if this has the same impact on Beta after uplift on Monday that's been seen on m-c. Aurora+
Attachment #8497024 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
David or Irving, could you fill the uplift requests to beta & release? Thanks
Flags: needinfo?(irving)
Flags: needinfo?(dteller)
I'd rather let Irving do that, as he knows that piece of code better than me.
Flags: needinfo?(dteller)
Irving, FYI, this is critical for 33.0.1.
Ryan (thanks btw) has launched a try job on Mozilla-release with this patch and bug 1059674.
It is busted:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=533d0b3beb49
Could you have a look to his asap? thanks
Apparently, that's a merge issue:
08:43:39 INFO - Diagnostic: SyntaxError: illegal character at resource://gre/modules/AddonManager.jsm:184
There were several ES6 features used in the JS code for these patches that aren't supported on 33. I'm testing a back-port right now.
Approval Request Comment
[Feature/regressing bug #]: 1057312
[User impact if declined]: Shutdown crashes
[Describe test coverage new/current, TBPL]: Unit tests, running on Nightly / Aurora / Beta for over a week
[Risks and why]: Exposes bug 1081702
[String/UUID change made/needed]: None

The trunk patch used ES6 features not available in the 33 branch (template strings).
Flags: needinfo?(irving)
Attachment #8509010 - Flags: review?(dtownsend+bugmail)
Attachment #8509010 - Flags: approval-mozilla-release?
Attachment #8509010 - Flags: review?(dtownsend+bugmail) → review+
Comment on attachment 8509010 [details] [diff] [review]
Call back after provider exceptions, ported to Release branch

To the sheriff, please land that also in GECKO330_2014101104_RELBRANCH
thanks
Attachment #8509010 - Flags: approval-mozilla-release? → approval-mozilla-release+
Please note that this patch depends on, and needs to land with, bug 1059674.
You need to log in before you can comment on or make changes to this bug.