Closed
Bug 1074135
Opened 10 years ago
Closed 10 years ago
Intermittent "FATAL ERROR: AsyncShutdown timeout in AddonManager: Waiting for providers to shut down. Conditions: [{"name":"Experiments.jsm shutdown"..."
Categories
(Toolkit :: Add-ons Manager, defect)
Tracking
()
RESOLVED
FIXED
mozilla35
Tracking | Status | |
---|---|---|
firefox33 | --- | fixed |
firefox34 | --- | fixed |
firefox35 | --- | fixed |
firefox-esr31 | --- | unaffected |
People
(Reporter: ttaubert, Assigned: Irving)
References
Details
(Keywords: intermittent-failure)
Attachments
(2 files)
16.26 KB,
patch
|
Unfocused
:
review+
lmandel
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
16.34 KB,
patch
|
mossop
:
review+
Sylvestre
:
approval-mozilla-release+
|
Details | Diff | Splinter Review |
https://tbpl.mozilla.org/php/getParsedLog.php?id=49096796&tree=Mozilla-Central
FATAL ERROR: AsyncShutdown timeout in AddonManager: Waiting for providers to shut down. Conditions: [{"name":"Experiments.jsm shutdown","state":{"isShutdown":true,"isEnabled":true,"isRefresh":false,"isDirty":false,"isFirstEvaluate":false,"hasLoadTask":false,"hasMainTask":true,"hasTimer":false,"hasAddonProvider":false,"latestLogs":["10: onUninstalled() - addon id: addon1@tests.mozilla.org","10: onUninstalled() - addon id: 4@personas.mozilla.org","10: updateManifest()","10: _run","10: _main iterat... [exceeded max length]
[Parent 1845] ###!!! ABORT: file resource://app/modules/experiments/Experiments.jsm, line 435
[Parent 1845] ###!!! ABORT: file resource://app/modules/experiments/Experiments.jsm, line 435
255 ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_sorting_plugins.js | application terminated with exit code 11
PROCESS-CRASH | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_sorting_plugins.js | application crashed [@ mozalloc_abort(char const*)]
Return code: 1
Reporter | ||
Comment 1•10 years ago
|
||
That's a failing m-e10s-bc run btw.
Assignee | ||
Comment 2•10 years ago
|
||
I suspect this is a similar problem to bug 1057312. The full AsyncShutdown failure status, with the JSON pretty-printed, is:
03:51:05 INFO - FATAL ERROR: AsyncShutdown timeout in AddonManager: Waiting for providers to shut down. Conditions:
[
{
"name": "Experiments.jsm shutdown",
"state": {
"isShutdown": true,
"isEnabled": true,
"isRefresh": false,
"isDirty": false,
"isFirstEvaluate": false,
"hasLoadTask": false,
"hasMainTask": true,
"hasTimer": false,
"hasAddonProvider": false,
"latestLogs": [
"10: onUninstalled() - addon id: addon1@tests.mozilla.org",
"10: onUninstalled() - addon id: 4@personas.mozilla.org",
"10: updateManifest()",
"10: _run",
"10: _main iteration",
"10: _loadManifest",
"10: httpGetRequest(http://127.0.0.1:8888/experiments-dummy/manifest)",
"60: httpGetRequest::onLoad() - Request to http://127.0.0.1:8888/experiments-dummy/manifest returned status 404",
"60: _loadManifest - failure to fetch/parse manifest (continuing anyway): Error: Experiments - XHR status for http://127.0.0.1:8888/experiments-dummy/manifest is 404",
"10: _evaluateExperiments",
"10: onUninstalled() - addon id: remote1@tests.mozilla.org",
"10: uninit: started",
"10: uninit: finished with _loadTask",
"10: uninit: no previous shutdown",
"10: Unregistering instance with Addon Manager.",
"10: Removing install listener from add-on manager.",
"10: Removing addon listener from add-on manager.",
"10: Finished unregistering with addon manager.",
"10: Unregistering previous experiment add-on provider.",
"10: uninit: waiting on _mainTask"
],
"experiments": {
},
"terminateReason": null
},
"filename": "resource://app/modules/experiments/Experiments.jsm",
"lineNumber": 435,
"stack": [
"resource://app/modules/experiments/Experiments.jsm:Experiments.Experiments.prototype.init:435",
"resource://app/modules/experiments/Experiments.jsm:Experiments.Experiments:416",
"resource://app/modules/experiments/Experiments.jsm:Experiments.instance:250",
"resource://app/components/ExperimentsService.js:ExperimentsService.prototype._delayedInit:65",
"resource://services-common/utils.js:notify:182",
"null:null:0"
]
}
]
At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources.
Blocks: 1057312
Comment 3•10 years ago
|
||
Hm, i wonder if we could catch _evaluateExperiments() in a bad state:
(In reply to :Irving Reid from comment #2)
> "latestLogs": [
[...]
> "10: _main iteration",
> "10: _loadManifest",
[...]
> "10: _evaluateExperiments",
> "10: onUninstalled() - addon id: remote1@tests.mozilla.org",
... we may get interrupted somewhere after this:
http://hg.mozilla.org/mozilla-central/annotate/5e704397529b/browser/experiments/Experiments.jsm#l1225
> "10: uninit: started",
> "10: uninit: finished with _loadTask",
> "10: uninit: no previous shutdown",
> "10: Unregistering instance with Addon Manager.",
> "10: Removing install listener from add-on manager.",
> "10: Removing addon listener from add-on manager.",
> "10: Finished unregistering with addon manager.",
> "10: Unregistering previous experiment add-on provider.",
> "10: uninit: waiting on _mainTask"
... and swap back to the _mainTask, yielding/waiting on some AddonManager result.
Comment 4•10 years ago
|
||
(In reply to Georg Fritzsche [:gfritzsche] from comment #3)
> Hm, i wonder if we could catch _evaluateExperiments() in a bad state:
>
> (In reply to :Irving Reid from comment #2)
> > "latestLogs": [
> [...]
> > "10: _main iteration",
> > "10: _loadManifest",
> [...]
> > "10: _evaluateExperiments",
> > "10: onUninstalled() - addon id: remote1@tests.mozilla.org",
Actually... that can't come from Experiments.jsm#l1225, otherwise we'd see logging for "_evaluateExperiments() - unknown add-ons [...].
Comment 5•10 years ago
|
||
Tweaking the summary to make this TBPL/Treeherder suggestible.
Summary: Intermittent AsyncShutdown timeout in AddonManager: Waiting for providers to shut down (Experiments.jsm) → Intermittent "FATAL ERROR: AsyncShutdown timeout in AddonManager: Waiting for providers to shut down. Conditions: [{"name":"Experiments.jsm shutdown"..."
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 7•10 years ago
|
||
This is a combination of dangerous code in the test harness, and weak exception handling in the AddonManager's asynchronous provider callback mechanism. Testing a patch for AddonManager now, then I'll look at the test harness code.
Assignee: nobody → irving
Status: NEW → ASSIGNED
Assignee | ||
Comment 8•10 years ago
|
||
gfritzsche spotted this in the TBPL log:
09:47 (gfritzsche) hm: ERROR Exception calling provider getAddonsByTypes: TypeError: this.timerLocations is null
09:50 (gfritzsche) ok, that looks like test-failure here: http://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/extensions/test/browser/head.js#1041
The mock provider's getAddonsByTypes callback is being called after the mock provider is shut down. This could be caused by some other provider calling AddonManager.getAddonsByTypes in its own shutdown, or by an async call to getAddonsByTypes running in the background when AddonManager.shutdownManager is called. Investigating...
Assignee | ||
Comment 9•10 years ago
|
||
Just thought of another possible failure path - if AddonManager is in the process of starting up providers and one of those providers calls one of the AddonManager iterate-through-all-providers methods, a provider could be called before it is initialized.
Experiments.jsm / _evaluateExperiments calls installedExerimentAddons() which calls AddonManager.getAddonsByTypes(), so that could be the trigger.
Note that calling getAddonsByTypes forces the AddonManager to load the XPIProviderUtils and the XPI database, so in the long term it would be nice to find a better solution for installedExperimentAddons.
Need to think a bit more about what the right solution is. My current ideas are:
1) Don't allow clients to call AddonManager APIs that call providers, until all the providers are initialized. This might require adding another notification to tell clients when that initialization is done (there might be an existing notification that's good enough).
2) Start up and shut down providers sequentially, and add/remove them on the active providers list as they are ready / stopped. That would make things like the current behaviour of the Experiments provider difficult, since it relies on the XPIProvider being available and having loaded the experiments; this would break if those providers got initialized in the wrong order.
Flags: needinfo?(bmcbride)
Assignee | ||
Comment 10•10 years ago
|
||
On further thought, we probably also need a way to either cancel or wait for completion of in-progress async methods that call providers, to make sure they don't race with shutdown.
Assignee | ||
Comment 11•10 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=4a3d4626750d
I'd really like to burn AsyncObjectCaller with fire, but it supports a mixed sync / async code path that we'd need to deprecate before it's worth striking any matches.
Attachment #8497024 -
Flags: review?(bmcbride)
Comment 12•10 years ago
|
||
(In reply to :Irving Reid from comment #11)
> I'd really like to burn AsyncObjectCaller with fire
Please.
(In reply to :Irving Reid from comment #9)
> 1) Don't allow clients to call AddonManager APIs that call providers, until
> all the providers are initialized. This might require adding another
> notification to tell clients when that initialization is done (there might
> be an existing notification that's good enough).
+1. This is the simplest and most deterministic.
Flags: needinfo?(bmcbride)
Updated•10 years ago
|
Attachment #8497024 -
Flags: review?(bmcbride) → review+
Assignee | ||
Comment 13•10 years ago
|
||
https://hg.mozilla.org/integration/fx-team/rev/5c156a876c7e
(In reply to Blair McBride [:Unfocused] from comment #12)
> (In reply to :Irving Reid from comment #11)
> > I'd really like to burn AsyncObjectCaller with fire
>
> Please.
>
> (In reply to :Irving Reid from comment #9)
> > 1) Don't allow clients to call AddonManager APIs that call providers, until
> > all the providers are initialized. This might require adding another
> > notification to tell clients when that initialization is done (there might
> > be an existing notification that's good enough).
>
> +1. This is the simplest and most deterministic.
Split off to bug 1075625.
Whiteboard: [fixed-in-fx-team]
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-fx-team]
Target Milestone: --- → mozilla35
Updated•10 years ago
|
status-firefox33:
--- → unaffected
status-firefox34:
--- → unaffected
status-firefox35:
--- → fixed
status-firefox-esr31:
--- → unaffected
Updated•10 years ago
|
Comment 15•10 years ago
|
||
Irving, do you think that we can uplift this to 34, given bug 1064470 et al?
Flags: needinfo?(irving)
Assignee | ||
Comment 16•10 years ago
|
||
crash-stats is showing a pretty sharp drop off in related shutdown crashes, and the AddonManager exception telemetry isn't showing any new exceptions indicating problems with these patches, so we should request uplift for this and bug 1059674 together.
Flags: needinfo?(irving)
Assignee | ||
Comment 17•10 years ago
|
||
Comment on attachment 8497024 [details] [diff] [review]
Part 1 - call back after exceptions when looping through async provider calls
Approval Request Comment
[Feature/regressing bug #]: 1057312, 1064470
[User impact if declined]: Shutdown hangs / crashes
[Describe test coverage new/current, TBPL]:
Has been on Nightly for a week. Crash-stats shows substantial reduction in AsyncShutdown crashes caused by Experiments.jsm, AddonManager exception telemetry shows no new forms of failure.
[Risks and why]: Low risk - at worst, would cause other forms of shutdown crash.
[String/UUID change made/needed]: None
NOTE: depends on uplift of bug 1059674 attachment 8496467 [details] [diff] [review].
Attachment #8497024 -
Flags: approval-mozilla-aurora?
Comment 18•10 years ago
|
||
Comment on attachment 8497024 [details] [diff] [review]
Part 1 - call back after exceptions when looping through async provider calls
This looks pretty low risk. Let's see if this has the same impact on Beta after uplift on Monday that's been seen on m-c. Aurora+
Attachment #8497024 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 19•10 years ago
|
||
Comment 20•10 years ago
|
||
Backed out per bug 1059674 comment 16.
https://hg.mozilla.org/releases/mozilla-aurora/rev/27c4a0956f76
Flags: needinfo?(irving)
Comment 21•10 years ago
|
||
Flags: needinfo?(irving)
Comment 22•10 years ago
|
||
David or Irving, could you fill the uplift requests to beta & release? Thanks
Flags: needinfo?(irving)
Flags: needinfo?(dteller)
I'd rather let Irving do that, as he knows that piece of code better than me.
Flags: needinfo?(dteller)
Comment 24•10 years ago
|
||
Irving, FYI, this is critical for 33.0.1.
Ryan (thanks btw) has launched a try job on Mozilla-release with this patch and bug 1059674.
It is busted:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=533d0b3beb49
Could you have a look to his asap? thanks
Apparently, that's a merge issue:
08:43:39 INFO - Diagnostic: SyntaxError: illegal character at resource://gre/modules/AddonManager.jsm:184
Assignee | ||
Comment 26•10 years ago
|
||
There were several ES6 features used in the JS code for these patches that aren't supported on 33. I'm testing a back-port right now.
Assignee | ||
Comment 27•10 years ago
|
||
Approval Request Comment
[Feature/regressing bug #]: 1057312
[User impact if declined]: Shutdown crashes
[Describe test coverage new/current, TBPL]: Unit tests, running on Nightly / Aurora / Beta for over a week
[Risks and why]: Exposes bug 1081702
[String/UUID change made/needed]: None
The trunk patch used ES6 features not available in the 33 branch (template strings).
Flags: needinfo?(irving)
Attachment #8509010 -
Flags: review?(dtownsend+bugmail)
Attachment #8509010 -
Flags: approval-mozilla-release?
Updated•10 years ago
|
Attachment #8509010 -
Flags: review?(dtownsend+bugmail) → review+
Comment 28•10 years ago
|
||
Comment on attachment 8509010 [details] [diff] [review]
Call back after provider exceptions, ported to Release branch
To the sheriff, please land that also in GECKO330_2014101104_RELBRANCH
thanks
Attachment #8509010 -
Flags: approval-mozilla-release? → approval-mozilla-release+
Assignee | ||
Comment 29•10 years ago
|
||
Please note that this patch depends on, and needs to land with, bug 1059674.
Updated•10 years ago
|
Comment 30•10 years ago
|
||
You need to log in
before you can comment on or make changes to this bug.
Description
•