Closed Bug 995027 Opened 6 years ago Closed 6 years ago

Intermittent Experiments.jsm Shutdown "ABORT: file resource://gre/modules/AsyncShutdown.jsm, line 431" [@ mozalloc_abort(char const*)]

Categories

(Firefox Health Report Graveyard :: Client: Desktop, defect)

x86
Linux
defect
Not set

Tracking

(firefox30 unaffected, firefox31 fixed, firefox32 fixed, firefox-esr24 unaffected)

VERIFIED FIXED
Firefox 32
Tracking Status
firefox30 --- unaffected
firefox31 --- fixed
firefox32 --- fixed
firefox-esr24 --- unaffected

People

(Reporter: RyanVM, Assigned: benjamin)

References

Details

(Keywords: crash, intermittent-failure, Whiteboard: p=3 s=it-32c-31a-30b.2 [qa-])

Attachments

(4 files)

https://tbpl.mozilla.org/php/getParsedLog.php?id=37597133&tree=B2g-Inbound

Ubuntu VM 12.04 b2g-inbound pgo test mochitest-2 on 2014-04-10 14:16:04 PDT for push 68c042c2b2e5
slave: tst-linux32-spot-738

14:27:17     INFO -  7529 INFO TEST-START | Shutdown
14:27:17     INFO -  7530 INFO Passed:  216012
14:27:17     INFO -  7531 INFO Failed:  0
14:27:17     INFO -  7532 INFO Todo:    24270
14:27:17     INFO -  7533 INFO Slowest: 110701ms - /tests/dom/events/test/test_wheel_default_action.html
14:27:17     INFO -  7534 INFO SimpleTest FINISHED
14:27:17     INFO -  7535 INFO TEST-INFO | Ran 1 Loops
14:27:17     INFO -  7536 INFO SimpleTest FINISHED
14:27:28     INFO -  WARNING: A phase completion condition is taking too long to complete. Condition: Experiments.jsm shutdown Phase: profile-before-change State: (none)
14:28:19     INFO -  ERROR: At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources. Conditions: [{"name":"Experiments.jsm shutdown","state":"(none)"}] Phase: profile-before-change
14:28:19     INFO -  [Parent 2397] ###!!! ABORT: file resource://gre/modules/AsyncShutdown.jsm, line 431
14:28:19     INFO -  [Parent 2397] ###!!! ABORT: file resource://gre/modules/AsyncShutdown.jsm, line 431
14:28:20     INFO -  TEST-INFO | Main app process: killed by SIGSEGV
14:28:20  WARNING -  TEST-UNEXPECTED-FAIL | Shutdown | application terminated with exit code 11

14:28:47  WARNING -  PROCESS-CRASH | Shutdown | application crashed [@ mozalloc_abort(char const*)]
14:28:47     INFO -  Crash dump filename: /tmp/tmpgO4r8u/minidumps/64261ca9-1332-9921-71d57392-084df75f.dmp
14:28:47     INFO -  Operating system: Linux
14:28:47     INFO -                    0.0.0 Linux 3.2.0-23-generic-pae #36-Ubuntu SMP Tue Apr 10 22:19:09 UTC 2012 i686
14:28:47     INFO -  CPU: x86
14:28:47     INFO -       GenuineIntel family 6 model 45 stepping 7
14:28:47     INFO -       1 CPU
14:28:47     INFO -  Crash reason:  SIGSEGV
14:28:47     INFO -  Crash address: 0x0
14:28:47     INFO -  Thread 0 (crashed)
14:28:47     INFO -   0  libmozalloc.so!mozalloc_abort(char const*) [mozalloc_abort.cpp:68c042c2b2e5 : 30 + 0x0]
14:28:47     INFO -      eip = 0xb773c34c   esp = 0xbfdca020   ebp = 0xbfdca038   ebx = 0xb773dd74
14:28:47     INFO -      esi = 0xb75d9d9c   edi = 0xbfdca06c   eax = 0x0000000a   ecx = 0xffffffff
14:28:47     INFO -      edx = 0xb75da8ac   efl = 0x00210282
14:28:47     INFO -      Found by: given as instruction pointer in context
14:28:47     INFO -   1  libxul.so!NS_DebugBreak [nsDebugImpl.cpp:68c042c2b2e5 : 421 + 0xd]
14:28:47     INFO -      eip = 0xb39f4ca6   esp = 0xbfdca040   ebp = 0xbfdca488   ebx = 0xb6de0b00
14:28:47     INFO -      esi = 0xbfdca078   edi = 0xbfdca06c
14:28:47     INFO -      Found by: call frame info
14:28:47     INFO -   2  libxul.so!nsDebugImpl::Abort(char const*, int) [nsDebugImpl.cpp:68c042c2b2e5 : 117 + 0x22]
14:28:47     INFO -      eip = 0xb39f4ef8   esp = 0xbfdca490   ebp = 0xbfdca4b8   ebx = 0xb6de0b00
14:28:47     INFO -      esi = 0xbfdca600   edi = 0xbfdca5f8
14:28:47     INFO -      Found by: call frame info
14:28:47     INFO -   3  libxul.so + 0x6407eb
14:28:47     INFO -      eip = 0xb3a377ec   esp = 0xbfdca4c0   ebp = 0xbfdca4e8   ebx = 0xb6de0b00
14:28:47     INFO -      esi = 0xbfdca600   edi = 0xbfdca5f8
14:28:47     INFO -      Found by: call frame info
14:28:47     INFO -   4  libxul.so!XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) [XPCWrappedNative.cpp:68c042c2b2e5 : 2405 + 0x15]
14:28:47     INFO -      eip = 0xb37b4b59   esp = 0xbfdca4f0   ebp = 0xbfdca6a8
14:28:47     INFO -      Found by: previous frame's frame pointer
14:28:47     INFO -   5  libxul.so!XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) [XPCWrappedNativeJSOps.cpp:68c042c2b2e5 : 1285 + 0xd]
14:28:47     INFO -      eip = 0xb37b7220   esp = 0xbfdca6b0   ebp = 0xbfdca798   ebx = 0xb6de0b00
14:28:47     INFO -      esi = 0xbfdca718   edi = 0x69e7fc6c
14:28:47     INFO -      Found by: call frame info
14:28:47     INFO -   6  libxul.so!js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) [jscntxtinlines.h:68c042c2b2e5 : 239 + 0x7]
14:28:47     INFO -      eip = 0xb39b8c36   esp = 0xbfdca7a0   ebp = 0xbfdcaa38   ebx = 0xb6de0b00
14:28:47     INFO -      esi = 0xae44c188   edi = 0xb7216cc0
14:28:47     INFO -      Found by: call frame info
Suddenly my desire to make init() and uninit() logic more robust seems justified.

This /might/ be a race between the test job running to completion and a timeout fetching the manifest (from a URL behind a firewall). I /think/ we'll currently block on waiting on the "fetch manifest" promise to complete. My patch in part 17 of bug 989137 will fix the manifest URL in testing environments to not point to a remote service, potentially making this crash go away. But that won't prevent the crash in the wild: the XHR request needs a timeout.

If waiting on the XHR request isn't the issue, I'm not sure where the deadlock is occurring. Will need more logging to be sure.
FHR was blocked on enabling in the wild because of a shutdown crash. Assuming the same logic holds here and adding this to the release blockers.
Blocks: telex
Duplicate of this bug: 996601
Looking at the logs, here's some non-conclusive data:

The first mention of experiments is:

16:05:10     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_CTP_plugins.js | Console message: 1397516710711	Browser.Experiments.Experiments	TRACE	Experiments #0::onDisabled() - addon id: {e2c52c1c-5ee1-cc23-15fa-35945fd58806}

16:11:33     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/xpinstall/browser_navigateaway2.js | Console message: 1397517093327	addons.repository	DEBUG	cacheAddons: enabled false IDs ["unsigned-xpi@tests.mozilla.org"]
16:11:33     INFO -  ************************************************************
16:11:33     INFO -  * Call to xpconnect wrapped JSObject produced this error:  *
16:11:33     INFO -  [Exception... "[JavaScript Error: "can't access dead object" {file: "resource://gre/components/amWebInstallListener.js" line: 178}]'[JavaScript Error: "can't access dead object" {file: "resource://gre/components/amWebInstallListener.js" line: 178}]' when calling method: [nsIDOMWindow::document]"  nsresult: "0x80570021 (NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS)"  location: "JS frame :: resource://gre/components/amWebInstallListener.js :: Installer_checkAllDownloaded :: line 178"  data: yes]
16:11:33     INFO -  ************************************************************


16:11:47     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/xpinstall/browser_whitelist7.js | Console message: [JavaScript Error: "installInfo.originatingURI is null" {file: "chrome://browser/content/browser.js" line: 243}]
(many of these in various tests)

and finally right before the crash (probably unrelated)
16:12:48     INFO -  ************************************************************
16:12:48     INFO -  * Call to xpconnect wrapped JSObject produced this error:  *
16:12:48     INFO -  [Exception... "AddonManager is not initialized"  nsresult: "0xc1f30001 (NS_ERROR_NOT_INITIALIZED)"  location: "JS frame :: resource://gre/modules/AddonManager.jsm :: AMI_backgroundUpdateCheck :: line 1043"  data: no]
16:12:48     INFO -  ************************************************************

At the crash itself:
16:12:51     INFO -  ERROR: At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources. Conditions: [{"name":"Experiments.jsm shutdown","state":"(none)"},{"name":"Experiments.jsm shutdown","state":"(none)"},{"name":"Experiments.jsm shutdown","state":"(none)"}] Phase: profile-before-change

This means that there are three Experiments instances running in the browser, which seems unfortunate in a b-c test (I'd only expect multiple instances in xpcshell unit tests).

In the log there are two "completed uninitialization" logs during browser_experiments.jsm but one is still outstanding (probably the main .instance()). We don't have any tracing for starting to call uninit(), only when it's finished.

It's a little odd that I only see logging for "Experiments #0" when I'd expect to see logging for #1 and #2 also. Maybe that counter system is broken somehow?

I'll take this for now to add some extra logging.
Assignee: nobody → benjamin
Status: NEW → ASSIGNED
Attachment #8407734 - Flags: review?(georg.fritzsche)
Comment on attachment 8407734 [details] [diff] [review]
bug995027-logging

Review of attachment 8407734 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/experiments/Experiments.jsm
@@ +318,5 @@
>  Experiments.Experiments = function (policy=new Experiments.Policy()) {
>    this._log = Log.repository.getLoggerWithMessagePrefix(
>      "Browser.Experiments.Experiments",
>      "Experiments #" + gExperimentsCounter++ + "::");
> +  this._log.trace("Experiments constructor");

Nit: This will show up as something like:

> [...] Experiments #0::Experiments constructor

Maybe trace("constructor") instead?
Attachment #8407734 - Flags: review?(georg.fritzsche) → review+
The failure in comment 21 apparently didn't have the additional tracing merged in yet.
We have no failures since then, so maybe this is fixed now by one of the bugs that landed since.
Yoric, can you help suggest next steps here for AsyncShutdown debugging?

Based on the log from comment 25, I would expect to see the Experiments.uninit function being called and logging from this codepath:

http://hg.mozilla.org/mozilla-central/annotate/9d3da41ad0b6/browser/experiments/Experiments.jsm#l400
calling http://hg.mozilla.org/mozilla-central/annotate/9d3da41ad0b6/browser/experiments/Experiments.jsm#l434

We actually call this five times on the same object for stupid reasons relating to re-initializing objects for mock testing, but I see all five promises being listed at 

But in the log there is no "final" call to uninit at shutdown time before this is logged:

08:06:15     INFO -  ERROR: At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources. Conditions: [{"name":"Experiments.jsm shutdown","state":"(none)"},{"name":"Experiments.jsm shutdown","state":"(none)"},{"name":"Experiments.jsm shutdown","state":"(none)"},{"name":"Experiments.jsm shutdown","state":"(none)"},{"name":"Experiments.jsm shutdown","state":"(none)"}] Phase: profile-before-change

what does "state": "(none)" mean in this context?

Oddly, if I look at a non-failing log, I don't see uninit logged at shutdown either, for example https://tbpl.mozilla.org/php/getParsedLog.php?id=38395628&tree=Mozilla-Central&full=1

I really don't know what to make of this. If uninit threw an exception when called by AsyncShutdown, would that be logged here?
Flags: needinfo?(dteller)
Flags: firefox-backlog+
Whiteboard: p=3
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #26)
> 08:06:15     INFO -  ERROR: At least one completion condition failed to
> complete within a reasonable amount of time. Causing a crash to ensure that
> we do not leave the user with an unresponsive process draining resources.
> Conditions: [{"name":"Experiments.jsm
> shutdown","state":"(none)"},{"name":"Experiments.jsm
> shutdown","state":"(none)"},{"name":"Experiments.jsm
> shutdown","state":"(none)"},{"name":"Experiments.jsm
> shutdown","state":"(none)"},{"name":"Experiments.jsm
> shutdown","state":"(none)"}] Phase: profile-before-change
> 
> what does "state": "(none)" mean in this context?

It means that the call to addBlocker did not provide the third (optional) argument that can be used to pass debugging information. This optional argument is expected to be a function that returns an object with state information, meant to be displayed in the above error message.

> Oddly, if I look at a non-failing log, I don't see uninit logged at shutdown
> either, for example
> https://tbpl.mozilla.org/php/getParsedLog.php?id=38395628&tree=Mozilla-
> Central&full=1
> 
> I really don't know what to make of this. If uninit threw an exception when
> called by AsyncShutdown, would that be logged here?

If uninit threw an exception, this exception should be dump()-ed with a message
"A completion condition encountered an error [...]"

I don't have a clear idea of what's going wrong. A few shots in the dark.

1/ I don't know if it's related but there are two instances of "A coding exception was thrown and uncaught in a Task." in the non-failing link above.

2/ If uninit() doesn't log anything, it might mean that uninit() is never called. If you throw from uninit(), this should be reported. This should help you find out whether uninit() is called.

3/ Task.async() is fairly new, so there is a chance that it has pitfalls that we don't know yet. Also, I haven't heard of anyone combining it with bind(). I would try using the regular Task.spawn() instead of Task.async(), just to make sure that we don't have a problem with |this| being captured at the wrong time.

Also, as a side-note, I personally find that
   () => this.uninit()
 is clearer and less likely to hide subtleties than
   this.uninit.bind(this)

I'll tell you if anything else comes to mind.
Flags: needinfo?(dteller)
I'm not seeing uninit in the logs because we're only logging to the console and the console log only shows up in the mochitest log while a test is running, not during shutdown. This logs experiments to the dump-console in mochitests.
Attachment #8413757 - Flags: review?(georg.fritzsche)
Attachment #8413757 - Flags: review?(georg.fritzsche) → review+
11:50:37     INFO -  *** End BrowserChrome Test Results ***
11:50:37     INFO -  1398797437154	addons.xpi	WARN	Add-on hotfix@tests.mozilla.org is missing bootstrap method shutdown
11:50:37     INFO -  1398797437688	Browser.Experiments.Experiments	TRACE	Experiments #0::uninit: started
11:50:37     INFO -  1398797437690	Browser.Experiments.Experiments	TRACE	Experiments #0::uninit: finished with _loadTask
11:50:37     INFO -  1398797437691	Browser.Experiments.Experiments	TRACE	Experiments #0::uninit: no previous shutdown
11:50:37     INFO -  1398797437691	Browser.Experiments.Experiments	TRACE	Experiments #0::Unregistering instance with Addon Manager.
11:50:37     INFO -  1398797437692	Browser.Experiments.Experiments	TRACE	Experiments #0::Unregistering previous experiment add-on provider.
11:50:37     INFO -  1398797437695	Browser.Experiments.Experiments	TRACE	Experiments #0::uninit: waiting on _mainTask
11:50:37     INFO -  1398797437696	Browser.Experiments.Experiments	TRACE	Experiments #0::uninit: started
11:50:37     INFO -  1398797437697	Browser.Experiments.Experiments	TRACE	Experiments #0::uninit: finished with _loadTask
11:50:37     INFO -  1398797437697	Browser.Experiments.Experiments	TRACE	Experiments #0::uninit: waiting on _mainTask
11:50:37     INFO -  1398797437699	Browser.Experiments.Experiments	TRACE	Experiments #0::uninit: started
11:50:37     INFO -  1398797437700	Browser.Experiments.Experiments	TRACE	Experiments #0::uninit: finished with _loadTask
11:50:37     INFO -  1398797437700	Browser.Experiments.Experiments	TRACE	Experiments #0::uninit: waiting on _mainTask
11:50:37     INFO -  1398797437702	Browser.Experiments.Experiments	TRACE	Experiments #0::uninit: started
11:50:37     INFO -  1398797437703	Browser.Experiments.Experiments	TRACE	Experiments #0::uninit: finished with _loadTask
11:50:37     INFO -  1398797437703	Browser.Experiments.Experiments	TRACE	Experiments #0::uninit: waiting on _mainTask
11:50:37     INFO -  1398797437705	Browser.Experiments.Experiments	TRACE	Experiments #0::uninit: started
11:50:37     INFO -  1398797437705	Browser.Experiments.Experiments	TRACE	Experiments #0::uninit: finished with _loadTask
11:50:37     INFO -  1398797437706	Browser.Experiments.Experiments	TRACE	Experiments #0::uninit: waiting on _mainTask
11:50:47     INFO -  WARNING: A phase completion condition is taking too long to complete. Condition: Experiments.jsm shutdown Phase: profile-before-change State: (none)
11:50:47     INFO -  WARNING: A phase completion condition is taking too long to complete. Condition: Experiments.jsm shutdown Phase: profile-before-change State: (none)
11:50:47     INFO -  WARNING: A phase completion condition is taking too long to complete. Condition: Experiments.jsm shutdown Phase: profile-before-change State: (none)
11:50:47     INFO -  WARNING: A phase completion condition is taking too long to complete. Condition: Experiments.jsm shutdown Phase: profile-before-change State: (none)
11:50:47     INFO -  WARNING: A phase completion condition is taking too long to complete. Condition: Experiments.jsm shutdown Phase: profile-before-change State: (none)
11:50:57     INFO -  ************************************************************
11:50:57     INFO -  * Call to xpconnect wrapped JSObject produced this error:  *
11:50:57     INFO -  [Exception... "AddonManager is not initialized"  nsresult: "0xc1f30001 (NS_ERROR_NOT_INITIALIZED)"  location: "JS frame :: resource://gre/modules/AddonManager.jsm :: AMI_backgroundUpdateCheck :: line 1134"  data: no]
11:50:57     INFO -  ************************************************************
11:51:38     INFO -  ERROR: At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources. Conditions: [{"name":"Experiments.jsm shutdown","state":"(none)"},{"name":"Experiments.jsm shutdown","state":"(none)"},{"name":"Experiments.jsm shutdown","state":"(none)"},{"name":"Experiments.jsm shutdown","state":"(none)"},{"name":"Experiments.jsm shutdown","state":"(none)"}] Phase: profile-before-change
11:51:38     INFO -  [Parent 3430] ###!!! ABORT: file resource://gre/modules/AsyncShutdown.jsm, line 431

So "waiting on _mainTask" means that there was a task running. Here's the last available output from the main task:

11:48:57     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings_info.js | Console message: 1398797337553	Browser.Experiments.Experiments	ERROR	Experiments #0::_loadManifest - failure to fetch/parse manifest (continuing anyway): Error: Experiments - XHR status for http://127.0.0.1:8888/experiments-dummy/manifest is 404
11:48:57     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings_info.js | Console message: 1398797337554	Browser.Experiments.Experiments	TRACE	Experiments #0::_evaluateExperiments

Asked about debugging opportunities for Tasks here: https://ask.mozilla.org/question/577/do-the-chrome-debuggers-support-debugging-tasks/
I narrowed this down using a try push with ridiculous logging:

https://tbpl.mozilla.org/php/getParsedLog.php?id=38824717&tree=Try&full=1

The last log from the last iteration of _evaluateExperiments stops at this line: http://hg.mozilla.org/mozilla-central/annotate/b1d730203bb6/browser/experiments/Experiments.jsm#l1051

Which ends up calling AddonManager.getAddonsByTypes but never receiving a response. I don't see anything in the log about an exception, though. Irving or Blair, do you have any recommended next steps or logging flags I can use? I'm not sure what the behavior of AsyncObjectCaller and safeCall is if there are exceptions thrown.
Flags: needinfo?(irving)
Flags: needinfo?(bmcbride)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #44)
> I'm not sure what the behavior of AsyncObjectCaller and safeCall is if there
> are exceptions thrown.

It should always log the exception - safeCall() and callProvider() are doing the actual calling, and their main purpose is to safely handle exceptions and ensure they're logged.

> do you have any recommended next steps or logging flags I can use?

If you can reproduce locally, the xpinstall tests are just browser-chrome tests - run them with the devtools debugger attached (--jsdebugger argument). I suspect you're not getting an exception loged because there isn't one getting thrown - a callback just isn't getting called (could be from the PreviousExperimentsProvider), so everything is just hanging there waiting for it.
Flags: needinfo?(bmcbride)
These log messages are interesting:

12:09:48     INFO -  WARNING: A phase completion condition is taking too long to complete. Condition: Experiments.jsm shutdown Phase: profile-before-change State: (none)
...
12:10:10     INFO -  ************************************************************
12:10:10     INFO -  * Call to xpconnect wrapped JSObject produced this error:  *
12:10:10     INFO -  [Exception... "AddonManager is not initialized"  nsresult: "0xc1f30001 (NS_ERROR_NOT_INITIALIZED)"  location: "JS frame :: resource://gre/modules/AddonManager.jsm :: AMI_backgroundUpdateCheck :: line 1134"  data: no]
12:10:10     INFO -  ************************************************************
12:10:39     INFO -  ERROR: At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources. Conditions: [{"name":"Experiments.jsm shutdown","state":"(none)"},{"name":"Experiments.jsm shutdown","state":"(none)"},{"name":"Experiments.jsm shutdown","state":"(none)"},{"name":"Experiments.jsm shutdown","state":"(none)"},{"name":"Experiments.jsm shutdown","state":"(none)"}] Phase: profile-before-change


It looks like there are two things going on, and I'm not sure whether they're related or not:

1) A background update check is running after the addon manager is shut down (or before it is started up, but given the test context I'll go with after shutdown). This could be caused by an async step left over from a previous test, if your test isn't invoking it.

2) The Experiments.jsm shutdown handler is blocking on something, but the log doesn't make it clear what.


First, AddonManager already does an async shut down on all registered providers, see http://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/extensions/AddonManager.jsm?from=AddonManager.jsm#945; as long as your provider has a "shutdown" method that is either synchronous, or returns a Promise, AddonManager will shut your provider down before it shuts itself down, so your provider doesn't need to register its own AsyncShutdown hook. The problem might be caused by the separate async shutdown for Experiments racing with the one for AddonManager, so I'd recommend letting AddonManager shut down your provider by removing your AsyncShutdown hook and renaming 'uninit' to 'shutdown'.
Flags: needinfo?(irving)
(In reply to :Irving Reid from comment #46)
> First, AddonManager already does an async shut down on all registered
> providers, see
> http://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/extensions/
> AddonManager.jsm?from=AddonManager.jsm#945; as long as your provider has a
> "shutdown" method that is either synchronous, or returns a Promise,
> AddonManager will shut your provider down before it shuts itself down, so
> your provider doesn't need to register its own AsyncShutdown hook. The
> problem might be caused by the separate async shutdown for Experiments
> racing with the one for AddonManager, so I'd recommend letting AddonManager
> shut down your provider by removing your AsyncShutdown hook and renaming
> 'uninit' to 'shutdown'.

It looks very much like the problem I am attempting to fix with bug 985655, also encountered in bug 917883: in general, dependencies are between services, rather than between shutdown phases, so we need a way to express these dependencies explicitly.

Bug 985655 provides a primitive that can be used by AddonManager to let clients register to be informed when AddonManager itself is about to shutdown and they need to complete their own async shutdown.

(it might also be that AddonManager's internal design solves the issue in a different manner with the hooks)
I've been pushing try patches with progressively more logging and got this caught with more details. I don't think this is directly related to the shutdown sequence, we're hanging the main experiments task by failing to resolve a promise. Here's the relevant log from the last call to _evaluateExperiments:

https://tbpl.mozilla.org/php/getParsedLog.php?id=38873976&tree=Try&full=1

07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371285	Browser.Experiments.Experiments	TRACE	Experiments #0::_evaluateExperiments
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371286	Browser.Experiments.Experiments	TRACE	Experiments #0::_evaluateExperiments not shutdown
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371287	addons.manager	WARN	getAddonsByTypes provider list: [object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371287	addons.manager	WARN	getAddonsByTypes calling [object Object]
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371288	DeferredSave.extensions.json	DEBUG	Starting timer
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371290	addons.manager	WARN	getAddonsByTypes got response from [object Object]: [object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371291	addons.manager	WARN	getAddonsByTypes calling [object Object]
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371292	addons.manager	WARN	getAddonsByTypes got response from [object Object]:
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371292	addons.manager	WARN	getAddonsByTypes calling [object Object]
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371294	addons.manager	WARN	getAddonsByTypes got response from [object Object]: [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371295	addons.manager	WARN	getAddonsByTypes calling [object Object]
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371296	addons.manager	WARN	getAddonsByTypes got response from [object Object]:
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371296	addons.manager	WARN	getAddonsByTypes calling [object Object]
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371297	Browser.Experiments	TRACE	PreviousExperimentProvider.getAddonsByTypes null
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371298	Browser.Experiments	TRACE	PreviousExperimentProvider._getPreviousExperiments
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371298	Browser.Experiments.Experiments	TRACE	Experiments #0::getExperiments
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371299	Browser.Experiments.Experiments	TRACE	Experiments #0::getExperiments finished _loadTask
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371301	addons.manager	WARN	getAddonsByTypes got response from [object Object]:
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371301	addons.manager	WARN	getAddonsByTypes calling [object Object]
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371302	addons.manager	WARN	getAddonsByTypes got response from [object Object]:
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371303	addons.manager	WARN	getAddonsByTypes calling [object Object]
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371303	addons.manager	WARN	getAddonsByTypes got response from [object Object]:
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371304	addons.manager	WARN	getAddonsByTypes calling [object Object]
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371305	addons.manager	WARN	getAddonsByTypes got response from [object Object]:
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371305	addons.manager	WARN	getAddonsByTypes calling [object Object]
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371306	Browser.Experiments	TRACE	PreviousExperimentProvider.getAddonsByTypes experiment
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371307	Browser.Experiments	TRACE	PreviousExperimentProvider._getPreviousExperiments
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371307	Browser.Experiments.Experiments	TRACE	Experiments #0::getExperiments
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371308	Browser.Experiments.Experiments	TRACE	Experiments #0::getExperiments finished _loadTask
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371310	Browser.Experiments	TRACE	PreviousExperimentProvider.getAddonsByTypes then ok
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371310	addons.manager	WARN	getAddonsByTypes got response from [object Object]:
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371311	addons.manager	WARN	getAddonsByTypes calling [object Object]
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371313	Browser.Experiments	TRACE	PreviousExperimentProvider.getAddonsByTypes then ok
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371314	addons.manager	WARN	getAddonsByTypes got response from [object Object]:
07:59:31     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/test-window/browser_inlinesettings.js | Console message: 1398956371314	addons.manager	WARN	getAddonsByTypes calling [object Object]

So the promise stack is:

_evaluateExperiments @ https://hg.mozilla.org/try/file/f73e59995f49/browser/experiments/Experiments.jsm#l1055
installedExperimentsAddons @ https://hg.mozilla.org/try/file/f73e59995f49/browser/experiments/Experiments.jsm#l209
AddonManager.getAddonsByTypes

That appears to end up calling PreviousExperimentProvider.getAddonsByTypes @ https://hg.mozilla.org/try/file/f73e59995f49/browser/experiments/Experiments.jsm#l2075
*twice*: once with a null "types" argument and once with ["experiment"], perhaps nested? But both of those eventually appear to finish correctly.

And we're stuck with the final log output being: getAddonsByTypes calling [object Object] @ https://hg.mozilla.org/try/file/f73e59995f49/toolkit/mozapps/extensions/AddonManager.jsm#l2058

So I think my next question is figuring out how to dump the addon provider a little better so that I know which one is the one that's failing.

To answer Blair's question, I haven't been able to reproduce locally or catch in a debugger effectively.
Benjamin, I don't think this is going to help you in your current debugging session, but in case you need it, the state argument of addBlocker was designed specifically to be able to upload (or log) detailed information on what has blocked when AsyncShutdown decides to crash.
Attachment #8417426 - Flags: review?(irving)
Comment on attachment 8417426 [details] [diff] [review]
bug995027-telex-asyncshutdown

Review of attachment 8417426 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/mozapps/extensions/test/browser/head.js
@@ +799,1 @@
>        timer.cancel();

I'm not sure I like the idea of firing the callback here; tests that run off the end without reliably waiting for a timer callback really are broken, and I'd much rather fix them by making the wait reliable rather than faking the timeout. In the mean time, tests that deliberately ignore a timer (for example, a "stop me if I've hung" timer to fail the test) might start getting unexpected callbacks.

Existing code, and it's not so important if we're not firing the callback, but we should iterate in a way that's safe if a timer's callback (or cancel) alters this.callbackTimers. In other places I've used (the equivalent of)

while (this.callbackTimers.length > 0) {
  let timer = this.callbackTimers.shift();
  ...
}
Attachment #8417426 - Flags: review?(irving) → review-
Comment on attachment 8417426 [details] [diff] [review]
bug995027-telex-asyncshutdown

From IRC: the problem I have is that this code is running not as part of a single test: it's entering during one test via AM.getAddonsByTypes and depending on I/O, that chain of callbacks may not finish until the current test is finished. This has a a green try run with BC3 retriggered 10 times:

https://tbpl.mozilla.org/?tree=Try&rev=fd3f5f0999f1

Otherwise, mocks in integration tests are just pure pain :-(
Attachment #8417426 - Flags: review- → review?(irving)
Comment on attachment 8417426 [details] [diff] [review]
bug995027-telex-asyncshutdown

Review of attachment 8417426 [details] [diff] [review]:
-----------------------------------------------------------------

OK, I think I understand what's happening enough now to see why this fixes it...

I'd like a follow up bug to change your shutdown mechanism to use the AddonManager's shutdown() callback, rather than using AsyncShutdown directly (comment 46). Experiments.jsm also throws away a lot of exceptions; I'd like to see more _log.warn() calls in your catch clauses.
Attachment #8417426 - Flags: review?(irving) → review+
https://hg.mozilla.org/integration/fx-team/rev/0fa36ab98588

Filed followup as bug 1006771.
Keywords: leave-open
Target Milestone: --- → Firefox 32
https://hg.mozilla.org/mozilla-central/rev/0fa36ab98588
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Almost as soon as this landed, a mochitest I was running while working on another patch tripped over the cleanup code and it failed. Patch arriving shortly...
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Turns out Benjamin's patch was using the wrong call to trigger the waiting timer; in the course of debugging this I added tracking to see where the waiting timers came from, which led to the actual fix of the underlying bug:

The test harness cleanup code held an assumption that AddonManager.getAllInstalls() was always synchronous, despite taking a callback - this assumption is true for the built in extension providers. However, the MockProvider defined in the same file breaks that assumption...
Attachment #8419464 - Flags: review?(benjamin)
Attachment #8419464 - Flags: review?(benjamin) → review+
https://hg.mozilla.org/mozilla-central/rev/7cebaf7bff2f
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Whiteboard: p=3 [fixed-in-fx-team] → p=3
I had to backout Irving's follow-up fix from Aurora for mochitest-bc timeouts.
https://hg.mozilla.org/releases/mozilla-aurora/rev/59e467fdbcb8

https://tbpl.mozilla.org/php/getParsedLog.php?id=39492625&tree=Mozilla-Aurora

08:53:59     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_about.js | Console message: 1399910038954	addons.manager	ERROR	Exception calling provider getInstallsByTypes
08:53:59     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_about.js | Console message: 1399910038984	addons.manager	ERROR	Exception calling provider getAddonsByTypes

Seems like these should be fatal rather than causing the test to opaquely timeout, but OK.
There is some diagnostic code in the patch that depends on bug 966674. We could either uplift that (which would make some people happy, because it substantially improves error logs for the add-on manager and add-on bootstrapping), or I can strip that diagnostic code out of this patch for the Aurora uplift.
Depends on: 966674
Whiteboard: p=3 → p=3 s=it-32c-31a-30b.2 [qa?]
Whiteboard: p=3 s=it-32c-31a-30b.2 [qa?] → p=3 s=it-32c-31a-30b.2 [qa-]
Status: RESOLVED → VERIFIED
Kwierso, comment 74 was mis-tagged and deserves a new bug report:

14:34:09     INFO -  WARNING: At least one completion condition is taking too long to complete. Conditions: Error getting state: TypeError: state is not a function at safeGetState@resource://gre/modules/AsyncShutdown.jsm:118:5
14:34:09     INFO -  Barrier.prototype<._wait/<@resource://gre/modules/AsyncShutdown.jsm:634:1
14:34:09     INFO -  Spinner.prototype.observe@resource://gre/modules/AsyncShutdown.jsm:344:7
14:34:09     INFO -   Barrier: profile-before-change
14:35:00     INFO -  ERROR: At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources. Conditions: [{"name":"Experiments.jsm shutdown","state":"(none)"},{"name":"Experiments.jsm shutdown","state":"(none)"},{"name":"Experiments.jsm shutdown","state":"(none)"},{"name":"Experiments.jsm shutdown","state":"(none)"},{"name":"Experiments.jsm shutdown","state":"(none)"}] Barrier: profile-before-change
14:35:00     INFO -  [Parent 2416] ###!!! ABORT: file resource://gre/modules/AsyncShutdown.jsm, line 691

At least part of the log looks like a bug in the new AsyncShutdown/Barrier impl.
Flags: needinfo?(kwierso)
Split that off to bug 1012924
Flags: needinfo?(kwierso)
Tweaking summary to avoid mis-stars.
Summary: Intermittent Shutdown | application crashed [@ mozalloc_abort(char const*)] in Experiments.jsm after "ABORT: file resource://gre/modules/AsyncShutdown.jsm, line 431" → Intermittent Experiments.jsm Shutdown "ABORT: file resource://gre/modules/AsyncShutdown.jsm, line 431" [@ mozalloc_abort(char const*)]
Duplicate of this bug: 1060400
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in before you can comment on or make changes to this bug.