Open Bug 1513855 Opened 9 months ago Updated 2 days ago

Crash in AsyncShutdownTimeout | AddonManager: Waiting for providers to shut down. | EnvironmentAddonBuilder,XPIProvider

Categories

(Toolkit :: Add-ons Manager, defect, P2, critical)

defect

Tracking

()

REOPENED
mozilla70
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- affected
firefox68 - wontfix
firefox69 + wontfix
firefox70 --- fix-optional
firefox71 --- affected

People

(Reporter: robwu, Assigned: mixedpuppy)

References

Details

(Keywords: crash, regression, Whiteboard: webext?)

Crash Data

Attachments

(1 file)

Following a report about a startup crash on Reddit [1], I looked for similar crash signatures on Soccoro [2] and found several crashes. I don't have reproduction steps, but the user on Reddit might be in a position to provide more information if needed.

The most useful information for debugging the report is in the "AsyncShutdownTimeout" field at the "metadata" tab of crash reports.
For example, bp-3c9424b7-aca0-43db-958c-f99910181213 has

{
    "phase": "AddonManager: Waiting for providers to shut down.",
    "conditions": [{
        "name": "XPIProvider",
        "state": "(none)",
        "filename": "resource://gre/modules/AddonManager.jsm",
        "lineNumber": 678,
        "stack": ["resource://gre/modules/AddonManager.jsm:_startProvider:678", "resource://gre/modules/AddonManager.jsm:startup:807", "resource://gre/modules/AddonManager.jsm:startup:2802", "jar:file:///C:/Program%20Files%20(x86)/Mozilla%20Firefox/omni.ja!/components/addonManager.js:observe:66"]
    }, {
        "name": "EnvironmentAddonBuilder",
        "state": "(none)",
        "filename": "resource://gre/modules/TelemetryEnvironment.jsm",
        "lineNumber": 502,
        "stack": ["resource://gre/modules/TelemetryEnvironment.jsm:init:502", "resource://gre/modules/TelemetryEnvironment.jsm:EnvironmentCache:896", "resource://gre/modules/TelemetryEnvironment.jsm:getGlobal:59", "resource://gre/modules/TelemetryEnvironment.jsm:get currentEnvironment:66", "jar:file:///C:/Program%20Files%20(x86)/Mozilla%20Firefox/omni.ja!/components/TelemetryStartup.js:annotateEnvironment:40", "jar:file:///C:/Program%20Files%20(x86)/Mozilla%20Firefox/omni.ja!/components/TelemetryStartup.js:TelemetryStartup.prototype.observe:30"]
    }]
}


[1]: https://www.reddit.com/r/firefox/comments/a5qru6/firefox_64_opens_blank_tab_and_crashes/
[2] https://crash-stats.mozilla.com/search/?signature=~AsyncShutdownTimeout%20%7C%20AddonManager%3A%20Waiting%20for%20providers%20to%20shut%20down.&date=%3E%3D2018-12-06T12%3A20%3A24.000Z&date=%3C2018-12-13T12%3A20%3A24.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature
Priority: -- → P3
(NI to determine if this is telemetry or not)
Flags: needinfo?(kmaglione+bmo)
Priority: P3 → --
Priority: -- → P2
Flags: needinfo?(kmaglione+bmo)

Added a new signature from nightly.

Crash Signature: Waiting for providers to shut down. | JSON store: writing data,XPIProvider] → Waiting for providers to shut down. | JSON store: writing data,XPIProvider] [@ AsyncShutdownTimeout | AddonManager: Waiting to start provider shutdown. | EnvironmentAddonBuilder]

I've stumbled on a few more shutdown crashes mentioning the AddonManager but I'm not sure if they're the same issue or not. See here and here.

See Also: → 1555632

looking at the crashing graph in 69.0b this seems to be regressing in volume again compared to 68 (which was already quite bad).

Jim - Do we have anyone that can investigate this further? See phillip's note in Comment 4.

Flags: needinfo?(jmathies)
Flags: needinfo?(jmathies)
Whiteboard: webext?

Getting late in the game to fix for 69, but agreed that it'd be good to do so if possible given the volume.

I have been able to reproduce this crash with the same reproduction outlined in bug 1381633, which is:

# create a new profile
./firefox -CreateProfile testprofile
# run with -silent
./firefox -P testprofile -silent

The first run didn't produce what I expected, but when I did it a second or third time, I got the crash.

I added some additional logging in DeferredTask.jsm to get a traceback on all the console output:

JavaScript error: resource://gre/modules/DeferredTask.jsm, line 195: Error: Unable to arm timer, the object has been finalized.

Which always points to save@resource://gre/modules/ExtensionParent.jsm:2014:27

These two tracebacks happen a lot, the first earlier seemingly during startup, the second later seemingly during shutdown (ie. they are not interleaved).

JavaScript error: resource://gre/modules/DeferredTask.jsm, line 195: Error: Unable to arm timer, the object has been finalized.
arm@resource://gre/modules/DeferredTask.jsm:195:79
save@resource://gre/modules/ExtensionParent.jsm:2014:27
get@resource://gre/modules/ExtensionParent.jsm:2112:20
async*load@resource://gre/modules/Schemas.jsm:3485:35
async*lazyInit/promise<@resource://gre/modules/ExtensionParent.jsm:196:33
observe@resource://gre/modules/AsyncShutdown.jsm:551:16

JavaScript error: resource://gre/modules/DeferredTask.jsm, line 195: Error: Unable to arm timer, the object has been finalized.
arm@resource://gre/modules/DeferredTask.jsm:195:79
save@resource://gre/modules/ExtensionParent.jsm:2014:27
get@resource://gre/modules/ExtensionParent.jsm:2112:20
async*parseManifest@resource://gre/modules/Extension.jsm:1834:35
loadManifest@resource://gre/modules/Extension.jsm:1003:12
loadManifest@resource://gre/modules/Extension.jsm:1850:32
startup@resource://gre/modules/Extension.jsm:2159:18
startup@resource://gre/modules/Extension.jsm:1496:27
callBootstrapMethod@resource://gre/modules/addons/XPIProvider.jsm:1789:33
startup@resource://gre/modules/addons/XPIProvider.jsm:1893:32
startup@resource://gre/modules/addons/XPIProvider.jsm:2462:39
callProvider@resource://gre/modules/AddonManager.jsm:214:31
_startProvider@resource://gre/modules/AddonManager.jsm:650:5
startup@resource://gre/modules/AddonManager.jsm:876:14
startup@resource://gre/modules/AddonManager.jsm:3468:26
observe@resource://gre/modules/addonManager.js:86:29

At the end I get:

JavaScript error: resource://gre/modules/ExtensionSettingsStore.jsm, line 117: Error: The ExtensionSettingsStore was accessed before the initialize promise resolved.
WARNING: At least one completion condition is taking too long to complete. Conditions: [{"name":"EnvironmentAddonBuilder","state":"(none)","filename":"resource://gre/modules/TelemetryEnvironment.jsm","lineNumber":596,"stack":["resource://gre/modules/TelemetryEnvironment.jsm:init:596","resource://gre/modules/TelemetryEnvironment.jsm:EnvironmentCache:1037","resource://gre/modules/TelemetryEnvironment.jsm:getGlobal:94","resource://gre/modules/TelemetryEnvironment.jsm:get currentEnvironment:101","resource://gre/modules/TelemetryStartup.jsm:annotateEnvironment:47","resource://gre/modules/TelemetryStartup.jsm:TelemetryStartup.prototype.observe:34"]}] Barrier: AddonManager: Waiting to start provider shutdown.
WARNING: At least one completion condition is taking too long to complete. Conditions: [{"name":"AddonManager: shutting down.","state":[{"name":"AddonManager: Waiting to start provider shutdown.","state":[{"name":"EnvironmentAddonBuilder","state":"(none)","filename":"resource://gre/modules/TelemetryEnvironment.jsm","lineNumber":596,"stack":["resource://gre/modules/TelemetryEnvironment.jsm:init:596","resource://gre/modules/TelemetryEnvironment.jsm:EnvironmentCache:1037","resource://gre/modules/TelemetryEnvironment.jsm:getGlobal:94","resource://gre/modules/TelemetryEnvironment.jsm:get currentEnvironment:101","resource://gre/modules/TelemetryStartup.jsm:annotateEnvironment:47","resource://gre/modules/TelemetryStartup.jsm:TelemetryStartup.prototype.observe:34"]}]},{"name":"AddonManager: Waiting for providers to shut down.","state":"Not started"},{"name":"AddonRepository: async shutdown","state":""}],"filename":"resource://gre/modules/AddonManager.jsm","lineNumber":866,"stack":["resource://gre/modules/AddonManager.jsm:startup:866","resource://gre/modules/AddonManager.jsm:startup:3468","resource://gre/modules/addonManager.js:observe:86"]}] Barrier: profile-before-change
JavaScript error: resource://gre/modules/UpdateTimerManager.jsm, line 349: TypeError: setting getter-only property "gLogEnabled"
JavaScript error: resource://gre/modules/ModulesPing.jsm, line 66: NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS: [JavaScript Error: "setting getter-only property "gLogEnabled"" {file: "resource://gre/modules/UpdateTimerManager.jsm" line: 349}]'[JavaScript Error: "setting getter-only property "gLogEnabled"" {file: "resource://gre/modules/UpdateTimerManager.jsm" line: 349}]' when calling method: [nsIUpdateTimerManager::registerTimer]
FATAL ERROR: AsyncShutdown timeout in AddonManager: Waiting to start provider shutdown. Conditions: [{"name":"EnvironmentAddonBuilder","state":"(none)","filename":"resource://gre/modules/TelemetryEnvironment.jsm","lineNumber":596,"stack":["resource://gre/modules/TelemetryEnvironment.jsm:init:596","resource://gre/modules/TelemetryEnvironment.jsm:EnvironmentCache:1037","resource://gre/modules/TelemetryEnvironment.jsm:getGlobal:94","resource://gre/modules/TelemetryEnvironment.jsm:get currentEnvironment:101","resource://gre/modules/TelemetryStartup.jsm:annotateEnvironment:47","resource://gre/modules/TelemetryStartup.jsm:TelemetryStartup.prototype.observe:34"]}] At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources.

And the OSX crash results are the same as in this bug.

Based on the last output, I think that the change in bug 1379831 did not accomplish (consistently) what was intended, but am unsure still.

@aswan, any thoughts on comment 7?

Flags: needinfo?(aswan)

Running with -silent does not result in the quit application granted notification.
Without that notification, we have a deadlock in EnvironmentAddonBuilder where it blocks
beforeShutdown waiting on the XPIDatabase to load.

Basically, if startup happens and we do not get the quit-application-granted notification, we deadlock then crash. -silent does this as well as a few other paths in commandline handling.

Flags: needinfo?(aswan)
Assignee: nobody → mixedpuppy
Whiteboard: webext?
Pushed by scaraveo@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/61dc2ad51719
ensure we notify quit application when running silent r=kmag
Pushed by scaraveo@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0fd656a563f0
ensure we notify quit application when running silent r=kmag
Status: NEW → RESOLVED
Closed: 29 days ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla70

the crash volume on nightly hasn't noticeably decreased since the patch landed - the str from comment #7 maybe are just describing an edge-case for this crash signature. should we attempt to uplift the fix to 69 beta anyway to see if it has an impact for the top crasher there and/or reopen the bug?

(In reply to [:philipp] from comment #16)

maybe are just describing an edge-case for this crash signature.

That is highly possible.

should we attempt to uplift the fix to 69 beta anyway to see if it has an impact for the top crasher there and/or reopen the bug?

I don't think it would hurt do to that. Keeping this bug open is probably good as well.

Flags: needinfo?(mixedpuppy)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Flags: qe-verify+

Comment on attachment 9085311 [details]
Bug 1513855 ensure we notify quit application when running silent

Beta/Release Uplift Approval Request

  • User impact if declined: potential reduction in crashes.

This didn't fix the nightly crash rate, wondering if uplift would be ok to see if it has any affect on the beta crash rate (which is higher than nightly).

  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: - create a new profile (don't run firefox)
  • run firefox with command line arg -silent

secondary str

  • verify running firefox -search foo works properly after patch applied
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Small change in shutdown to make sure shutdown notifications are sent
  • String changes made/needed: none
Attachment #9085311 - Flags: approval-mozilla-beta?

Comment on attachment 9085311 [details]
Bug 1513855 ensure we notify quit application when running silent

Might as well see what affect it has on Beta. Approved for 69.0b15.

Attachment #9085311 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
QA Whiteboard: [qa-triaged]
Duplicate of this bug: 1575215

unfortunately the patch hasn't made a sizeable difference in the beta crash volume as expected, so i'll change the state back to affected.

Adding the QE+ flagg back, confirmed with :philipp.

Fwiw, managed to confirm the issue with 62.0.8 on Windows 10.
The current patch was checked using steps from comment #7 with 70.0a1 (2019-08-22), 69.0b15 on Windows 10, macOS 10.14, Ubuntu 16.0.4 and I did not encounter the crash at all.

Flags: qe-verify+

Comment on attachment 9085311 [details]
Bug 1513855 ensure we notify quit application when running silent

Clearing the approval so this doesn't show up on the needs-uplift queries.

Attachment #9085311 - Flags: approval-mozilla-beta+

There is a touch of this happening on Thunderbird as well. When I look at the crash report by clicking through on the graph itself, Thunderbird is the larger portion of the daily crashes. It may be good for someone over there to look at the patch on this bug and see if there are any shutdown paths that are not causing the quit-application notification.

Flags: needinfo?(mkmelin+mozilla)

Thx. Ben, please have a look at some point.

Flags: needinfo?(mkmelin+mozilla)

Curious if there's something more we can do here. I know we've been chipping away at these shutdown issues.

Whiteboard: webext?

Looking from 9/5 to now (past week), most of the crashes seem to be v60 or earlier. Out of 971 results, roughly 20 are Firefox >= 67 WINNT, only one on 69, and none on 68 release.

Thunderbird 60 is a big chunk of the results, as is Firefox 60. I suspect some earlier changes Kris did largely fixed the problem, my patch here just caught an STR that isn't normal.

edit: strange that my search provides different results than the table after the graph, looking more.

(In reply to [:philipp] from comment #29)

have you looked into all signatures?

yes, after realizing again that linking from the graph does not do that (thus my edit in the comment)

I spent a lot of time yesterday eyeballing code related to this. I've no idea how to reproduce it. However, the shutdown hang is happening somewhere here[1]. _shutdownBlocker, which the reports points to, returns the _pendingTask after a couple other items are done.

A small sampling of the crash reports (manually looking at 20-30 of them) show that many (but certainly not all) happen in the first 60-90 seconds of a session. Most but not all crashes are WINNT. Many show that firefox is non-default browser, but not all. I've no idea if it's possible to get any correlation data on uptime and default browser setting.

In any case, my suspicion is that there is some startup to pass a url to a running firefox, which then quickly exits. This is the code path my prior patch touched, but there could be something windows specific I am unaware of.

I'm not certain where to go from here at this time.

[1] https://searchfox.org/mozilla-release/rev/81e7b8ebd4a907d11ae3794cb9ba705e47e39060/toolkit/components/telemetry/app/TelemetryEnvironment.jsm#598-616

You need to log in before you can comment on or make changes to this bug.