1513855 - Crash in AsyncShutdownTimeout | AddonManager: Waiting for providers to shut down. | EnvironmentAddonBuilder,XPIProvider

Reporter

Description

•

7 years ago

Following a report about a startup crash on Reddit [1], I looked for similar crash signatures on Soccoro [2] and found several crashes. I don't have reproduction steps, but the user on Reddit might be in a position to provide more information if needed. The most useful information for debugging the report is in the "AsyncShutdownTimeout" field at the "metadata" tab of crash reports. For example, bp-3c9424b7-aca0-43db-958c-f99910181213 has { "phase": "AddonManager: Waiting for providers to shut down.", "conditions": [{ "name": "XPIProvider", "state": "(none)", "filename": "resource://gre/modules/AddonManager.jsm", "lineNumber": 678, "stack": ["resource://gre/modules/AddonManager.jsm:_startProvider:678", "resource://gre/modules/AddonManager.jsm:startup:807", "resource://gre/modules/AddonManager.jsm:startup:2802", "jar:file:///C:/Program%20Files%20(x86)/Mozilla%20Firefox/omni.ja!/components/addonManager.js:observe:66"] }, { "name": "EnvironmentAddonBuilder", "state": "(none)", "filename": "resource://gre/modules/TelemetryEnvironment.jsm", "lineNumber": 502, "stack": ["resource://gre/modules/TelemetryEnvironment.jsm:init:502", "resource://gre/modules/TelemetryEnvironment.jsm:EnvironmentCache:896", "resource://gre/modules/TelemetryEnvironment.jsm:getGlobal:59", "resource://gre/modules/TelemetryEnvironment.jsm:get currentEnvironment:66", "jar:file:///C:/Program%20Files%20(x86)/Mozilla%20Firefox/omni.ja!/components/TelemetryStartup.js:annotateEnvironment:40", "jar:file:///C:/Program%20Files%20(x86)/Mozilla%20Firefox/omni.ja!/components/TelemetryStartup.js:TelemetryStartup.prototype.observe:30"] }] } [1]: https://www.reddit.com/r/firefox/comments/a5qru6/firefox_64_opens_blank_tab_and_crashes/ [2] https://crash-stats.mozilla.com/search/?signature=~AsyncShutdownTimeout%20%7C%20AddonManager%3A%20Waiting%20for%20providers%20to%20shut%20down.&date=%3E%3D2018-12-06T12%3A20%3A24.000Z&date=%3C2018-12-13T12%3A20%3A24.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

David Durst [:ddurst]

Updated

•

7 years ago

Priority: -- → P3

David Durst [:ddurst]

Comment 1

•

7 years ago

(NI to determine if this is telemetry or not)

Flags: needinfo?(kmaglione+bmo)

Priority: P3 → --

David Durst [:ddurst]

Updated

•

7 years ago

Priority: -- → P2

David Durst [:ddurst]

Updated

•

7 years ago

Flags: needinfo?(kmaglione+bmo)

Gabriele Svelto [:gsvelto]

Comment 2

•

6 years ago

Added a new signature from nightly.

Crash Signature: Waiting for providers to shut down. | JSON store: writing data,XPIProvider] → Waiting for providers to shut down. | JSON store: writing data,XPIProvider] [@ AsyncShutdownTimeout | AddonManager: Waiting to start provider shutdown. | EnvironmentAddonBuilder]

Gabriele Svelto [:gsvelto]

Comment 3

•

6 years ago

I've stumbled on a few more shutdown crashes mentioning the AddonManager but I'm not sure if they're the same issue or not. See here and here.

[:philipp]

Updated

•

6 years ago

Updated

•

6 years ago

status-firefox68: --- → ?

[:philipp]

Comment 4

•

6 years ago

looking at the crashing graph in 69.0b this seems to be regressing in volume again compared to 68 (which was already quite bad).

status-firefox68: ? → affected

status-firefox69: --- → affected

status-firefox70: --- → affected

status-firefox-esr68: --- → affected

tracking-firefox68: --- → ?

tracking-firefox69: --- → ?

Keywords: regression

Marcia Knous [:marcia]

Comment 5

•

6 years ago

Jim - Do we have anyone that can investigate this further? See phillip's note in Comment 4.

Flags: needinfo?(jmathies)

Jim Mathies [:jimm]

Updated

•

6 years ago

Flags: needinfo?(jmathies)

Whiteboard: webext?

Ryan VanderMeulen [:RyanVM]

Comment 6

•

6 years ago

Getting late in the game to fix for 69, but agreed that it'd be good to do so if possible given the volume.

status-firefox68: affected → wontfix

status-firefox-esr60: --- → unaffected

tracking-firefox68: ? → -

tracking-firefox69: ? → +

Shane Caraveo (:mixedpuppy)

Assignee

Comment 7

•

6 years ago

I have been able to reproduce this crash with the same reproduction outlined in bug 1381633, which is:

# create a new profile
./firefox -CreateProfile testprofile
# run with -silent
./firefox -P testprofile -silent

The first run didn't produce what I expected, but when I did it a second or third time, I got the crash.

I added some additional logging in DeferredTask.jsm to get a traceback on all the console output:

JavaScript error: resource://gre/modules/DeferredTask.jsm, line 195: Error: Unable to arm timer, the object has been finalized.

Which always points to save@resource://gre/modules/ExtensionParent.jsm:2014:27

These two tracebacks happen a lot, the first earlier seemingly during startup, the second later seemingly during shutdown (ie. they are not interleaved).

JavaScript error: resource://gre/modules/DeferredTask.jsm, line 195: Error: Unable to arm timer, the object has been finalized.
arm@resource://gre/modules/DeferredTask.jsm:195:79
save@resource://gre/modules/ExtensionParent.jsm:2014:27
get@resource://gre/modules/ExtensionParent.jsm:2112:20
async*load@resource://gre/modules/Schemas.jsm:3485:35
async*lazyInit/promise<@resource://gre/modules/ExtensionParent.jsm:196:33
observe@resource://gre/modules/AsyncShutdown.jsm:551:16

JavaScript error: resource://gre/modules/DeferredTask.jsm, line 195: Error: Unable to arm timer, the object has been finalized.
arm@resource://gre/modules/DeferredTask.jsm:195:79
save@resource://gre/modules/ExtensionParent.jsm:2014:27
get@resource://gre/modules/ExtensionParent.jsm:2112:20
async*parseManifest@resource://gre/modules/Extension.jsm:1834:35
loadManifest@resource://gre/modules/Extension.jsm:1003:12
loadManifest@resource://gre/modules/Extension.jsm:1850:32
startup@resource://gre/modules/Extension.jsm:2159:18
startup@resource://gre/modules/Extension.jsm:1496:27
callBootstrapMethod@resource://gre/modules/addons/XPIProvider.jsm:1789:33
startup@resource://gre/modules/addons/XPIProvider.jsm:1893:32
startup@resource://gre/modules/addons/XPIProvider.jsm:2462:39
callProvider@resource://gre/modules/AddonManager.jsm:214:31
_startProvider@resource://gre/modules/AddonManager.jsm:650:5
startup@resource://gre/modules/AddonManager.jsm:876:14
startup@resource://gre/modules/AddonManager.jsm:3468:26
observe@resource://gre/modules/addonManager.js:86:29

At the end I get:

JavaScript error: resource://gre/modules/ExtensionSettingsStore.jsm, line 117: Error: The ExtensionSettingsStore was accessed before the initialize promise resolved.
WARNING: At least one completion condition is taking too long to complete. Conditions: [{"name":"EnvironmentAddonBuilder","state":"(none)","filename":"resource://gre/modules/TelemetryEnvironment.jsm","lineNumber":596,"stack":["resource://gre/modules/TelemetryEnvironment.jsm:init:596","resource://gre/modules/TelemetryEnvironment.jsm:EnvironmentCache:1037","resource://gre/modules/TelemetryEnvironment.jsm:getGlobal:94","resource://gre/modules/TelemetryEnvironment.jsm:get currentEnvironment:101","resource://gre/modules/TelemetryStartup.jsm:annotateEnvironment:47","resource://gre/modules/TelemetryStartup.jsm:TelemetryStartup.prototype.observe:34"]}] Barrier: AddonManager: Waiting to start provider shutdown.
WARNING: At least one completion condition is taking too long to complete. Conditions: [{"name":"AddonManager: shutting down.","state":[{"name":"AddonManager: Waiting to start provider shutdown.","state":[{"name":"EnvironmentAddonBuilder","state":"(none)","filename":"resource://gre/modules/TelemetryEnvironment.jsm","lineNumber":596,"stack":["resource://gre/modules/TelemetryEnvironment.jsm:init:596","resource://gre/modules/TelemetryEnvironment.jsm:EnvironmentCache:1037","resource://gre/modules/TelemetryEnvironment.jsm:getGlobal:94","resource://gre/modules/TelemetryEnvironment.jsm:get currentEnvironment:101","resource://gre/modules/TelemetryStartup.jsm:annotateEnvironment:47","resource://gre/modules/TelemetryStartup.jsm:TelemetryStartup.prototype.observe:34"]}]},{"name":"AddonManager: Waiting for providers to shut down.","state":"Not started"},{"name":"AddonRepository: async shutdown","state":""}],"filename":"resource://gre/modules/AddonManager.jsm","lineNumber":866,"stack":["resource://gre/modules/AddonManager.jsm:startup:866","resource://gre/modules/AddonManager.jsm:startup:3468","resource://gre/modules/addonManager.js:observe:86"]}] Barrier: profile-before-change
JavaScript error: resource://gre/modules/UpdateTimerManager.jsm, line 349: TypeError: setting getter-only property "gLogEnabled"
JavaScript error: resource://gre/modules/ModulesPing.jsm, line 66: NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS: [JavaScript Error: "setting getter-only property "gLogEnabled"" {file: "resource://gre/modules/UpdateTimerManager.jsm" line: 349}]'[JavaScript Error: "setting getter-only property "gLogEnabled"" {file: "resource://gre/modules/UpdateTimerManager.jsm" line: 349}]' when calling method: [nsIUpdateTimerManager::registerTimer]
FATAL ERROR: AsyncShutdown timeout in AddonManager: Waiting to start provider shutdown. Conditions: [{"name":"EnvironmentAddonBuilder","state":"(none)","filename":"resource://gre/modules/TelemetryEnvironment.jsm","lineNumber":596,"stack":["resource://gre/modules/TelemetryEnvironment.jsm:init:596","resource://gre/modules/TelemetryEnvironment.jsm:EnvironmentCache:1037","resource://gre/modules/TelemetryEnvironment.jsm:getGlobal:94","resource://gre/modules/TelemetryEnvironment.jsm:get currentEnvironment:101","resource://gre/modules/TelemetryStartup.jsm:annotateEnvironment:47","resource://gre/modules/TelemetryStartup.jsm:TelemetryStartup.prototype.observe:34"]}] At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources.

And the OSX crash results are the same as in this bug.

Based on the last output, I think that the change in bug 1379831 did not accomplish (consistently) what was intended, but am unsure still.

Shane Caraveo (:mixedpuppy)

Assignee

Comment 8

•

6 years ago

@aswan, any thoughts on comment 7?

Flags: needinfo?(aswan)

Shane Caraveo (:mixedpuppy)

Assignee

Comment 9

•

6 years ago

Attached file Bug 1513855 ensure we notify quit application when running silent — Details

Running with -silent does not result in the quit application granted notification.
Without that notification, we have a deadlock in EnvironmentAddonBuilder where it blocks
beforeShutdown waiting on the XPIDatabase to load.

Shane Caraveo (:mixedpuppy)

Assignee

Comment 10

•

6 years ago

Basically, if startup happens and we do not get the quit-application-granted notification, we deadlock then crash. -silent does this as well as a few other paths in commandline handling.

Flags: needinfo?(aswan)

Shane Caraveo (:mixedpuppy)

Assignee

Comment 11

•

6 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d2437918972f5660bb3dbac755d6e98f9dd22b53

Shane Caraveo (:mixedpuppy)

Assignee

Updated

•

6 years ago

Assignee: nobody → mixedpuppy

Shane Caraveo (:mixedpuppy)

Assignee

Updated

•

6 years ago

Whiteboard: webext?

Pulsebot

Comment 12

•

6 years ago

Pushed by scaraveo@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/61dc2ad51719 ensure we notify quit application when running silent r=kmag

Dorel Luca [:dluca]

Comment 13

•

6 years ago

Backed out changeset 61dc2ad51719 (bug 1513855) for DevTools failures in devtools/client/framework/test/browser_browser_toolbox.js

Log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=261694878&repo=autoland&lineNumber=4358

Push with failures:
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=61dc2ad51719c1f2a00eb5d08d925cf59bb4d0bb

Backout:
https://hg.mozilla.org/integration/autoland/rev/9468a3143e3826ad7a4a00341a7fca8c5b2d9c77

Flags: needinfo?(mixedpuppy)

Pulsebot

Comment 14

•

6 years ago

Pushed by scaraveo@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0fd656a563f0 ensure we notify quit application when running silent r=kmag

Narcis Beleuzu [:NarcisB]

Comment 15

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/0fd656a563f0

Status: NEW → RESOLVED

Closed: 6 years ago

status-firefox70: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla70

[:philipp]

Comment 16

•

6 years ago

the crash volume on nightly hasn't noticeably decreased since the patch landed - the str from comment #7 maybe are just describing an edge-case for this crash signature. should we attempt to uplift the fix to 69 beta anyway to see if it has an impact for the top crasher there and/or reopen the bug?

Shane Caraveo (:mixedpuppy)

Assignee

Comment 17

•

6 years ago

(In reply to [:philipp] from comment #16)

maybe are just describing an edge-case for this crash signature.

That is highly possible.

should we attempt to uplift the fix to 69 beta anyway to see if it has an impact for the top crasher there and/or reopen the bug?

I don't think it would hurt do to that. Keeping this bug open is probably good as well.

Flags: needinfo?(mixedpuppy)

Shane Caraveo (:mixedpuppy)

Assignee

Updated

•

6 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Shane Caraveo (:mixedpuppy)

Assignee

Updated

•

6 years ago

Flags: qe-verify+

Shane Caraveo (:mixedpuppy)

Assignee

Comment 18

•

6 years ago

Comment on attachment 9085311 [details]
Bug 1513855 ensure we notify quit application when running silent

Beta/Release Uplift Approval Request

User impact if declined: potential reduction in crashes.

This didn't fix the nightly crash rate, wondering if uplift would be ok to see if it has any affect on the beta crash rate (which is higher than nightly).

Is this code covered by automated tests?: No
Has the fix been verified in Nightly?: No
Needs manual test from QE?: Yes
If yes, steps to reproduce: - create a new profile (don't run firefox)

run firefox with command line arg -silent

secondary str

verify running firefox -search foo works properly after patch applied

List of other uplifts needed: None
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): Small change in shutdown to make sure shutdown notifications are sent
String changes made/needed: none

Attachment #9085311 - Flags: approval-mozilla-beta?

Ryan VanderMeulen [:RyanVM]

Comment 19

•

6 years ago

Comment on attachment 9085311 [details]
Bug 1513855 ensure we notify quit application when running silent

Might as well see what affect it has on Beta. Approved for 69.0b15.

Attachment #9085311 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Ryan VanderMeulen [:RyanVM]

Comment 20

•

6 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/66a12ddcf77c

status-firefox69: affected → fixed

Bogdan Maris, Desktop Test Engineering

Updated

•

6 years ago

QA Whiteboard: [qa-triaged]

[:philipp]

Comment 22

•

6 years ago

unfortunately the patch hasn't made a sizeable difference in the beta crash volume as expected, so i'll change the state back to affected.

status-firefox69: fixed → affected

status-firefox70: fixed → affected

Flags: qe-verify+

Cristian Fogel [:cfogel]

Comment 23

•

6 years ago

Adding the QE+ flagg back, confirmed with :philipp.

Fwiw, managed to confirm the issue with 62.0.8 on Windows 10.
The current patch was checked using steps from comment #7 with 70.0a1 (2019-08-22), 69.0b15 on Windows 10, macOS 10.14, Ubuntu 16.0.4 and I did not encounter the crash at all.

Flags: qe-verify+

Ryan VanderMeulen [:RyanVM]

Comment 24

•

6 years ago

Comment on attachment 9085311 [details]
Bug 1513855 ensure we notify quit application when running silent

Clearing the approval so this doesn't show up on the needs-uplift queries.

Attachment #9085311 - Flags: approval-mozilla-beta+

Shane Caraveo (:mixedpuppy)

Assignee

Comment 25

•

6 years ago

There is a touch of this happening on Thunderbird as well. When I look at the crash report by clicking through on the graph itself, Thunderbird is the larger portion of the daily crashes. It may be good for someone over there to look at the patch on this bug and see if there are any shutdown paths that are not causing the quit-application notification.

Flags: needinfo?(mkmelin+mozilla)

Magnus Melin [:mkmelin]

Comment 26

•

6 years ago

Thx. Ben, please have a look at some point.

Flags: needinfo?(mkmelin+mozilla)

Kate Hudson :k88hudson

Updated

•

6 years ago

status-firefox69: affected → wontfix

status-firefox71: --- → affected

Jim Mathies [:jimm]

Comment 27

•

6 years ago

Curious if there's something more we can do here. I know we've been chipping away at these shutdown issues.

Whiteboard: webext?

janet [:janet]

Updated

•

6 years ago

status-firefox70: affected → fix-optional

Shane Caraveo (:mixedpuppy)

Assignee

Comment 28

•

6 years ago

•

Edited

Looking from 9/5 to now (past week), most of the crashes seem to be v60 or earlier. Out of 971 results, roughly 20 are Firefox >= 67 WINNT, only one on 69, and none on 68 release.

Thunderbird 60 is a big chunk of the results, as is Firefox 60. I suspect some earlier changes Kris did largely fixed the problem, my patch here just caught an STR that isn't normal.

edit: strange that my search provides different results than the table after the graph, looking more.

[:philipp]

Comment 29

•

6 years ago

have you looked into all signatures? https://crash-stats.mozilla.org/signature/?product=Firefox&signature=AsyncShutdownTimeout%20%7C%20AddonManager%3A%20Waiting%20to%20start%20provider%20shutdown.%20%7C%20EnvironmentAddonBuilder is the one affecting current release versions (and not in small numbers)

Shane Caraveo (:mixedpuppy)

Assignee

Comment 30

•

6 years ago

(In reply to [:philipp] from comment #29)

have you looked into all signatures?

yes, after realizing again that linking from the graph does not do that (thus my edit in the comment)

Shane Caraveo (:mixedpuppy)

Assignee

Comment 31

•

6 years ago

I spent a lot of time yesterday eyeballing code related to this. I've no idea how to reproduce it. However, the shutdown hang is happening somewhere here[1]. _shutdownBlocker, which the reports points to, returns the _pendingTask after a couple other items are done.

A small sampling of the crash reports (manually looking at 20-30 of them) show that many (but certainly not all) happen in the first 60-90 seconds of a session. Most but not all crashes are WINNT. Many show that firefox is non-default browser, but not all. I've no idea if it's possible to get any correlation data on uptime and default browser setting.

In any case, my suspicion is that there is some startup to pass a url to a running firefox, which then quickly exits. This is the code path my prior patch touched, but there could be something windows specific I am unaware of.

I'm not certain where to go from here at this time.

[1] https://searchfox.org/mozilla-release/rev/81e7b8ebd4a907d11ae3794cb9ba705e47e39060/toolkit/components/telemetry/app/TelemetryEnvironment.jsm#598-616

Shane Caraveo (:mixedpuppy)

Assignee

Comment 32

•

6 years ago

Aaron, Jim mentioned the security boostrap work as we were discussing this bug, so I was wondering if you might have some input. I think this crash is still a similar issue as one path I fixed.

Comment 31 above has some basic info that you could read, but the essence of what I think the problem is: "my suspicion is that there is some startup to pass a url to a running firefox, which then quickly exits". There is a potential deadlock in this situation and I fixed one path on the patch that landed in august. Basically if we don't do the quit notification, we get a deadlock and this crash is forced to happen.

Does this sound like a potential problem with the security bootstrap?

Flags: needinfo?(aklotz)

Liz Henry (:lizzard) (relman/hg->git project)

Comment 33

•

6 years ago

•

Edited

Crash volume is quite high in beta 70 for one signature. And it was a fair number in 69 beta as well but no crashes in 69 release. (https://crash-stats.mozilla.org/search/?release_channel=beta&release_channel=aurora&signature=%3DAsyncShutdownTimeout%20%7C%20AddonManager%3A%20Waiting%20to%20start%20provider%20shutdown.%20%7C%20EnvironmentAddonBuilder&product=Firefox&version=69.0b16&version=70.0b1&version=70.0b2&version=70.0b3&version=70.0b4&version=70.0b5&version=70.0b6&version=70.0b7&version=70.0b8&date=%3E%3D2019-08-22T00%3A00%3A00.000Z&date=%3C2019-09-20T17%3A36%3A00.000Z&_facets=install_time&_facets=version&_facets=address&_facets=moz_crash_reason&_facets=reason&_facets=build_id&_facets=platform_pretty_version&_facets=signature&_facets=useragent_locale&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-build_id)

(No longer employed by Mozilla) Aaron Klotz

Comment 34

•

6 years ago

I'm a little hazy on the problem here, but the launcher process should be unaffected by this.

Flags: needinfo?(aklotz)

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 35

•

6 years ago

So I can see this crash with the recent Firefox Nightly build on MacOS. See bp-36181aa7-dc3e-4fcb-b909-c882c0191019.

What happens for me is that when I have selected the profile inside of the profile manager Firefox doesn't startup. Sometimes I see the icon in the dock, but other times even not that. Then after a minute of hanging the crash reporter comes up.

Shane Caraveo (:mixedpuppy)

Assignee

Comment 36

•

6 years ago

Is that 100% reproducible for you? This sounds in line with one Firefox process quitting and possibly not calling Services.startup.quit. I'm not able to reproduce this with the profile manager as you describe.

Flags: needinfo?(hskupin)

Shane Caraveo (:mixedpuppy)

Assignee

Comment 37

•

6 years ago

Maybe mossop will know some details about the shutdown in this case (comment 35 and 36)

Flags: needinfo?(dtownsend)

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 38

•

6 years ago

Yes, it reproduces 100% of the time for this particular profile. When I'm back at work tomorrow I will make a copy of it, and try to nail down the problem. Leaving the needinfo request for now.

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 39

•

6 years ago

Hah, I think that I found at least my problem. So in the shell the CWD is eg. ~/foobar. Then in the Finder I removed that folder. When I now start Firefox from the shell, which operates on a folder which doesn't exist anymore, the above failure can be seen. If CWD still exists the problem is not visible.

Flags: needinfo?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 40

•

6 years ago

Actually on MacOS you will also have to empty the Trash, otherwise the shell will refer to the deleted folder from inside the Trash. Running a debug version of Firefox it looks to be related to the startup cache:

[32842, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[32842, Main Thread] WARNING: dependent window created without a parent: file /builds/worker/workspace/build/src/toolkit/components/startup/nsAppStartup.cpp, line 631
++DOCSHELL 0x1130e6000 == 1 [pid = 32842] [id = {9dd8b9ce-ed4a-5448-9d6e-ec900194cb67}]
++DOMWINDOW == 1 (0x10ff25200) [pid = 32842] [serial = 1] [outer = 0x0]
++DOMWINDOW == 2 (0x115345000) [pid = 32842] [serial = 2] [outer = 0x10ff25200]
[32842, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173

https://searchfox.org/mozilla-central/rev/8512c64c65031321381d12c8f58cc22d5af15bf1/startupcache/StartupCache.cpp#173

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 41

•

6 years ago

Note that the above might only be one situation when this bug could appear, but there might also be others. Especially if that crash is happening a lot, I doubt that those users have the same situation.

Dave Townsend [:mossop]

Comment 42

•

6 years ago

(In reply to Shane Caraveo (:mixedpuppy) from comment #36)

Is that 100% reproducible for you? This sounds in line with one Firefox process quitting and possibly not calling Services.startup.quit. I'm not able to reproduce this with the profile manager as you describe.

We don't call appStartup.quit when quiting the profile manager to load the selected profile.

Flags: needinfo?(dtownsend)

Shane Caraveo (:mixedpuppy)

Assignee

Comment 43

•

6 years ago

Attached file Bug 1513855 make addon manager block on profileChangeTeardown rather than applicationQuit (obsolete) — Details

This patch avoids blocking on app quit notifications which are not sent in all
situations. profileChangeTeardown always happens from nsXREDirProvider::DoShutdown.

Shane Caraveo (:mixedpuppy)

Assignee

Comment 44

•

6 years ago

Attached file Bug 1513855 force quit if quit has not been called for shutdown (obsolete) — Details

This forces a quit to occur if it had not been called by this point.

Shane Caraveo (:mixedpuppy)

Assignee

Comment 45

•

6 years ago

The patch in comment 43 occasionally causes Firefox to want to enter safe mode on the next startup.
The patch in comment 44 occasionally causes Firefox to show the restore session tab on the next startup.

Both cases above are tested using the -silent command line to force the situation.

So it seems that both of these are probably too late. It seems that profileChangeTeardown is too late to shutdown stuff in the AddonManager, and ScopedXPCOMStartup destructor is too late to call quit.

Both remove the deadlock more generically, but I'm still having a problem getting the right timing.

Shane Caraveo (:mixedpuppy)

Assignee

Comment 46

•

6 years ago

I could probably use some advice from someone who might be more versed in various shutdown stuff. I've got two approaches to trying to deal with this deadlock but neither are completely right.

The second patch could also run into some additional complications due to some windows code that does a hacky shutdown: https://searchfox.org/mozilla-release/rev/167e8d714a6d7cb46cb111cd583b8d5d95318531/widget/windows/nsWindow.cpp#4951-4974

Flags: needinfo?(dtownsend)

Dave Townsend [:mossop]

Comment 47

•

6 years ago

Attached file Bug 1513855: Make sure that quit-application-granted and quit-application notifications are always sent. r=froydnj — Details

We only run the main event loop if at least one window is open. There are various
cases where this doesn't happen, such as -silent. By the time we get there though
some components have started and require appropriate notifications to tell them
to shutdown safely. This patch sends the quit-application-granted and
quit-application notifications in the case that there were no windows open
and nsAppStartup::Quit hasn't already been called when we come to run the
event loop.

Dave Townsend [:mossop]

Comment 48

•

6 years ago

I'm not going to claim I know all the ins and outs of shutdown and particularly the async shutdown stuff. The entire startup and shutdown processes are frankly a mess held together with duct tape and bubble gum. But for the -silent case what is happening is we are not entering the event loop because no windows were opened at startup but Quit was never called so the notifications it sends are not sent. This should be a pretty low risk patch that sends those notifications out in that case. I haven't done a great deal of testing with it (and I wouldn't expect it to have any impact on our automated tests), but for the -silent case it goes from logging a whole bunch of exceptions (none that match this bug!) to logging no abnormal exceptions, so that seems like a step in the right direction.

Flags: needinfo?(dtownsend)

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 49

•

6 years ago

Dave, we could have a Marionette restart test for this particular case. So we would have this situation covered by a test. Examples can be found here:

https://searchfox.org/mozilla-central/source/testing/marionette/harness/marionette_harness/tests/unit/test_quit_restart.py

Shane Caraveo (:mixedpuppy)

Assignee

Comment 50

•

6 years ago

Part of the issue is that I'm not sure how it's being produced on windows. The -silent command line is one way I was able to produce the deadlock on osx, but the prior fix did not address the deadlock on windows. Thus...looking at how to otherwise prevent the deadlock (starting at comment 43).

I'll get around to testing the patch from Dave hopefully this week.

Dave Townsend [:mossop]

Comment 51

•

6 years ago

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #49)

Dave, we could have a Marionette restart test for this particular case. So we would have this situation covered by a test. Examples can be found here:

https://searchfox.org/mozilla-central/source/testing/marionette/harness/marionette_harness/tests/unit/test_quit_restart.py

I'm not sure how. Aside from the fact that we don't seem to know what is triggering this my understanding is that marionette tests run against a running browser and this failure occurs when the browser never fully starts up.

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 52

•

6 years ago

It doesn't fully startup because of the -silent argument but otherwise it does? But as Shane also wrote it looks like not always reproducible. So maybe not really testable in automation.

Dave Townsend [:mossop]

Comment 53

•

6 years ago

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #52)

It doesn't fully startup because of the -silent argument but otherwise it does? But as Shane also wrote it looks like not always reproducible. So maybe not really testable in automation.

Shane's working theory is that the quit-application notifications are not being sent. As far as I can tell the only way for that to be the case is if we never enter the main event loop.

Shane Caraveo (:mixedpuppy)

Assignee

Comment 54

•

6 years ago

I know it is a deadlock between asyncshutdown blockers in telemetry[1] and the addonmanager[2] (which waits on quit). Calling quit in my first patch addressed the -silent switch, but that isn't what is causing all the windows deadlock/crashes. I'm building with the latest patch now.

[1] https://searchfox.org/mozilla-release/rev/df950fbe858cb644a71b75fe3a8e3bcb0487b478/toolkit/components/telemetry/app/TelemetryEnvironment.jsm#596

[2] https://searchfox.org/mozilla-release/rev/df950fbe858cb644a71b75fe3a8e3bcb0487b478/toolkit/mozapps/extensions/internal/XPIProvider.jsm#2483

Shane Caraveo (:mixedpuppy)

Assignee

Comment 55

•

6 years ago

Ok, using D51736 and running ./mach run -P someprofile -silent then running again without -silent results in the restore session tab occasionally being shown.

Since I don't know how to replicate the windows crashes, I'm uncertain if we'd have the same issue in real life.

Dave Townsend [:mossop]

Comment 56

•

6 years ago

(In reply to Shane Caraveo (:mixedpuppy) from comment #55)

Ok, using D51736 and running ./mach run -P someprofile -silent then running again without -silent results in the restore session tab occasionally being shown.

Does the same happen without the patch?

I wouldn't expect this change to affect this directly. It would be worth checking the contents of sessionCheckpoints.json in such a case. One thing to note is that we determine if there was a crash based on the "sessionstore-final-state-write-complete" checkpoint having been stored in the file (https://searchfox.org/mozilla-central/rev/8b7aa8af652f87d39349067a5bc9c0256bf6dedc/browser/components/sessionstore/SessionStartup.jsm#227-228) and we don't block shutdown on this happening (https://searchfox.org/mozilla-central/rev/8b7aa8af652f87d39349067a5bc9c0256bf6dedc/toolkit/components/crashmonitor/CrashMonitor.jsm#200-205). It wouldn't surprise me if shutdown was too fast in the case that we haven't actually started up fully and most of the browser services haven't been initialized.

Shane Caraveo (:mixedpuppy)

Assignee

Comment 57

•

6 years ago

•

Edited

Without the patch you get a hang then a hard crash (which we force to get out of the hang). With the patch, there is no crash, at least I don't get the crash dialog, and there are no errors to the console.

ie. without the patch, and removing my prior change to call quit for -silent

Shane Caraveo (:mixedpuppy)

Assignee

Comment 58

•

6 years ago

So, looking at this, we need "sessionstore-final-state-write-complete" to be written.

When quit is granted, RunState is updated[1]. So long as it is "closing", the file should be written[2].

[1] https://searchfox.org/mozilla-central/rev/d061ba55ac76f41129618d638f4ef674303ec103/browser/components/sessionstore/SessionStore.jsm#870
[2] https://searchfox.org/mozilla-central/rev/d061ba55ac76f41129618d638f4ef674303ec103/browser/components/sessionstore/SessionFile.jsm#431

During the write, a blocker for profileBeforeChange is setup[3], in which the write-complete notification is sent.

[3] https://searchfox.org/mozilla-central/rev/d061ba55ac76f41129618d638f4ef674303ec103/browser/components/sessionstore/SessionFile.jsm#478

The crash reporter uses the same profileBeforeChange blocker to write it's notifications received[4].

[4] https://searchfox.org/mozilla-central/rev/d061ba55ac76f41129618d638f4ef674303ec103/toolkit/components/crashmonitor/CrashMonitor.jsm#176

So, is there a potential for a race given both are using profileBeforeChange? Seems like that would explain why session restore only happens sometimes (in my testing, probably more than half the time, but not always).

Dave Townsend [:mossop]

Comment 59

•

6 years ago

(In reply to Shane Caraveo (:mixedpuppy) from comment #58)

So, looking at this, we need "sessionstore-final-state-write-complete" to be written.

When quit is granted, RunState is updated[1]. So long as it is "closing", the file should be written[2].

[1] https://searchfox.org/mozilla-central/rev/d061ba55ac76f41129618d638f4ef674303ec103/browser/components/sessionstore/SessionStore.jsm#870
[2] https://searchfox.org/mozilla-central/rev/d061ba55ac76f41129618d638f4ef674303ec103/browser/components/sessionstore/SessionFile.jsm#431

During the write, a blocker for profileBeforeChange is setup[3], in which the write-complete notification is sent.

[3] https://searchfox.org/mozilla-central/rev/d061ba55ac76f41129618d638f4ef674303ec103/browser/components/sessionstore/SessionFile.jsm#478

This is all pretty tricky to reason about...

So the profileBeforeChange blocker there blocks profileBeforeChange on the worker completing writing sessionstore.jsonlz4. The dispatch of the sessionstore-final-state-write-complete will be scheduled to occur in the next microtask after the blocking promise resolves (https://searchfox.org/mozilla-central/rev/d061ba55ac76f41129618d638f4ef674303ec103/browser/components/sessionstore/SessionFile.jsm#494). I think that this will be before the AsyncShutdown event loop for profile-before-change terminates but it might be worth checking.

Either way, that means that we start the asynchronous process of writing out the final checkpoint: https://searchfox.org/mozilla-central/rev/d061ba55ac76f41129618d638f4ef674303ec103/toolkit/components/crashmonitor/CrashMonitor.jsm#196 but we don't wait for it in any specific way since CrashMonitor's profileBeforeChange gets resolved once the "profile-before-change" checkpoint is written.

So now it's a race of the write of the final checkpoint complete against thread shutdown. Thread shutdown happens pretty quickly after the profile-before change notification completes (https://searchfox.org/mozilla-central/source/toolkit/xre/nsAppRunner.cpp#1233).

OS.File blocks profile-before-change on completing IO but in this case we're posting something after profile-before-change has started so I'm not sure that will save us. But OS.File also blocks thread shutdown until all scheduled events are complete. writeAtomic ends up getting posted to the stream transport service's thread pool and on thread shutdown that also completes all pending requests. But OS.File spins an event loop after it receives the thread shutdown notification, depending on notification ordering the stream transport service may have already received the thread shutdown notification and so will reject a writeAtomic event posted after that point. Without checking I'd actually expect the stream transport service to get the notification much earlier than any JS.

Basically there is a lot of async and threaded stuff going on that will determine whether the final checkpoint gets written out to sessionCheckpoints.json or not. In a Firefox run that has loaded a lot of stuff and so will need to shutdown a lot of stuff we likely win the race every time. In a -silent run it may be more dicey.

All to say, if we're going to use that final checkpoint to decide whether the last run was a crash or not and we care about never showing the crash page by accident we really should explicitly block shutdown on writing the checkpoint.

In my tests, for whatever reason -silent doesn't crash without your fix in place. And when running with -silent that final checkpoint doesn't get written to sessionCheckpoints.json and does when I start up fully.

Dave Townsend [:mossop]

Comment 60

•

6 years ago

In fact it isn't written for me even with your fix in place.

Dave Townsend [:mossop]

Comment 61

•

6 years ago

Ah. It looks like the final session store write never happens anyway during -silent.

In order to trigger the write _sessionInitialized should be true: https://searchfox.org/mozilla-central/source/browser/components/sessionstore/SessionStore.jsm#844
We only set it to true in response to a browser window opening: https://searchfox.org/mozilla-central/source/browser/components/sessionstore/SessionStore.jsm#1526

Shane Caraveo (:mixedpuppy)

Assignee

Comment 62

•

6 years ago

I meant to write more...but had lunch first. I also see that the session file is never written in this case. So there is something that I still don't see that is causing the session restore tab to sometimes appear. The file is getting written but because sessionstore is never started, sessionstore-final-state-write-complete is never sent either.

So, I never see sessionstore-final-state-write-complete in the file when running silent, which seems to indicate we should always get a session restore tab, but we don't always get it. There's some weird logic in deciding whether to show it.

However, what I see is that we would only ever write the session file if we've run a session. And in that case, we should have sessionstore-windows-restored in the file as well. So I'm thinking we should check both here:

https://searchfox.org/mozilla-central/rev/8b7aa8af652f87d39349067a5bc9c0256bf6dedc/browser/components/sessionstore/SessionStartup.jsm#227-228

this._previousSessionCrashed = checkpoints["sessionstore-windows-restored"] && !checkpoints[
  "sessionstore-final-state-write-complete"
];

Shane Caraveo (:mixedpuppy)

Assignee

Comment 63

•

6 years ago

The other presumption I make here is that -silent is essentially synonymous with any startup that does not run a full session. Perhaps that is passing a url over to an already running instance, or something else. In those cases, the already running session should typically close correctly, in which case this file will be fine. So perhaps we don't need to do anything (like I suggest in comment 62).

Dave Townsend [:mossop]

Comment 64

•

6 years ago

(In reply to Shane Caraveo (:mixedpuppy) from comment #63)

The other presumption I make here is that -silent is essentially synonymous with any startup that does not run a full session. Perhaps that is passing a url over to an already running instance, or something else. In those cases, the already running session should typically close correctly, in which case this file will be fine. So perhaps we don't need to do anything (like I suggest in comment 62).

That is not a valid assumption. -silent gets us all the way to running command line handlers which is about the last step before spinning the event loop to load the first window. There are a number of other cases that cause us to bail out earlier than that, everywhere from not even bringing up XPCOM (which wouldn't cause this kind of problem because async shutdown is never even initialised) all the way up to where -silent does it.

So I can reproduce the sessionstore page showing up with or without the existing patch applied with no apparent crashing going on which leads me to believe that it is likely a separate issue to this crash. I think we should land the patch I have and see what effect it has on the metrics and figure out the sessionstore issue separately. Do you disagree?

Flags: needinfo?(mixedpuppy)

Shane Caraveo (:mixedpuppy)

Assignee

Comment 65

•

6 years ago

(In reply to Dave Townsend [:mossop] (he/him) from comment #64)

Do you disagree?

No.

Flags: needinfo?(mixedpuppy)

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 66

•

6 years ago

(In reply to Shane Caraveo (:mixedpuppy) from comment #62)

I meant to write more...but had lunch first. I also see that the session file is never written in this case. So there is something that I still don't see that is causing the session restore tab to sometimes appear. The file is getting written but because sessionstore is never started, sessionstore-final-state-write-complete is never sent either.

That's interesting. Would that mean if sessionstore never gets started we also don't see a sessionstore-windows-restored notification, or others which depend on it? That's a situation which we have seen for Marionette for a very long time now. See bug 1382162. Nowadays we have a custom marionette-startup-requested observer notification but that also sometimes aren't getting fired. I wonder if those things are somewhat related.

Flags: needinfo?(mixedpuppy)

Pulsebot

Comment 67

•

6 years ago

Pushed by dtownsend@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/40f4f1079552 Make sure that quit-application-granted and quit-application notifications are always sent. r=froydnj

Shane Caraveo (:mixedpuppy)

Assignee

Comment 68

•

6 years ago

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #66)

(In reply to Shane Caraveo (:mixedpuppy) from comment #62)

I meant to write more...but had lunch first. I also see that the session file is never written in this case. So there is something that I still don't see that is causing the session restore tab to sometimes appear. The file is getting written but because sessionstore is never started, sessionstore-final-state-write-complete is never sent either.

That's interesting. Would that mean if sessionstore never gets started we also don't see a sessionstore-windows-restored notification, or others which depend on it?

Correct. -silent specifically works as a test case for me because it never starts a window, thus sessionstore also is never started.

Flags: needinfo?(mixedpuppy)

Cosmin Sabou [:CosminS]

Comment 69

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/40f4f1079552

Status: REOPENED → RESOLVED

Closed: 6 years ago → 6 years ago

status-firefox72: --- → fixed

Resolution: --- → FIXED

Phabricator Automation

Updated

•

6 years ago

Attachment #9105321 - Attachment is obsolete: true

Phabricator Automation

Updated

•

6 years ago

Attachment #9105446 - Attachment is obsolete: true

Shane Caraveo (:mixedpuppy)

Assignee

Comment 70

•

6 years ago

Comment on attachment 9106375 [details]
Bug 1513855: Make sure that quit-application-granted and quit-application notifications are always sent. r=froydnj

Beta/Release Uplift Approval Request

User impact if declined: Occasional crashes
Is this code covered by automated tests?: No
Has the fix been verified in Nightly?: No
Needs manual test from QE?: No
If yes, steps to reproduce: We only have one STR on osx for this, which the patch covers.
List of other uplifts needed: None
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): The patch adds an additional quit which doesn't do anything if quit was already done. Given the low crash rate on nightly, I'd like to uplift this and see if it has any effect on beta.
String changes made/needed: none

Attachment #9106375 - Flags: approval-mozilla-beta?

Shane Caraveo (:mixedpuppy)

Assignee

Updated

•

6 years ago

Attachment #9085311 - Flags: approval-mozilla-beta?

Shane Caraveo (:mixedpuppy)

Assignee

Updated

•

6 years ago

Attachment #9085311 - Flags: approval-mozilla-beta?

Pascal Chevrel:pascalc

Comment 71

•

6 years ago

Comment on attachment 9106375 [details]
Bug 1513855: Make sure that quit-application-granted and quit-application notifications are always sent. r=froydnj

Fix attempt for a high volume beta and release shutdown crash, uplift approved for 71 beta 10, let's see if this simple patch has an effect on crashes, thanks.

Attachment #9106375 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Narcis Beleuzu [:NarcisB]

Comment 72

•

6 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/551870e334f9

status-firefox71: affected → fixed

Cristian Fogel [:cfogel]

Comment 73

•

6 years ago

Marking as verified with the current builds: 71.0b10(20191113144733), 72.0a1 (2019-11-13) on Windows10, macOS 10.13, Ubuntu 18.04.
Can confirm that the startup restore_issue noticed is no longer encountered even after 2-3 -silent starts.

status-firefox71: fixed → verified

status-firefox72: fixed → verified

Ryan VanderMeulen [:RyanVM]

Comment 74

•

6 years ago

•

Edited

Initial data from 71.0b10 looks like there may indeed be a drop in crashes with this signature. We may want to keep this on the radar for ESR68 uplift as it's also showing a pretty high incidence rate. The patch grafts cleanly as-landed.

status-firefox70: fix-optional → wontfix

Ryan VanderMeulen [:RyanVM]

Comment 75

•

6 years ago

Dave, should we consider this for ESR68 uplift?

Flags: needinfo?(dtownsend)

Dave Townsend [:mossop]

Comment 76

•

6 years ago

Comment on attachment 9106375 [details]
Bug 1513855: Make sure that quit-application-granted and quit-application notifications are always sent. r=froydnj

ESR Uplift Approval Request

If this is not a sec:{high,crit} bug, please state case for ESR consideration: We're seeing a drop in the incidence of crashes from this fix.
User impact if declined: User may see crashes.
Fix Landed on Version: 70
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): The fix is very straightforward and just causes some additional notifications to be sent out on shutdown.
String or UUID changes made by this patch: None

Flags: needinfo?(dtownsend)

Attachment #9106375 - Flags: approval-mozilla-esr68?

Ryan VanderMeulen [:RyanVM]

Comment 77

•

6 years ago

Comment on attachment 9106375 [details]
Bug 1513855: Make sure that quit-application-granted and quit-application notifications are always sent. r=froydnj

Improves our AsyncShutdown crashes a bit. Approved for 68.4esr.

Attachment #9106375 - Flags: approval-mozilla-esr68? → approval-mozilla-esr68+

Ryan VanderMeulen [:RyanVM]

Comment 78

•

6 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-esr68/rev/9388fd235de5

status-firefox-esr68: affected → fixed

Bug 1513855 ensure we notify quit application when running silent 6 years ago Shane Caraveo (:mixedpuppy) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1513855 make addon manager block on profileChangeTeardown rather than applicationQuit 6 years ago Shane Caraveo (:mixedpuppy) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1513855 force quit if quit has not been called for shutdown 6 years ago Shane Caraveo (:mixedpuppy) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1513855: Make sure that quit-application-granted and quit-application notifications are always sent. r=froydnj 6 years ago Dave Townsend [:mossop] 47 bytes, text/x-phabricator-request	pascalc : approval-mozilla-beta+ RyanVM : approval-mozilla-esr68+	Details \| Review