1395542 - Crash in AsyncShutdownTimeout | profile-before-change | Places Connection shutdown

Reporter

Description

•

7 years ago

This bug was filed from the Socorro interface and is 
report bp-d5246b0c-a246-4853-b4ce-497330170831.
=============================================================

Seen while looking at 55 crash stats: http://bit.ly/2eHfmrw. Currently in 55.0.3 it isn't very high ranking (#45), but it didn't have a bug associated with it.

Also affects 56 and 57.

Sylvestre Ledru [:Sylvestre]

Comment 1

•

7 years ago

Marco, I saw profile-before-change | Places Connection shutdown in the signature. I guess this is for you!

Flags: needinfo?(mak77)

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Comment 2

•

7 years ago

I've already seen it (I keep an eye on crash stats usually), but it's not actionable as-is.
It means Places started closing the connection, but something didn't allow that, after 1 minute async shutdown just forcibly crashed Firefox.
Unfortunately all the information in the metadata is "state":{"progress":{}}. Not really useful. There may be a bug in how we pass it, or in async shutdown itself, but I didn't find it so far and didn't have time to dig deeper.

One thing we should do, imo, is to add a circular buffer to async shutdown keeping the last N log messages, and push those along the async shutdown metadata. Async shutdown logging may help us identify what happened just before the crash.

Flags: needinfo?(mak77)

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Updated

•

7 years ago

Priority: -- → P5

Sylvestre Ledru [:Sylvestre]

Comment 3

•

6 years ago

Marco, why did you mark it as p5?

This is impacting a lot of users on release (~100k install at least).

status-firefox55: affected → wontfix

status-firefox56: affected → wontfix

status-firefox57: affected → wontfix

status-firefox59: --- → affected

status-firefox60: --- → affected

status-firefox61: --- → affected

Flags: needinfo?(mak77)

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Comment 4

•

6 years ago

Hm, it should have been a P3 :(
The problem here is that there's no clear cause nor solution to this problem for now, it's really cryptic.

Flags: needinfo?(mak77)

Whiteboard: [fxsearch]

Sylvestre Ledru [:Sylvestre]

Updated

•

6 years ago

Priority: P5 → P3

Takanori MATSUURA

Comment 5

•

4 years ago

Maybe Fission related. (DOMFissionEnabled=1)

Crash report: https://crash-stats.mozilla.org/report/index/45153595-2f48-4bc8-b76f-81fc40201116

MOZ_CRASH Reason: MOZ_CRASH()

Top 10 frames of crashing thread:

0 mozglue.dll mozalloc_abort memory/mozalloc/mozalloc_abort.cpp:33
1 xul.dll NS_DebugBreak xpcom/base/nsDebugImpl.cpp:435
2 xul.dll nsDebugImpl::Abort xpcom/base/nsDebugImpl.cpp:134
3 xul.dll XPTC__InvokebyIndex 
4  @0x1788a2056ef 
5 xul.dll trunc 
6 xul.dll trunc 
7 xul.dll static XPCWrappedNative::CallMethod js/xpconnect/src/XPCWrappedNative.cpp:1142
8 xul.dll XPC_WN_CallMethod js/xpconnect/src/XPCWrappedNativeJSOps.cpp:925
9 xul.dll js::InternalCallOrConstruct js/src/vm/Interpreter.cpp:599

BugBot [:suhaib / :marco/ :calixte]

Comment 6

•

2 years ago

The bug is linked to a topcrash signature, which matches the following criterion:

Top 20 desktop browser crashes on release

For more information, please visit auto_nag documentation.

Keywords: topcrash

BMO Automation

Updated

•

2 years ago

Severity: critical → S2

BugBot [:suhaib / :marco/ :calixte]

Comment 7

•

2 years ago

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash

BugBot [:suhaib / :marco/ :calixte]

Comment 8

•

2 years ago

The bug is linked to a topcrash signature, which matches the following criterion:

Top 20 desktop browser crashes on release

For more information, please visit auto_nag documentation.

Keywords: topcrash

Chris Bellini (:cbellini)

Comment 9

•

2 years ago

We will continue to monitor the number of crashes, and based on changes, we'll determine if we have enough/more details that could help pinpoint the location of crashes.

BugBot [:suhaib / :marco/ :calixte]

Comment 10

•

2 years ago

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Comment 11

•

2 years ago

Most of the crashes here are due to exporting bookmarks to html on shutdown:
https://searchfox.org/mozilla-central/rev/b6d0ad6ba24b3d075e357e52160cd8fc7f911ba9/browser/components/BrowserGlue.jsm#3273-3275
Likely these users have many bookmarks...
The browser.bookmarks.autoExportHTML pref is not enabled by default luckily, so it only affects users who manually flipped it on.

Not sure what we could do here, the requirement is to export the most up to date version of bookmarks, since html format is used to sync bookmarks with third party apps...
Likely most of the time is spent in PlacesUtils.promiseFaviconData(), we could maybe make this part of the initial database fetch, instead of having one query per bookmark.

BugBot [:suhaib / :marco/ :calixte]

Comment 12

•

2 years ago

The bug is linked to a topcrash signature, which matches the following criterion:

Top 20 desktop browser crashes on release

For more information, please visit auto_nag documentation.

Keywords: topcrash

BugBot [:suhaib / :marco/ :calixte]

Comment 13

•

1 year ago

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash

BugBot [:suhaib / :marco/ :calixte]

Comment 14

•

1 year ago

The bug is linked to a topcrash signature, which matches the following criterion:

Top 20 desktop browser crashes on release

For more information, please visit auto_nag documentation.

Keywords: topcrash

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Updated

•

1 year ago

Depends on: 1815464

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Updated

•

1 year ago

Blocks: 1819671

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Comment 15

•

1 year ago

Looking at various reports, it looks like many of these crashes are due to DeleteFileW taking too long when Sqlite tries to remove the -shm and -wal files. There is an option in Sqlite to persist those files, https://www.sqlite.org/c3ref/c_fcntl_begin_atomic_write.html#sqlitefcntlpersistwal and we could evaluate enabling it in Nightly/Early beta and check the effect.

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Updated

•

1 year ago

Depends on: 1820478

Avi Lugassi

Comment 16

•

1 year ago

when I'm running Automation test using Selenium and Firefox Web driver, I'm getting the next output can you please take a look on that:
WARNING: A blocker encountered an error while we were waiting.
Blocker: Waiting for ping task
Phase: TelemetryController: Waiting for pending ping activity
State: (none)
WARNING: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
WARNING: addBlocker@resource://gre/modules/AsyncShutdown.sys.mjs:727:15
addBlocker@resource://gre/modules/AsyncShutdown.sys.mjs:523:26
addBlocker@resource://gre/modules/AsyncShutdown.sys.mjs:458:15
setupShutdown@resource://gre/modules/osfile/osfile_async_front.jsm:1548:28
@resource://gre/modules/osfile/osfile_async_front.jsm:1568:16
@resource://gre/modules/osfile.jsm:12:30
@resource://gre/modules/TelemetryStorage.sys.mjs:10:28
_checkPendingPings@resource://gre/modules/TelemetrySend.sys.mjs:859:17
setup@resource://gre/modules/TelemetrySend.sys.mjs:799:18
setup@resource://gre/modules/TelemetrySend.sys.mjs:237:30
setupTelemetry/this._delayedInitTask<@resource://gre/modules/TelemetryControllerParent.sys.mjs:824:36
observe@resource://gre/modules/AsyncShutdown.sys.mjs:576:16

After i take a look on the code i can see that this Warning errors arrives when im using driver.get().quit() method

Flags: needinfo?(mak)

Avi Lugassi

Comment 17

•

1 year ago

(In reply to Avi Lugassi from comment #16)

when I'm running Automation test using Selenium and Firefox Web driver, I'm getting the next output can you please take a look on that:
WARNING: A blocker encountered an error while we were waiting.
Blocker: Waiting for ping task
Phase: TelemetryController: Waiting for pending ping activity
State: (none)
WARNING: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
WARNING: addBlocker@resource://gre/modules/AsyncShutdown.sys.mjs:727:15
addBlocker@resource://gre/modules/AsyncShutdown.sys.mjs:523:26
addBlocker@resource://gre/modules/AsyncShutdown.sys.mjs:458:15
setupShutdown@resource://gre/modules/osfile/osfile_async_front.jsm:1548:28
@resource://gre/modules/osfile/osfile_async_front.jsm:1568:16
@resource://gre/modules/osfile.jsm:12:30
@resource://gre/modules/TelemetryStorage.sys.mjs:10:28
_checkPendingPings@resource://gre/modules/TelemetrySend.sys.mjs:859:17
setup@resource://gre/modules/TelemetrySend.sys.mjs:799:18
setup@resource://gre/modules/TelemetrySend.sys.mjs:237:30
setupTelemetry/this._delayedInitTask<@resource://gre/modules/TelemetryControllerParent.sys.mjs:824:36
observe@resource://gre/modules/AsyncShutdown.sys.mjs:576:16

After I take a look on the code, I can see that those messages arrive when call the driver.quit() method.

Flags: needinfo?(mak)

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Updated

•

11 months ago

Depends on: 1834043

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Comment 18

•

9 months ago

Most of the crashes here are now related to:

operations when closing the database. Bug 1834043 will replace a file delete with a truncate, that ideally should be cheaper, though apparently some clients are still crashing on the truncate. I must assume it's really slow mechanical disks at this point.
wal journal syncing that is causing an fsync, and again the system IO in these cases seems particularly slow. There's also crashes in LocalStorage and IndexedDB, not just Places.

I'm lowering severity, because I'm not sure we can do much when the disk is just painfully slow.

Severity: S2 → S3

BugBot [:suhaib / :marco/ :calixte]

Comment 19

•

4 months ago

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash