Crash in [@ AsyncShutdownTimeout | profile-before-change | Sqlite.jsm shutdown blocker,SyncedBookmarksMirror: finalize]

RESOLVED FIXED in Firefox 68

Status

()

defect
--
critical
RESOLVED FIXED
2 months ago
15 days ago

People

(Reporter: jseward, Assigned: lina)

Tracking

(Blocks 1 bug, Regression, {crash})

unspecified
Firefox 68
x86
Windows 10
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox67 unaffected, firefox68 fixed)

Details

(crash signature)

Attachments

(1 attachment)

Reporter

Description

2 months ago

This bug is for crash report bp-1b9e62c8-f961-47fc-aba6-9041d0190510.

This (shutdown crash?) appears 6 times in 6 different installations
in the Windows nightlies of 20190509. The signature first appeared
in the 7 may nightlies, though.

Top 10 frames of crashing thread:

0 mozglue.dll mozalloc_abort memory/mozalloc/mozalloc_abort.cpp:33
1 xul.dll static void Abort xpcom/base/nsDebugImpl.cpp:439
2 xul.dll NS_DebugBreak xpcom/base/nsDebugImpl.cpp
3 xul.dll nsresult nsDebugImpl::Abort xpcom/base/nsDebugImpl.cpp:133
4 xul.dll NS_InvokeByIndex 
5 xul.dll static bool XPCWrappedNative::CallMethod js/xpconnect/src/XPCWrappedNative.cpp:1157
6 xul.dll static bool XPC_WN_CallMethod js/xpconnect/src/XPCWrappedNativeJSOps.cpp:943
7 xul.dll js::InternalCallOrConstruct js/src/vm/Interpreter.cpp:535
8 xul.dll static bool InternalCall js/src/vm/Interpreter.cpp:590
9 xul.dll static bool Interpret js/src/vm/Interpreter.cpp:3082

Reporter

Updated

2 months ago
Flags: needinfo?(mak77)
Assignee

Comment 1

2 months ago

This looks like a shutdown hang caused by the new bookmark sync, which we enabled by default on Nightly in bug 1548884. I'll take a look.

Blocks: 1433177
Component: Bookmarks & History → Places
Flags: needinfo?(mak77) → needinfo?(lina)
Product: Firefox → Toolkit
Regressed by: 1548884
Assignee

Comment 2

2 months ago

The async shutdown timeout is 1 minute; taking that long to close the connection is concerning, and makes me think we're blocked on some other operation. It's not something on the JS side, since Sqlite.jsm operations add their own shutdown blockers, and I'd expect something like SyncedBookmarksMirror: store to show up in the signature if it were.

So that leaves us with either merging or application as the culprit. Since we merge on the storage thread, the AsyncClose event gets queued behind that, and can't close the connection until the merge finishes. It would be nice if we could include how far we got in the hang report, maybe annotating it ourselves if we can't hook into the AsyncShutdown machinery.

One idea for a fix is to make merges interruptible. I'm inclined to push on that, especially since we've done that in the app services code. We can add something like this to Desktop, change mozISyncedBookmarksMirror::finalize to set an atomic counter, and have each turn of the loops in fetch_{local, remote}_{tree, contents} check that counter to see if it was interrupted. We could also make Dogear support interrupting merges.

This won't make application (update_local_items_in_places) interruptible, since that happens through triggers. If we're hanging because one of the triggers is taking too long, we'll need to use sqlite3_interrupt (already exposed in mozStorage, for read-only connections). But we can figure that out later, if it turns out to be the issue.

Assignee: nobody → lina
Status: NEW → ASSIGNED
Component: Places → Sync
Flags: needinfo?(lina)
Product: Toolkit → Firefox

This commit adds an AbortController to the bookmark merger that
aborts fetching and merging when the mirror is finalized on shutdown.

Even though we're in soft freeze, I think this is low-risk enough to land: the change surface area is small, we have good test coverage, it's pref'd off on Release, and I'd like to see the impact of this patch to decide if we need to do anything more.

Comment 6

Last month
Pushed by kcambridge@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3fae03e3595a
Abort merging when the synced bookmarks mirror is finalized. r=tcsc

Comment 7

Last month
bugherder
Status: ASSIGNED → RESOLVED
Closed: Last month
Resolution: --- → FIXED
Target Milestone: --- → Firefox 68
Regressions: 1551667

Looks like the min rust version should have been updated (maybe to 1.34.2) with this.
I had 1.34.0, but couldn't build after this change.

Flags: needinfo?(lina)

Hmm, sorry about that, it looks like I have 1.34.1 locally. I'm guessing our build machines also use a newer version, since this passed with a green Try run? In the future, is there a way for me to force building with the minimum required Gecko version, to avoid breakages like this?

Flags: needinfo?(lina)

(In reply to Lina Cambridge (she/her) [:lina] from comment #9)

Hmm, sorry about that, it looks like I have 1.34.1 locally. I'm guessing our build machines also use a newer version, since this passed with a green Try run? In the future, is there a way for me to force building with the minimum required Gecko version, to avoid breakages like this?

No problem and I don't know over forcing to build that way.
I guess froydnj might?

It seems to me that build servers should build with the minimum anyway and then it would always get caught.

Flags: needinfo?(nfroyd)

(In reply to Bob Owen (:bobowen) from comment #10)

(In reply to Lina Cambridge (she/her) [:lina] from comment #9)

Hmm, sorry about that, it looks like I have 1.34.1 locally. I'm guessing our build machines also use a newer version, since this passed with a green Try run? In the future, is there a way for me to force building with the minimum required Gecko version, to avoid breakages like this?

No problem and I don't know over forcing to build that way.
I guess froydnj might?

It seems to me that build servers should build with the minimum anyway and then it would always get caught.

We have jobs that (theoretically) build with the base versions of the toolchains we require, e.g.:

https://searchfox.org/mozilla-central/rev/11cfa0462a6b5d8c5e2111b8cfddcf78098f0141/taskcluster/ci/build/linux.yml#222

and configure should (theoretically) check for the version of Rust used in that job:

https://searchfox.org/mozilla-central/rev/11cfa0462a6b5d8c5e2111b8cfddcf78098f0141/build/moz.configure/rust.configure#144

But obviously 1.34.0 built Lina's change successfully, and looking at our Rust toolchains:

https://searchfox.org/mozilla-central/rev/11cfa0462a6b5d8c5e2111b8cfddcf78098f0141/taskcluster/ci/toolchain/rust.yml

everything else would have used 1.34.0 as well. So I'm a little mystified as to why 1.34.0 wouldn't have built things for you. I don't suppose you have the error message fresh at hand?

Flags: needinfo?(nfroyd)

Sorry Lina ... looks like this was a false alarm, probably because my active toolchain was an old version of nightly and bootstrap didn't pick that up ... maybe.
My guess is that nightly updated when I updated to 1.34.2 and that's why things started working for me locally.

No worries, thanks for following up!

Blocks: 1552621
You need to log in before you can comment on or make changes to this bug.