places.sqlite migration fails if an application has never used the bookmarks service

RESOLVED FIXED in Firefox -esr52

Status

()

P1
normal
RESOLVED FIXED
2 years ago
a year ago

People

(Reporter: mak, Assigned: mak)

Tracking

Trunk
mozilla55
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(thunderbird_esr45 unaffected, thunderbird_esr52? affected, firefox52 unaffected, firefox-esr52 fixed, firefox53 unaffected, firefox54 fixed, firefox55 fixed)

Details

(Whiteboard: [fxsearch])

Attachments

(1 attachment)

(Assignee)

Description

2 years ago
[Tracking Requested - why for this release]:

[Tracking Requested - why for this release]:

[Tracking Requested - why for this release]:

[Tracking Requested - why for this release]:

Thunderbird hit a schema migration problem due to the fact it doesn't use the bookmarks service.
In particulare migrateV35Up() tries to create a mobile root, and in doing so it tries to read previous roots to position the new one.
That's not going to work in TB because there are no roots at all, nothing ever created them.

We need to also add a schema migration test for this case.
(Assignee)

Updated

2 years ago
Assignee: nobody → mak77
Status: NEW → ASSIGNED
(Assignee)

Comment 1

2 years ago
Actually the fact we cannot remove a corrupt database file is a consequence of various facts:
1. some of the migrators run async statements because they are expensive operations
2. but we invoke Close() because the database is corrupt and we need to delete it synchronously and hand back a working one to the consumer, so we cannot wait for asyncClose
3. but Close throws because we didn't invoke asyncClose
4. the connection doesn't get closed, the file is locked and cannot be removed, everyone is sad

To fix that, the only way would be to detect close failing, try asyncClose with event loop spinning. It's not great, we could even decide to do that in mozStorage for everyone, so MOZ_ASSERT if someone calls close instead of asyncClose, and fallback to asyncClose with spinning. It's not great but the opposite (leaving the connection open) is not great as well. We could also still throw, but after closing the connection and then Places could ignore the error and assume close worked.

Andrew, what do you think, could we asyncClose() and spin in mozStorage close() method if asyncClose was not invoked?
Flags: needinfo?(bugmail)
(In reply to Marco Bonardo [::mak] from comment #1)
> Andrew, what do you think, could we asyncClose() and spin in mozStorage
> close() method if asyncClose was not invoked?

Building on your proposal, I suggest:

- Add a new method called "spinningSynchronousClose()" that does what you propose and making it very clear what's going on to anyone copying or reading the code.  This does not assert.  Especially in a WebExtensions-only world where front-end/back-end development are hopefully now more integrated, I think it's reasonable to trust callers to either know what they're doing or trust that we should be able to fix the problem in-tree if there's mis-use.

- In event of close() where an asyncClose() ideally would have happened, do a MOZ_DIAGNOSTIC_ASSERT() that will explode in both debug and release builds for nightly (and previously in aurora) but not beta/release, calling spinningSynchronousClose() if we survive the explosion.
Flags: needinfo?(bugmail)
Uh, and we may want to do the XPConnect stack dump trick proposed in the other bug too, or maybe delay being so aggressive about the MOZ_DIAGNOSTIC_ASSERT until such time as we're in WebExtesions-only land.  Realistically, if there are misbehaving extensions out there right now, they're not super likely to get massively overhauled right before they're retired.
(Assignee)

Comment 4

2 years ago
(In reply to Andrew Sutherland [:asuth] from comment #2)
> - Add a new method called "spinningSynchronousClose()" that does what you
> propose and making it very clear what's going on to anyone copying or
> reading the code.

You mean a public API, right?

> - In event of close() where an asyncClose() ideally would have happened, do
> a MOZ_DIAGNOSTIC_ASSERT() that will explode in both debug and release builds
> for nightly (and previously in aurora) but not beta/release, calling
> spinningSynchronousClose() if we survive the explosion.

SGTM, I will file a separate storage bug for this.
(Assignee)

Updated

2 years ago
Depends on: 1355561
(In reply to Marco Bonardo [::mak] from comment #4)
> You mean a public API, right?

Yes.
Comment hidden (mozreview-request)
(Assignee)

Comment 7

2 years ago
Let's start taking this, I'll work on bug 1355561 separately since it's different issues, even if they happen in related situations.
This should be uplifted to Thunderbird ESR52. It's less likely to affect Firefox since every profile touches bookmarks on startup.
(Assignee)

Comment 8

2 years ago
note that bug 1355561 is instead more critical for Firefox, since some of our users may be stuck in a corrupt database loop. This would also explain some of the reports I've seen in the last years, where we were expecting a corrupt db to be replaced, but it didn't happen.
(Assignee)

Comment 9

2 years ago
ehr, on the other side, thanks to the forced crash, this one will also fix a db corruption loop.
With bug 1355561 we could potentially avoid the forced crash.

Comment 10

2 years ago
mozreview-review
Comment on attachment 8857342 [details]
Bug 1355414 - places.sqlite schema migration fails if an application has never used the bookmarks service.

https://reviewboard.mozilla.org/r/129328/#review132046

::: toolkit/components/places/Database.cpp:644
(Diff revision 1)
> +Database::ForceCrashAndReplaceDatabase()
> +{
> +  Preferences::SetBool(PREF_FORCE_DATABASE_REPLACEMENT, true);
> +  // We could force an application restart here, but we'd like to get these
> +  // cases reported to us, so let's force a crash instead.
> +  MOZ_CRASH("places.sqlite is corrupt and locked, app can't work properly");

Since there are three potential causes for crash (the three call sites), wouldn't it be useful to pass a reason parameter to ForceCrashAndReplaceDatabase so it can get appended to the crash signature? That would inform what further mitigations we might consider.

::: toolkit/components/places/Database.cpp:1926
(Diff revision 1)
>  Database::MigrateV35Up() {
>    MOZ_ASSERT(NS_IsMainThread());
>  
>    int64_t mobileRootId = CreateMobileRoot();
> -  if (mobileRootId <= 0) return NS_ERROR_FAILURE;
> +  if (mobileRootId <= 0)  {
> +    // Either the schema is broken or there isn't any root. This can happen if

"This can happen" -> "The latter can happen"

It's a bit ambiguous what 'this' refers to ("broken schema" or "no root").
Attachment #8857342 - Flags: review?(past) → review+
Comment hidden (mozreview-request)

Comment 12

2 years ago
Pushed by mak77@bonardo.net:
https://hg.mozilla.org/integration/autoland/rev/a121d68c6eec
places.sqlite schema migration fails if an application has never used the bookmarks service. r=past

Comment 13

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/a121d68c6eec
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
(Assignee)

Comment 14

2 years ago
Comment on attachment 8857342 [details]
Bug 1355414 - places.sqlite schema migration fails if an application has never used the bookmarks service.

Approval Request Comment
[Feature/Bug causing the regression]: new Sync mobile root in Firefox 51
[User impact if declined]: In case of a places.sqlite corruption the user is stuck with the corrupt database, breaking a good part of the browser features. For other apps like thunderbird this fixes an almost certain corruption.
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: no
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no
[Why is the change risky/not risky?]: this touches an old migration we already went through, and code that is only touched in case of a corrupt database.
[String changes made/needed]: none
Attachment #8857342 - Flags: approval-mozilla-esr52?
Attachment #8857342 - Flags: approval-mozilla-aurora?
(Assignee)

Comment 15

2 years ago
do we need any other uplifts for TB? Will you take care of those?
Flags: needinfo?(jorgk)

Comment 16

2 years ago
If it gets uplifted to M-ESR52, then I don't have to do anything. If that is declined, I'll put it onto the TB release branch.

Thanks for addressing it quickly!
Flags: needinfo?(jorgk)
Depends on: 1356812

Updated

2 years ago
status-firefox54: unaffected → affected
Comment on attachment 8857342 [details]
Bug 1355414 - places.sqlite schema migration fails if an application has never used the bookmarks service.

Fix a database corrupt issue related to bookmark sync. Aurora54+.
Attachment #8857342 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
status-firefox55: unaffected → fixed
status-firefox-esr52: --- → affected
Flags: in-testsuite+

Comment 18

2 years ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-aurora/rev/f73bb141ba46
status-firefox54: affected → fixed
(Assignee)

Comment 19

2 years ago
Hm, I actually wonder if it would be better to add a call to savePrefFile() before crashing, I fear the pref file may not be saved otherwise... it sounds like it would be safer.
I'll probably land that as a followup changeset here, it would be a safe change also for branches.
Woudl that be fine with you or do you prefer a separate bug?
Flags: needinfo?(gchang)
Hi ::mak,
Can you file a separate bug to track?
Flags: needinfo?(gchang) → needinfo?(mak77)
(Assignee)

Updated

2 years ago
Depends on: 1357366
(Assignee)

Comment 21

2 years ago
filed bug 1357366.
Flags: needinfo?(mak77)
(Assignee)

Comment 22

2 years ago
landing this on ESR52, will also require landing the follow-up in bug 1357366.
that follow-up bug should also land in 54, since this one already landed there.
Comment on attachment 8857342 [details]
Bug 1355414 - places.sqlite schema migration fails if an application has never used the bookmarks service.

Corrupt SQl.lite database is a severe problem, also Thunderbird team needs this, ESR52.2+
Attachment #8857342 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+

Comment 24

a year ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-esr52/rev/28e09d4ac3e9
status-firefox-esr52: affected → fixed
You need to log in before you can comment on or make changes to this bug.