Closed Bug 1607902 Opened 6 years ago Closed 6 years ago

Assertions in TelemetryVFS.cpp with the latest SQLite snapshot build

Categories

(Core :: Storage: Quota Manager, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr68 --- fixed
firefox72 --- wontfix
firefox73 --- fixed
firefox74 --- wontfix

People

(Reporter: RyanVM, Assigned: mak)

References

()

Details

(Keywords: assertion)

Crash Data

Attachments

(1 file)

I recently ran the most recent SQLite snapshot (201912260110) build through Try and hit assertions in TelemetryVFS.cpp across a variety of test suites:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=283704856&repo=try&lineNumber=1396

INFO - Assertion failure: !*cursor, at /builds/worker/workspace/build/src/storage/TelemetryVFS.cpp:217

After discussing with mak and asuth, it was suggested to file a bug and NI janv to take a look. I believe that the next SQLite release isn't due until next month (and AFAIK, we don't have any strong need to update once available), so this isn't immediately urgent.

Flags: needinfo?(jvarga)

Yeah this needs to be investigated, maybe they slightly changed the format of zName. I'll take a look when I have time or I'll try to reassign to someone from our team.

I got the chance to quickly look into this.

It looks like the change was at this spot in this patch: https://github.com/mackyle/sqlite/commit/3c75186534d24f6fa5b9e58b10d22b8382633d6d#diff-c72d2a86ffd101b18ec363440ab734b0L4863 or at least somewhere around there.

As I understand the situation, our helper method DatabasePathFromWALPath at https://searchfox.org/mozilla-central/rev/ba4fab1cc2f1c9c4e07cdb71542b8d441707c577/storage/TelemetryVFS.cpp#170 exists because the implementation of sqlite3_uri_parameter assumes the memory layout described in that comment that's setup by SQLite's pager.c and that it's only being given the initial database name, not the zWal name, so that it can just keep strcmp-ing its way forward until it finds the key or finds a sentinel double-NUL that indicates the end of the parameters. (And this doesn't work for the zWal name because it's after the data in question.)

I'll reach out to the SQLite team for guidance and because I think there's the meta-issue that in many ways we'd prefer to not have to be tunneling stuff through URI's in the first place and it would be handy for the VFS to just be able to know what sqlite3* db it's being asked about.

Flags: needinfo?(jvarga)

Team SQLite will make sqlite3_uri_parameter work consistently across database name/WAL name/journal name. This will also work for ATTACH DATABASE URI filenames. So we can remove DatabasePathFromWALPath.

The revised snapshot is up. I pushed the revised snapshot and a simple patch for TelemetryVFS to try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=493d9d030cbad5e4a0ba43e5d157fce381c63c13

Linux distributions that build with system SQLite potentially mean that we'll need to make a more complicated version of the patch that performs runtime detction of the SQLite version. (Unless those distributions make sure to rebuild Firefox every time they rebuild SQLite and have a hard version dependency, in which case we can use ifdef's.)

This might look like leaving the method as it exists intact and having it check the version and early return zName if sqlite3_uri_parameter is known to have the new behavior. This patch would potentially want to be uplifted so that Firefox doesn't break when the new SQLite version gets rolled out by distributions.

Then when we can make it a hard build-time dependency to require 3.31.0, we can remove the method and bake the requirement into configure.in. Or maybe we can do that already on trunk and we should just uplift the runtime approach to beta?

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #4)

Then when we can make it a hard build-time dependency to require 3.31.0, we can remove the method and bake the requirement into configure.in. Or maybe we can do that already on trunk and we should just uplift the runtime approach to beta?

We already do set a minimum version in configure.in.
https://searchfox.org/mozilla-central/source/old-configure.in#65

Right. My concern is builds that already exist for released versions of Firefox that express a minimum dependency on SQLite (but not a max). In that case, they'll start crashing if they get 3.31.0. It seems safe for us to just bump the configure.in version on trunk as we'll just be expressing a new minimum version. But for those older versions, something needs to be done.

Ah, I understand now. Realistically, given SQLite's release cycle and ours, the next update is very likely to land during the Fx75 nightly cycle. So maybe we land a patch now with runtime check (and uplift to 73 since it's effectively a no-op anyway) and then we can remove the check in 75 when the SQLite update lands. I don't think we need to worry about ESR as I think it's reasonable to assume the distros using an ESR Firefox are using a pinned SQLite version too.

Priority: -- → P2

The assumption that firefox-esr will use old sqlite is false at least on debian (and thus probably ubuntu). I use firefox-esr and have been hit this morning by the bug and given the number of duplicates I'm not alone.

BTW: it affects thunderbird too.

Blocks: SQLite3.31.1

Jan, do you have cycles to implement the patch from comment 4 in time to uplift to Fx73/68.5esr shipping in a bit over a week? Then we can land the 3.31.0 on m-c during the Fx75 Nightly cycle.

Flags: needinfo?(jvarga)
Crash Signature: [@ DatabasePathFromWALPath]

32-bit builds have a different signature.

Crash Signature: [@ DatabasePathFromWALPath] → [@ DatabasePathFromWALPath] [@ (anonymous namespace)::DatabasePathFromWALPath]

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #6)

My concern is builds that already exist for released versions of Firefox that express a minimum dependency on SQLite (but not a max). In that case, they'll start crashing if they get 3.31.0.

If these things keep happening, we should really re-evaluate support for system Sqlite. For how much I'd prefer to keep it working, it has caused more than one problem in these years, and considered reduced resources we have, it becomes a burden to maintain. If we end up losing users due to these issues, the cost largely overweights the benefits.

I'm wondering if the sqlite change that triggered this could be reverted, given it effectively breaks ABI firefox relies on?

Flags: needinfo?(drh)

Yeah, I can work on this, but it seems we need a decision here first.

Flags: needinfo?(jvarga)
Blocks: 1611386

Sqlite 3.31.0 changed the zName layout, breaking our custom parsing of it.
Due to this the VFS started crashing on Sqlite library upgrade.

This fixes the parser, but only for Firefox versions it can be uplifted to,
we can't fix versions already released, that may start crashing once system
Sqlite gets updated to a newer version.

Assignee: nobody → mak
Status: NEW → ASSIGNED

I had some spare time to make a patch, though this will only fix the problem on versions we can uplift to, we can't fix already released versions :(
comment 13 is the only plausible solution, for the future I filed bug 1611386 to finally evaluate dropping system Sqlite,but that's still not a solution to this problem.

(In reply to Julien Cristau [:jcristau] from comment #13)

I'm wondering if the sqlite change that triggered this could be reverted, given it effectively breaks ABI firefox relies on?

The change was to fix a bug. And it has already been released. We cannot easily call it back.

Furthermore, I'm hesitant to call this an ABI. FF was relying on undocumented implementation details concerning where in memory SQLite was storing certain filenames. The locations where we store filenames was never intended to be an "interface". SQLite (like all other software) should be free to organize its internal data in whatever way best accomplishes its mission.

We did not deliberately set out to break FF. We were fixing another, unrelated bug.

After asuth brought FF's perusal of our internal data structures to our attention, we provided new APIs that enable FF to find the information it needs about the relationship between the various filenames in a way that is documented, supported, and that does not rely on random implementation details. We were under the impression that this was sufficient to resolve the concern and so we continued with the release.

Flags: needinfo?(drh)

I understand this was not intended as ABI, but effectively if users rely on it that's what it is...

Very well. We will attempt to revert the internal storage of filenames to the legacy arrangement that FF expects and publish SQLite version 3.31.1. It might be Monday or later before we can get that out - assuming we can get it to work, which is yet to be demonstrated.

(In reply to Julien Cristau [:jcristau] from comment #18)

I understand this was not intended as ABI, but effectively if users rely on it that's what it is...

What we were doing was very far from an ABI. I'm unclear on why the enhancement that's implemented as of SQLite 3.3.1.0 wasn't something we requested at the time of writing DatabasePathFromWALPath given that I think we were a member of the SQLite consortium and I think had already interacted with the SQLite developers in the creation of the quota layer (inferring from SQLite example quota VFS code). I presume the decision had something to do with Firefox OS schedule pressure, something that resulted in a lot of corner cutting.

I believe the following is the code that was reverse-engineered for us to depend on:

  /* Fill in the Pager.zFilename and Pager.zJournal buffers, if required. */
  if( zPathname ){
    assert( nPathname>0 );
    pPager->zJournal =   (char*)(pPtr += nPathname + 1 + nUri);
    memcpy(pPager->zFilename, zPathname, nPathname);
    if( nUri ) memcpy(&pPager->zFilename[nPathname+1], zUri, nUri);
    memcpy(pPager->zJournal, zPathname, nPathname);
    memcpy(&pPager->zJournal[nPathname], "-journal\000", 8+2);
    sqlite3FileSuffix3(pPager->zFilename, pPager->zJournal);
#ifndef SQLITE_OMIT_WAL
    pPager->zWal = &pPager->zJournal[nPathname+8+1];
    memcpy(pPager->zWal, zPathname, nPathname);
    memcpy(&pPager->zWal[nPathname], "-wal\000", 4+1);
    sqlite3FileSuffix3(pPager->zFilename, pPager->zWal);
#endif
    sqlite3DbFree(0, zPathname);
  }

I think the overarching problem we're encountering is that distribution builds, at least Debian, don't seem to run any of our tests before shipping updated versions of Firefox or its dependencies. I don't think this has historically been a major problem, but the recent bug 1601707 was a GCC miscompilation that was caught by our tests (ex: bug 1607001 although since we only use gcc in code coverage builds, it was tier 2) but ended up shipping to Debian users AIUI.

Can somebody please verify that check-in https://www.sqlite.org/src/info/34ab760689fd493e fixes the problem with legacy versions of FF linking against newer versions of SQLite?

On the page linked above, beside the "Downloads:" label, there is a "Tarball" link that will give you a complete source tarball for SQLite. Then do "./configure && make sqlite3.c" to get an appropriate SQLite amalgamation for incorporation in the FF source tree.

If and when you confirm that the changes fix the problem, we will release the revised code as SQLite version 3.31.1.

(In reply to D. Richard Hipp from comment #21)

Can somebody please verify that check-in https://www.sqlite.org/src/info/34ab760689fd493e fixes the problem with legacy versions of FF linking against newer versions of SQLite?

On the page linked above, beside the "Downloads:" label, there is a "Tarball" link that will give you a complete source tarball for SQLite. Then do "./configure && make sqlite3.c" to get an appropriate SQLite amalgamation for incorporation in the FF source tree.

If and when you confirm that the changes fix the problem, we will release the revised code as SQLite version 3.31.1.

I've tested with thunderbird 68.4.2-1 on Archlinux x86_64 with testing repositories enabled, where Thunderbird crashed reproducibly when sqlite 3.31.0-1 from the testing repository was installed. Replacing said sqlite package with a custom one built from sqlite checkout 34ab760689fd493eda482e856047708d74e769a01cc90b69da456d79ffe39aea fixes the issue for me.

Thanks as lot for caring about this and for providing a fix so quickly! :-)

I have built SQLite 3.31.0 (by using the PKGBUILD for version 3.31.0-1 from the testing repository of Arch Linux) with the patch from the mentioned commit and tested it by starting Thunderbird, which didn't crash anymore.

Results look good, thanks for the quick patch!

Attachment #9122857 - Attachment description: Bug 1607902 - Fix Sqlite VFS zName parsing for versions >= 3.31.0. r=asuth → Bug 1607902 - Fix Sqlite VFS zName parsing for Sqlite versions >= 3.31.0 (uplift only patch). r=asuth
No longer blocks: SQLite3.31.1
Regressed by: SQLite3.31.1

I filed bug 1611925 for Nightly, that required Sqlite 3.31.1. This bug will take care of uplifting a fix to Beta/ESR, to cover for the edge case of using Sqlite 3.31.0 and so that potentially the Sqlite team may be able to remove this requirement sooner (it will still take some time before the affected versions have low enough usage).

Comment on attachment 9122857 [details]
Bug 1607902 - Fix Sqlite VFS zName parsing for Sqlite versions >= 3.31.0 (uplift only patch). r=asuth

Beta/Release Uplift Approval Request

  • User impact if declined: Crashing
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Sqlite newer version supports natively our needs, so on those versions we just need a single API call. On previous Sqlite versions we keep running the usual code. This code is covered by various automated tests.
  • String changes made/needed:

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Same as beta, ESR should not crash if Sqlite 3.31.0 is used, and we should try to address the problem sooner, so that Sqlite team can remove the workaround sooner in the future.
  • User impact if declined: Crashing.
  • Fix Landed on Version: 73
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Sqlite newer version supports natively our needs, so on those versions we just need a single API call. On previous Sqlite versions we keep running the usual code. This code is covered by various automated tests.
  • String or UUID changes made by this patch:
Attachment #9122857 - Flags: approval-mozilla-esr68?
Attachment #9122857 - Flags: approval-mozilla-beta?
Blocks: SQLite3.31.1
Keywords: regression
No longer regressed by: SQLite3.31.1

Comment on attachment 9122857 [details]
Bug 1607902 - Fix Sqlite VFS zName parsing for Sqlite versions >= 3.31.0 (uplift only patch). r=asuth

Improves compatibility with various SQLite releases. Approved for 68.5esr and 73.0RC1.

Attachment #9122857 - Flags: approval-mozilla-release+
Attachment #9122857 - Flags: approval-mozilla-esr68?
Attachment #9122857 - Flags: approval-mozilla-esr68+
Attachment #9122857 - Flags: approval-mozilla-beta?
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED

Jan, it looks like Quota handling is quite different on ESR68; I asked Ryan to ni you because I don't know much about those differences. Any help is welcome.

I commented in phabricator.

Flags: needinfo?(jvarga)

Updated patch for ESR68 in Phabricator, thanks Jan!

Note that this missed the 68.5esr RC build, so unless there's an RC respin or 68.5.1esr dot release, this won't ship until 68.6esr.

FYI, we're respinning the 68.5esr RC build, so this should be included in the final release.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: