Result of vacuuming is not checked in CreateOrMigrateSchema
Categories
(Core :: Storage: Cache API, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox88 | --- | fixed |
People
(Reporter: sg, Assigned: tt)
References
(Blocks 1 open bug)
Details
Attachments
(4 files)
The result of executing the VACUUM
operation is not checked at https://searchfox.org/mozilla-release/rev/4585f47ffd5244d276b33420fc30cd6647a0a37b/dom/cache/DBSchema.cpp#563-566. It can't simply be added, since then dom/cache/test/xpcshell/test_migration.js actually fails. It must be checked if this is indeed a bad test case, or if there is a general problem with schema migration.
Reporter | ||
Comment 1•4 years ago
|
||
Tom, can you give this a look? I am not sure how critical that is.
Assignee | ||
Comment 2•4 years ago
•
|
||
Assignee | ||
Updated•4 years ago
|
Reporter | ||
Comment 3•4 years ago
|
||
This probably brings up the question again (I think we discussed it for IndexedDB at some point, but the case for IndexedDB might be a bit different) if we need to support migrating such old profiles at all, or if we could just wipe the cache when we really encounter an old profile. That seems to be better than corrupting the profile...
Comment 4•4 years ago
|
||
The currently oldest supported version of Firefox per the top row of the "Past Branch Dates" in https://wiki.mozilla.org/Release_Management/Calendar is ESR78. But note that this is if we're talking about a user downloading a fresh install of Firefox. The auto-update mechanism has a concept of Watershed releases which are updates the user must pass through for migration purposes. That is, if you have a very old version of Firefox, the auto-updater won't directly give you the current release version, it will go [watershed 43, watershed 47, watershed 57, watershed 72]. The telemetry on this which is also where I'm finding the set of watershed versions is https://telemetry.mozilla.org/update-orphaning/.
It seems extremely reasonable for us to only support upgrades from, say, ESR68 from here on out since that would also cover the v72 watershed.
And as mentioned somewhat on slack, we want to be clearing the QuotaManager origin, not just Cache API storage. This is both something that's notionally mandated by spec (and in theory would include LocalStorage and cookies, but I think we'd explicitly want to avoid cookies) and something that is likely important to avoid breakage as sites will both in practice and due to spec assume that origin storage will be cleared coherently for these more advanced forms of storage.
Comment 5•4 years ago
|
||
And in terms of the implication of that, reviewing https://wiki.mozilla.org/DOM/QuotaManager/SchemaChanges (which unfortunately has ugly formatting resulting from the migration from etherpad not going great), this might suggest that if we see a QuotaManager storage.sqlite version that's not v2.1, that we delete the entirety of storage.sqlite, PROFILE/storage/ and any legacy PROFILE/indexeddb/ directory.
There's a minor edge-case there that we had the v3/v2.1 downgrade hack. It could make sense to treat v3 as explicitly broken/to be cleared since anyone past v57 shouldn't have a v3 QM.
Reporter | ||
Comment 6•4 years ago
|
||
Thanks for providing the reference to the watershed versions. I wasn't aware of that, but that's very useful and indeed seems to imply that we cannot hit the upgrade paths from pre-72/ESR68 versions. Which for the cache means that no upgrade paths are needed at all. Rather, as you suggested, we should just clear the QM origin in such a case. Probably a separate bug should be filed for that, and this can then be closed as WONTFIX.
Other quota clients and the QM scheme itself should be covered separately, I think.
Comment 7•4 years ago
•
|
||
Ok, but why we are still seeing very old QM storage upgrades for FF 84 ?
https://sql.telemetry.mozilla.org/queries/72503/source?p_version=Release%2084
Comment 8•4 years ago
|
||
If some users restore their profiles from an old backup, we should at least notify them (via a deprecation process) that such profiles are no longer supported.
Reporter | ||
Comment 9•4 years ago
|
||
(In reply to Jan Varga [:janv] from comment #7)
Ok, but why we are still seeing very old QM storage upgrades for FF 84 ?
https://sql.telemetry.mozilla.org/queries/72503/source?p_version=Release%2084
Maybe that's happening in some automation setting, which involves restoration of an ancient profile as you mentiong in comment 8?
Comment 10•4 years ago
|
||
Ok, so first of all, we need to know if automation could be involved. All our packaged profiles which are used for testing of the upgrades are used by xpcshell tests. I would be really surprised if we learned that xpcshell tests send the telemetry.
Comment 11•4 years ago
|
||
Oh, you probably didn't mean our automation (treeherder).
Reporter | ||
Comment 12•4 years ago
|
||
Back to the question of the result of vacuuming: in IndexedDB we are ignoring errors during vacuuming at https://searchfox.org/mozilla-central/source/dom/indexedDB/ActorsParent.cpp#14219, but we propagate errors at https://searchfox.org/mozilla-central/source/dom/indexedDB/ActorsParent.cpp#1029
Reporter | ||
Comment 13•4 years ago
|
||
(In reply to Jan Varga [:janv] from comment #11)
Oh, you probably didn't mean our automation (treeherder).
Right, I wasn't thinking of our automation, but someone else's, using a build from one of our channels. I hope that doesn't submit anything to telemetry, that would be very confusing. If it did, it would need to be an extra try/autoland channel.
Comment 14•4 years ago
|
||
(In reply to Jan Varga [:janv] from comment #8)
If some users restore their profiles from an old backup, we should at least notify them (via a deprecation process) that such profiles are no longer supported.
The existing product decision for critical data in a user's profile as established by Refresh Firefox does not include QuotaManager-managed data or LocalStorage. (This is also something we push via URL bar interventions and other mechanisms.) Which is to say, the profiles are supported and it's already policy that QuotaManager-managed storage is acceptable data to triage.
I'm not sure exactly what you mean by deprecation process. If you're proposing some kind of UI that's exposed to users, then it seems unlikely to get product and UX sign-off, especially if we expect that most of these profiles involve some kind of test automation relying on historical snapshotted profiles that are restored either in their entirety or partially. If you're talking about proposing a release note (via the "relnote-firefox" flag and generally following the docs in https://wiki.mozilla.org/Release_Management/Release_Notes), then that does make sense to me. If you're proposing intentionally supporting QM releases that pre-date Firefox ESR68 (which means v2.2) or watershed 72 then I don't think that makes sense without an explicit user story explicitly signed off on by product management that recognizes the complexity trade-offs.
Comment 15•4 years ago
•
|
||
So I understand the general situation with VACUUM to be this:
- For the Cache API, this failure is currently expected due to how we handle schema migration for existing databases which I believe ends up being buggy.
- In migrations we use ALTER TABLE to add the column and set it to be NOT NULL with a DEFAULT value. But this explicitly does not modify the contents of the table and I think we were assuming it did.
- We need to be running an UPDATE command here to actually be making the DEFAULT value the one we expect.
- Per SQLite docs on DEFAULT is only evaluated at insertion time.
- When
rewriteSchema
is set to true, after we perform the migrations, we clobber the schema to no longer have the DEFAULT via RewriteEntriesSchema. This is what will cause VACUUM to fail as we've now enacted constraints that will not hold.
- In migrations we use ALTER TABLE to add the column and set it to be NOT NULL with a DEFAULT value. But this explicitly does not modify the contents of the table and I think we were assuming it did.
- In all other cases, VACUUM potentially will be detecting corruption of the database because it ends up looking at every page and therefore effectively doing an integrity check both for file format errors as well and constraint errors. We should be handling this corruption and deleting the database.
- We probably want to ensure we have an exception for use of sqlite3_interrupt as we are planning to use that to speed up shutdown, I believe.
- It likely also makes sense to specially detect the NS_ERROR_STORAGE_CONSTRAINT here to ensure it's surfaced to telemetry as an indication of a potential application logic error. It might make sense to not delete the database unless we see the storage constraints at runtime from non-VACUUM purposes.
- In this case we might want to run a PRAGMA integrity_check which produces a bounded set of string rows that describe the constraint errors and which could be returned via telemetry events.
Comment 16•4 years ago
|
||
I meant a release note.
especially if we expect that most of these profiles involve some kind of test automation relying on historical snapshotted profiles that are restored either in their entirety or partially.
Well, that's only our expectation. I already mentioned this on slack:
FF 84 has 100% storage initialization success rate and 99.70% temporary storage initialization success rate. There are 534,714 successful UpgradeStorageFrom0_0To1_0 (which is quite old upgrade). If the upgrade is successful it will no longer be invoked during next session. This doesn't necessary mean that there are 0.5M affected users. In theory, every session could start with a restored old profile. So we probably need to write a new query that involves client_id.
As I said, I'm not opposed to the idea of getting rid of some old upgrades, but we need to understand the data better and make the decision once we are confident we understand it well.
Comment 17•4 years ago
|
||
We could implement an intermediate step regarding QM storage upgrades and improving the success rate even more. If the storage initialization fails in one of the upgrades for storage version < 2.2, then entire storage/ directory and storage.sqlite would be deleted.
Comment 18•4 years ago
|
||
Note that we will generate a new client id if it's not possible to load the id from PROFILE/datareporting/state.json and that logic seems to operate independently from the cached id in a preference although I haven't dug into the whole control flow there.
Here's the selenium webdriver Firefox profile logic in particular this logic.
Comment 19•4 years ago
•
|
||
(In reply to Jan Varga [:janv] from comment #17)
We could implement an intermediate step regarding QM storage upgrades and improving the success rate even more. If the storage initialization fails in one of the upgrades for storage version < 2.2, then entire storage/ directory and storage.sqlite would be deleted.
The v2.3 schema was introduced in Firefox 70 in bug 1563023. I think it'd be appropriate to wipe storage in any all-or-nothing version upgrade failure given that v70 pre-dates v72 for non-ESR and we would expect ESR68 to have already upgraded to ESR78. It would then be a subsequent enhancement to remove support for older upgrade versions and have them move to using the same storage{.sqlite} clearing mechanism.
In the end state, we'd end up with the startup check deciding on versions sorta like:
- Too Old: The storage is just too old, clear and start from scratch.
- Best Effort: All or Nothing: The storage is old but we'll still try and update the data, but if we can't, we'll clear and start from scratch.
- Current Migration: Clear Broken Origins: For all new upgrades from here on out on Beta and Release, we'll clear origins that fail to upgrade.
- Nightly Active Development: Retain Broken Origins: For recently landed upgrades on nightly perhaps we initially mark the upgrade as one that will not clear the origin, instead letting us ask for help investigating from people. This could perhaps trigger the tab to open up about:quotamanager at most once (ex: guarded by a pref that gets set to the new version upgrade that broke). The idea here is that a newly landed upgrade shouldn't clear everyone's data, instead giving the opportunity to back things out and/or address the problems. This then advances to the next situation:
- Nightly Stabilized: Clear Broken Origins: After we think we've successfully landed the upgrade on nightly or we're down to a long tail of errors, we land a minor change to switch to clearing the broken origins, but we still have the detailed _TRY telemetry logging so the problems are known.
I understand Jan's proposal above to be that we keep everything in "Best Effort" for now and basically nothing in "Too Old", but I think my general argument is that it's appropriate for binary size, general complexity, and general sanity reasons to start moving things into "Too Old" once we think the total clearing logic is sound.
Assignee | ||
Comment 20•4 years ago
|
||
Assignee | ||
Comment 21•4 years ago
|
||
Depends on D107237
Updated•4 years ago
|
Assignee | ||
Comment 22•4 years ago
|
||
Comment 23•4 years ago
|
||
Comment on attachment 9209052 [details]
bug-1687685.md
DATA COLLECTION REVIEW RESPONSE:
Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?
Yes.
Is there a control mechanism that allows the user to turn the data collection on and off?
Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.
If the request is for permanent data collection, is there someone who will monitor the data over time?
No. This collection will expire in Firefox 96.
Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?
Category 1, Technical.
Is the data collection request for default-on or default-off?
Default on for all channels.
Does the instrumentation include the addition of any new identifiers?
No.
Is the data collection covered by the existing Firefox privacy notice?
Yes.
Does there need to be a check-in in the future to determine whether to renew the data?
Yes. The DOM Necko-Worker-Storage team is responsible for renewing or removing the collection before it expires in Firefox 96.
Result: datareview+
Assignee | ||
Comment 24•4 years ago
|
||
Comment 25•4 years ago
|
||
Comment 26•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/bdfa69ff97f2
https://hg.mozilla.org/mozilla-central/rev/5a464a241fe6
Description
•