Closed Bug 1635348 Opened 4 years ago Closed 4 years ago

Migrate browser.sync.storage data in a sensible way given quotas are not currently enforced.

Categories

(WebExtensions :: Storage, task, P2)

task

Tracking

(firefox78 fixed, firefox79 fixed)

RESOLVED FIXED
mozilla79
Tracking Status
firefox78 --- fixed
firefox79 --- fixed

People

(Reporter: markh, Assigned: markh)

References

Details

(Whiteboard: SACI)

Attachments

(1 file, 1 obsolete file)

As reported in bug 1311516, the current browser.storage.sync implementation doesn't implement client side quotas. This means that it's very likely that some extensions have exceeded the nominal quotas imposed by the API.

The new browser.storage.sync implementation will enforce client-side quotas - which begs the question of how we should migrate data which exceeds the quota. I see the following options for extensions which exceed it:

  • Migrate nothing.
  • Migrate everything (ie, ignore the quotas) and accept that after migrations, extensions attempting to add new data might fail due to exceeding the quotas - which is roughly what this github issue describes for syncing, where the quota might be temporarily exceeded.
  • Migrate what we can up to the quota and just silently ignore data once we hit it. As described in the issue above, this doesn't seem great as the dropped data will be arbitrary.
  • Disable the quota entirely! Or maybe change it to a much larger value and then choose one of the above options?

@rpl, thoughts?

Flags: needinfo?(lgreco)

(In reply to Mark Hammond [:markh] [:mhammond] from comment #0)

As reported in bug 1311516, the current browser.storage.sync implementation doesn't implement client side quotas. This means that it's very likely that some extensions have exceeded the nominal quotas imposed by the API.

The new browser.storage.sync implementation will enforce client-side quotas - which begs the question of how we should migrate data which exceeds the quota. I see the following options for extensions which exceed it:

  • Migrate nothing.
  • Migrate everything (ie, ignore the quotas) and accept that after migrations, extensions attempting to add new data might fail due to exceeding the quotas - which is roughly what this github issue describes for syncing, where the quota might be temporarily exceeded.
  • Migrate what we can up to the quota and just silently ignore data once we hit it. As described in the issue above, this doesn't seem great as the dropped data will be arbitrary.
  • Disable the quota entirely! Or maybe change it to a much larger value and then choose one of the above options?

@rpl, thoughts?

I've been thinking a bit about this over the last few days, and personally I think that the most reasonable one between the above options seems to be the second one:

  • Migrate everything even if over quota, and let "attempting to add more data" to fail until the extension is back into quota

we were also discussing recently about a similar scenario that may happen in the future related to storage.local, if the extension removes the unlimitedStorage permission in an update and the data already stored would be over quota (1), and Philipp expressed himself in favor on this kind of behavior in that context (as well as Shane and myself).

I'm needinfo-ing Shane and Philipp in case they want to add their thoughts or want to think about it.

(1): at the moment IndexedDB (the backend we use for storage.local) keeps the database as persisted even if the user does remove the persistent permission (and so the quota isn't enforced until the database is destroyed and recreated from scratch), but there was some discussion in Bug 1558478 about changing that (see also our comments related to that in Bug 1558478 comment 18 and comment 19)

Severity: -- → N/A
Flags: needinfo?(philipp)
Flags: needinfo?(mixedpuppy)
Flags: needinfo?(lgreco)

Related to the "storage.sync data over-quota" (and/or the general "storage.sync data fails to sync" in general), something that the current implementation isn't doing right now (and the extensions can't neither do it on their own) is:

"make the user aware when the storage.sync data from the extensions their are using are failing to be synced"

At the moment this is a silent error and so there is a chance that it may not even be reported as much as it is actually happening.

When it is reported, to investigate it we had to resort to "asking the users to look to about:sync-log", which can be "not exactly simple" for non technical user.

It would be great to improve that in the future, e.g. an idea could be to make sure that we keep track of that and then show a warning in about:addons on the addon card (and/or a badge on the Firefox Sync toolbar button and something in the related popup menu to allow the user to know more).

This can be something that we may discuss further in a follow up, but I thought to mention it in this bug while we are discussing a topic which is related to that.

Priority: -- → P2

(In reply to Luca Greco [:rpl] [:luca] [:lgreco] from comment #3)

Related to the "storage.sync data over-quota" (and/or the general "storage.sync data fails to sync" in general), something that the current implementation isn't doing right now (and the extensions can't neither do it on their own) is:

"make the user aware when the storage.sync data from the extensions their are using are failing to be synced"

We actually have the same problem when syncing - imagine 2 devices have a "full" quota, but with different keys. When syncing, the resulting data will be over quota. It's roughly the same problem as migration.

Our intention is to not enforce any quotas when syncing. Thus, whether migration or syncing itself causes local data to go over the quota limit, we will continue to sync - it's just that the extension will fail to add new data.

(in practice, the sync server enforces a limit of the record, which is 2MB IIRC, so we might need to work out what to do around that size too)

It would be great to improve that in the future, e.g. an idea could be to make sure that we keep track of that and then show a warning in about:addons on the addon card (and/or a badge on the Firefox Sync toolbar button and something in the related popup menu to allow the user to know more).

We've found that sort of notification is fairly hostile to the user when we used them for, eg, sync outages - we found it just creates anxiety as there's no action they can personally take other than to disable the feature/addon. But yeah, let's cross that bridge when we come to it.

Assignee: nobody → markh
Status: NEW → ASSIGNED
Whiteboard: SACI

Migrate everything even if over quota is the only reasonable approach I think.

Flags: needinfo?(mixedpuppy)
Attachment #9146078 - Attachment is obsolete: true

FTR, given the discussions around quotas, and performance concerns around implementing the migration using the public APIs, I ended implementing the migration in rust here. So while the quotas aren't enforced, I did enforce some sanity checks - if the payload ends up being larger than 20MB, then the sync server is going to reject the data anyway - so the limits enforced by the migration are 1.5MB of payload and 2k keys. We'll take everything up until that limit for an extension, so no extensions will end up with nothing migrated.

I'll be pushing up patches for this shortly.

Depends on D76093

Luca raised some good points in phabricator, which I thought I'd bring back here because it really is more of a "policy" question.

First, I'll describe how things work in the current patch:

  • The migration itself doesn't clear any data from the "target" database, nor does it consider what is already in the database. So if you migrate twice, then assuming the source database is identical, you should end up with a target database that's also identical - however, everything was re-migrated in that case (ie, the second migration was not a no-op). If the source database changed between the first and second migrations, you will end up with different data - extension data that only existed in the first migration will remain, extension data that existed in both but was changed in the second will end up with only the second. This is all in the rust code here.

  • The JS side of the world determines if a migration should be done (and just passes the filename of the source database if so). The attached phabricator patch as written uses a preference to indicate if a migration needs to happen, and it sets that preference to false before it even attempts the migration, explicitly so failure to migrate never retries, which is to handle one particular problem scenarion.

I guess there are 2 prongs to the problem scenarios:

  • Attempting to migrate every time the API is used for the first time will be expensive for the case of a permanent failure. For example, if the source database is corrupt we can expect every attempt to fail. On the other hand, failure scenarios like "disk is full" might be expected to work at some point in the future - but possibly not the next attempt (ie, a simple "try migration twice" isn't necessarily going to help)

vs

  • If we allow any retries of the migration, we run the risk of overwriting new data with old data. For example, consider the following sequence:
    • Migration fails, so extensions find no data - this is obviously data-loss.
    • Extension writes current data, which ends up in the new db.
    • At some point in the future, disk space is cleared, so migration succeeds.
    • Extension now finds "old" data it wrote in the past, not the data it most recently wrote - this is a second data-loss event

So my code as written is opinionated towards "let's try and avoid 2 data-loss events" - which is a possible scenario if we ever attempt to migrate after we've made the API available to extensions. However, the addons team owns this policy decision, so let's discuss what to actually do!

(I'm only needinfo-ing Luca here in the expectation he'll discuss this with Shane and/or anyone else in the team.)

Flags: needinfo?(lgreco)
Flags: needinfo?(philipp)

Luca and I discussed this today for a while. We have one prime example that causes concern. I'll describe the basic use case we talked through and how failures could have cascading problems. This is one example, it is unknown whether there are other comparable examples, but if we've done it, there probably are. Because this example is ours, we could probably mitigate the issues, but we are very concerned about unknowns.

Multi Containers addon, a cascading failure

The mozilla multi accounts containers addon is used to create a set of containers, and to automatically open specific sites into specific containers.

As a user of the addon, I choose to put a bunch of sites into a few different containers. Those sites are creating data stores, cookies, various stuff that remains persistent across restart.

The addon uses storage.sync to share the configuration across devices.

When the user deletes a container, all associated data in that container is deleted. That is cookies, local storage, etc.

I haven't fully followed the logic in this addon, but it does have code to remove containers in its sync storage code.

Migration Failure

We attempt to migrate data and are partially successful, or fail, but continue with the new implementation. The containers addon sees that some set of containers no longer exists in the storage sync area, and decides to delete those containers assuming the user did the removal. All data associated with the container is gone.

Migration success, but the user decides to flip the preference

We have a successful migration. The user creates new containers and starts using them, storage is created. Later (days/weeks/?) some unknown issue happens with the new implementation, so the user flips the pref to return to using kinto. When Firefox restarts, the addon deletes those containers because they are not in the storage.sync dataset.

As a part of our discussion, summarized in comment 10, we have three points that we think should be done/pursued, but we of course want to discuss this with you @markh before they become some set of requirements.

  1. full migration is the only success
  2. we want to fallback to kinto in readonly mode (actual server sync not necessary) if there is a failure
  3. user switching the pref off/on needs to be handled somehow

(In reply to Shane Caraveo (:mixedpuppy) from comment #11)

  1. full migration is the only success
  2. we want to fallback to kinto in readonly mode (actual server sync not necessary) if there is a failure

I don't really object to this, but this is likely to create scenarios where the user forever has a broken storage.sync implementation - eg, if the source database is corrupt. On the other hand, I guess they already are broken, although we might be making their browser experience worse (eg, trying to migrate every single time storage.sync is used.)

You'll probably also want a path to remove the kinto implementation at some point? It doesn't seem in anyone's interest that we forever keep kinto around just to handle this scenario.

OTOH though, it seems possible this scenario is already experienced by the user for which migration might fail? Eg, if the database is corrupt or the file system is full, then we will fail to set information about these containers in the first place, so the extension will later find them missing and make the same erroneous decision?

  1. user switching the pref off/on needs to be handled somehow

I think the only way forward is to explicitly not support that. If you allow that, then a variation of your scenario remains possible - eg, "user adds container to new impl, user switches back to old impl, new container now assumed to be deleted". One way to not support this is to swap the pref out for a check of the file-system, and have migration delete the old database on success.

(In reply to Mark Hammond [:markh] [:mhammond] from comment #12)

(In reply to Shane Caraveo (:mixedpuppy) from comment #11)

  1. full migration is the only success
  2. we want to fallback to kinto in readonly mode (actual server sync not necessary) if there is a failure

I don't really object to this, but this is likely to create scenarios where the user forever has a broken storage.sync implementation - eg, if the source database is corrupt. On the other hand, I guess they already are broken, although we might be making their browser experience worse (eg, trying to migrate every single time storage.sync is used.)

If the old db is corrupt and we cannot migrate, it's useless anyway, so we should just be "successful" in that case.

You'll probably also want a path to remove the kinto implementation at some point? It doesn't seem in anyone's interest that we forever keep kinto around just to handle this scenario.

Absolutely, perhaps after a release or two and we have telemetry showing it's not a big problem?

OTOH though, it seems possible this scenario is already experienced by the user for which migration might fail? Eg, if the database is corrupt or the file system is full, then we will fail to set information about these containers in the first place, so the extension will later find them missing and make the same erroneous decision?

If the disk is full and we cannot migrate, I'm fine with the migration being attempted again later. This will be more of an edge case.

  1. user switching the pref off/on needs to be handled somehow

I think the only way forward is to explicitly not support that. If you allow that, then a variation of your scenario remains possible - eg, "user adds container to new impl, user switches back to old impl, new container now assumed to be deleted". One way to not support this is to swap the pref out for a check of the file-system, and have migration delete the old database on success.

can we ignore the pref after a successful migration? "no turning back". if pref.enabled || extension.migrated use new sync.

If the old db is corrupt and we cannot migrate, it's useless anyway, so we should just be "successful" in that case.

By that I mean, we probably should just have an empty new db and mark it migrated.

Besides what Shane already wrote in comment 13 and comment 14, here is some additional notes about what I see in the migration code linked in comment 9:

If I'm reading it correctly (based on these consts here and the comment right above and the related migration code here), it looks that if some extension has storage an amount of data greater than MAX_LEN and/or more keys than MAX_KEYS, then we will ends up with partially migrated data for that extension, which wasn't what I expected.

I was expecting that:

  • we migrate everything that was there originally, no matter the size or the number of keys (even if the limits used in the migration code looks higher than the ones we use in the quota checks enforced by the new implementation for the API calls),

and then:

  • we don't sync the data with the server and we don't allow to write more data, until the data is back in expected quota.

From my perspective, migrating part of the data may more likely trigger other unexpected behaviors in the extensions, that may be expecting all the data stored, or no data at all (and so that behavior may make the data to look as corrupted data from the perspective of the extension).

Flags: needinfo?(lgreco)

This is a difficult, and ultimately, impossible situation to deal with optimally. While I agree that you own this decision, I want to be clear that I think we are prioritizing the wrong edge-cases.

Firstly, if it is true that when the containers addon finds missing data it deletes containers, then that addon is fundamentally broken, period. That addon will encounter that scenario at some point (eg, a user signing up a new device, realizing they don't know their password and resetting it). At Sync's scale, there are many other "one in a million" chances that come up multiple times per day (eg, bad ram). Someone should investigate that addon because:

  • If it is true, that addon should be fixed. The traditional way of avoiding that is to keep a "tombstone".
  • If it's not true (eg, if it never was, or even if it was true and is fixed), then in my opinion, there are other more compelling use-cases.

(In reply to Luca Greco [:rpl] [:luca] [:lgreco] from comment #15)

From my perspective, migrating part of the data may more likely trigger other unexpected behaviors in the extensions, that may be expecting all the data stored, or no data at all (and so that behavior may make the data to look as corrupted data from the perspective of the extension).

The problem is that this directly contradicts the primary use-case of the "containers" addon scenario in comment 10. That scenario is best served by migrating as much as possible. But now we have another theoretical scenario with is best served by doing the exact opposite. It's clear we can't meet both requirements simultaneously.

This is going to come up in practice. Let's say migration of 1000 rows sees a single "read" error - we will not know what extension the read error was for. Do we throw the entire migration away (and lose all user's containers and all other addon data) or migrate what we can? Which requirement do we voilate? We also can't guarantee every error will fit neatly in a bucket. Corruption of sqlite databases can manifest in many ways, including just reading garbage. Extreme out-of-memory or out-of-space situations might manifest as IO errors doing a read (ie, we'd assume corruption but retry may work). So a policy that says "treat corruption as X, read errors as Y and write errors as Z" is only ever going to be an approximation anyway - ultimately we will end up misclassifying some anyway - it's not as though we have a corpus of corrupt databases to test against, nor a test infrastructure to ensure out of memory/storage scenarios always work as we'd like to think they do.

it looks that if some extension has storage an amount of data greater than MAX_LEN and/or more keys than MAX_KEYS, then we will ends up with partially migrated data for that extension, which wasn't what I expected.

That's true, but let's keep this in perspective. We have an addon that's explicitly chosen to use storage.sync but somehow has ended up with ~20x the documented limit for this area and which is enforced by chrome. The user of this addon will only know that stuff isn't syncing, not why, and will not have levers to fix it. The addon will need to deduce for itself what the problem is (presumably based on a slow trickle of user feedback) and given the addon author is also unlikely to understand how come there's 20x the expected data there, the only effective way of resolving this is to call .clear() on the busted profile. The data might be oversized due to previous corruption. The addon presumably cares the data is synced or would use a different area with a greater quota. I don't see how migrating in a way that the addon is guaranteed to not sync but allows for it being (say) 1000x over quota is a good outcome for anyone and will appear to both the user and addon author as though the Firefox profile is simply broken. As above, I accept that this is your policy decision, but I think it will be delivering the worst possible outcome for users of that addon. Migrating nothing for that addon would be a better outcome and still meets one of the requirements above.

Also, the migration code ignores read errors - as above, this too doesn't meet the "better to migrate" nothing requirement. Should this change?

(Also, keep in mind that addons must be somewhat robust here - sync's merging means they are likely to see data that no single profile ever wrote - the entire point of this storage area is that the addons may see stuff that they themselves would never write)

It's also worth noting that going around too many circles here means we are likely to miss the next ESR, which (a) has significant ongoing costs as the kinto service must remain alive for much longer and (b) means ESR devices will not sync with non-ESR for a number of months. While that can't be a driver of this decision, it would be a shame for this to be yet another bad outcome in an effort to obtain an impossible ideal outcome.

I hope this doesn't come across as too ranty :) But I'm going to need guidance here - I don't see anything actionable I can do which doesn't violate at least one requirement above, and I'd like to push back on any notion that we can "correctly" handle/classify all possible errors, so we need a "broad" error strategy rather than a fine-grained one.

I'm afraid that we are focusing too much on the multi-account-containers "hypothetical issue" (1), my main point (from my original comment on the phabricator revision) was that we should provide a reasonably consistent behavior and prepare to communicate to the addon developers:

  • what to expect (e.g. new behaviors around the quota, how the migration works)
  • what their extension should do to deal with some possible corner cases they may find themselves
  • how they can double-check that their extensions are going to do the right thing (while this is still riding the train) and eventually report issue that we may have missed.

The hypothetical multi-account-containers issue was mainly interesting as a way to think of what the extensions may be doing wrong without even realizing, we can't handle optimally every edge-case for sure, we can't even list and test explicitly all possible edge-cases.
We should just have a consistent behavior and try to avoid introducing more edge-cases to deal with.

I'm absolutely sure that none of us wants to be going around too many circles, what we want is to assess this as quickly as possible and agree on a consistent behavior (and as simple and clear as possible, because it has to also be communicated to the extensions developers) that can allow us to move this forward .

Let's do that.


(1): I looked to the multi-account-containers code and if I'm not reading it wrong, it looks that it is using an explicit deletedIdentityList key stored in the sync data, and so losing that key shouldn't be making it to remove all the existing containers, as for all the other extension developers we should still let them aware that storage.sync will be migrating data soon so that they can double-check what may go wrong for them).

Follows my attempt to list the topics we are trying to agree on, and then list for each of them some potentially actionable points (with a couple of alternative options in some cases) and some open questions.

Avoid unexpected outcomes due to migrating data twice

to deal with this we could:

  • set a preference webextensions.storage.sync.migrated (as a quick way to check if the migration was already done in a previous run, that doesn't require additional I/O)
  • once we migrated, also rename the old storage-sync.sqlite file to storage-sync.migrated.sqlite file (e.g. to avoid redoing a data migration if the prefs file gets lost for unrelated reasons)
  • when starting a migration, if there is already data in the the new storage-sync2.sqlite file, then we assume that the data has been already migrated and we do not migrate again (as an additional protection from unexpected issues that may be making us migrating twice the storage-sync.sqlite data, e.g. if we fail to rename the old file)

Also, Firefox may be starting to shut down while we are still migrating data, and so we should be sure to wrap the data migration in a AsyncShutdown blocker (if we are not doing that already).

Data migration behavior on data over quota

As I was trying to explain in my previous comment, I don't really understand what we would gain but having a second level of quota in the migration, different from the quota enforced at runtime by the API method themselves.

If the data is over quota, then it is not going to be synced until the extension goes back into quota, is that right?
If that is the case, 2x the quota or 20x the quota does really make difference from that perspective?
Partially migrating data seems to be something that would be creating some additional corner cases (and make even harder to understand how the extension got in that state if it is then reported as a bug).

In my opinion it would be simpler to explain to the developers one of the following two behaviors:

if an extension has data over quota, then:

  • option 1: we migrate all the data and the extension is responsible to clear enough data to go back in quota (and finally be able to sync data again)
  • option 2: we don't migrate any data at all? (and the extension, and user, should expect to be starting from scratch)

Behavior on flipping the pref webextensions.storage.sync.kinto back to true

To deal with this scenario we may want to also check the value set to webextensions.storage.sync.migrated (where we are selecting which storage.sync implementation we are going to use here), and if it is true we flip webextensions.storage.sync.kinto back to false and keep using the new storage.sync implementation.

What behavior should be implemented if we need to flip webextensions.storage.sync.kinto back to true?

Questions:

  • what is the plan if something goes wrong and we need to go back to the kinto implementation?

    • is that an option we considered and planned? and if it is, where is the "no going back" line? (e.g. is the "no going back" line "early beta"?)
    • under which conditions we are willing to flip the pref back to true?
  • should "services.sync.prefs.sync.webextensions.storage.sync.kinto" also flipped back to true?

  • what should we be doing to avoid triggering new unexpected corner case?

    • should we remove (or rename) the storage-sync2.sqlite file?
    • should we rename storage-sync.migrated.sqlite back to storage-sync.sqlite (if the storage-sync.migrated.sqlite file actually exists)?
      or just assume that the old data would be synced back from the kinto server)

Other data migration unexpected errors

Aiming for the optimal solution (that doesn't really exist) for all unexpected errors is not possible, let's list the cases we can think of, and agree on what is a reasonable behavior from our perspectives (which could also be, ignoring the error if we agree on that):

here are some that I could think of (and/or already mentioned in the previous comments):

  • corrupted storage-sync.sqlite file => ignore the corrupted file, migrate successfully to the new implementation
  • disk full during the data migration:
    • option 1: temporarily fall back to kinto implementation, retry on the next startup
    • option 2: rename the storage-sync.sqlite file and start from scratch?
    • questions:
      • should we warn the user?
      • should we record in telemetry the size of the storage-sync.sqlite file, in case that is what was growing and filling all the space?
  • unexpected lock on read for the storage-sync.sqlite (I think this is only possible on windows):
    • questions:
      • can we detect this?
      • how to deal with it? wait for next startup? (or defer migration for N times then just migrate to the new implementation and start from scratch without trying to migrate the data again?)

Document for the user how to repeat the data migrations if they really want to

Some users may still want to force this (for their own reasons), providing some info could reduce the chance for them to end in one more corner case.
The manual way to force a migration to happen again (at least based on the behaviors described in the previous points) could be something like:

  • remove webextensions.storage.sync.migrated (or flip it to false)
  • stop Firefox
  • in the related Firefox profile, rename back storage-sync.migrated.sqlite to storage-sync.sqlite and remove (and/or rename/move it elsewhere) the new storage-sync2.sqlite file
  • start Firefox again

Shane and I had a good discussion today about the migration - one of the key requirements is that a "write error" (eg, disk full) should abort and keep kinto in place. The complication here is that the migration is async, and this propagates to many places - eg, ext-storage.js can no longer just rely on the Cu.import of the relevant module, but needs to wait for migration to finish before it knows if it's OK to continue with the new implementation.

I've just pushed a new version to phabricator and want to explicitly call it out for early feedback - more for the "shape" of the changes than anything. The new test fails for reasons which are almost certainly shallow - but extensive testing will be difficult due to the fact an xpcom service is involved.

One of the requirements is that we are safe from ever migrating twice, even if the profile loses all preferences - so we must use the state of the file-system to determine if we should migrate. For this reason, the migration is done by that xpcom service - it's critically important the service isn't accidentally used without migration, as otherwise a new database is created and we will assume that means migration previously worked. So this "asynced-ness" propagates through a number of modules (and sadly makes testing more difficult as IIUC, we can't reinitialize the service even in tests.

So needinfo? both Luca and Lina - again, more for the shape than a final review.

Flags: needinfo?(lina)
Flags: needinfo?(lgreco)

Doing it in rust turns out to be far far easier and far less complex - thanks to both of you for your patience. I'm pushing a new version in which all the existing tests pass - migration tests need more work, but it's very close. LMK what you think.

(In reply to Mark Hammond [:markh] [:mhammond] from comment #20)

Doing it in rust turns out to be far far easier and far less complex - thanks to both of you for your patience. I'm pushing a new version in which all the existing tests pass - migration tests need more work, but it's very close. LMK what you think.

Thanks a lot Mark!!!
I took a quick look to the updated revision and I like it a lot, it looks like we are getting really really close.

Flags: needinfo?(lgreco)
Pushed by mhammond@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/09705f26786b
Migrate browser.storage.sync data from kinto to the new world. r=rpl,lina
Pushed by mhammond@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2692b8d159d5
Migrate browser.storage.sync data from kinto to the new world. r=rpl,lina

I forgot to disable that new test on Android.

Flags: needinfo?(markh)
Flags: needinfo?(lina)
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79

Hi Mark,
Lina did create uplifts requests for Bug 1641005 and Bug 1642271 because they landed on central after the last commit merged on Beta.

I asked Julien if that was the case for this one and Julien confirmed that it is.

And so it looks that you may need to request an uplift for this one too.

Flags: needinfo?(markh)

Comment on attachment 9150361 [details]
Bug 1635348 - Migrate browser.storage.sync data from kinto to the new world. r?rpl

Beta/Release Uplift Approval Request

  • User impact if declined: Dataloss for power users who enable the new Rust storage.sync backend
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This functionality is behind a preference. The preference will not be enabled until a full PI review has been done.
  • String changes made/needed: None
Flags: needinfo?(markh)
Attachment #9150361 - Flags: approval-mozilla-beta?

I didn't see an answer here to some of the questions in comment 18, in particular what happens if we turn the pref on and then back off for whatever reason?

Flags: needinfo?(markh)

I can answer some of those.

Here's what we ended up implementing for migration:

  • The first time the storage.sync database is accessed, we check if the new (Rust) and old (Kinto) databases exist. We only proceed with migration if the Kinto database exists, and the Rust database does not.
  • If we determine we need to migrate, we copy all key-value pairs out of the Kinto database, and into the new Rust database. The two databases use different schemas, so we need to do some parsing and validation. If there are any database-level errors reading from the Kinto database (including failing to open the Kinto DB, unexpected read locks, and preparing or running a SQLite query), we bail and don't migrate anything. Any invalid data in Kinto (including unexpected collections or invalid JSON) is skipped over. Other than that, all data is copied over. We don't enforce any kind of quotas here—if something exceeds the quota, but it's valid JSON, we still migrate it for the extension. If there are any database-level errors writing to the new Rust database, we abort migration with the error. So read errors aren't fatal, but write errors are.
  • If migration fails with a write error, we remove the new Rust database, and propagate the error up. In that case, we fall back to the legacy Kinto implementation until Firefox is restarted, at which point we'll try the whole process over again.

So there's no separate migration pref, and the way to force a migration is to quit Firefox, delete storage-sync2.sqlite, and restart. However, since the data in storage-sync2.sqlite is syncing with a different server, the old Kinto storage-sync.sqlite database won't have any of it. So any new data written while the Rust backend was active will be lost. This can manifest as extension settings being reverted.

Conversely, disabling the new Rust backend (by flipping webextensions.storage.sync.kinto back to true) and returning to Kinto means that all new data written to Kinto will not be migrated when the Rust backend is turned back on. As above, this manifests as extension settings being reverted.

Why did we do it this way? We decided it's the least error-prone mitigation for corner cases. Given that the two backends are independent and rolling the pref back will lead to a "split" where neither Kinto nor Rust are aware of each other's changes, we felt a more elaborate strategy that tried to merge changes would cause more unexpected behavior than settings being reverted.

With that in mind, we accepted this risk of data loss because flipping webextensions.storage.sync.kinto back to true is a last resort. We used the pref to gate development, not rollout.

Flags: needinfo?(markh)

Comment on attachment 9150361 [details]
Bug 1635348 - Migrate browser.storage.sync data from kinto to the new world. r?rpl

Thanks Lina for the extra details.

Approved for 78.0b3.

Attachment #9150361 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: in-testsuite+

Thanks so much for the quick turnaround, Julien!

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: