Open Bug 1465313 Opened 7 years ago Updated 2 years ago

[meta] Firefox Sync should provide durable storage of your profile data "in the cloud"

Categories

(Firefox :: Sync, enhancement, P1)

enhancement

Tracking

()

People

(Reporter: rfkelly, Unassigned)

References

Details

(Keywords: meta)

We've been telling ourselves for years that "sync is not a backup service", but that's never really made sense to our users and and becomes less and less defensible over time. We're re-engineering a bunch of sync-related things, so let's do the work of figuring out how to make it actually provide the reliable backup service that our users expect. This is a metabug for digging into metrics on data loss, using those to make a plan, and then executing said plan. The first step is to make sure we understand the reasons why it doesn't currently function this way, which are a combination of server-side and client-side behaviours: * server-side Storage Node migrations * as a result of users resetting their account password * as a result of nodes becoming overloaded * due to too much write contention from too many concurrent users * due to the "heat death" of purge_ttl jobs over time * as a result of server upgrades and other routine maintenance * as a result of AWS instance failures and other unexpected events * client-side data wipes: * In response to unexpected or corrupted metadata on the server [1] * When the user disables syncing of a given data type [2] * When extensive bookmark corruption is discovered [3] * After restoring bookmarks from a backup [4] * As an explicitly-available user action [5] * IIRC, under some circumstances when /crypto/keys is busted :markh and :bobm, can you think of any other potential causes that I may have missed here? [1] https://dxr.mozilla.org/mozilla-central/rev/f01bb6245db1ea2a87e5360104a4110571265137/mobile/android/services/src/main/java/org/mozilla/gecko/sync/stage/ServerSyncStage.java#529 [2] https://dxr.mozilla.org/mozilla-central/rev/f01bb6245db1ea2a87e5360104a4110571265137/mobile/android/services/src/main/java/org/mozilla/gecko/sync/stage/ServerSyncStage.java#568 [3] https://dxr.mozilla.org/mozilla-central/source/services/sync/modules/bookmark_repair.js#32 [4] https://dxr.mozilla.org/mozilla-central/source/services/sync/modules/engines.js#926 [5] https://dxr.mozilla.org/mozilla-central/source/services/sync/modules/service.js#1328
Flags: needinfo?(markh)
Flags: needinfo?(bobm)
Depends on: 1465314
Depends on: 1291418
Depends on: 1465317
(In reply to Ryan Kelly [:rfkelly] from comment #0) > * server-side Storage Node migrations > * as a result of users resetting their account password > * as a result of nodes becoming overloaded > * due to too much write contention from too many concurrent users > * due to the "heat death" of purge_ttl jobs over time > * as a result of server upgrades and other routine maintenance > * as a result of AWS instance failures and other unexpected events > :markh and :bobm, can you think of any other potential causes that I may > have missed here? It's probably worth noting here that Tokenserver will, over time, assign more users to a Sync node than it's maximum capacity. See bug 1325519. Otherwise, that list looks complete to me.
Flags: needinfo?(bobm)
> * When the user disables syncing of a given data type [2] It's worth noting that this is the only troubleshooting step available for a wide array of support issues -- "turn it off and on again". Doing so is, of course, a sure-fire way of losing any data that's only stored on the server. An additional item that's not on your list, Ryan: forgetting the FxA password. My personal non-exhaustive list of reasons why Sync is not a good backup service: - If you forget your password -- which is probably common when doing the "wipe and reinstall Windows" dance -- you lose all of your data. It's easy for users to accidentally wipe the only copy of their data. - Sync does not support undo/rollback/recovery, because it doesn't save snapshots or multiple versions. If you deleted a bookmark and want to get it back, you're out of luck. It's not possible to sync without mutating your backup. - Indeed, Sync doesn't support 'narrow' recovery (grabbing individual records) at all. - The UX for 'recovery' lost the Sync Options dialog when we moved to FxA, so it's very easy to accidentally merge instead of replacing local data with the contents of the server. - You can't recover from the server twice, or dry-run/test your backups; the backup is mutable and routinely mutated. - Sync doesn't store all of the data that's in each synced data type, let alone the other data in the profile. You cannot sign in to Firefox on a new device and get anywhere close to the same experience you get from taking a local backup of your profile. - TTLs mean that old data disappears; you won't get more than the last 90 days of history. I regularly get use out of 2+ years of history. - Data isn't backed up from any device if it's not synchronized between all of your devices. (Backup and sync are conflated.) It's possible to make Sync _more durable_, but I don't believe that gets it anywhere close to a backup service.
> An additional item that's not on your list, Ryan: forgetting the FxA password. This was literally the first item on my "server-side Storage Node migrations" list ;-) But you're right, it's different enough from the others that it should be its own top-level item, it would cause you to lose access to the stored data even if it did not trigger a node migration. > It's possible to make Sync _more durable_, but I don't believe that gets it anywhere close to a backup service. Thanks, I had forgotten how much baggage and expectations the word "backup" carries with it, and have edited the bug title accordingly. FWIW I'm also using "Firefox Sync" here in the product-feature sense, fully expecting that part of how we get there is "migrate Firefox Sync to use better underlying technologies, like Mentat".
Summary: [meta] Firefox Sync should provide a reliable backup → [meta] Firefox Sync should provide durable storage of your profile data "in the cloud"
> - If you forget your password -- which is probably common when doing the "wipe and reinstall Windows" dance -- you lose all of your data. It's easy for users to accidentally wipe the only copy of their data. I think we're starting to get a good handle on this one. Recovery keys is just a start. We hope that with WebAuthn, we can remove the need for passwords most of the time. (so long as users have more than 1 device) > - The UX for 'recovery' lost the Sync Options dialog when we moved to FxA, so it's very easy to accidentally merge instead of replacing local data with the contents of the server. This UX can be resurfaced pretty easily IIUC. I can't speak for all of the other point rnewman brought up. > FWIW I'm also using "Firefox Sync" here in the product-feature sense, fully expecting that part of how we get there is "migrate Firefox Sync to use better underlying technologies, like Mentat". +1 What it comes down to is that the majority of users expect there things to be safely backed up. Let's meet that expectation. Data persistency will also be a requirement for other teams that want to build on top of "sync + fxa" like Lockbox, for example.
Depends on: 1465705
Sorry for the delay, but FTR, I believe comment 0 is accurate.
Flags: needinfo?(markh)
Component: Firefox Sync: Cross-client → Sync
Product: Cloud Services → Firefox
Depends on: 1600212
Priority: P3 → P1
Blocks: 1655337
Severity: normal → S3

To disambiguate a bit, the Mozilla project to move to a more stable storage architecture (called "durable sync"), which has been live for some time (see https://github.com/mozilla-services/syncstorage-rs).

However, this bug covers the concept of "durability" more broadly, which also gets into data-loss scenarios that are not infrastructure accidents.

You need to log in before you can comment on or make changes to this bug.