Closed Bug 1422420 Opened 7 years ago Closed 4 years ago

Discuss how to store history tombstones on the server

Categories

(Cloud Services Graveyard :: Server: Sync, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: lina, Unassigned)

References

(Blocks 1 open bug)

Details

In bug 578694, we're working on uploading tombstones for cleared history pages. This can result in Firefox uploading thousands of tombstone records when history is cleared. For pages that are already on the server, we'll replace their records with a simple `{ id: "...", deleted: true }` payload. However, for records that aren't on the server, we'll store *new* tombstones. This seems wasteful, especially since the server likely doesn't have your full history. On Desktop, we'll only upload visits from the last 30 days on the first sync; I think iOS and Android backfill incrementally. Since we don't currently dedupe history records, clients will end up downloading, then ignoring most of those tombstones. I thought it might be helpful to extend `POST /storage/{collection}` to support "update only if exists". For example, if we upload 50 tombstones in the POST, and 40 of those tombstones are for items that don't exist on the server, we only store 10 tombstones, and ignore the remaining 40. *However*, I didn't think about the TTL for history records (60 days)! So it's possible that a page you last visited 3 months ago *has* been synced to your other devices, but is *not* on the server because it was TTLed. In that case, it would be very surprising to clear your entire history on one device, and still see all pages you visited 2+ months ago on the others. In fact, I think it's worse, because it draws attention to all those old pages, as only the new ones will be removed. So "update only if exists" likely won't help us here, and we'll want to record all tombstones, anyway. Maybe we should use a lower TTL for tombstones...or, if garbage-collecting the TTLed tombstones would create more load on the server, just leave them be. Either way, I thought I'd get a bug on file so we can discuss. CCing all the usual suspects. :-)
Thanks Kit! I've been mulling this over for a while, but I can't think of a sensible way to expose this in the protocol without it feeling like a really bolted-on special case. I'm leaning towards just leaving it as-is and eating the potential storage and traffic costs if we don't have anything that's a compellingly better alternative.
This is definitely verging into the "there is no good solution possible" part of the problem. But overall I agree with your 90° turn — we should upload a tombstone regardless of whether the previous record on the server evaporated, and so this bug is WONTFIX. I suspect there are two big buckets for history deletion: deleting very recent history ("oh shit, I wasn't in Private Browsing mode!"), and deleting stuff that shows up a text search -- "oh shit, look what's in my history for 'engagement'!". The latter is more likely than not to be older than our short TTL.

Resurfacing this bug.

This problem may become more significant with Durable Sync. Every time an item is deleted, a tombstone is created. We have several users who have acquired several GB of tombstones which impact cost of providing service as well as approach the hard limit for how much data we can store for a given user.

To that end, we are investigating enforcing quotas. Some discussion has been made in various channels about adding a server readable flag to indicate a record as a tombstone, with some quantity of tombstones guaranteed to persist. Older tombstones that are beyond that limit would be dropped by the server.

It would be good to have the following:

  1. The minimum number of tombstone records that need to be preserved for a satisfactory user experience. (I'm going to presume that this will be per collection, however it would GREATLY simplify server's life if this were universal. "Satisfactory" should address the needs and usage behaviors of 90% of the population.)

  2. Understanding if it possible for a user to add any "tombstone" flags to existing tombstones when reading them in, and submitting them as replacement records. (This would allow server to clean up some of the historic items.)

  3. A universal "TTL" for a given user. (in other words, if we've not heard from a user in X period, can we consider the record set as abandoned and remove it?)

Hi JR, thanks for kicking off the discussion!

I wonder if it would be better to capture your wonderful comment in a new bug? This bug was specifically about history, and described a hypothetical scenario where we might upload thousands of tombstones after clearing history. Currently, we don't do this, because we never fixed bug 578694 for reasons best summed up as "it's really hard".

But, as you pointed out, the general problem of accumulating useless tombstones on the server —sometimes on the order of gigabytes—still exists today. The bug to expose a flag in the cleartext BSO payload to indicate it's a tombstone is bug 1657536. This would help the server prune those tombstones.

Answering your questions:

  1. The minimum number of tombstone records that need to be preserved for a satisfactory user experience. (I'm going to presume that this will be per collection, however it would GREATLY simplify server's life if this were universal. "Satisfactory" should address the needs and usage behaviors of 90% of the population.)

I don't think there's a clear-cut number that will work for every case, so I'll reply with a question: how tricky would it be to evict old tombstones based on a combination on time and total collection size? For example, if a collection has 50% tombstones...it might be that the user just did some spring cleaning on their old bookmarks (so we should keep them around until they're synced to other devices), or they might be old garbage. The size of the collection matters here, too.

Once the number of records in a collection exceeds some threshold, can we calculate how many tombstones we'd need to remove to fit under that threshold, and then delete them oldest to newest?

  1. Understanding if it possible for a user to add any "tombstone" flags to existing tombstones when reading them in, and submitting them as replacement records. (This would allow server to clean up some of the historic items.)

You mean, can clients reupload plaintext tombstones for ones that are currently encrypted on the server? They certainly could, but it might be worse than leaving them alone. With the exception of new bookmark sync and Rust logins (which could do this), clients generally don't persist Sync tombstones in their local databases. They would need to read the entire collection from the server, decrypt every record, and reupload the ones with deleted: true.

New bookmark sync has been around for some time (Firefox 70, FxiOS 17, Fenix), and Rust logins even longer (FxiOS 16, Fenix, Lockwise), so this might catch a good chunk of those users. We would have to make sure only one client does this, though; we don't want every client with locally cached tombstones reuploading them all, over and over, to the server.

If you meant, can clients do this on the fly, for new incoming tombstones that they see—that might be more tractable. Connecting a new device for the first time would basically do the same thing as above.

  1. A universal "TTL" for a given user. (in other words, if we've not heard from a user in X period, can we consider the record set as abandoned and remove it?)

That's a good idea, too! It's basically what happens now—the longer you're not syncing, the more likely a node reassignment will happen in the meantime, and you'll start with a fresh bucket on a different node when you come back. The risk here is that folks using Sync as a backup would lose their data signing back in, even if they remember their password. It would also be good to communicate that deadline somehow, maybe via a Firefox account email? Something like "hey, we saw you haven't synced in a while, please connect one of your computers to sync or we'll delete all the data in your account soon."

Absolutely fine with moving this to a new bug.

Does it make sense to close this one as well?

Thanks for moving the discussion over, JR! Yeah, let's close this as WONTFIX, since this doesn't make much sense without bug 578694.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.