Open Bug 1598168 Opened 5 years ago Updated 2 years ago

Fail the sync if any record queued for upload exceed the record size limit

Categories

(Firefox :: Sync, task, P3)

task

Tracking

()

People

(Reporter: lina, Unassigned)

References

(Blocks 1 open bug)

Details

There are two problems with syncing jumbo bookmark sets, and they feed into each other. The first is, a folder with more than ~16k children will exceed the individual record size, and fail to be uploaded. The second is, that number of records also exceeds the batch size limit, so they'll be spread out over multiple batches.

Unfortunately, that leaves us with the worst of both worlds. When a user with lots of bookmarks tries to sync their library, we'll only upload some of it, before getting to the large folder and failing. However, everything we already uploaded will remain on the server, confusing other devices and the original device. This is worse with new bookmark sync, which actively tries to fix up corruption in bookmark trees, because it'll move those items into unfiled and reupload records for them.

Large libraries themselves aren't enough to cause this problem. It's still possible for bad things to happen if you sync a large library at the same time as another client: device A might commit a batch, then device B syncs down the (incomplete) tree, then device A fails to commit the next batch because device B is syncing. But I think that's rarer. The problem here is large libraries with too many bookmarks in one folder.

In the ideal case, we'd "just fix it" by changing the record storage format entirely, to avoid listing children out like that, and bumping both the record size and batch size limits. Chrome Sync works around this exact issue with a better storage format. However, due to a mix of historical reasons, legacy baggage, practical concerns, and not having enough people to work on this, we can't realistically do this. In fact, the new durable sync will lower the batch size limits, because the underlying database limits the number of mutations.

Which leaves us with a stopgap: fail to sync jumbo sets, like bug 1544656, comment 2 suggests. We could try to avoid syncing the offending folder, but that won't work well, since children will lose their order. We could try not syncing the children, and still sync other folders and their children, but that gets tricky. What if you have 16k unfiled bookmarks, and you move one from a different folder (which is synced) into unfiled (which isn't?) We've worked to move away from syncing only a subset of your bookmarks because that caused more confusion and corruption.

Looking at the last 14 days of telemetry pings, there's an order of magnitude drop between the number of users with 1k and 5k bookmarks. (The difference between 5k-10k and 10k+ isn't as dramatic). We don't want to abandon those users, but the experience we're giving them now is worse.

Since we already buffer all bookmark records into memory before uploading, we can check if any of them would exceed the batch size limit, and abort the sync right away. Bookmarks is the only engine that currently does this; for other engines, it's safe to ignore oversized records.

The second thing to think about is surfacing some UI. Maybe a warning icon next to bookmarks in about:preferences#sync, showing something like "Your bookmarks library is too big to be synced." When folks see that and ask for support, we can ask them about the structure of their library, and if they can split up their bookmarks into folders.

Another option is something like bug 1321021 - while that seems user-hostile, I expect there aren't that many users with both a huge number of bookmarks in a single folder and a carefully curated bookmark hierarchy.

(In reply to :Lina Cambridge from comment #0)

Large libraries themselves aren't enough to cause this problem. It's still possible for bad things to happen if you sync a large library at the same time as another client: device A might commit a batch, then device B syncs down the (incomplete) tree, then device A fails to commit the next batch because device B is syncing.

Bug 1515790 is related to this first part. It's for the case where everything fits into the record size limit, but not the batch size limit.

Priority: -- → P3
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.