Millions of Form BSO records

RESOLVED WONTFIX

Status

()

Firefox
Sync
RESOLVED WONTFIX
10 months ago
10 months ago

People

(Reporter: mostlygeek, Assigned: markh)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

10 months ago
I've been investigating the slow growth of disk usage on the go-sync server. For some of the the largest databases users have tens of thousands of form fields [1]. Most of the records are very small, <300 bytes and most of it is HMAC, IV and JSON overhead. 

Using numbers from my synced data. I have FF on iOS, Android and my Desktop and they are all synced. IMO I have pretty regular browsing behaviour. 

Type, count and total size:

History  : 97,028 (51,052KB)
Form     : 7,660  (1,828KB)
Bookmark : 571    (271KB)

From an ops perspective, I'm concerned with the number of form and history BSOs. Form BSOs have a client set expiration of 1095 days (3 years) and history has an expiration of 60 days. 

- Do these numbers seem reasonable, expected or as designed?
- Is it reasonable to cap the number of form/history BSOs and reduce their TTL?
- Can we reduce form TTLs and history TTLs to 30 days? This would free up much disk space. 
  - would clear up hundreds of millions of records which would make the python server's mysql
    db very happy. Smaller indexes.
(Reporter)

Comment 1

10 months ago
:rnewman would appreciate your input here too.
Flags: needinfo?(rnewman)
(Reporter)

Comment 2

10 months ago
correction from the Description: 

- some users have tens of thousands of form ~BSOs~ 
- [1] some users have than 200K form BSOs
(In reply to Benson Wong [:mostlygeek] from comment #0)

> From an ops perspective, I'm concerned with the number of form and history
> BSOs. Form BSOs have a client set expiration of 1095 days (3 years) and
> history has an expiration of 60 days. 
> 
> - Do these numbers seem reasonable, expected or as designed?

Yes.

> - Is it reasonable to cap the number of form/history BSOs and reduce their
> TTL?

Perhaps a cap, but it would have to be quite a high cap, and clients are not prepared to handle limits like that — they have no error handling for it, and they don't know how to prioritize which records they upload.

Form history is the record of everything you've ever typed in a web form that you'd like to use again — addresses, for example. The reason it has a longer TTL than history is that utility _increases_ with duration — you're less likely to remember the values!

My local form history database has 2,061 entries, and I think that's totally fine.


> - Can we reduce form TTLs and history TTLs to 30 days? This would free up
> much disk space. 

Even if we chose to, that's a local client change (so you're looking at 6-12 months for full market penetration), and it would only apply to newly uploaded records.


Two things come to mind:

- Our clients don't do very well at cleaning up after themselves. The contents of a sync server will continue to grow until a user is node-reassigned, at which point they'll reupload only newer stuff. The Sync client-server dance somewhat expects node reassignments to handle cleanup.

Historically that's been exactly what we want from an operational perspective — our very least favorite thing is a big DELETE. If that's no longer the case, there are probably lots of opportunities for clients to delete stale data from the Sync server.

- Users with tens or hundreds of thousands of form history entries either type a lot, or they have an add-on storing data in form history. A quick trawl through AMO's source repository looking for users of the Satchel interfaces would be informative.
Flags: needinfo?(rnewman)
It's also a good time to note that the form object format is due a redesign, alongside history. It doesn't include enough data, for one thing.
(Reporter)

Comment 5

10 months ago
> Even if we chose to, that's a local client change (so you're looking at 6-12
> months for full market penetration), and it would only apply to newly
> uploaded records.

Speaking only for my golang server it can be implemented on the server side. TTLs can be overridden and capped at 30 days (or whatever). 

> 
> Two things come to mind:
> ...
> Historically that's been exactly what we want from an operational
> perspective — our very least favorite thing is a big DELETE. If that's no
> longer the case, there are probably lots of opportunities for clients to
> delete stale data from the Sync server.

With the go-syncstorage server DELETE and vacuum are our friends now. 

> A quick trawl through AMO's source repository looking for users of the Satchel
> interfaces would be informative.

Got a link?
We tend to not like the server to throw away data when a client has specified a TTL that's longer; protocols shouldn't be built on lies, even if it's expedient. But mostly we do not want a shorter TTL: this is long-lived data. The Sync server should be able to store everything a new client would need, and your full set of saved form data is part of that.

I can't find the AMO source repo from here (on a train!), but someone in #amo should be able to help you out.
History:

Bug 697352/Bug 697482 is a situation of excessive form history payloads.

Bug 643633 (Firefox 44) is when we made form history actually useful by uploading more than 500 entries and not time-limiting.

Bug 745409 covers designing a better form history format, which will allow us to offer a non-terrible experience on other devices (Bug 997718).
More noodling: it sounds like there's a tension here between the client's desire to have small, incremental payloads and an operational desire to have fewer records on the server.

Our current trajectory is actually to have *more* records, perhaps many more — as we record more history visit metadata, it becomes unfeasible to reupload an entire history record each time one visit is added, and we also want to store more visits than we currently do (20, up from 10 a few years ago).

That might mean one 'data' record for each URL, and one or more 'visit' records that are append-only.

That implies that the Sync storage server should plan to regularly handle more than 100K smaller records per user.

This should be well within SQLite's capabilities, but it's worth you planning for it.

Comment 9

10 months ago
> > our very least favorite thing is a big DELETE.
> With the go-syncstorage server DELETE and vacuum are our friends now. 

TokuDB supposedly handles this a lot better on the MySQL side as well, although I don't think we've tried it out in practice.
(Reporter)

Comment 10

10 months ago
(In reply to Richard Newman [:rnewman] from comment #8)



> More noodling: it sounds like there's a tension here between the client's
> desire to have small, incremental payloads and an operational desire to have
> fewer records on the server.

I also did some noodling [1] and some napkin math with data from the go-sync server. Sampling ~10% of users I figure we're getting an average of 350MB/day of form BSOs. Over 3 years, we should have about 374GB of purely form BSO. At that point it should start levelling off. One of the design goals of the gosync server is to never have to replace a server and do mass user migrations. 

With an average form payload size of 244 bytes, that's about 1.64 billion form BSO records. I don't think that is a problem for the go-sync server.

For the go-sync server there's no tension around the number of records per user. Though, the additional storage costs should be considered. At our scale, those micro dollars do add up and that needs to be considered.

Also thanks for the background info on the forms bso evolution. This bug turned into more of an impact investigation of Bug 643633. 

[1] Thought I was the only one that used that term. :)
Status: NEW → RESOLVED
Last Resolved: 10 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.