Closed
Bug 768663
Opened 13 years ago
Closed 11 years ago
Review operational implications of storing one record per tab
Categories
(Cloud Services Graveyard :: Server: Sync, defect)
Cloud Services Graveyard
Server: Sync
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: rfkelly, Assigned: rfkelly)
Details
(Whiteboard: [qa-])
Client proposes to split the current monolithic "tabs" record into a record per tab, and add some additional information to each tab (page position, session cookies, favicon, form data, anything you might get from a session restore).
These records will be big, and there may be several hundred of them per user - in the meeting we had one person with 300 open tabs, another with 400. So we need to consider whether we can handle with this the current setup, or if we need to tweak some things server-side to make it work better.
Assignee | ||
Comment 1•13 years ago
|
||
Client team, can you give a ballpark figure for the expected size of one of these records?
Assignee: nobody → rfkelly
Comment 2•13 years ago
|
||
It depends what we put in the tab record.
I think we will trend towards replicating session store over time. My sessionrestore.js file is ~130k for ~30 tabs. We also need to throw in about 1kb for a favicon. So, let's call it 6kb per tab.
Of course, if you have a novel saved in a <textarea>, that could easily be >100kb. That would be the exception to the rule, I would think.
I think we have user studies showing that the overwhelming majority of people only use at most 4 or 5 tabs at a time. How that sample set overlaps with the typical sync user, I have no clue.
Assignee | ||
Comment 3•13 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #2)
> Of course, if you have a novel saved in a <textarea>, that could easily be
> >100kb. That would be the exception to the rule, I would think.
Sooner or later this will hit the 256K limit on individual BSO payloads. Just something to bear in mind.
:atoll or :telliott, what is the history and rationale behind storing tabs in memcache rather than in the database, as we do currently?
(In reply to Ryan Kelly [:rfkelly] from comment #3)
> :atoll or :telliott, what is the history and rationale behind storing tabs
> in memcache rather than in the database, as we do currently?
I am not familiar with the rationale, it was before my time.
Comment 5•13 years ago
|
||
(In reply to Ryan Kelly [:rfkelly] from comment #3)
> :atoll or :telliott, what is the history and rationale behind storing tabs
> in memcache rather than in the database, as we do currently?
telliott will likely give more accurate reportage, but I seem to recall that the rationale was that almost every tab sync will replace the entire record, and it's eminently recoverable -- no desire for persistence. That rationale has probably been obsoleted, particularly after Instant Sync.
OS: Windows 7 → All
Hardware: x86_64 → All
Comment 6•13 years ago
|
||
two main reasons:
1) The data is not terribly valuable. If we lose tab records, it's unlikely to matter
2) The data turns over quickly and with short effective-ttls. Most tabs last a minute or two, then go away and either never get synced or are immediately deleted upon the next sync. As a result, the items are largeish and rapidly change.
The second reason is actually the main operational implication to moving to one-record-per-tab - closed tabs are going to persist unless we move towards a delete-on-close model (which has its own set of problems). That's going to leave a lot more clutter on the db or in memcache.
Comment 7•13 years ago
|
||
Further clarification on point 1: the data is not "valuable" in that if we lose it, it's not lost - if the tab is still open upon next sync, it reappears in the monolithic blob. Unlike, say, history, where the client will not resync a record it thinks it has already synced, there's almost no risk of permanent relevant data loss here.
Note that we're now paying a good chunk of money to store, among other things, tab data in a persistent store where they are not lost.
We do not consider "losing" tabs to be any more acceptable than "losing" history, no matter what the original design spec was.
Updated•13 years ago
|
Whiteboard: [qa?]
Updated•13 years ago
|
Whiteboard: [qa?] → [qa-]
Comment 9•13 years ago
|
||
OK, what's the next step here?
Comment 10•13 years ago
|
||
gps and I had a long chat yesterday about this, history and commands yesterday. I think consensus was we liked history, had some interesting ideas for commands, and tabs pose a problem as currently structured, especially in the context of tab groups.
The core issue is that we currently sync things with at least some idea of persistence. Even history sticks around for a while. Whereas tabs can come and go in the space of seconds/minutes. The current system is good because it's modeled around the client, and the client is a persistent thing, even if the data itself updates quite a lot.
The "correct" way to do this is probably to sync tabs and tab groups separately, with tab groups being pointers to tab records, then have sync not auto-download individual tab records. In order to make this work, though, tab records need to have very short ttls, or they'll persist long after most windows are closed. That means you'll have pointers in the tab groups to records that no longer exist on the server. If that's not going to be OK, we're going to end up with a lot of no-longer-existent tabs still persisting on the dbs - not the end of the world, but still pretty inefficient. And we'll end up doing a lot more db queries, as tabs change a lot.
gps may have other thoughts from our conversation to add here.
Comment 11•13 years ago
|
||
(In reply to Toby Elliott [:telliott] from comment #10)
> The "correct" way to do this is probably to sync tabs and tab groups
> separately, with tab groups being pointers to tab records, then have sync
> not auto-download individual tab records.
That sounds reasonable.
> In order to make this work,
> though, tab records need to have very short ttls, or they'll persist long
> after most windows are closed.
Remember how we allowed you to refresh TTLs with a POST in 1.x? Maybe we need to bring that back, because I sure don't want Firefox to be uploading 300 tabs every 8 hours.
(And maybe this doesn't work anyway; we need a multi-day TTL for these, so that the feature is useful when you go away for a weekend, but how many tabs does a heavy user open in a week?)
> That means you'll have pointers in the tab
> groups to records that no longer exist on the server.
That's probably going to happen if we sync places, anyway. We need to think about a pattern for broken-but-restorable pseudo-foreign keys.
> be OK, we're going to end up with a lot of no-longer-existent tabs still
> persisting on the dbs - not the end of the world, but still pretty
> inefficient. And we'll end up doing a lot more db queries, as tabs change a
> lot.
I think batch cleanup from clients is the only solution here. And filtering would really help, too; we don't want clients just deleting everything, but maybe tracking which records to delete is too much.
Assignee | ||
Comment 12•13 years ago
|
||
(In reply to Richard Newman [:rnewman] from comment #11)
> Remember how we allowed you to refresh TTLs with a POST in 1.x? Maybe we
> need to bring that back, because I sure don't want Firefox to be uploading
> 300 tabs every 8 hours.
Do you mean the ability to POST a set of BSO objects without payloads, to overwrite the metadata of already existing records? Sync2 still has this for PUTs to individual records but not for POST of multiple records at once.
IIRC we removed it because it's fiddly to implement efficiently with batch insert/update operations. It would be possible to bring it back if there's value to be had from it, since I think it makes sense at a protocol level.
Assignee | ||
Comment 13•13 years ago
|
||
One other option I was thinking about for this "grouping" construct is just putting different tab groups in different collections. Each group could go in a collection named "tabs.GROUPNAME" and be operated on independently. This might save you having to maintain a top-level mapping from group names to individual records.
Comment 14•13 years ago
|
||
(In reply to Ryan Kelly [:rfkelly] from comment #13)
> One other option I was thinking about for this "grouping" construct is just
> putting different tab groups in different collections. Each group could go
> in a collection named "tabs.GROUPNAME" and be operated on independently.
> This might save you having to maintain a top-level mapping from group names
> to individual records.
Implications of this:
* Two-stage lookup for collection => group mappings.
* One fetch per group, rather than one fetch for tabs. Tab groups can be many.
* info/collections would now change every time you add a group.
* Collections don't expire; items do.
(In reply to Ryan Kelly [:rfkelly] from comment #12)
> (In reply to Richard Newman [:rnewman] from comment #11)
> > Remember how we allowed you to refresh TTLs with a POST in 1.x? Maybe we
> > need to bring that back, because I sure don't want Firefox to be uploading
> > 300 tabs every 8 hours.
>
> Do you mean the ability to POST a set of BSO objects without payloads, to
> overwrite the metadata of already existing records? Sync2 still has this
> for PUTs to individual records but not for POST of multiple records at once.
Yes, exactly. This would be essentially "refresh the TTLs of these tabs, without uploading 5MB".
> IIRC we removed it because it's fiddly to implement efficiently with batch
> insert/update operations. It would be possible to bring it back if there's
> value to be had from it, since I think it makes sense at a protocol level.
Yeah, and we weren't using it. If we go this approach, we'll probably want frequent TTL refreshes.
![]() |
||
Comment 15•13 years ago
|
||
(In reply to Richard Newman [:rnewman] from comment #14)
> (In reply to Ryan Kelly [:rfkelly] from comment #13)
> > One other option I was thinking about for this "grouping" construct is just
> > putting different tab groups in different collections. Each group could go
> > in a collection named "tabs.GROUPNAME" and be operated on independently.
> > This might save you having to maintain a top-level mapping from group names
> > to individual records.
>
> Implications of this:
>
> * Two-stage lookup for collection => group mappings.
> * One fetch per group, rather than one fetch for tabs. Tab groups can be
> many.
> * info/collections would now change every time you add a group.
> * Collections don't expire; items do.
Rather than overloading collections, you could bake a new concept of "groups" into the schema, and then set up background pruning.
CREATE TABLE groups (group_id, group_name);
CREATE TABLE tabs (tab_id, user_id, group_id, tab_data);
DELETE FROM groups LEFT JOIN tabs ON groups.group_id = tabs.group_id WHERE tabs.group_id IS NULL;
Assignee | ||
Comment 16•13 years ago
|
||
Added Bug 784599 to bring back the bulk-update-of-ttls functionality. After the recent db-access refactoring I think it's clear how I can implement it properly.
Assignee | ||
Comment 17•13 years ago
|
||
Unblocking sync2.0-protocol-definition bug, blocking sync2.0-imeplementation bug. This doesn't seem to call for a change of protocol, but may affect how to manage the "tabs" collection under the hood.
Assignee | ||
Comment 18•11 years ago
|
||
We're building a whole new "task continuity" infra that will allow us to do new things with tabs and other data; I don't think there's any value in keeping this bug alive.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
Updated•2 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•