If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

[meta] Next-generation LocalStorage (DOMStorage) implementation

ASSIGNED
Assigned to

Status

()

Core
DOM
P3
normal
ASSIGNED
a year ago
3 hours ago

People

(Reporter: janv, Assigned: janv)

Tracking

(Blocks: 8 bugs, {meta})

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox50 affected)

Details

(Whiteboard: [e10s-multi:-])

Attachments

(1 attachment)

(Assignee)

Description

a year ago
The next-generation should solve a bunch of problems:
- unification with other storage APIs (using QuotaManager)
- per origin database files
- database files in temporary storage (eviction of old data)
- e10s multi support
- move from PContent to PBackground
(Assignee)

Updated

a year ago
Blocks: 742822, 666724
(Assignee)

Updated

a year ago
Depends on: 1283609
Blocks: 1207306
Whiteboard: [e10s-multi:M?]
Whiteboard: [e10s-multi:M?] → [e10s-multi:M1]
(Assignee)

Comment 1

a year ago
Created attachment 8772366 [details] [diff] [review]
latest snappy
Assignee: nobody → jvarga
Status: NEW → ASSIGNED
Blocks: 1294386
My main profile's 100MB webappstore.sqlite has 2,659 distinct originKeys.

Does that imply that my profile will have a couple of thousand SQLite databases, and a WAL for each one that's open?

That might not work out on mobile, even if you manage to put these files in the Android Cache directory. Heck, it might not be too fun on my MacBook Pro…!

Am I misunderstanding your approach?
Blocks: 857888
Flags: needinfo?(jvarga)
(Assignee)

Comment 3

a year ago
I have a similar profile (60MB, around 1800 domains), so I'm testing with real world data.
Yes, local storage is special in many regards. It's synchronous API which makes it quite hard to plug into quota manager, but it's been here for some time and many sites use it. However, we never evicted unused data, so we ended up with 2-3 thousand distinct domains.
I'm thinking about creating separate databases only for new local storage and about converting origins with existing local storage lazily. I believe many of those distinct domains won't be ever converted, so we will have to delete them when other domains need to store more data.
Flags: needinfo?(jvarga)
(Assignee)

Comment 4

a year ago
And btw, I'm not going to use SQLite for new local storage implementation :)
\o/

I was worried about the per-database overhead for multi-tab mobile users! Glad to hear you have it all in hand :D
(Assignee)

Comment 6

a year ago
multi-tab shouldn't be such a big concern since local storage will be preloaded in memory so the per origin database can be "closed" after preloading
(Assignee)

Updated

a year ago
Blocks: 1296572
Whiteboard: [e10s-multi:M1] → [e10s-multi:?]
Whiteboard: [e10s-multi:?] → [e10s-multi:M3]
(Assignee)

Updated

11 months ago
Depends on: 768074

Updated

7 months ago
Whiteboard: [e10s-multi:M3] → [e10s-multi:-]
See Also: → bug 1350071
See Also: → bug 1362190
Blocks: 1350071
Keywords: meta
Summary: Next-generation DOMStorage implementation → [meta] Next-generation DOMStorage implementation
Priority: -- → P3
Blocks: 1378716
(Assignee)

Comment 7

9 days ago
(In reply to Jan Varga [:janv] from comment #0)
> The next-generation should solve a bunch of problems:
> - unification with other storage APIs (using QuotaManager)
> - per origin database files
> - database files in temporary storage (eviction of old data)
> - e10s multi support
> - move from PContent to PBackground

Just a quick update:
- e10s multi support landed, although there is room for improvements
- move from PContent to PBackground landed too, bug 1350637
I'm updating the subject to make this easier to find, and I want to give a dump of my current understanding of :janv's plan, as I think the design has a lot of wins, but there are trade-offs involved and we want any discussion/debate to happen sooner rather than later.

## Plan
My mental model of the overhaul plan is (Jan, please correct me if I'm wrong!):
- All (canonical) LocalStorageCache data state is held in the parent process.
- We have actors per-LocalStorageCache which are per-principal (that is, origin+attributes) instead of the current "one actor per process that is conceptually just a remoted database connection" so that we can scope events properly.
- All localStorage reads happen via sync IPC to PBackground.  (Or, if necessary, a new PLocalStorageIsDumbAndNeedsItsOwnThread.)
- localStorage writes probably also happen via sync IPC[1].
- Events happen via async IPC if there are any "storage" event listeners in a content process.  The subscription should happen synchronously for maximum correctness, but it's fine to subscribe async under the spec.  This means that a change can be perceived by reading localStorage for which a "storage" event has not yet been received.  This is fine and consistent with how LocalStorage already works in Chromium/Blink (and probably webkit too?).  It's also how we've worked for a while.
- We use QuotaManager, but the origin never gets more than 5MiB of LocalStorage space (+slack) no matter what.

1: At least some writes could be optimized to be async if we're willing to allow the origin to exceed quota.  For example, Chromium allows quota to be exceeded by 100k because of multi-process races.  Our implementation could similarly allow each process some "slack" quota that permits async write messages up to that quota (and assuming no space would be freed by overwrites) as long as we're (sufficiently) under quota.  If a write would exceed the slack size, a sync write is performed and resets the outstandingSlackBytes to 0.  In the best-case scenario, outstandingSlackBytes never exceeds the threshold, and is continually re-zeroed by async IPC messages that confirm the writes and current quota usage for the given cache.

## Upsides
We get the following wins out of this implementation strategy versus our current strategy:
- Reduced memory usage.  Right now each browser process that has a window/iframe loaded for an origin that uses LocalStorage will carry the memory burden of the origin's entire LocalStorage usage in every process with the window/iframe loaded.  For sites that use localStorage and are very popular or are embedded as 3rd-party iframes (and assuming we don't double-key them), this could be a lot of memory[2].
- Globally consistent, sequential view of localStorage across all processes (unless we make any caching optimizations).  The HTML spec provides us basically infinite wiggle room when multiple processes are involved, but I know of at least one instance of LocalStorage being used by a site as a mutex between multiple windows.  There are arguably no downsides 
- Improved page load performance where the origin has a ton of data stored in localStorage but only needs a small amount of it near startup.  If the origin has 5MiB stored and it uses localStorage at all, all 5 MiB needs to be loaded from disk and received by the thread responsible for storage.  And if we go synchronous (in the current implementation), that could mean a simple boolean check has to wait on 5 megs of data.  Based on our current DB strategy, the load is still unavoidable, but we avoid having to ship it over IPC or process that on the content process main thread.

2: Note that this regressed in bug 1285898 out of necessity.  Previously, we'd drop the storage if a page didn't actually use LocalStorage within ~20 seconds.  This could be added back without the overhaul by addressing bug 1350071.

## Downsides
- Synchronous IPC is synchronous IPC, and this approach means more of it, even if it's in smaller bite-sized pieces.  Although we can mitigate by letting Quantom DOM context-switch to other TabGroups, it's not a win if we're perceptibly stalling a foreground TabGroup the user cares about.

## Potential Mitigations
- Use some kind of clever multi-process shared-memory map to make all reads except the preload non-blocking, and all writes async.  This would increase complexity and lose the globally consistent sequence.
- Create a second mode of operation for localStorage-heavy origins.  Once an activity threshold is crossed,  cache the entirety of the state in the content process and rely on the "storage" events (which must be subscribed to even if not delivered to content pages) to update the local cache.  If there's heavy write-load, consider coalescing the writes locally by tracking dirty keys and flushing on a timer, much like we do in the database thread already.

## Meta: Metrics / Benchmarks / Synthetic Benchmarks

From my perspective, we don't have a good understanding of how LocalStorage is used in the wild or a clear goal to optimize for than to reduce the scalar telemetry LOCALDOMSTORAGE_GETVALUE_BLOCKING_MS and make the boolean telemetry LOCALDOMSTORAGE_PRELOAD_PENDING_ON_FIRST_ACCESS as false as possible.  These are mainly questions of moving the preload trigger as early as possible in the pipeline, optimizing the database queries, and reducing database fragmentation.  These are all orthogonal to whether we do the in-parent overhaul or instead massively clean-up the existing implementation.  (Although our current implementation is limited in how early we can move the preload trigger because of how broadcast is implemented.)

It might be beneficial to instrument our LocalStorage implementation with MOZ_LOG statements that can generate optionally-anonymized traces of localstorage usage.  They'd log all reads/writes as `origin "${hashy(perUserRandomKey, origin}" key "${hashy(perUserRandomKey, key)}" ${key.length} ${readOrWriteOrClear} ${value && value.length}`.  Some kind of "about:networking" "logging"-tab-style affordance could turn on the logs for helpful people, who could then provide us privately with their logs with the understanding that we could reverse things if we brute-forced the per-user-key/salt or did traffic-analysis, but that we would not do so and this would largely avoid us accidentally learning anything private when skimming or otherwise analyzing logs.

## Analytical tools.
### Per-origin via Devtools console
Paste this into a dev-tools console of any browser on any origin to get a report on what localStorage is up to:
```
(function() { var valueBytes=0; var keyBytes=0; for (var i=0; i < localStorage.length; i++) { var key = localStorage.key(i); keyBytes += key.length; valueBytes += localStorage[key].length; } console.log("localStorage has", localStorage.length, "keys and uses", keyBytes, "key bytes,", valueBytes, "value bytes,", keyBytes+valueBytes, "total bytes-ish"); })();
```
ex, my google.com in my Firefox primary profile:
localStorage has 177 keys and uses 5738 key bytes, 17683 value bytes, 23421 total bytes-ish
ex, my google.com in my Google Chrome MoCo profile:
localStorage has 168 keys and uses 3295 key bytes, 2623 value bytes, 5918 total bytes-ish

### Whole profile via browser console
Open up a browser console via ctrl-shift-j (you may need to enable it first): paste this in and run it:
```
Components.utils.import("resource://gre/modules/Sqlite.jsm"); (async function() { const conn = await Sqlite.openConnection({ path: "webappsstore.sqlite", sharedMemoryCache: false }); await conn.execute("select sum(length(key) + length(value))/1024 AS total, originkey from webappsstore2 GROUP BY originKey HAVING total > 1024 ORDER BY total ASC", [], (row) => { console.log(row.getResultByIndex(0), row.getResultByIndex(1)); }); conn.close(); console.log("That's it."); })()
```

Note that if you have no origins with over 1 meg, you'll just see "That's it."  You can adjust the query as needed.

### Whole profile via sqlite3 command line tool
Same as the browser console invocation above.

Run this inside a `sqlite3 webappsstore.sqlite` to get a rough list of per (reversed) origin localStorage usage in KiB for sites using more than a meg:
```
select sum(length(key) + length(value))/1024 AS total, originkey from webappsstore2 GROUP BY originKey HAVING total > 1024 ORDER BY total ASC;
```
or direct from the command-line in your profile directory:
```
sqlite3 webappsstore.sqlite "select sum(length(key) + length(value))/1024 AS total, originkey from webappsstore2 GROUP BY originKey HAVING total > 1024 ORDER BY total ASC;"
```
Summary: [meta] Next-generation DOMStorage implementation → [meta] Next-generation LocalStorage (DOMStorage) implementation
(Assignee)

Comment 9

8 days ago
(In reply to Andrew Sutherland [:asuth] from comment #8)

Thanks for the plan description, nice written!
As others can see, this is not trivial at all.

I have some more comments:
1. Regarding sync/async writes. We could avoid the slack if decide to keep the precise quota checks strategy by making QuotaObject::MaybeUpdateSize to work on PBackground thread (or special LocalStorage thread dedicated for IPC in the parent process).
The idea is that in LocalStorage it's easy to calculate how much data (or at least maximum) will be stored on disk. It's a simple key.length + value.length calculation. So we can try to synchronously pre-allocate the size (which would check the group limit and possibly free some space). Writes would then be asynchronously finished and after we know real size we can again update quota object (we could also try to compress in content process and send compressed bytes across IPC).
This can't work well with SQLite because it's hard to estimate how much data will actually be stored on disk, so I'm also considering to drop SQLite in LocalStorage and replace it with own simple db system.
All we need is to write key/value pairs using snappy compression and splitting these pairs across multiple files (to avoid 5 MB writes when only 1 byte is changed in the worst case). The splitting can be done based on hashing the key or something like that.
I already experimented with the simple db system. (Note this is not simpledb, bug 1361330).

2. Hopefully sync IPC reads won't be a performance problem since it makes everything much cleaner and less complicated.

3. Local storage db can contain thousands of unique domains. The current quota manager storage placement on disk is a directory per origin and these directories must be scanned during default/temporary storage initialization.
However, we don't need to create all of them. We can do it on demand when they are actually needed. Something like lazy upgrade of old Local Storage db.
This can be improved even more and have a zip file with compressed (unused) origin directories and unzip them when needed.
Finally, we can also cover a case when there's plenty of space on disk so origin directories never get evicted, but the number of origin directories gets quite big after some time. We can use the same archive/zip file to archive unused origin directories.

Of course, we don't have to implement everything at once.
(In reply to Jan Varga [:janv] from comment #9)
> This can't work well with SQLite because it's hard to estimate how much data
> will actually be stored on disk, so I'm also considering to drop SQLite in
> LocalStorage and replace it with own simple db system.
> All we need is to write key/value pairs using snappy compression and
> splitting these pairs across multiple files (to avoid 5 MB writes when only
> 1 byte is changed in the worst case). The splitting can be done based on
> hashing the key or something like that.
> I already experimented with the simple db system. (Note this is not
> simpledb, bug 1361330).

Can we dig into this a bit more?

I agree that given our usage pattern for localStorage, the main thing SQLite is doing is bounding I/O for small writes, as you say.  But I don't view the SQLite disk overhead as particularly concerning[1].  At least not worth the maintenance overhead of implementing a persistence layer that has to juggle multiple files.

That's not to say there isn't a gap in our set of persistence solutions, especially from C++ code.  We basically have SQLite and the ServiceWorkerRegistrar-style "sketchily read and write a text-file that's a downgrade nightmare" strategies.  JS is in better shape because it has easy JSON flat-file serialization and IndexedDB.

My thoughts in this area were that now that we can write rust code we can use its "serde" serialization/deserialization library to allow our platform code to do easily do JSON-style serialization.  And it seems conceivable that there is some middle-ground between flat-file and SQLite, but I think we want there to be an existing open source project we build around that's aligned with our use-case.  Or at least, it should be more than just you taking on an embedded database engine as part of your day job :)

Note that MoCo is a member of the SQLite consortium (https://sqlite.org/consortium.html) which means we actually can get a lot of their time; we're already paying for it, we're just not using it.  I'm sure they'd be happy to advise us on how to optimize for the localStorage case.  And it's worth pointing out that SQLite has a lot of extensibility.  For example, there's the CSV virtual table at https://sqlite.org/csv.html, plus, as you know, there's the VFS layer.  And as a virtual table, we'd get to use all of our existing mozStorage bindings, any rust bindings we adopt, etc.

1: The localStorage 5meg limit is in many ways orthogonal to the QuotaManager limit.  It's a policy decision from the pre-QuotaManager era that now is more about bounding memory use from LocalStorage-sites than it's about disk space.  To avoid breaking on sites that depend on the limit as we previously implemented it, we need to give them that space as we originally measured it, roughly speaking.  QuotaManager does need to deal in real-world disk usage, but there's no reason it has to insist on the localStorage on-disk usage being exactly 5 megabytes.  It can give it 10.  It should enforce our more generous upper limits, which we may need to boost slightly with the localStorage move to QM, but those are more abstract as far as the various origins are concerned.
Flags: needinfo?(jvarga)
(Assignee)

Comment 11

8 days ago
I haven't made any decision regarding database system for LocalStorage, I just experimented with own one :)
I'm all for an existing open source system that fits our needs.
2¢ on this: adding another database tool of almost any kind is a non-trivial installer size bump, and that really matters on mobile, particularly if it's a tool that's only used for one specific case.

That means I tend to see fewer middle grounds between flat files (which have lots of problems) and SQLite (which solves lots of problems) -- the only acceptable middle grounds are those that have a future across a number of components and kill a bunch of other code, because we don't want to pay a few MB of library size just for one feature.

I say that as someone who very much thinks adding a suitable database tool to the codebase is a good idea!
(Assignee)

Comment 13

7 days ago
If we go the SQLite route here's proposed db schema:

CREATE TABLE database (
  origin TEXT NOT NULL
  last_vacuum_time INTEGER NOT NULL DEFAULT 0,
  last_analyze_time INTEGER NOT NULL DEFAULT 0,
  last_vacuum_size INTEGER NOT NULL DEFAULT 0
)  WITHOUT ROWID;

CREATE TABLE data (
  key TEXT PRIMARY KEY,
  value TEXT NOT NULL,
  keyCompressed INTEGET NOT NULL,
  valueCompressed INTEGET NOT NULL,
  lastAccessTime INTEGER NOT NULL DEFAULT 0
);

CREATE INDEX data_key ON data(key);

Features (some of them just for consideration):
- WAL is probably not needed for such small databases
- queue changes coming from content processes and apply them to SQLite periodically or when the size gets over a threshold
- after a database update, check if it's worth to vacuum the database (or vacuum only when we get the idle notification)
- compress keys and values if the length is greater than a threshold, if compressed data isn't significantly smaller, keep key/value uncompressed
- compression/decompression would happen in content processes
- pre-loaded data could stay compressed to send less bytes across IPC either for normal operations or for distributing the change events
- every key/value pair would have last access time, if such a pair is not used for long time, it doesn't have to be preloaded next time we preload data for entire origin
- lastAccessTime may also serve for clearing session data based on time constraints ?

It would be nice if we could calculate the size needed by SQLite on disk for given key/value pair in the worst case (no free blocks in the database).
That way we could just synchronously pre-allocate space using QuotaManager before setItem() call is finished.
The pre-allocated space would then be freed once a database update is finished.
Flags: needinfo?(jvarga)
You need to log in before you can comment on or make changes to this bug.