Use the size of structured clone data to choose whether it will be stored in the database or in the filesystem

RESOLVED FIXED in Firefox 52

Status

()

defect
RESOLVED FIXED
6 years ago
3 years ago

People

(Reporter: bent.mozilla, Assigned: janv)

Tracking

Trunk
mozilla52
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox52 fixed)

Details

(Whiteboard: [games])

Attachments

(11 attachments, 8 obsolete attachments)

14.37 KB, patch
asuth
: review+
Details | Diff | Splinter Review
5.88 KB, patch
asuth
: review+
Details | Diff | Splinter Review
4.25 KB, patch
asuth
: review+
Details | Diff | Splinter Review
4.36 KB, patch
asuth
: review+
Details | Diff | Splinter Review
5.31 KB, patch
asuth
: review+
Details | Diff | Splinter Review
20.26 KB, patch
asuth
: review+
Details | Diff | Splinter Review
7.69 KB, patch
asuth
: review+
Details | Diff | Splinter Review
7.55 KB, patch
asuth
: review+
Details | Diff | Splinter Review
9.88 KB, patch
asuth
: review+
Details | Diff | Splinter Review
2.39 KB, patch
mak
: review+
Details | Diff | Splinter Review
1.13 KB, patch
asuth
: review+
Details | Diff | Splinter Review
Currently all structured clone data (including ArrayBuffer contents) is stored in the database while all Blob/File objects are always stored as actual files in the filesystem. Neither decision takes the data size into account so in the worst case we can have very large values in the database and many tiny files in the filesystem. We should be smarter about this and consistently store small values in the database and larger values in the filesystem.

The size threshold will probably need to be tweakable via pref so that mobile/desktop can have different values.
In particular, this would greatly help in avoiding the general problem described in bug 933398 comment 0.  We'd still have the 1 big copy from the ArrayBuffer (holding the VFS) to the structured clone buffer, but at least we'd avoid the subsequent copy when compressing and again later when storing in sqlite.  (To avoid the structured clone buffer copy, we discussed an orthogonal optimization.)
Blocks: 933398
Whiteboard: [games]
I did a little investigation of Epic's Citadel demo.

<profile>/storage/persistenct/http+++www.unrealengine.com/idb/1873003896EEMH_CPARCE_LDOA
- empty directory (no stored files obviously)

sqlite3 <profile>/storage/persistenct/http+++www.unrealengine.com/idb/1873003896EEMH_CPARCE_LDOA.sqlite
sqlite> pragma page_size;
32768

sqlite> select length(data) from object_data;
102690803
105

so, just 2 rows, one of them almost 100MB (compressed)

There are some performance numbers about whether to store blobs in the database or a filesystem.
It seems the limit is 100k.

We can make this number dependent on the platform, and maybe even configurable from JS, e.g.
var request = indexedDB.open("myDb", 1, { maxInternalBlobSize: 102400 });
We need this to avoid creating an extra buffer in IndexedDB implementation.
Assignee: nobody → Jan.Varga
Attachment #8366754 - Flags: review?(luke)
Comment on attachment 8366754 [details] [diff] [review]
Add JSAutoStructuredCloneBuffer::Copy() which takes a PRFileDesc

Review of attachment 8366754 [details] [diff] [review]:
-----------------------------------------------------------------

Thus far we've tried to keep our dependency on NSPR quite limited.  Do you suppose you could change the API to take an 'int fd' instead so we can just use plain posix freads instead?  MSDN suggests this will just work on MSVC.
ah, ok, I'll rework it using plain posix freads.
Posted patch patch v2 (obsolete) — Splinter Review
Attachment #8366754 - Attachment is obsolete: true
Attachment #8366754 - Flags: review?(luke)
Attachment #8366857 - Flags: review?(luke)
Comment on attachment 8366857 [details] [diff] [review]
patch v2

Review of attachment 8366857 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/StructuredClone.cpp
@@ +44,5 @@
>  
>  #include "jscntxtinlines.h"
>  #include "jsobjinlines.h"
>  
> +#define FILE_COPY_BUFFER_SIZE 32768

Could you move this to a static const unsigned inside JSAutoStructuredCloneBuffer::copy?

@@ +1688,5 @@
> +bool
> +JSAutoStructuredCloneBuffer::copy(FILE *fd, size_t size, uint32_t version)
> +{
> +    uint64_t *newData = static_cast<uint64_t *>(js_malloc(size));
> +    if (!newData)

Could you use a ScopedJSFreePtr<uint64_t>?

@@ +1707,5 @@
> +        }
> +
> +        js_memcpy(pos, copyBuffer, bytesRead);
> +        pos += bytesRead;
> +    } while (true);

Could you add a JS_ASSERT((char*)newData + size == (char*)pos) at the end of the loop?
Attachment #8366857 - Flags: review?(luke) → review+
Depends on: 965084
I think storing Blobs in the database if they are small enough makes a lot of sense.

However I don't think we should be in a hurry to add the complexity involved in storing large structured clone data outside of the database until we have data showing that there's significant performance improvements.

Storing large data in such that it doesn't get in the way of performance of database lookups seems like the job of SQLite. Do we know what it does as the data gets large?

Storing values of the size of 100MB is never going to be fast. No matter where the data is stored we're going to have to read 100MB into memory. This is hardly a common case.

The fact that the games code is doing it right now is hopefully temporary. As I understand it it's a workaround since they need synchronous access to a filesystem. So they keep the whole filesystem in memory. Once we have something like sync IDB or the sync message channel, that will be solved in much better ways by only loading the data that's needed into memory.
Well, it seems that collapsing of small Blobs is harder than storing structured clones outside of the database.
I have the latter mostly implemented.

I agree that long term, they need real sync access to a filesystem, but in order to do that they also need WebGL on workers (that's what Luke told me), so it will take some time. For now, we could fix the crashes at least, by lowering number of allocated buffers and storing large structured clones externally.
(In reply to Jonas Sicking (:sicking) from comment #8)
> Storing large data in such that it doesn't get in the way of performance of
> database lookups seems like the job of SQLite. Do we know what it does as
> the data gets large?

Copies happen even before it gets into the hands of SQLite.  It's copies all the way down :)

> Storing values of the size of 100MB is never going to be fast. No matter
> where the data is stored we're going to have to read 100MB into memory. This
> is hardly a common case.

It doesn't need to be fast, it needs to avoid keeping multiple full-size copies of the structured clone buffer in memory simultaneously.  It seems like this would be really important for FFOS as well where the memory spike from much smaller payloads would unacceptable.  Gregor was telling me yesterday he has seen this be a problem.

> The fact that the games code is doing it right now is hopefully temporary.

Do you expect that *everything* a game needs will be available in Web Workers in less than a year?
btw, https://www.sqlite.org/intern-v-extern-blob.html, that's the page where I found the 100k boundary.
Attachment #8366857 - Attachment is obsolete: true
Attachment #8367607 - Flags: review?(luke)
Attachment #8367607 - Flags: review?(luke) → review+
I posted all the patches on try:
https://tbpl.mozilla.org/?tree=Try&rev=43244dabf359

and also with enforced external storage of structured clones:
https://tbpl.mozilla.org/?tree=Try&rev=2ec7c5676cf4

I will need some help here to do some better testing on mobile before landing.
Depends on: 965982
try run with default "max internal blob size" (100 KB on android, 1 MB on other platforms):
https://tbpl.mozilla.org/?tree=Try&rev=a449f823796e

try run with "max internal blob size" set to 0 (every structured clone stored externally):
https://tbpl.mozilla.org/?tree=Try&rev=0bc8c6d7c317

The bc failures are caused by mochitest timeouts, it's interesting that especially windows 7 and windows 8 is very slow when storing a lot of tiny files.
Posted patch patch v1 (obsolete) — Splinter Review
ok, this is the main patch
Attachment #8368504 - Flags: review?(bent.mozilla)
Depends on: 966233
Random comment on HN complaining about the specific issue this bug solves:
https://news.ycombinator.com/item?id=7068171
Ok, so who is going to make the final decision whether to take this patch or not ?
It's unclear to me where this patch is attempting to prevent copies? I.e. what copies that the current code does is this patch attempting to eliminate? The fact that we're writing to sqlite rather than a separate file doesn't seem to be what creates more copies.

Also, do we really need a new column here? Couldn't we simply write the file-id in the existing file-ids column and leave the data empty to indicate that the data lives in a file rather than the database?
But to more concretely answer your question, I think it's Ben's decision what to do here.
(In reply to Jonas Sicking (:sicking) from comment #18)
> It's unclear to me where this patch is attempting to prevent copies? I.e.
> what copies that the current code does is this patch attempting to
> eliminate? The fact that we're writing to sqlite rather than a separate file
> doesn't seem to be what creates more copies.

I think mozStorage or sqlite creates a copy, but there's also the compression which is done before writing to sqlite, so it seems that this patch eliminates 2 copies.

> Also, do we really need a new column here? Couldn't we simply write the
> file-id in the existing file-ids column and leave the data empty to indicate
> that the data lives in a file rather than the database?

Yes, that might be possible, good catch.
> I think mozStorage or sqlite creates a copy,

Ben thought otherwise, but I'll leave it to him.

> but there's also the compression which is done before writing to sqlite,

Compression seems orthogonal to where we store things. If want to eliminate the copy that compression creates, we can do that while still writing into the database, no?
(In reply to Jonas Sicking (:sicking) from comment #21)
> > I think mozStorage or sqlite creates a copy,
> 
> Ben thought otherwise, but I'll leave it to him.

We have this comment in IDBObjectStore.cpp:
// This will hold our compressed data until the end of the method. The
// BindBlobByName function will copy it.

but it seems it's not correct, hm

> 
> > but there's also the compression which is done before writing to sqlite,
> 
> Compression seems orthogonal to where we store things. If want to eliminate
> the copy that compression creates, we can do that while still writing into
> the database, no?

Well, that's possible, but snappy doesn't support compression in chunks :(
and switching to some other library doesn't sound like a good solution.

Anyway, I still think that we should support storing big structured clones in separate files. We can set the limit high, like several megabytes. There always be people who store big things in the database and then complain about performance.
I don't know all the internals of sqlite, but I have a feeling that even when we eliminate the extra copy for compression, it's better to not store huge blobs in the database.

I think we should also ask someone from sqlite devs.
Flags: needinfo?(drh)
Storing things outside of the database for performance reasons seems like a good idea. It just sounded like we were storing it outside of the database in order to eliminate copies which didn't seem accurate. We can choose to store both compressed and uncompressed data both inside and outside the database.
(In reply to Jan Varga [:janv] from comment #23)
> I don't know all the internals of sqlite, but I have a feeling that even
> when we eliminate the extra copy for compression, it's better to not store
> huge blobs in the database.
> 
> I think we should also ask someone from sqlite devs.

For "smaller" blobs, it is faster to store them in the SQLite database.  SQLite is slower to write content to disk (because of the extra formatting it has to do, and because of transaction control) but storing in the database eliminates an open()/close() operation, which can be expensive.

The cutoff for what constitutes a "small" blob varies by OS and hardware.  For tests we ran on Linux, the cutoff point was about 100KB.  In other words, for blobs smaller than about 100KB, it is faster to store them as fields in an SQLite database than it is to write them into separate files.  See http://www.sqlite.org/intern-v-extern-blob.html for details.

Note that SQLite rewrites an entire row whenever you change any column in that row.  So if you do put massive blobs in a database, it is best to put them in their own table that assigned each blob an integer tracking number:

    CREATE TABLE somekindablob(id INTEGER PRIMARY KEY, content BLOB);

Then use the integer tracking number in the table that contains all the other properties of the blob.  Or if you really, really must put the large blob in the same table with other fields, at least put the large blob at the very end, so that the other fields can be read without having to seek past the entire blob.
Flags: needinfo?(drh)
(In reply to D. Richard Hipp from comment #25)
> Or if you really, really must put the large
> blob in the same table with other fields, at least put the large blob at the
> very end, so that the other fields can be read without having to seek past
> the entire blob.

bent already did this optimization
1. I think we should pick a threshold, maybe 50k based on the chart drh provided in comment 25.
2. We should ask snappy how much space it needs to store our blob or structured clone data.
3. Anything smaller than the threshold should be compressed and stored in the database. I don't think we need to change our schema at all for this.
4. Anything larger should be stored outside the database.
5. I think it makes sense to store blobs uncompressed (they're likely in a format that is already compressed anyway).
6. Structured clone data maybe should always be compressed since our format is not very efficient (lots of 0s last I looked). However, this requires increased memory to give snappy a buffer to work with. We could try tricks like stitching together chunked snappy buffers but we'd need to see if that costs more performance than the speedup we get from writing less data.
Comment on attachment 8368504 [details] [diff] [review]
patch v1

Canceling review until we get a new patch that avoids the extra column.
Attachment #8368504 - Flags: review?(bent.mozilla)
When I force to store all structured clones externally all tests pass except browser_quotaPromptDelete.js and browser_quotaPromptDeny.js:
https://tbpl.mozilla.org/?tree=Try&rev=1daa655e88d4
It's not a big deal, this is just an edge case that shouldn't happen during normal browser-chrome-mochitest runs, but it would be better to fix it.
However, it seems that I need to fix bug 856921 first.
Depends on: 856921
Posted patch Part 3: Fix tests (obsolete) — Splinter Review
Attachment #8400667 - Flags: review?(bent.mozilla)
Replaced custom FileAutoCloser with ScopedCloseFile from FileUtils.h
Attachment #8401073 - Flags: review?(bent.mozilla)
Attachment #8400666 - Attachment is obsolete: true
Attachment #8400666 - Flags: review?(bent.mozilla)
When the structured clone data is stored as a file, will it also be compressed?
(In reply to Luke Wagner [:luke] from comment #36)
> When the structured clone data is stored as a file, will it also be
> compressed?

The current patch doesn't compress it to avoid additional memory copy since snappy doesn't support streaming at the moment. We could do the trick Ben mentioned, but that can be done later.
Hm, this reminds me that I wanted to create a flag for this, so it would be easy to add such support in future, w/o converting/compressing all stored files. Thanks for reminding me :)
Attachment #8401073 - Flags: review?(bent.mozilla)
I didn't look too closely at the patch, but do make sure that we properly handle things like

oS.put({ a: reallyHugeArrayBuffer, b: blob1, c: blob2 });

Also make sure to test this of course.
(In reply to Jonas Sicking (:sicking) from comment #38)
> I didn't look too closely at the patch, but do make sure that we properly
> handle things like
> 
> oS.put({ a: reallyHugeArrayBuffer, b: blob1, c: blob2 });
> 
> Also make sure to test this of course.

That should work correctly, but I'll add a test for it.
I canceled the review because I forgot to add a flag for the compression of externally stored structured clone data.
I'll attach the test separately.
Jan, is this urgent or can we get to this after Jamun is done?  Thanks!
Flags: needinfo?(Jan.Varga)
Also maybe a question for luke ^
Flags: needinfo?(luke)
I don't know of any super-pressing needs for this feature, just that we've hit this (comment 1, and again more recently) and had to tell people to change the code to store incrementally in smaller ABs.
Flags: needinfo?(luke)
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #42)
> Jan, is this urgent or can we get to this after Jamun is done?  Thanks!

Yeah, we'll get to it after Jamun is done.
Flags: needinfo?(Jan.Varga)
Comment on attachment 8400667 [details] [diff] [review]
Part 3: Fix tests

Clearing review request for now.
Attachment #8400667 - Flags: review?(bent.mozilla)
Attachment #8406806 - Flags: review?(bent.mozilla)
I'm going to try to resurrect this patch.
IDB code should be pretty stable now, the only other relevant big change pending is locale-aware indexes (bug 871846).
I have a patch queue for this:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=dfb8873f8462600cd2bdcb6f1240fe15144dfcd1

It's a prerequisite for WASM in IDB.

This version also supports compression of externally stored structured clone data w/o
creating flat buffers.
So there are no big buffer copies. I created an input stream wrapper for structured clone data
and compression is done using the snappy compression output stream.

I also upgraded the snappy library to the latest version which gives better compression ratio
and is also slightly faster.
Blocks: 1276029
Depends on: 1272855, 768074
Attachment #8367607 - Attachment is obsolete: true
Attachment #8400667 - Attachment is obsolete: true
Attachment #8406806 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Blocks: 1311466
No longer blocks: 1276029
Attachment #8802726 - Flags: review?(bugmail)
Attachment #8802729 - Attachment description: Part 8: Disable externally stored structured clone dat a in file tests; r=asuth → Part 8: Disable externally stored structured clone data in file tests; r=asuth
Attachment #8802730 - Attachment description: Part 9: Add a new test for externally stored structure d clone data; r=asuth → Part 9: Add a new test for externally stored structured clone data; r=asuth
Comment on attachment 8802725 [details] [diff] [review]
Part 4: Update keys directly in the structured clone buffer; r=asuth

Review of attachment 8802725 [details] [diff] [review]:
-----------------------------------------------------------------

This is needed because when we compress externally stored structured clone data we don't need to create one big flat buffer. We read from structured clone buffers directly, so the key must be updated there. The writing is a bit complicated since we can cross buffers after each char of the key in theory.
Comment on attachment 8802726 [details] [diff] [review]
Part 5: Add a data threshold pref; r=asuth

Review of attachment 8802726 [details] [diff] [review]:
-----------------------------------------------------------------

I decided to have 1 MB threshold for now. We can lower it once we are confident there are no bugs related to this new feature.
Comment on attachment 8802727 [details] [diff] [review]
Part 6: Core changes for storing structured clone data outside of the database; r=asuth

Review of attachment 8802727 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/indexedDB/ActorsParent.cpp
@@ +25726,5 @@
> +  if (mDataOverThreshold) {
> +    // Higher 32 bits reserved for flags like compressed/uncompressed. For now,
> +    // we don't compress externally stored structured clone data since snappy
> +    // doesn't support streaming yet and we don't want to allocate another
> +    // large memory buffer for that.

This comment is updated in following patch for compression.
Comment on attachment 8802729 [details] [diff] [review]
Part 8: Disable externally stored structured clone data in file tests; r=asuth

Review of attachment 8802729 [details] [diff] [review]:
-----------------------------------------------------------------

We need this just in case someone runs all the tests with the threshold set to 0.
In that case all structured clones need to allocate a file id.
However, file tests have static assumptions about blob file ids, so file tests don't pass.
It would be really hard and messy to fix that.
Comment on attachment 8802722 [details] [diff] [review]
Part 1: Use a type in StructuredCloneFile instead of a boolean; r=asuth

Review of attachment 8802722 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/indexedDB/ActorsParent.cpp
@@ +8162,5 @@
>  };
>  
>  struct ObjectStoreAddOrPutRequestOp::StoredFileInfo final
>  {
> +  enum Type

note to other readers: Happily this duplicated enum type is removed in bug 1311466's part 1 patch.
Attachment #8802722 - Flags: review?(bugmail) → review+
Attachment #8802723 - Flags: review?(bugmail) → review+
Attachment #8802724 - Flags: review?(bugmail) → review+
Attachment #8802725 - Flags: review?(bugmail) → review+
Comment on attachment 8802726 [details] [diff] [review]
Part 5: Add a data threshold pref; r=asuth

Review of attachment 8802726 [details] [diff] [review]:
-----------------------------------------------------------------

Looking forward to having the threshold lowered!  I expect the target would want to be something in the neighborhood of PAGE_SIZE/20 (per https://www.sqlite.org/withoutrowid.html#when_to_use_without_rowid) since object_data is WITHOUT ROWID.  Of course, depending on per-file overhead, it may make sense to adjust the value upwards and/or consider introducing an additional file type for sizes from PAGE/20 up to 4*PAGE_SIZE where we use a dedicated blob table in the database.  Until/unless we have a blob pool database, a value around PAGE_SIZE might be least bad.

Specifically, when referring to per-file overhead, I mean:
- file system overhead for opening a new file, especially:
  - if we continue to use a flat directory structure and run the risk of file-systems not being happy about us accumulating tens of thousands of files in a single directory and encountering scale problems there.
  - if we find that antivirus software is increasing latencies.  We definitely would want to benchmark with a few AV solutions.
- disk waste/storage bloat if the file-system requires that at least one disk sector is used per file, etc.

::: dom/indexedDB/IndexedDatabaseManager.cpp
@@ +398,5 @@
>  #endif
>    Preferences::RegisterCallbackAndCall(LoggingModePrefChangedCallback,
>                                         kPrefLoggingEnabled);
>  
> +  Preferences::RegisterCallbackAndCall(DataThresholdPrefChangedCallback,

note for other readers: I had a question about using the callback idiom versus adding more atomic-capable cache variants like AddAtomicUintVarCache but part 8 adds logic that upgrades -1 to INT32_MAX which seems like an appropriate use for continuing the existing pattern.  (The AtomicBoolPrefChangedCallback could use a new atomic *Cache variant if the atomic bit is really necessary, but the churn doesn't seem worth it anytime soon.)
Attachment #8802726 - Flags: review?(bugmail) → review+
Comment on attachment 8802727 [details] [diff] [review]
Part 6: Core changes for storing structured clone data outside of the database; r=asuth

Review of attachment 8802727 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/indexedDB/ActorsParent.cpp
@@ +4118,5 @@
>  nsresult
> +UpgradeSchemaFrom24_0To25_0(mozIStorageConnection* aConnection)
> +{
> +  // The only change between 24 and 25 was a different file_ids format,
> +  // but it's backwards-compatible.

Note for other readers: The "data" column is created with BLOB type affinity which means no conversion/coercion is performed by SQLite, so there is no schema change to worry about.

@@ +9586,5 @@
>    int32_t id;
>    StructuredCloneFile::Type type;
>  
> +  bool isDot = false;
> +  if ((isDot = aText.First() == '.') ||

Note for other readers: This lint-nit gets mooted in bug 1311466's part 6.

@@ +25748,3 @@
>      size_t compressedLength = snappy::MaxCompressedLength(uncompressedLength);
>  
> +    // We don't have a smart pointer class that calls free, so we need to

UniquePtr supports destructor policies and UniquePtrExtensions.h defines a FreePolicy that can be used.  I don't think this comment/code got mooted in subsequent patches, but I could be wrong.

::: dom/indexedDB/PBackgroundIDBSharedTypes.ipdlh
@@ +53,5 @@
>  };
>  
>  union BlobOrMutableFile
>  {
> +  null_t;

Note for other readers: The ambiguity with NullableMutableFile is eliminated in bug 1311466's Part 2 which eliminates NullableMutableFile by extracting PBackgroundMutableFile up into the BlobOrMutableFile union.
Attachment #8802727 - Flags: review?(bugmail) → review+
Attachment #8802728 - Flags: review?(bugmail) → review+
Comment on attachment 8802729 [details] [diff] [review]
Part 8: Disable externally stored structured clone data in file tests; r=asuth

Review of attachment 8802729 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/indexedDB/IndexedDatabaseManager.cpp
@@ +262,2 @@
>      Preferences::GetInt(aPrefName, kDefaultDataThresholdBytes);
> +  if (dataThresholdBytes == -1) {

This could benefit from a simple comment like your comment 63.  Ex:
// The magic -1 is for use only by tests that depend on stable blob file id's.
Attachment #8802729 - Flags: review?(bugmail) → review+
Attachment #8802730 - Flags: review?(bugmail) → review+
(In reply to Andrew Sutherland [:asuth] from comment #65)
> Comment on attachment 8802726 [details] [diff] [review]
> Part 5: Add a data threshold pref; r=asuth
> 
> Review of attachment 8802726 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looking forward to having the threshold lowered!  I expect the target would
> want to be something in the neighborhood of PAGE_SIZE/20 (per
> https://www.sqlite.org/withoutrowid.html#when_to_use_without_rowid) since
> object_data is WITHOUT ROWID.  Of course, depending on per-file overhead, it
> may make sense to adjust the value upwards and/or consider introducing an
> additional file type for sizes from PAGE/20 up to 4*PAGE_SIZE where we use a
> dedicated blob table in the database.  Until/unless we have a blob pool
> database, a value around PAGE_SIZE might be least bad.
> 
> Specifically, when referring to per-file overhead, I mean:
> - file system overhead for opening a new file, especially:
>   - if we continue to use a flat directory structure and run the risk of
> file-systems not being happy about us accumulating tens of thousands of
> files in a single directory and encountering scale problems there.
>   - if we find that antivirus software is increasing latencies.  We
> definitely would want to benchmark with a few AV solutions.
> - disk waste/storage bloat if the file-system requires that at least one
> disk sector is used per file, etc.

Yeah, these are excellent points. However implementation of a blob pool is a complex task. Changing the flat directory structure doesn't look so complex and could be done sooner. But as you said, many small files can slow down things significantly, we should be careful about lowering the threshold.
Anyway, I think we will be in much better shape with the 1MB limit, since it removes the need for big flat buffers and the SQLite database won't be stressed so much.
I think these methods are expected to free the value even when an error occurs. The AdoptedBlobVariant does the job, but NS_ENSURE_ARG_MAX can trigger an early return (before AdoptedBlobVariant is created).
Attachment #8803847 - Flags: review?(mak77)
(In reply to Andrew Sutherland [:asuth] from comment #66)
> @@ +25748,3 @@
> >      size_t compressedLength = snappy::MaxCompressedLength(uncompressedLength);
> >  
> > +    // We don't have a smart pointer class that calls free, so we need to
> 
> UniquePtr supports destructor policies and UniquePtrExtensions.h defines a
> FreePolicy that can be used.  I don't think this comment/code got mooted in
> subsequent patches, but I could be wrong.

Ok, fixed. I also found a possible leak in mozStorage code. See patch Part 10.
(In reply to Andrew Sutherland [:asuth] from comment #67)
> @@ +262,2 @@
> >      Preferences::GetInt(aPrefName, kDefaultDataThresholdBytes);
> > +  if (dataThresholdBytes == -1) {
> 
> This could benefit from a simple comment like your comment 63.  Ex:
> // The magic -1 is for use only by tests that depend on stable blob file
> id's.

Ok, added.
Comment on attachment 8803847 [details] [diff] [review]
Part 10: Fix a possible leak in BindAdoptedBlobByName()/BindAdoptedBlobByIndex(); r=mak

Review of attachment 8803847 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM. fwiw, Andrew can review any change to Storage as well (and I honestly think you should act as a peer as well!)
Attachment #8803847 - Flags: review?(mak77) → review+
(In reply to Marco Bonardo [::mak] from comment #72)
> Comment on attachment 8803847 [details] [diff] [review]
> Part 10: Fix a possible leak in
> BindAdoptedBlobByName()/BindAdoptedBlobByIndex(); r=mak
> 
> Review of attachment 8803847 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> LGTM. fwiw, Andrew can review any change to Storage as well (and I honestly
> think you should act as a peer as well!)

Oh, ok, will do next time :)
Ok, this is ready, but I want to wait for bug 1311466 and land them together.
The other bug extends the format of file_ids column too. We don't need to add another schema upgrade of IDB databases if we land them together.
Heads-up: Bug 1302036 just landed and increased the IDB schema version, I updated appropriate patch for this bug to use the same schema since both land at the same day. If there are any backouts I'll fix things to get it right.
Pushed by jvarga@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/12213ad49dc0
Part 1: Use a type in StructuredCloneFile instead of a boolean; r=asuth
https://hg.mozilla.org/integration/mozilla-inbound/rev/e2166a89ee40
Part 2: Refactor file ids handling for better expandability; r=asuth
https://hg.mozilla.org/integration/mozilla-inbound/rev/a1fcb8001d2a
Part 3: Implement an input stream wrapper around structured clone data; r=asuth
https://hg.mozilla.org/integration/mozilla-inbound/rev/97e29df357f7
Part 4: Update keys directly in the structured clone buffer; r=asuth
https://hg.mozilla.org/integration/mozilla-inbound/rev/94f3aca1fe73
Part 5: Add a data threshold pref; r=asuth
https://hg.mozilla.org/integration/mozilla-inbound/rev/34a3a5034620
Part 6: Core changes for storing structured clone data outside of the database; r=asuth
https://hg.mozilla.org/integration/mozilla-inbound/rev/34b5720fa068
Part 7: Compress externally stored structured clone data; r=asuth
https://hg.mozilla.org/integration/mozilla-inbound/rev/9113736a76ce
Part 8: Disable externally stored structured clone data in file tests; r=asuth
https://hg.mozilla.org/integration/mozilla-inbound/rev/1e2c6afa8a5e
Part 9: Add a new test for externally stored structured clone data; r=asuth
https://hg.mozilla.org/integration/mozilla-inbound/rev/e1c649956724
Part 10: Fix a possible leak in BindAdoptedBlobByName()/BindAdoptedBlobByIndex(); r=mak
Attachment #8804440 - Flags: review?(bugmail) → review+
Pushed by jvarga@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/97d5eb039b32
A follow-up fix. Pass correct buffer size to CopyFileData(); r=asuth
Duplicate of this bug: 933398
You need to log in before you can comment on or make changes to this bug.