Closed Bug 1168152 Opened 5 years ago Closed 5 years ago

Cache API should use smaller sqlite page size

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox41 --- fixed

People

(Reporter: bkelly, Assigned: bkelly)

References

(Blocks 1 open bug)

Details

Attachments

(4 files, 3 obsolete files)

The Cache API is currently using the mozStorage default block size of 32KB.  This has a number of downsides:

1) It penalizes write performance.  For example, look at the data in bent's benchmarking spreadsheet.
2) It greatly bloats the database when there is not a lot of data stored.

For example, right now if you create an empty Cache by doing the following in the console on some page:

  caches.open('foo')
  caches.delete('foo')
  /* close page and re-open it to flush the connection */

You will see that the origin is using 416KB.  This is because each table and index gets assigned an initial page.  There are currently 10 tables and 3 indices; 32KB * (10 + 3) = 416.

The main downside to reducing the block size is it makes the database more prone to fragmentation.  I plan to mitigate this by setting a larger growth factor (allocate multiple blocks at once) and only free'ing blocks back to the OS when there is an excessive amount of free space.

I plan to use a 4096 block size for now.  This matches the sweet spot in bent's data and also corresponds to the most typical file system block size.

I plan to use the time required to run the wpt cache-match.https.html test as a performance measure for now.  Its the most sensitive read+write test we have and it was running so slow on windows in automation that it was failing the 1min timeouts.  It should show any excessive problems with the block sizes (and I believe it will show a speed up).

[1] https://docs.google.com/spreadsheets/d/1KbcxgvkCeeuFArrnyA6qrvcuA_HWjWT-X5rn_nNJbIk/edit#gid=1750276979
Summary: Cache API should use smaller sqlite block size → Cache API should use smaller sqlite page size
This reduces the page size to 4KB and sets a growth increment of 64KB.  It does not yet adjust our incremental_vacuum code to avoid free'ing the extra pages in our growth chunk.  I will do that as a second patch.

With just this, though, I see an empty Cache take 52KB instead 416KB.

In addition, the cache-match.https.html test on my windows machine improved from ~2.5 seconds to ~1.5 seconds.  That's about a 40% improvement.  That will be really nice if it translates to the times on the windows machines in automation.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1c7df455039f
Attachment #8610209 - Flags: review?(ehsan)
Comment on attachment 8610209 [details] [diff] [review]
P1 Use a smaller sqlite page size and a growth increment in Cache. r=ehsan

Review of attachment 8610209 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/cache/DBSchema.cpp
@@ +413,5 @@
>    nsresult rv = aConn->ExecuteSimpleSQL(pragmas);
>    if (NS_WARN_IF(NS_FAILED(rv))) { return rv; }
>  
> +  // Limit fragmentation by growing the database by many pages at once.
> +  rv = aConn->SetGrowthIncrement(kGrowthSize, NS_LITERAL_CSTRING(""));

Nit: EmptyCString()
Attachment #8610209 - Flags: review?(ehsan) → review+
I will need to tweak this patch a bit.  There is a bug in the growth constant and after thinking about it I might want to use 32kb for the growth increment.

Let me know if you want re-review for these.

The good news, though, is that try shows that the windows cache-match.https.html run time dropped from ~28 seconds to ~17 seconds.  That's pretty much the same 40% improvement I measured locally.
See Also: → 1168682
This changed a little bit.  Its not a 32kb growth chunk instead of 64kb.
Attachment #8610209 - Attachment is obsolete: true
Attachment #8610958 - Flags: review?(ehsan)
This is preparation for the next patch.  It introduces a mozIStorageConnection wrapper so we can hook the close event.  I want to trigger the incremental vacuum on close instead of doing it on every open.  This is possible now that we share the connection for the vast majority of Action objects.

Note, any Actions that don't get a shared connection won't use the wrapper.
Attachment #8610960 - Flags: review?(ehsan)
Move our incremental vacuum from connection open to connection close.

While testing this I also found that our auto_vacuum was not working any more.  It seems that it cannot be enabled directly on a fresh database created by mozStorage.  We have to a do VACUUM on the empty database to enable it.  I wrote bug 1168682 about this.  For now, though, we just do the VACUUM because it should be cheap and only happens once.

I also added code to verify that auto_vacuum is enabled and actually doing something.
Attachment #8610962 - Flags: review?(ehsan)
Add a test to verify that the cache disk usage fluctuates as expected.
Attachment #8610963 - Flags: review?(ehsan)
Comment on attachment 8610962 [details] [diff] [review]
P3 Perform incremental vacuum at tail end of Cache db connections. r=ehsan

I need to adjust this patch one more time.  Drop review for now.
Attachment #8610962 - Flags: review?(ehsan)
Marco pointed out in the other bug that it was WAL blocking auto_vacuum from being enabled.  So I refactored the patch to do things in the right order.  It no longer performs a full VACUUM to enable auto_vacuum.
Attachment #8610962 - Attachment is obsolete: true
Attachment #8611219 - Flags: review?(ehsan)
Small change to the test to tighten the check that the end disk usage shrank significantly.
Attachment #8610963 - Attachment is obsolete: true
Attachment #8610963 - Flags: review?(ehsan)
Attachment #8611220 - Flags: review?(ehsan)
Attachment #8610958 - Flags: review?(ehsan) → review+
Comment on attachment 8610960 [details] [diff] [review]
P2 Use a wrapper mozIStorageConnection for shared Cache connections. r=ehsan

Review of attachment 8610960 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/cache/Connection.h
@@ +15,5 @@
> +
> +class Connection final : public mozIStorageConnection
> +{
> +public:
> +  Connection(mozIStorageConnection* aBase);

Nit: explicit
Attachment #8610960 - Flags: review?(ehsan) → review+
Attachment #8611219 - Flags: review?(ehsan) → review+
Comment on attachment 8611220 [details] [diff] [review]
P4 Add a test to verify Cache incremental vacuum works. r=ehsan

Review of attachment 8611220 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/cache/test/mochitest/test_cache_shrink.html
@@ +80,5 @@
> +  }).then(function(usage) {
> +    endUsage = usage;
> +    dump("### ### initial:" + initialUsage + ", full:" + fullUsage +
> +         ", end:" + endUsage + "\n");
> +    ok(endUsage < (fullUsage / 2), 'disk usage should have shrank significantly');

Isn't this prone to intermittent failures?
Attachment #8611220 - Flags: review?(ehsan) → review+
(In reply to :Ehsan Akhgari (not reading bugmail, needinfo? me!) from comment #13)
> > +    ok(endUsage < (fullUsage / 2), 'disk usage should have shrank significantly');
> 
> Isn't this prone to intermittent failures?

It shouldn't be.  We are guaranteed the DB connection is flushed to disk by the resetStorage().  The size of the resulting DB should be 100% deterministic.

Now, if something changes the underlying structure of the cache then we might have to update the test.  The only thing I really expect to matter, though, is when I remove the url TEXT column indices.  That might shrink the peak size enough to break this assert.
OK, sounds good.
Of course there is a failure in try on mulet.  I'll have to investigate a bit whats going on there.
So it seems the test is failing in opt linux builds.  Instrumentation shows there are no free pages to release.  That is... odd.  I'll investigate further.
I think that if the Cache object is not fully GC'd, then the cache ID is still in use during the caches.delete() which then keeps all the Cache entries open.  The resetStorage() effectively then orphans the data and we don't have the code to detect/fix it yet.

I'm trying to see if a forced GC after dropping the Cache object fixes the problem.

https://hg.mozilla.org/try/pushloghtml?changeset=54affb13e828
The GC was not quite enough.  The failures on opt are due to the way caches.delete() executes in two async steps, even when the DOM object is gone.  If the resetStorage() happens before the second step executes, then the cache entries are orphaned.

To avoid this I added a caches.has().  It performs another async operation which guarantees the second async step is flushed properly before proceeding.  This fixes the problem:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=eea81170be12
Somehow I lost the 32kb page size when I was refactoring things.  I think 32kb is definitely better, so I pushed a follow-up to use it.  Sorry for the confusion there.
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.