Closed Bug 858680 Opened 11 years ago Closed 9 years ago

Figure out long-term strategy for VACUUM-ing indexedDB databases

Categories

(Core :: Storage: IndexedDB, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox41 --- fixed

People

(Reporter: bent.mozilla, Assigned: bent.mozilla)

References

Details

Attachments

(5 files, 1 obsolete file)

Bug 858674 enables auto_vacuum mode for indexedDB databases on mobile devices but does nothing for desktop. Also, auto_vacuum is a short-term fix that has real problems (increased fragmentation, small perf hits on commit). We need to design a long-term strategy for dealing with indexedDB databases.

Without investigating too much I imagine the solution will be to hook indexedDB databases into the VacuumManager so that they are routinely vacuumed off an idle timer, but we'd need a way to discover these databases and figure out which need vacuuming.
This is the first part. When databases are open and then go idle (currently that means 2 seconds with no open transactions) we will start reclaiming space. If the database or thread are needed then we will stop reclaiming.
Assignee: nobody → bent.mozilla
Status: NEW → ASSIGNED
Attachment #8597724 - Flags: review?(Jan.Varga)
This will let QM tell clients that they are idle.
Attachment #8597725 - Flags: review?(Jan.Varga)
I haven't done part 3 yet (which will actually do the db maintenance on system idle).
Comment on attachment 8597725 [details] [diff] [review]
Part 2: Add idle notifications to QuotaClient, v1

Review of attachment 8597725 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/quota/QuotaManager.cpp
@@ +2923,5 @@
>      return NS_OK;
>    }
>  
> +  if (!strcmp(aTopic, OBSERVER_TOPIC_IDLE_DAILY)) {
> +    for (auto& client : mClients) {

Nice!
Attachment #8597725 - Flags: review?(Jan.Varga) → review+
(In reply to Jan Varga [:janv] from comment #5)
> > +    for (auto& client : mClients) {


I'm actually going to change that to |nsRefPtr<Client>& client| there. auto is too magical and makes me nervous.
asuth, there's going to be a new SQLite released sometime soon with these sqlite3.c changes, so please ignore them for now. I'll block this bug on getting that new version incorporated.

Just look at the changes to mozStorageConnection if you don't mind?
Attachment #8600154 - Flags: review?(bugmail)
Comment on attachment 8600154 [details] [diff] [review]
Part 3: Add SQLite statistics to the build

That's super awesome to be able to have dbstats available in stock builds; back in the day when I was doing fragmentation analysis, I had to make custom SQLite builds like a sucker!  (And the worst part was the Tcl, oh the Tcl!)
Attachment #8600154 - Flags: review?(bugmail) → review+
A couple things that bugged me in the past, just for consideration:

1. many users seem to never hit idle daily, not sure if it's due to add-ons or system apps or bogus mice. For example using irccloud (but I suppose other web pages can cause the same) causes random active events (bug 973591). We have Places maintenance on IDLE_DAILY and for many users it never runs. I can tell because I get bug reports from users that are resolved just by forcing maintenance through an add-on. Unfortunately the telemetry I have has a bug, so I cannot present it as valid data (I just found the reason for the bug). I think sooner or later I will stop relying on idle at all in Places cause I saw too many issues. It's reliable only on very short times (<= 2 minutes).

2. Vaccum-ing too often makes inserts and updates slower, cause you are emptying the freelist. And it's still a slow operation, so if you are on a single core it might still be a problem. I think you should not vacuum the same database more than once a month. Or vacuum every N database access. Maybe even limit to database files larger than M KiB.
(In reply to Marco Bonardo [::mak] from comment #9)
> A couple things that bugged me in the past, just for consideration:

Thanks! This kind of info is always useful.

> 1. many users seem to never hit idle daily

Hm, that is problematic. Do you have a bug for getting better telemetry on when idle-daily actually runs? Sounds like we should try to get that fixed asap.

> 2. Vaccum-ing too often makes inserts and updates slower, cause you are
> emptying the freelist.

Yeah, I'm emptying the freelist pretty aggressively at the moment when they're idle to keep database size down. This is tunable though. I need to set aside some time to get telemetry for these things too.

> And it's still a slow operation, so if you are on a single core it might
> still be a problem.

Yeah, I'm currently toying around with the number of threads to devote to this (I think I'll probably use |cpuCount + 1| threads) and I have an aggressive progress handler to bail us out if the user comes back.

> I think you should not vacuum the same database more than once a month.
> Or vacuum every N database access. Maybe even limit to database files
> larger than M KiB.

Currently I have that set to no sooner than once per week, and then only if 1) percent of noncontiguous pages is more than 30%, else if 2) database has grown by more than 10% since last vacuum and there are few free pages but more than 20% unused bytes on existing pages. I think this basically gets us a vacuum if the db is fragmented or poorly packed.

How does all that sound?
> Currently I have that set

(in the patch I haven't yet attached!)
(In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #10)
> Hm, that is problematic. Do you have a bug for getting better telemetry on
> when idle-daily actually runs? Sounds like we should try to get that fixed
> asap.

I just fixed the places telemetry on m-c. I don't think we have idle-daily telemetry, it might be useful to have something like "days from the last valid idle-daily call". I think it's worth a bug.

> > I think you should not vacuum the same database more than once a month.
> > Or vacuum every N database access. Maybe even limit to database files
> > larger than M KiB.
> 
> Currently I have that set to no sooner than once per week, and then only if
> 1) percent of noncontiguous pages is more than 30%, else if 2) database has
> grown by more than 10% since last vacuum and there are few free pages but
> more than 20% unused bytes on existing pages. I think this basically gets us
> a vacuum if the db is fragmented or poorly packed.

Sure, basing this on stats is a very good idea we couldn't apply until now, your numbers sounds good to me. How expensive is to calculate the fragmentation of the db?
(In reply to Marco Bonardo [::mak] from comment #12)
> (In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #10)
> > Hm, that is problematic. Do you have a bug for getting better telemetry on
> > when idle-daily actually runs? Sounds like we should try to get that fixed
> > asap.
> 
> I just fixed the places telemetry on m-c. I don't think we have idle-daily
> telemetry, it might be useful to have something like "days from the last
> valid idle-daily call". I think it's worth a bug.

Provided telemetry doesn't happen on idle-daily, otherwise it's pointless (but I think it recently moved away from it?)
(In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #10)
> Currently I have that set to no sooner than once per week, and then only if
> 1) percent of noncontiguous pages is more than 30%, else if 2) database has
> grown by more than 10% since last vacuum and there are few free pages but
> more than 20% unused bytes on existing pages. I think this basically gets us
> a vacuum if the db is fragmented or poorly packed.

Can you wrap this logic in a mozStorage IsFragmented() call?  Sounds like a lot of code could use it.

Are you planning to execute "VACUUM;" or "PRAGMA incremental_vacuum;" when you hit this condition?

It was my understanding that incremental vacuum does not fix fragmentation, but just removes pages from the freelist.  Just curious as I'd like to make sure I understand.
yes, incremental vacuum is pretty much pointless.
Shrinking the file-size matters, which is what incremental vacuum does.

Fragmentation arguably matters less because it depends on the operating system or disk subsystem to speculatively read and cache the data we need in the future.  This effectively depends on a query performing a linear scan of a btree which is stored on disk with its pages in traversal order.  While IndexedDB with WITHOUT ROWID and mmap enabled is much more likely to generate this traversal pattern (if you use mozGetAll or our cursors supported batching), and VACUUM is likely to generate something like this ordering because of its insertion order, it doesn't seem like a property that is going to be particularly stable unless incremental vacuum learns to reorder in this way.

VACUUM seems most important on spinning disks where database sizes are potentially smaller and usage patterns (like service workers) make it practical to do the thing where SQLite tries to slurp the first N bytes of the database directly into the cache and that results in a high hit-rate.

In most cases it seems like any win we'd witness from defragmenting we'd also win from increasing the page size since that forces larger reads with guaranteed read locality.  (Which of course can be horribly bad from a disk usage perspective, as we saw with service worker cache not liking 32k pages.)


Note that some of my thoughts here are informed by Tara Glek and Mike Hommey's pre-fetch/pre-cache investigations as it related to Firefox startup time where it seemed like (at the time, which was a while ago), most operating systems are bad at even the obvious cases.
(In reply to Ben Kelly [:bkelly] from comment #14)
> It was my understanding that incremental vacuum does not fix fragmentation,
> but just removes pages from the freelist.

Right, I was describing the cases in which I will attempt a full vacuum.
(In reply to Marco Bonardo [::mak] from comment #12)
> How expensive is to calculate the fragmentation of the db?

It requires a full table scan, so it's not cheap.
Ok, this does the rest. All the knobs are tunable so we can hopefully gather data and adjust them.

The way this works is:

1. QM notifies the IDB client that we're idle.
2. We register an additional idle observer so that we can be notified when the user comes back.
3. We scan the storage directory on a background thread looking for idb databases. After we have discovered all of them for a particular set of (persistence-type, origin) we send a runnable to the main thread.
4. We ask QM to tell us when it's safe to open those databases.
5. Once it's safe we use a thread pool to do maintenance on each database.
6. Once all databases are tuned up we tell QM to carry on with other business.
Attachment #8601158 - Flags: review?(Jan.Varga)
Comment on attachment 8601158 [details] [diff] [review]
Part 4: Maintain databases on idle, v1

Review of attachment 8601158 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/indexedDB/ActorsParent.cpp
@@ +4257,5 @@
>    return file.forget();
>  }
>  
>  nsresult
> +GetStorageConnection(nsIFile* aDatabaseFile,

No changes here, this is just extracted from the other GetStorageConnection function below.

::: dom/quota/QuotaManager.cpp
@@ +2594,5 @@
>  }
>  
> +// static
> +nsresult
> +QuotaManager::GetDirectoryMetadata(nsIFile* aDirectory,

No changes here, just made it exposed as a static function
The QM hooks make this less clean than I wanted, so I'm attaching the interdiff that added them. Hopefully once we have origin locks I can reverse-apply this and skip the main thread.
(In reply to Ben Kelly [:bkelly] from comment #14)
> Can you wrap this logic in a mozStorage IsFragmented() call?  Sounds like a
> lot of code could use it.

I don't know... It's slow, so making it easy to call sounds risky. If someone wants to make the argument for it I guess I wouldn't fight too hard, but I'm not sure this is something we'd want to move into storage. Mak can decide!
(In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #22)
> (In reply to Ben Kelly [:bkelly] from comment #14)
> > Can you wrap this logic in a mozStorage IsFragmented() call?  Sounds like a
> > lot of code could use it.
> 
> I don't know... It's slow, so making it easy to call sounds risky. If
> someone wants to make the argument for it I guess I wouldn't fight too hard,
> but I'm not sure this is something we'd want to move into storage. Mak can
> decide!

The only scope of it, so far, is figuring out if a db needs vacuum, then it would be better to modify or rewrite the vacuum manager to make use of it, or more generically, expose a way to vacuum a db safely when needed. Regardless, vacuuming without need is more expensive than doing this check.
I still dream of a centralized way to automatically vacuum all the dbs that are touched by Storage :(

I see that the indexedDB requirements go far over what a generic vacuum manager could do, and maybe it's the same for the service workers cache?
I'd not object with moving the heuristic in Storage and expose it through a public header for internal cpp consumers, considered we are going to disable binary components in add-ons very soon. We could in the future reuse it in the vacuum manager, in the service cache and so on.  In any case I'd not expose it to js.
Bug 1162643 adds dbstat to SQLite.
Depends on: SQLite3.8.10.1
(In reply to Marco Bonardo [::mak] from comment #23)
> The only scope of it, so far, is figuring out if a db needs vacuum, then it
> would be better to modify or rewrite the vacuum manager to make use of it,
> or more generically, expose a way to vacuum a db safely when needed.
> Regardless, vacuuming without need is more expensive than doing this check.
> I still dream of a centralized way to automatically vacuum all the dbs that
> are touched by Storage :(

I would rather do anything like that in a different bug. I'm not sure exactly what we would want for that.

> I see that the indexedDB requirements go far over what a generic vacuum
> manager could do

The other problem is that I want to do multiple queries of the dbstats data to determine unused space, and perhaps other consumers of dbstats data would want something else as well. Let's wait until we have another consumer before trying to make this a generic function somewhere?
(In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #25)
> Let's wait until we have another consumer
> before trying to make this a generic function somewhere?

Sure, we can at any time move the code to a shared location when that need will rise.
Comment on attachment 8597724 [details] [diff] [review]
Part 1: Do incremental_vacuum while open databases are idle, v1

Review of attachment 8597724 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!

::: dom/indexedDB/ActorsParent.cpp
@@ +4445,5 @@
> +  nsresult
> +  CheckpointInternal(CheckpointMode aMode);
> +
> +  nsresult
> +  GetFreelistCount(CachedStatement& aCachedStatement, uint32_t* aFreelist);

You call it aFreelist here, aFreelistCount below.

@@ +8382,5 @@
>  DatabaseConnection::Init()
>  {
>    AssertIsOnConnectionThread();
>    MOZ_ASSERT(!mInWriteTransaction);
> +  MOZ_ASSERT(!mInReadTransaction);

put this before |MOZ_ASSERT(!mInWriteTransaction);| ?

@@ +8447,5 @@
>  {
>    AssertIsOnConnectionThread();
>    MOZ_ASSERT(mStorageConnection);
>    MOZ_ASSERT(!mInWriteTransaction);
> +  MOZ_ASSERT(mInReadTransaction);

put this before |MOZ_ASSERT(!mInWriteTransaction);| ?

@@ +8455,5 @@
>                   js::ProfileEntry::Category::STORAGE);
>  
>    // Release our read locks.
>    CachedStatement rollbackStmt;
> +  nsresult rv = GetCachedStatement("ROLLBACK;", &rollbackStmt);

Unfortunately, all these GetCachedStatement() calls must now pass NS_LITERAL_CSTRINGs :(

@@ +8525,5 @@
>  {
>    AssertIsOnConnectionThread();
>    MOZ_ASSERT(mStorageConnection);
> +  MOZ_ASSERT(mInWriteTransaction);
> +  MOZ_ASSERT(!mInReadTransaction);

put this before |MOZ_ASSERT(!mInWriteTransaction);| ?

@@ +8708,4 @@
>  {
>    AssertIsOnConnectionThread();
> +  MOZ_ASSERT(!mInWriteTransaction);
> +  MOZ_ASSERT(!mInReadTransaction);

put this before |MOZ_ASSERT(!mInWriteTransaction);| ?

@@ +8802,5 @@
> +                                   freelistCount,
> +                                   aNeedsCheckpoint,
> +                                   &freedSomePages);
> +    if (NS_WARN_IF(NS_FAILED(rv))) {
> +      MOZ_ASSERT(!freedSomePages);

Maybe add a comment that we can't simply return here.
Attachment #8597724 - Flags: review?(Jan.Varga) → review+
Comment on attachment 8601158 [details] [diff] [review]
Part 4: Maintain databases on idle, v1

Review of attachment 8601158 [details] [diff] [review]:
-----------------------------------------------------------------

This looks pretty good, although I think that some code for idle maintenance could be moved to QM at some point. For example DOM cache implementation could benefit from it.
But let's keep it in IDB for now.

Also, I know that we're "idle" when FindDatabasesForIdleMaintenance() runs, but it would be safer if FindDatabasesForIdleMaintenance() was protected by QM too.
In future, we might want to run the maintenance in other situations, not only when the system is "idle".
However, it's not easy with the current WaitForOpenAllowed(), but it should be much easier with directory locks which are coming soon hopefully.
So I propose that we would lock entire <profile>/storage during FindDatabasesForIdleMaintenance() and then request locks for origins which we collected. After that the lock for <profile>/storage can be released and locks for specific origins will get acquired.

I wonder if this is testable somehow.

::: dom/indexedDB/ActorsParent.cpp
@@ +8078,5 @@
> +  void
> +  operator delete[](void*) = delete;
> +};
> +
> +struct QuotaClient::MaintenanceInfoBase 

extra whitespace

@@ +15424,5 @@
>  {
>    MOZ_ASSERT(NS_IsMainThread());
>    MOZ_ASSERT(sInstance == this, "We expect this to be a singleton!");
> +  MOZ_ASSERT(!mMaintenanceThreadPool);
> +  MOZ_ASSERT(!mMaintenanceInfoHashtable->Count());

This needs to be |MOZ_ASSERT_IF(mMaintenanceInfoHashtable, !mMaintenanceInfoHashtable->Count());| as you suggested on IRC.

@@ +16037,5 @@
> +  MOZ_ASSERT(NS_IsMainThread());
> +
> +  if (mIdleObserverRegistered) {
> +    nsCOMPtr<nsIIdleService> idleService =
> +      do_GetService("@mozilla.org/widget/idleservice;1");

|"@mozilla.org/widget/idleservice;1"| could be predefined in the "constants" section

@@ +16061,5 @@
> +    // don't set some huge number here. We add 2 in case some threads block on
> +    // the disk I/O.
> +    const uint32_t threadCount =
> +      std::max(int32_t(PR_GetNumberOfProcessors()), int32_t(1)) +
> +      2;

I wonder if we can do something similar for the transaction thread pool too.

@@ +16151,5 @@
> +  static const PersistenceType kPersistenceTypes[] = {
> +    PERSISTENCE_TYPE_PERSISTENT,
> +    PERSISTENCE_TYPE_DEFAULT,
> +    PERSISTENCE_TYPE_TEMPORARY
> +  };

This can be moved to dom/quota/PersistenceType.h
I need it in other patches too.

Is the order important somehow ?

@@ +16157,5 @@
> +  static_assert((sizeof(kPersistenceTypes) / sizeof(kPersistenceTypes[0])) ==
> +                  size_t(PERSISTENCE_TYPE_INVALID),
> +                "Something changed with available persistence types!");
> +
> +  NS_NAMED_LITERAL_STRING(idbDirName, "idb");

this is defined in dom/quota/Client.h
IDB_DIRECTORY_NAME

@@ +16168,5 @@
> +    }
> +
> +    nsAutoCString persistenceTypeString;
> +    if (persistenceType == PERSISTENCE_TYPE_PERSISTENT) {
> +      // XXX This shouldn't be a special case...

eek

/me hides

@@ +16366,5 @@
> +    OriginOrPatternString::FromOrigin(aMaintenanceInfo.mOrigin);
> +
> +  Nullable<PersistenceType> npt(aMaintenanceInfo.mPersistenceType);
> +
> +  nsresult rv = 

extra whitespace

::: dom/quota/QuotaManager.cpp
@@ +2594,5 @@
>  }
>  
> +// static
> +nsresult
> +QuotaManager::GetDirectoryMetadata(nsIFile* aDirectory,

You will need to change this a bit since there are now two methods for getting directory metadata.

You will also need to change AssertIsOnIOThread() in more places, especially in the StorageDirectoryHelper.

::: dom/quota/QuotaManager.h
@@ +317,5 @@
> +  GetDirectoryMetadata(nsIFile* aDirectory,
> +                       int64_t* aTimestamp,
> +                       nsACString& aGroup,
> +                       nsACString& aOrigin,
> +                       bool* aIsApp);

I think we now want to expose GetDirectoryMetadataWithRestore(). And we can hide this once idle maintenance is generalized in QM.
Attachment #8601158 - Flags: review?(Jan.Varga) → review+
(In reply to Jan Varga [:janv] from comment #28)
> So I propose that we would lock entire <profile>/storage during
> FindDatabasesForIdleMaintenance() and then request locks for origins which
> we collected. After that the lock for <profile>/storage can be released and
> locks for specific origins will get acquired.

Hm, what does this buy you though? Any databases that are created after scanning starts don't need to be maintained anyway, and any databases that are deleted after scanning starts (but before maintenance starts) will simply fail to open and will be ignored.

> I wonder if this is testable somehow.

I've been thinking about it and I have an idea that I will try.

> This can be moved to dom/quota/PersistenceType.h
> I need it in other patches too.
> 
> Is the order important somehow ?

This is just the order in which to do the scanning, so I don't think it should be moved.

> I think we now want to expose GetDirectoryMetadataWithRestore().

I'd *really* prefer not to do this because it can block and sync-wait for the main thread. I want this maintenance to stop very quickly if the user comes back. If the metadata is corrupt we'll just skip it. Does that sound ok for now?
Flags: needinfo?(Jan.Varga)
Here's an idea for a test. It's not perfect, but better than nothing?
Attachment #8608512 - Flags: review?(Jan.Varga)
Comment on attachment 8608512 [details] [diff] [review]
Interdiff of part 4, plus test.

Review of attachment 8608512 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.
Attachment #8608512 - Flags: review?(Jan.Varga) → review+
(In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #29)
> (In reply to Jan Varga [:janv] from comment #28)
> > So I propose that we would lock entire <profile>/storage during
> > FindDatabasesForIdleMaintenance() and then request locks for origins which
> > we collected. After that the lock for <profile>/storage can be released and
> > locks for specific origins will get acquired.
> 
> Hm, what does this buy you though? Any databases that are created after
> scanning starts don't need to be maintained anyway, and any databases that
> are deleted after scanning starts (but before maintenance starts) will
> simply fail to open and will be ignored.
> 

I'm more worried about the case when we would start maintenance for example manually (in future).
I can imagine a case when QM is in the process of restoring missing/corrupted metadata file and IDB maintenance is trying to load the same file. I know, loading of that file will just fail, but why pollute the debug output with such warnings. It can be confusing when we (or someone else) will be debugging other stuff. There's also the loop which iterates over dir entries, and you have |MOZ_ASSERT(exists);| there which is perfectly fine, but again QM can delete that file in theory, so you get all dir entries in the enumerator but the file suddenly doesn't exist.
I know this is highly theoretical stuff, but I don't think it's so much work to protect the scanning by a directory lock in a followup bug.

I agree with the rest of your replies.
Flags: needinfo?(Jan.Varga)
In other words, I would like to introduce a rule that if you want to touch something in <profile>/storage you should be using a directory lock. It's a bit more code, but you won't have to be worried that something else is accessing the same dirs/files.
(In reply to Jan Varga [:janv] from comment #33)
> In other words, I would like to introduce a rule that if you want to touch
> something in <profile>/storage you should be using a directory lock. It's a
> bit more code, but you won't have to be worried that something else is
> accessing the same dirs/files.

something else is *deleting* the same dirs/files.
(In reply to Jan Varga [:janv] from comment #32)
> There's also the loop which iterates over dir entries, and you have
> |MOZ_ASSERT(exists);|

I went ahead and loosened that code up to not assert/warn any more. If something happens to those files we should just silently skip to the next.
Hrm, android isn't linking to the new SQLite function somehow.
Flags: needinfo?(bent.mozilla)
Comment on attachment 8600154 [details] [diff] [review]
Part 3: Add SQLite statistics to the build

It turns out that SQLite glues the dbstats table into every connection without any additional work needed on our part, so this patch isn't necessary (and causes linker failures).
Attachment #8600154 - Attachment is obsolete: true
Depends on: 1168590
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: