Last Comment Bug 702889 - IndexedDB: Change SQL schema and some cursor queries for faster performance
: IndexedDB: Change SQL schema and some cursor queries for faster performance
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: DOM: IndexedDB (show other bugs)
: unspecified
: All All
: -- normal (vote)
: ---
Assigned To: Ben Turner (not reading bugmail, use the needinfo flag!)
:
Mentors:
Depends on: 703398
Blocks:
  Show dependency treegraph
 
Reported: 2011-11-15 23:16 PST by Ben Turner (not reading bugmail, use the needinfo flag!)
Modified: 2012-03-22 11:52 PDT (History)
11 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Patch, v1 (38.29 KB, patch)
2011-11-15 23:16 PST, Ben Turner (not reading bugmail, use the needinfo flag!)
jonas: review+
Details | Diff | Review

Description Ben Turner (not reading bugmail, use the needinfo flag!) 2011-11-15 23:16:35 PST
Created attachment 574807 [details] [diff] [review]
Patch, v1

In the last few days we've seen some really bad performance numbers from our IndexedDB implementation. It turns out that most of the problem was that a) our table structure wasn't perfect, and b) we were using queries that fooled the SQLite query optimizer.

Attached patch fixes our cursor queries and table structure, as well as migrating data from old versions of the database. YUCK.
Comment 1 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-11-15 23:20:34 PST
Oh, this patch also vacuums and analyzes when it upgrades an old database. Anyone have an opinion on whether or not we should do that here?
Comment 2 Kyle Huey [:khuey] (khuey@mozilla.com) (Away until 6/13) 2011-11-16 03:48:06 PST
People who know more about sqlite should probably answer comment 1.
Comment 3 Andrew Sutherland [:asuth] 2011-11-16 10:47:14 PST
I think in production you always want the optimizer doing the exact same thing it did when you ran your EXPLAIN to make sure nothing stupid is happening.  Which is to say, do one or more of the following:
1) never run ANALYZE at all
2) manually insert the contents of the ANALYZE you ran on your own database so the optimizer is always working with the same data you used
3) use the "+" operator to constrain things such that the optimizer can never make a choice you don't like.  (see: http://www.sqlite.org/optoverview.html)

I'll also take this opportunity to plug my script that helps visualize the EXPLAIN output.
- blog post with pretty pictures: http://www.visophyte.org/blog/?p=485
- github project: https://github.com/asutherland/grok-sqlite-explain
Comment 4 Marco Bonardo [::mak] 2011-11-16 13:05:55 PST
Beware, analyze has hidden dangers, see my recent post in the SQLite and Firefox thread in m.d.platform. Places went the analyze path, and is full of blood.
What Asuth said is absolutely correct, you want to have control on what your queries are doing from when you create them, analyze removes this control from your hands, so the query may go really wrong when statistics are out of date (like trying to do a linear scan of a table with hundred thousands rows).

Running analyze on schema change is surely the wrong thing to do. If you need to use it, you should run it every time the number of entries in your table differs substantially from the the number of entries stored in the stats.

Now, with indexedDB, due to its nature, having a clear idea of what the external user will put into it to build performant queries may be problematic, and analyze seems to be the easy solution. But if you want to try it (changing your mind you may just DELETE FROM sqlite_stat1 and stop invoking analyze) you should ensure you analyze every time a certain amount of changes make the db.
Be aware it's not a cheap call, though, since SQLite indices don't have a O(1) shortcut for count(*).
Comment 5 Marco Bonardo [::mak] 2011-11-16 13:07:10 PST
Regarding vacuum, I'm not sure if indexedDB structure and threading allows that, but you may maybe use (or even improve) the VacuumManager I added to mozStorage. It vacuums registered databases once a month, on idle.
Comment 6 Marco Bonardo [::mak] 2011-11-16 13:14:36 PST
Asuth, would it be possible to have your script in an add-on, it would be so much easier to be able to open up a page (about:queryplan?), paste a query and get the graph :)
Comment 7 Marco Bonardo [::mak] 2011-11-16 13:29:00 PST
now, there are some things you may do to improve these connections performances, but first I'd like to know what are the indexedDB constraints regarding data integrity. Should it completely satisfy ACID? May we lose in durability? your createconnection method doesn't seem to try to limit fsyncs or improve concurrency, maybe you could make some test with WAL journaling and without the shared cache?
Comment 8 Andrew Sutherland [:asuth] 2011-11-16 14:36:24 PST
(In reply to Marco Bonardo [:mak] from comment #6)
> Asuth, would it be possible to have your script in an add-on, it would be so
> much easier to be able to open up a page (about:queryplan?), paste a query
> and get the graph :)

Since it's written in python and uses (natively compiled) graphviz, it would not be trivial.  Also, if you're just pasting in the SQL but not the explain output, we'd also have to jump through the hoops to connect to the right database to issue the query, have the chrome privs for that, etc.

If there's going to be a massive manpower push behind fixing up Places and/or IndexedDB, it might be worth it to streamline the process (and get the opcode and I/O counts built-in to boot) with an extension, but my suspicion would be that the cost/benefit is in favor of keeping it python for now and paying the cost of having the developer have to scrape the SQL out of the NSPR log manually, run it through the SQLite command line manually, and then run the script (manually).  (nb: Thunderbird's gloda is able to automatically run the EXPLAINs itself and other goodness, but it is implemented purely in JS, so such things are stupid easy for it compared to Places.)
Comment 9 Marco Bonardo [::mak] 2011-11-16 14:56:57 PST
(In reply to Andrew Sutherland (:asuth) from comment #8)
> Since it's written in python and uses (natively compiled) graphviz, it would
> not be trivial.  Also, if you're just pasting in the SQL but not the explain
> output, we'd also have to jump through the hoops to connect to the right
> database to issue the query, have the chrome privs for that, etc.

Well, the idea was to have an UI where you select a profile database (similar to SQLite Manager) and run the query on top of it. I'm sure there are good graph APIs on JS. btw this is OT and I don't want to steal space to the bug. So I'll stop now.
Comment 10 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-11-16 15:23:44 PST
Thanks guys. I'll lose the ANALYZE call for now and we can look at whether that's important later.

As for VACUUM, I looked at the vacuum service and I can't really use it. IndexedDB can have lots of different databases, created and deleted by web pages, so we can't really register a listener for each database. Ideally we'd try to do something more clever when databases are closed, or even allow web pages to do it themselves. For now I'll probably just leave that out.
Comment 11 Marco Bonardo [::mak] 2011-11-16 15:30:37 PST
(In reply to ben turner [:bent] from comment #10)
> As for VACUUM, I looked at the vacuum service and I can't really use it.

It's fine, when I made it, it was for simple cases relative to the profile databases (even if actually Places is the only user), I see indexedDB is a bit special. I think it may be improved in future to act on closed dbs and cover these needs, as usual it's matter of resources :(
Comment 12 Nathan Froyd [:froydnj] 2011-11-17 11:31:12 PST
Drive-by question: object_data's fields are reordered to place the BLOB member last.  Is that so SQLite can do a better job indexing the other fields?  Is placing BLOBs last a generally preferred technique, or is it only applicable in this case?
Comment 13 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-11-17 11:36:56 PST
(In reply to Nathan Froyd (:froydnj) from comment #12)

From one of the SQLite developers:

"It is generally best to put the smaller fields of a table first and big BLOBs and whatnot toward the end.  That way if you are accessing one of the shorter fields, SQLite doesn't have to search past the big BLOBs to find it."
Comment 14 Jonas Sicking (:sicking) 2011-11-17 19:10:16 PST
Comment on attachment 574807 [details] [diff] [review]
Patch, v1

Review of attachment 574807 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the below fixes.

::: dom/indexedDB/IDBTransaction.cpp
@@ +449,5 @@
>      );
>    }
>    if (aOverwrite) {
>      return GetCachedStatement(
> +      "INSERT OR REPLACE INTO index_data ("

This is due to the known index-update bug, right?

Don't we need this on the index-updating queries too?

We really need to fix that bug. It's a dataloss bug :(

::: dom/indexedDB/OpenDatabaseHelper.cpp
@@ +557,5 @@
> +  ));
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  rv = aConnection->ExecuteSimpleSQL(NS_LITERAL_CSTRING(
> +    "INSERT INTO index_data "

I think you'll need the same "OR REPLACE" here, in case there is broken data due to the index bug.

Same for the other indexes.

@@ +755,5 @@
> +  ));
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  rv = aConnection->ExecuteSimpleSQL(NS_LITERAL_CSTRING(
> +    "ANALYZE;"

Remove this
Comment 15 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-11-21 20:22:02 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/f41194217097
Comment 16 Ed Morley [:emorley] 2011-11-22 09:08:31 PST
https://hg.mozilla.org/mozilla-central/rev/f41194217097

Note You need to log in before you can comment on or make changes to this bug.