702889 - IndexedDB: Change SQL schema and some cursor queries for faster performance

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Description

•

13 years ago

Attached patch Patch, v1 — Details — Splinter Review

In the last few days we've seen some really bad performance numbers from our IndexedDB implementation. It turns out that most of the problem was that a) our table structure wasn't perfect, and b) we were using queries that fooled the SQLite query optimizer.

Attached patch fixes our cursor queries and table structure, as well as migrating data from old versions of the database. YUCK.

Attachment #574807 - Flags: review?(jonas)

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 1

•

13 years ago

Oh, this patch also vacuums and analyzes when it upgrades an old database. Anyone have an opinion on whether or not we should do that here?

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 2

•

13 years ago

People who know more about sqlite should probably answer comment 1.

Andrew Sutherland [:asuth] (he/him)

Comment 3

•

13 years ago

I think in production you always want the optimizer doing the exact same thing it did when you ran your EXPLAIN to make sure nothing stupid is happening.  Which is to say, do one or more of the following:
1) never run ANALYZE at all
2) manually insert the contents of the ANALYZE you ran on your own database so the optimizer is always working with the same data you used
3) use the "+" operator to constrain things such that the optimizer can never make a choice you don't like.  (see: http://www.sqlite.org/optoverview.html)

I'll also take this opportunity to plug my script that helps visualize the EXPLAIN output.
- blog post with pretty pictures: http://www.visophyte.org/blog/?p=485
- github project: https://github.com/asutherland/grok-sqlite-explain

Marco Bonardo [:mak]

Comment 4

•

13 years ago

Beware, analyze has hidden dangers, see my recent post in the SQLite and Firefox thread in m.d.platform. Places went the analyze path, and is full of blood.
What Asuth said is absolutely correct, you want to have control on what your queries are doing from when you create them, analyze removes this control from your hands, so the query may go really wrong when statistics are out of date (like trying to do a linear scan of a table with hundred thousands rows).

Running analyze on schema change is surely the wrong thing to do. If you need to use it, you should run it every time the number of entries in your table differs substantially from the the number of entries stored in the stats.

Now, with indexedDB, due to its nature, having a clear idea of what the external user will put into it to build performant queries may be problematic, and analyze seems to be the easy solution. But if you want to try it (changing your mind you may just DELETE FROM sqlite_stat1 and stop invoking analyze) you should ensure you analyze every time a certain amount of changes make the db.
Be aware it's not a cheap call, though, since SQLite indices don't have a O(1) shortcut for count(*).

Marco Bonardo [:mak]

Comment 5

•

13 years ago

Regarding vacuum, I'm not sure if indexedDB structure and threading allows that, but you may maybe use (or even improve) the VacuumManager I added to mozStorage. It vacuums registered databases once a month, on idle.

Marco Bonardo [:mak]

Comment 6

•

13 years ago

Asuth, would it be possible to have your script in an add-on, it would be so much easier to be able to open up a page (about:queryplan?), paste a query and get the graph :)

Marco Bonardo [:mak]

Comment 7

•

13 years ago

now, there are some things you may do to improve these connections performances, but first I'd like to know what are the indexedDB constraints regarding data integrity. Should it completely satisfy ACID? May we lose in durability? your createconnection method doesn't seem to try to limit fsyncs or improve concurrency, maybe you could make some test with WAL journaling and without the shared cache?

Andrew Sutherland [:asuth] (he/him)

Comment 8

•

13 years ago

(In reply to Marco Bonardo [:mak] from comment #6)
> Asuth, would it be possible to have your script in an add-on, it would be so
> much easier to be able to open up a page (about:queryplan?), paste a query
> and get the graph :)

Since it's written in python and uses (natively compiled) graphviz, it would not be trivial.  Also, if you're just pasting in the SQL but not the explain output, we'd also have to jump through the hoops to connect to the right database to issue the query, have the chrome privs for that, etc.

If there's going to be a massive manpower push behind fixing up Places and/or IndexedDB, it might be worth it to streamline the process (and get the opcode and I/O counts built-in to boot) with an extension, but my suspicion would be that the cost/benefit is in favor of keeping it python for now and paying the cost of having the developer have to scrape the SQL out of the NSPR log manually, run it through the SQLite command line manually, and then run the script (manually).  (nb: Thunderbird's gloda is able to automatically run the EXPLAINs itself and other goodness, but it is implemented purely in JS, so such things are stupid easy for it compared to Places.)

Marco Bonardo [:mak]

Comment 9

•

13 years ago

(In reply to Andrew Sutherland (:asuth) from comment #8)
> Since it's written in python and uses (natively compiled) graphviz, it would
> not be trivial.  Also, if you're just pasting in the SQL but not the explain
> output, we'd also have to jump through the hoops to connect to the right
> database to issue the query, have the chrome privs for that, etc.

Well, the idea was to have an UI where you select a profile database (similar to SQLite Manager) and run the query on top of it. I'm sure there are good graph APIs on JS. btw this is OT and I don't want to steal space to the bug. So I'll stop now.

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 10

•

13 years ago

Thanks guys. I'll lose the ANALYZE call for now and we can look at whether that's important later.

As for VACUUM, I looked at the vacuum service and I can't really use it. IndexedDB can have lots of different databases, created and deleted by web pages, so we can't really register a listener for each database. Ideally we'd try to do something more clever when databases are closed, or even allow web pages to do it themselves. For now I'll probably just leave that out.

Marco Bonardo [:mak]

Comment 11

•

13 years ago

(In reply to ben turner [:bent] from comment #10)
> As for VACUUM, I looked at the vacuum service and I can't really use it.

It's fine, when I made it, it was for simple cases relative to the profile databases (even if actually Places is the only user), I see indexedDB is a bit special. I think it may be improved in future to act on closed dbs and cover these needs, as usual it's matter of resources :(

Nathan Froyd [:froydnj]

Comment 12

•

13 years ago

Drive-by question: object_data's fields are reordered to place the BLOB member last.  Is that so SQLite can do a better job indexing the other fields?  Is placing BLOBs last a generally preferred technique, or is it only applicable in this case?

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 13

•

13 years ago

(In reply to Nathan Froyd (:froydnj) from comment #12)

From one of the SQLite developers:

"It is generally best to put the smaller fields of a table first and big BLOBs and whatnot toward the end.  That way if you are accessing one of the shorter fields, SQLite doesn't have to search past the big BLOBs to find it."

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Updated

•

13 years ago

Depends on: 703398

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 14

•

13 years ago

Comment on attachment 574807 [details] [diff] [review]
Patch, v1

Review of attachment 574807 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the below fixes.

::: dom/indexedDB/IDBTransaction.cpp
@@ +449,5 @@
>      );
>    }
>    if (aOverwrite) {
>      return GetCachedStatement(
> +      "INSERT OR REPLACE INTO index_data ("

This is due to the known index-update bug, right?

Don't we need this on the index-updating queries too?

We really need to fix that bug. It's a dataloss bug :(

::: dom/indexedDB/OpenDatabaseHelper.cpp
@@ +557,5 @@
> +  ));
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  rv = aConnection->ExecuteSimpleSQL(NS_LITERAL_CSTRING(
> +    "INSERT INTO index_data "

I think you'll need the same "OR REPLACE" here, in case there is broken data due to the index bug.

Same for the other indexes.

@@ +755,5 @@
> +  ));
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  rv = aConnection->ExecuteSimpleSQL(NS_LITERAL_CSTRING(
> +    "ANALYZE;"

Remove this

Attachment #574807 - Flags: review?(jonas) → review+

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 15

•

13 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/f41194217097

Ed Morley [:emorley]

Comment 16

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/f41194217097

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla11

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

12 years ago

Component: DOM → DOM: IndexedDB

Target Milestone: mozilla11 → ---

Version: Trunk → unspecified