Open Bug 872741 Opened 11 years ago Updated 2 years ago

Consider adding support for arbitrary key list queries to IndexedDB

Categories

(Core :: Storage: IndexedDB, enhancement, P3)

enhancement

Tracking

()

blocking-b2g -
Tracking Status
b2g18 + ---

People

(Reporter: bkelly, Unassigned)

References

Details

(Keywords: perf, Whiteboard: [c= p=5 s= u=])

Attachments

(2 files)

Currently its possible to query IDB using a specific key or in a bounded, ordered range.  It would also be nice to support querying for an arbitrary list of keys.  In SQL, for example, this is done with syntax like "SELECT * FROM my_table WHERE key IN (1, 2, 3)".

The attached patch provides a proof-of-concept which extends IDBKeyRange to offer this support.  For example:

  var keyRange = IDBKeyRange.inList([245, 9106, 123]);
  var request = objectStore.mozGetAll(keyRange);
This patch shows a potential use case for this support.  It modifies ContactsDB.jsm to use inList() when retrieving contacts from a cached query.

Testing shows that when loading 1000 contacts this improves load times from ~4910ms to ~4366ms. This is just over a 10% improvement.  Time to retrieve the first contact remains the same at ~370ms in these tests.
Assignee: nobody → bkelly
Whiteboard: c=
Blocks: 865747
Blocks bug 865747 which is currently tracking.
Status: NEW → ASSIGNED
tracking-b2g18: --- → +
How are we currently querying IDB? Are we sending multiple .get() requests at once so that they are all queued up on the IDB IO thread? Or are we waiting for the result from one .get() before we send the next .get()?

Are there other ways we could get smarter in the IDB implementation without changing the API? One way might be to detect that we're getting multiple .get() requests to the same objectStore/index one after another and turn that into a IN query behind the scenes?

If there's a 10% performance win here that we can't get any other way, then that certainly makes modifying IDB more attractive.
(In reply to Jonas Sicking (:sicking) from comment #3)
> How are we currently querying IDB? Are we sending multiple .get() requests
> at once so that they are all queued up on the IDB IO thread? Or are we
> waiting for the result from one .get() before we send the next .get()?

In this case ContactDB.jsm issues all of the get() requests at once in a single transaction:

  http://mxr.mozilla.org/mozilla-central/source/dom/contacts/fallback/ContactDB.jsm#55

From what I understand, however, if we could get locale-based sorting (bug 871846), then this code path in ContactDB.jsm would just go away.  That being said, querying for an arbitrary set of keys does seem like a reasonable general use case.

> Are there other ways we could get smarter in the IDB implementation without
> changing the API? One way might be to detect that we're getting multiple
> .get() requests to the same objectStore/index one after another and turn
> that into a IN query behind the scenes?

I like this idea.  It seems, however, we would have to pay some penalty in latency on get().  If a nextTick() is a reasonable cost to pay, then it might be relatively straight forward to detect and covert to an "IN" query.

I still think it would be optimal if we could enhance the API since that allows the client code to be more explicit and therefore would avoid a general penalty on get().

In any case, I can circle back to prototype this approach after investigating some other ContactsAPI changes.

> If there's a 10% performance win here that we can't get any other way, then
> that certainly makes modifying IDB more attractive.

I should probably clarify that the performance change depends on where you measure.  The 10% improvement is based on measurements at the ContactsAPI public interface.  Measuring at the IDB public interface I see a 25% improvement.  The ContactsAPI has some leakage due to additional sorting work it needs to perform (bug 871846).
Also, bent suggested I email the w3c public-webapps list for feedback on the API change.

For those following along, there is some previous w3c discussion here:

  https://www.w3.org/Bugs/Public/show_bug.cgi?id=16595

It seems like the past conclusion was "not now", but there was no strong objection.
(In reply to Ben Kelly [:bkelly] from comment #4)
> Measuring at the IDB public interface I see a 25%
> improvement.

Wait, what exactly are we talking about here?

Are you're saying that the difference between:

  objectStore.get(key1);
  objectStore.get(key2);
  objectStore.get(key3);

and:

  objectStore.get(IDBKeyRange.inList([key1, key2, key3]));

is a 25% speedup?
(In reply to ben turner [:bent] from comment #6)
> (In reply to Ben Kelly [:bkelly] from comment #4)
> > Measuring at the IDB public interface I see a 25%
> > improvement.
> 
> Wait, what exactly are we talking about here?
> 
> Are you're saying that the difference between:
> 
>   objectStore.get(key1);
>   objectStore.get(key2);
>   objectStore.get(key3);
> 
> and:
> 
>   objectStore.get(IDBKeyRange.inList([key1, key2, key3]));
> 
> is a 25% speedup?

Yes, for N=1000 keys which is a target workload for ContactsAPI. I can double check my math when I get back to my laptop later today.
So to be more specific, here is where I am seeing ~25% speed up.  (Sorry if this is too much detail):

First, look at this google doc for some raw data:  http://bit.ly/125yjoH

We are interested in:

  Runs 1 and 2:  clean m-c + profiling printf's
  Run 9: m-c + attachment 750045 [details] [diff] [review] + attachment 750055 [details] [diff] [review] + profiling printf's

The two profiling points we are interested in and where they are located:

  Row 15 - ContactDB.js-sendChunk-CACHE:
    http://mxr.mozilla.org/mozilla-central/source/dom/contacts/fallback/ContactDB.jsm#50

  Row 19 - ContactService.js-receiveMessage-DBCALLBACK:
    http://mxr.mozilla.org/mozilla-central/source/dom/contacts/fallback/ContactService.jsm#103

Adding up Rows 16 to 19 for each run:

  Run 1: 2001ms
  Run 2: 1936ms
  Run 9: 46 + 1206 + 253 = 1505ms

So Run 9 completes 24.7% faster than Run 1 and 22.2% faster than Run 2.

Granted, this is not a strictly clean measurement of the time to the start of the IDB callback.  I found this was difficult to do with the get() approach since the printf() overhead became significant when it was called 1000 times.

The test data used here is the reference-workload-heavy set from gaia master.  The test device is a Buri.

The original ~10% improvement from comment 2 refers to the change in total time on row 25.  On a second look, the drop from 25% to 10% mainly represents a dilution of the time savings over a longer test period.
Whiteboard: c= → c= p=5
No longer blocks: 865747
As requested I've broken out a test that can be run in isolation in the desktop firefox.  I stuck it on github since its easy to self host with ghpages:

  http://blog.wanderview.com/idb-inlist-test/

Its pretty crude and I'm obviously not a designer, but it works to provide a portable test across different browsers and platforms.

On my MBPr quad core with nightly built with attachment 750045 [details] [diff] [review], I get the following results.

  cursor: 329ms   [1]
  get:    88ms    [2]
  getAll: 71ms    [3]
  inList: 44ms    [4]

Values are mean time to query 250 keys 10 times using the given method.

If you want to modify the tests, configuration, or data you will need to clone the github repo.

I've undoubtedly done something wrong in the tests somewhere.  Pull requests or other feedback welcome.

[1]: https://github.com/wanderview/idb-inlist-test/blob/gh-pages/scripts/tests/cursor.js
[2]: https://github.com/wanderview/idb-inlist-test/blob/gh-pages/scripts/tests/get.js
[3]: https://github.com/wanderview/idb-inlist-test/blob/gh-pages/scripts/tests/getAll.js
[4]: https://github.com/wanderview/idb-inlist-test/blob/gh-pages/scripts/tests/inList.js
blocking-b2g: --- → koi?
Keywords: perf
Whiteboard: c= p=5 → c= p=5 ,
Priority: -- → P2
Blocks: 888666
needinfo : mlee to check if this is a koi blocker.
Flags: needinfo?(mlee)
My personal opinion is this won't make it for v1.2.
Not a koi blocker.
blocking-b2g: koi? → -
Flags: needinfo?(mlee)
Whiteboard: c= p=5 , → [c= p=5 s= u=]
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #6)
> Are you're saying that the difference between:
> 
>   objectStore.get(key1);
>   objectStore.get(key2);
>   objectStore.get(key3);
> 
> and:
> 
>   objectStore.get(IDBKeyRange.inList([key1, key2, key3]));
> 
> is a 25% speedup?

After a year of pondering and using sqlite more, I think I can explain this.

While the keys are stored in an sqlite index, this does not make lookup O(c).  Instead, we're looking at O(log n) to lookup a key in the index.

Since the separate get() calls each use a separate SQL statement, sqlite is forced to re-search the index each time.  So essentially O(n log n) behavior.

With a single SQL statement that uses an IN clause, the sqlite query engine can scan the index only once.  Depending on the exact data used this ends up between O(log n) to O(n) at the worst case.

Ben, do you know if something like KeyRange.inList() is still on the table for IDB 2.0?
Flags: needinfo?(bent.mozilla)
Joshua Bell tells me supporting something like KeyRange.inList() is still being considered.  They have some exploring various API enhancements including this feature.
Flags: needinfo?(bent.mozilla)
Priority: P2 → P3
Assignee: ben → nobody
Status: ASSIGNED → NEW
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: