Open Bug 872741 Opened 12 years ago Updated 3 years ago

Consider adding support for arbitrary key list queries to IndexedDB

Tracking

()

Status:

NEW

Project Flags:

blocking-b2g

Tracking Flags:

Tracking

Status

b2g18

---

People

(Reporter: bkelly, Unassigned)

References

Details

(Keywords: perf, Whiteboard: [c= p=5 s= u=])

Attachments

(2 files)

IDBKeyRange.inList([]) proof-of-concept 12 years ago Ben Kelly [:bkelly, not reviewing] 28.33 KB, patch		Details \| Diff \| Splinter Review
Example change to ContactsDB to use IDBKeyRange.inList([]) 12 years ago Ben Kelly [:bkelly, not reviewing] 5.47 KB, patch		Details \| Diff \| Splinter Review

Ben Kelly [:bkelly, not reviewing]

Reporter

Description

•

12 years ago

Attached patch IDBKeyRange.inList([]) proof-of-concept — Details — Splinter Review

Currently its possible to query IDB using a specific key or in a bounded, ordered range. It would also be nice to support querying for an arbitrary list of keys. In SQL, for example, this is done with syntax like "SELECT * FROM my_table WHERE key IN (1, 2, 3)". The attached patch provides a proof-of-concept which extends IDBKeyRange to offer this support. For example: var keyRange = IDBKeyRange.inList([245, 9106, 123]); var request = objectStore.mozGetAll(keyRange);

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 1

•

12 years ago

Attached patch Example change to ContactsDB to use IDBKeyRange.inList([]) — Details — Splinter Review

This patch shows a potential use case for this support. It modifies ContactsDB.jsm to use inList() when retrieving contacts from a cached query. Testing shows that when loading 1000 contacts this improves load times from ~4910ms to ~4366ms. This is just over a 10% improvement. Time to retrieve the first contact remains the same at ~370ms in these tests.

Ben Kelly [:bkelly, not reviewing]

Reporter

Updated

•

12 years ago

Assignee: nobody → bkelly

Whiteboard: c=

Ben Kelly [:bkelly, not reviewing]

Reporter

Updated

•

12 years ago

Blocks: 865747

Mike Lee [:mlee]

Comment 2

•

12 years ago

Blocks bug 865747 which is currently tracking.

Status: NEW → ASSIGNED

tracking-b2g18: --- → +

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 3

•

12 years ago

How are we currently querying IDB? Are we sending multiple .get() requests at once so that they are all queued up on the IDB IO thread? Or are we waiting for the result from one .get() before we send the next .get()? Are there other ways we could get smarter in the IDB implementation without changing the API? One way might be to detect that we're getting multiple .get() requests to the same objectStore/index one after another and turn that into a IN query behind the scenes? If there's a 10% performance win here that we can't get any other way, then that certainly makes modifying IDB more attractive.

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 4

•

12 years ago

(In reply to Jonas Sicking (:sicking) from comment #3) > How are we currently querying IDB? Are we sending multiple .get() requests > at once so that they are all queued up on the IDB IO thread? Or are we > waiting for the result from one .get() before we send the next .get()? In this case ContactDB.jsm issues all of the get() requests at once in a single transaction: http://mxr.mozilla.org/mozilla-central/source/dom/contacts/fallback/ContactDB.jsm#55 From what I understand, however, if we could get locale-based sorting (bug 871846), then this code path in ContactDB.jsm would just go away. That being said, querying for an arbitrary set of keys does seem like a reasonable general use case. > Are there other ways we could get smarter in the IDB implementation without > changing the API? One way might be to detect that we're getting multiple > .get() requests to the same objectStore/index one after another and turn > that into a IN query behind the scenes? I like this idea. It seems, however, we would have to pay some penalty in latency on get(). If a nextTick() is a reasonable cost to pay, then it might be relatively straight forward to detect and covert to an "IN" query. I still think it would be optimal if we could enhance the API since that allows the client code to be more explicit and therefore would avoid a general penalty on get(). In any case, I can circle back to prototype this approach after investigating some other ContactsAPI changes. > If there's a 10% performance win here that we can't get any other way, then > that certainly makes modifying IDB more attractive. I should probably clarify that the performance change depends on where you measure. The 10% improvement is based on measurements at the ContactsAPI public interface. Measuring at the IDB public interface I see a 25% improvement. The ContactsAPI has some leakage due to additional sorting work it needs to perform (bug 871846).

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 5

•

12 years ago

Also, bent suggested I email the w3c public-webapps list for feedback on the API change. For those following along, there is some previous w3c discussion here: https://www.w3.org/Bugs/Public/show_bug.cgi?id=16595 It seems like the past conclusion was "not now", but there was no strong objection.

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 6

•

12 years ago

(In reply to Ben Kelly [:bkelly] from comment #4) > Measuring at the IDB public interface I see a 25% > improvement. Wait, what exactly are we talking about here? Are you're saying that the difference between: objectStore.get(key1); objectStore.get(key2); objectStore.get(key3); and: objectStore.get(IDBKeyRange.inList([key1, key2, key3])); is a 25% speedup?

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 7

•

12 years ago

(In reply to ben turner [:bent] from comment #6) > (In reply to Ben Kelly [:bkelly] from comment #4) > > Measuring at the IDB public interface I see a 25% > > improvement. > > Wait, what exactly are we talking about here? > > Are you're saying that the difference between: > > objectStore.get(key1); > objectStore.get(key2); > objectStore.get(key3); > > and: > > objectStore.get(IDBKeyRange.inList([key1, key2, key3])); > > is a 25% speedup? Yes, for N=1000 keys which is a target workload for ContactsAPI. I can double check my math when I get back to my laptop later today.

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 8

•

12 years ago

So to be more specific, here is where I am seeing ~25% speed up. (Sorry if this is too much detail): First, look at this google doc for some raw data: http://bit.ly/125yjoH We are interested in: Runs 1 and 2: clean m-c + profiling printf's Run 9: m-c + attachment 750045 [details] [diff] [review] + attachment 750055 [details] [diff] [review] + profiling printf's The two profiling points we are interested in and where they are located: Row 15 - ContactDB.js-sendChunk-CACHE: http://mxr.mozilla.org/mozilla-central/source/dom/contacts/fallback/ContactDB.jsm#50 Row 19 - ContactService.js-receiveMessage-DBCALLBACK: http://mxr.mozilla.org/mozilla-central/source/dom/contacts/fallback/ContactService.jsm#103 Adding up Rows 16 to 19 for each run: Run 1: 2001ms Run 2: 1936ms Run 9: 46 + 1206 + 253 = 1505ms So Run 9 completes 24.7% faster than Run 1 and 22.2% faster than Run 2. Granted, this is not a strictly clean measurement of the time to the start of the IDB callback. I found this was difficult to do with the get() approach since the printf() overhead became significant when it was called 1000 times. The test data used here is the reference-workload-heavy set from gaia master. The test device is a Buri. The original ~10% improvement from comment 2 refers to the change in total time on row 25. On a second look, the drop from 25% to 10% mainly represents a dilution of the time savings over a longer test period.

Ben Kelly [:bkelly, not reviewing]

Reporter

Updated

•

12 years ago

Whiteboard: c= → c= p=5

Ben Kelly [:bkelly, not reviewing]

Reporter

Updated

•

12 years ago

No longer blocks: 865747

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 9

•

12 years ago

As requested I've broken out a test that can be run in isolation in the desktop firefox. I stuck it on github since its easy to self host with ghpages: http://blog.wanderview.com/idb-inlist-test/ Its pretty crude and I'm obviously not a designer, but it works to provide a portable test across different browsers and platforms. On my MBPr quad core with nightly built with attachment 750045 [details] [diff] [review], I get the following results. cursor: 329ms [1] get: 88ms [2] getAll: 71ms [3] inList: 44ms [4] Values are mean time to query 250 keys 10 times using the given method. If you want to modify the tests, configuration, or data you will need to clone the github repo. I've undoubtedly done something wrong in the tests somewhere. Pull requests or other feedback welcome. [1]: https://github.com/wanderview/idb-inlist-test/blob/gh-pages/scripts/tests/cursor.js [2]: https://github.com/wanderview/idb-inlist-test/blob/gh-pages/scripts/tests/get.js [3]: https://github.com/wanderview/idb-inlist-test/blob/gh-pages/scripts/tests/getAll.js [4]: https://github.com/wanderview/idb-inlist-test/blob/gh-pages/scripts/tests/inList.js

Mike Lee [:mlee]

Updated

•

12 years ago

blocking-b2g: --- → koi?

Keywords: perf

Whiteboard: c= p=5 → c= p=5 ,

Mike Lee [:mlee]

Updated

•

12 years ago

Priority: -- → P2

Ben Kelly [:bkelly, not reviewing]

Reporter

Updated

•

12 years ago

Blocks: 888666

bhavana bajaj [:bajaj]

Comment 10

•

12 years ago

needinfo : mlee to check if this is a koi blocker.

Flags: needinfo?(mlee)

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 11

•

12 years ago

My personal opinion is this won't make it for v1.2.

Mike Lee [:mlee]

Comment 12

•

12 years ago

Not a koi blocker.

blocking-b2g: koi? → -

Flags: needinfo?(mlee)

Whiteboard: c= p=5 , → [c= p=5 s= u=]

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 13

•

11 years ago

(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #6) > Are you're saying that the difference between: > > objectStore.get(key1); > objectStore.get(key2); > objectStore.get(key3); > > and: > > objectStore.get(IDBKeyRange.inList([key1, key2, key3])); > > is a 25% speedup? After a year of pondering and using sqlite more, I think I can explain this. While the keys are stored in an sqlite index, this does not make lookup O(c). Instead, we're looking at O(log n) to lookup a key in the index. Since the separate get() calls each use a separate SQL statement, sqlite is forced to re-search the index each time. So essentially O(n log n) behavior. With a single SQL statement that uses an IN clause, the sqlite query engine can scan the index only once. Depending on the exact data used this ends up between O(log n) to O(n) at the worst case. Ben, do you know if something like KeyRange.inList() is still on the table for IDB 2.0?

Flags: needinfo?(bent.mozilla)

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 14

•

11 years ago

Joshua Bell tells me supporting something like KeyRange.inList() is still being considered. They have some exploring various API enhancements including this feature.

Flags: needinfo?(bent.mozilla)

Marion Daly [:mdaly]

Updated

•

8 years ago

Priority: P2 → P3

Ben Kelly [:bkelly, not reviewing]

Reporter

Updated

•

7 years ago

Assignee: ben → nobody

Status: ASSIGNED → NEW

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.