Closed Bug 866238 Opened 12 years ago Closed 5 years ago

implement simple key-value store module on top of indexedDB for storing small amounts of data

Categories

(Toolkit :: General, enhancement)

enhancement
Not set
normal
Points:
13

Tracking

()

RESOLVED DUPLICATE of bug 1490496

People

(Reporter: Gavin, Assigned: gal)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [Async:team])

Attachments

(2 files)

The goal is to stop using preferences for things that aren't actually configuration options, or that aren't required early during startup.

The API should be asynchronous, and fairly simple. bsmedberg proposes:
> ChromeData.get('key', function(value) {
>   // null if unset
> });
> ChromeData.set('key', value [, function()]); // asynchronous

But we may need additional methods like "has".

Some example prefs that should probably be switched to using this module:
- app.update.lastUpdateTime.* and similar timestamp prefs
- browser.download.lastDir
- browser.migration.version
- browser.newtabpage.blocked
- extensions.installCache
- many more
Looping in Perf early. IMO we should have a Perf/design review before any code is written so we know what we're dealing with.
Assignee: nobody → gal
If anyone would like to steal this, please go ahead. Otherwise I will try to get to this over the next few days, since I have been bitching about the use of Prefs for this.
What's the rationale for using IndexedDB and not, say, Sqlite.jsm?
(In reply to David Rajchenbach Teller [:Yoric] from comment #3)
> What's the rationale for using IndexedDB and not, say, Sqlite.jsm?

* SQLite (even through Sqlite.jsm) is full of foot guns. IndexedDB shields those from us.
* Concerns over separate vs shared databases. (Should we have one unified database for all of Gecko or separate databases? If the latter, what are the perf implications?)

That being said, if we're writing a wrapper, I suppose we could have it talk to Sqlite.jsm instead of IndexedDB. That would be an interesting discussion. On one hand Sqlite.jsm would be reinventing the wheel. On the other, fewer levels of abstraction and thus better performance? I don't know enough about IndexedDB's implementation to weigh in here.
(In reply to Andreas Gal :gal from comment #2)
> If anyone would like to steal this, please go ahead. Otherwise I will try to
> get to this over the next few days, since I have been bitching about the use
> of Prefs for this.

Andreas: While I share your enthusiasm for fixing this and trust that your implementation would be solid, I think we should hold off writing code until we have a little more buy-in. If this is going to be *the* generic storage API for Gecko, we arguably have one shot to get it right before people start using it and before we need to worry about things like data migrations, API compatibility, etc. If we get it wrong, it seems like there would be a steep price to pay.

I think strawman API designs would be great. But I fear anything more carries significant risk of wasting time.

You obviously know what you are doing. Perhaps I am being too risk averse [due to my unfamiliarity with IndexedDB]?
Blocks: 865893
David wrote an indexedDB wrapper in gaia to help people move out of localStorage. This could be a good start to make a chrome version. It's at https://github.com/mozilla-b2g/gaia/blob/master/shared/js/async_storage.js
bsmedberg had it right. Lets start with a trivial API and iterate from there.
Attached patch patch, untestedSplinter Review
API:

Registry.set(key, value, success, error);
Registry.get(key, success, error);

This obviously needs a bunch of tests.
Comment on attachment 742661 [details] [diff] [review]
patch, untested

Review of attachment 742661 [details] [diff] [review]:
-----------------------------------------------------------------

Minor drive-by nits.

::: toolkit/modules/Registry.jsm
@@ +20,5 @@
> +const REGISTRYDB_NAME = "registry";
> +const REGISTRYDB_VERSION = 1;
> +const REGISTRYSTORE_NAME = "registry";
> +
> +this.Registry = function Registry(aGlobal) {

First, we don't name functions after assignments any more. The engine and tools are smart enough to figure out the name based on what's to the left of the '=' or ':'. That's as of Firefox 19 or so IIRC.

Second, some of us are moving away from the C++ style of prefixing arguments with "a" inside JS. I can't remember if Toolkit has officially adopted it yet. But I know I've landed some a-less code in Toolkit without anybody saying anything :)

@@ +25,5 @@
> +  this._global = aGlobal;
> +}
> +
> +Registry.prototype = {
> +  __proto__: IndexedDBHelper.prototype,

I prefer to avoid the "fragile base class" problem and wrap instead of inheriting. If we're going with inheritance, you'll want a IndexedDBHelper.call(this) in the ctor.

@@ +35,5 @@
> +  upgradeScheme: function upgradeSchema(aTransaction, aDb, aOldVersion, aNewVersion) {
> +    aDb.createObjectStore(REGISTRYSTORE_NAME, { keyPath: "key" });
> +  },
> +
> +  set: function set(aKey, aValue, aSuccessCb, aErrorCb) {

So, are we going to go with callbacks or promises or both?

I think I'd like a promise version at some point so we can work Task.jsm magic:

Task.spawn(function () {
    yield Registry.set("foo", "bar");
    yield Registry.set("biz", "baz");
});

@@ +57,5 @@
> +                  };
> +                },
> +                aSuccessCb,
> +                aErrorCb);
> +  }

Should we throw in multiget and multiset while we're here?
(In reply to Gregory Szorc [:gps] from comment #11)

> I can't remember if Toolkit has officially adopted it
> yet. But I know I've landed some a-less code in Toolkit without anybody
> saying anything :)

Toolkit strictly hasn't, but *a lot* of thought leaders have declared that it's time for it to die, and broad swathes of the tree (particularly newer code that isn't in an existing file with the old convention) are doing modern two-space all-braced no-polish style.

I strongly advocate for doing the modern thing here. But as Gavin would say, it's not worth worrying about, particularly at this stage. Let's leave syntax for now.


> So, are we going to go with callbacks or promises or both?

If we go with promises, we ought to wait for Bug 810490. If we go with callbacks, it should be trivial to layer promises on top. I agree that this ought to be decided sooner rather than later.


> Should we throw in multiget and multiset while we're here?

That raises interesting error handling and rollback questions. (The kinds of thing that aren't often considered in straw-man API sketches, alas.)

I definitely think we should consider those questions. Is versioning/transactionality/ref semantics something we're considering in scope?


(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #0)

> But we may need additional methods like "has".

Personally I'm in favor of 'has', but someone with a little more clarity of thought than I have on a Friday night should think about concurrent safety. Can we guarantee the correctness of

  let hasIt = yield storage.has("foo");
  if (!hasIt) {
    yield storage.put("foo", "bar");
  }

without some kind of synchronization (and the consequent possibility of deadlock)? My instinct is no, but I'd be happy to be proved wrong.

> Some example prefs that should probably be switched to using this module:

* All of Sync
* All of PICL (CCing ckarlof as a dispatcher; better tools are on the horizon, Chris!)
* FHR state? gps is in the process of rolling up a file-based state blob right now...

"Many more" indeed...
(In reply to Richard Newman [:rnewman] from comment #12)
> > But we may need additional methods like "has".
> 
> Personally I'm in favor of 'has', but someone with a little more clarity of
> thought than I have on a Friday night should think about concurrent safety.
> Can we guarantee the correctness of
> 
>   let hasIt = yield storage.has("foo");
>   if (!hasIt) {
>     yield storage.put("foo", "bar");
>   }
> 
> without some kind of synchronization (and the consequent possibility of
> deadlock)? My instinct is no, but I'd be happy to be proved wrong.

If my understanding of the spec is correct, all I/O operations must be attached to a transaction. And, the behavior of multiple transactions is clearly defined and IMO is sane. I'm still trying to figure out how deadlock is handled. But alas.

In your above example, this would be guaranteed safe if all the operations were performed in a single transaction. But, in the current straw-man, since every function call is a new transaction, there would be no guarantee.

Poking around, I see IndexedDB is implemented on top of SQLite. It's using transactions and saved points for IndexedDB transactions. With FHR's SQLite performance tuning fresh in my head, I know that decreasing the number of transactions can help performance a lot. Therefore I think we should design the API to encourage transactions so consumers are put in the best position for optimal performance.

Speaking of SQLite, it appears IndexedDB is not using SQLite's WAL journal mode. This means every IndexedDB transaction will fsync(). That seems... bad. I'm more interested than ever in what the Perf team has to say about using IndexedDB.
Flagging! :D
Flags: needinfo?(taras.mozilla)
(In reply to Richard Newman [:rnewman] from comment #12)
> Personally I'm in favor of 'has', but someone with a little more clarity of
> thought than I have on a Friday night should think about concurrent safety.

I think this is an issue with or without "has":

  let n = yield storage.get("count");
  storage.set("count", n + 1);

We may need to expose some control over transactions to handle these use cases in general.  Whether we do that or not, we'll need some big warning labels plastered all over so developers can learn about these hazards.
I am not enthusiastic about this first version of Registry.jsm, but perhaps the issue is actually main thread IndexedDB.

What I generally want to with data do is:
1. read data (asynchronously);
2. progressively accumulate data (asynchronously);
3. at some point, effectively write my changes to the disk (asynchronously).

With Registry.jsm/main thread IndexedDB, the only support I have for accumulating data is to keep it in memory. Once I have all my data (think 60Mb of session restore), effectively writing my changes to disk requires a main thread serialization that effectively blocks the main thread.

By contrast, with off main thread IndexedDB, we could send the data progressively to a worker as it is gathered, hence removing the expensive one-time serialization cost.
One thing I forgot to mention: workers are good^H^H^H^Ha critical necessity in the very near future. We need to make sure that any API we design is worker-friendly.

Attaching an alternative API draft, more worker-oriented.
Attachment #742953 - Flags: feedback?(gps)
Attachment #742953 - Flags: feedback?(gal)
And yes, I realize that I have somehow shifted the conversation from "small amounts of data" to "large amounts of data", if you believe that this not appropriate, I will open another bug.
I think if holding the data in memory is a concern, this mid-size/everyday storage is not the right solution to begin with? Either way we should probably be super clear about which problem this is intended to solve. I expect at some point there will be an MDN page with guidance on making storage choices. Drafting that now might keep us from wandering off into the weeds
(In reply to Sam Foster [:sfoster] from comment #19)
> I think if holding the data in memory is a concern, this mid-size/everyday
> storage is not the right solution to begin with?

Just for clarification: my problem is not holding data in memory. My problem is serializing lots of data at once.
(In reply to David Rajchenbach Teller [:Yoric] from comment #20)
> Just for clarification: my problem is not holding data in memory. My problem
> is serializing lots of data at once.

I think session restore might be large and special enough to warrant its own storage considerations.

I'm going to assert that what we want this bug to be about a prefs-like key-value store for small to medium values (I'll arbitrarily pick <1M). If it scales to larger values and can service the needs of consumers like session restore, great. But if there is any conflict, we should side with the "simple" consumers.
Comment on attachment 742953 [details] [diff] [review]
Alternative API draft

Review of attachment 742953 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/modules/Registry.jsm
@@ +51,5 @@
> +   * Write to disk all the changes since the last call to |flush()|.
> +   *
> +   * @return {promise}
> +   */
> +  flush: function() {

By introducing flush, we're in the same boat as preferences are today: uncertainty of mutation persistence.

* How often does the store automatically flush?
* What happens when a shutdown occurs before auto flush?
* What happens during Windows fast shutdown?
* What happens when an auto flush occurs in the middle of a client "transaction." (Hint: You don't expose an explicit transaction API so you could have mixed state on disk.)

This will inevitably manifest in "I really need to ensure my writes are persisted - I'll just force a flush." Then, enough people start manually flushing and a goal of your API is undermined.

It also makes the implementation more complicated as now you need to reconcile differences between the DB and the in-memory cache.

Call me naive, but why can't we rely on IndexedDB (or whatever we use) to do its job and "just work" and not require consumers to worry about things like this?
Attachment #742953 - Flags: feedback?(gps)
Please folks, check out https://github.com/mounirlamouri/storage.js and see what it's missing. If we need more for chrome then web apps will probably need it as well. Let's try to not invent more than one new thing here.
(In reply to ben turner [:bent] from comment #23)
> Please folks, check out https://github.com/mounirlamouri/storage.js and see
> what it's missing. If we need more for chrome then web apps will probably
> need it as well. Let's try to not invent more than one new thing here.

Both storage.js and gal's patch have race issues as discussed in comment 12 and comment 14.  (storage.js even has an "increment" example in the README that breaks if you run it in a "for" loop.)

storage.js lacks any way for callers to detect or handle errors.

As discussed in comment 11 and comment 12, many developers will want an API based on promises.  I'd even suggest that this is the only API we need, since it would support both Task.jsm style and "traditional" callback style:

  let value = yield get('key');
  doSomething(value);

  // or...
  get('key').then(function(value) {
    doSomething(value);
  });
(In reply to Matt Brubeck (:mbrubeck) from comment #24)
> Both storage.js and gal's patch have race issues as discussed in comment 12
> and comment 14.  (storage.js even has an "increment" example in the README
> that breaks if you run it in a "for" loop.)

Yeah, this is a long known issue with localStorage... Except that most web devs either don't care or don't need to do this kind of thing. Most don't want to bother with transactions, so I don't know that we should care too much. (If you want transactions then maybe indexedDB is the best choice)

> storage.js lacks any way for callers to detect or handle errors.

Well, I think we've convinced ourselves that handling errors is not something that web devs do really well in the first place... However, aside from Quota/IO errors there really shouldn't be any. And what can a web page do in response to such errors besides log it in the console?

> As discussed in comment 11 and comment 12, many developers will want an API
> based on promises.

It's unclear to me that we can do this and still have storage.js be used in all browsers. If the promise stuff is something that we can't support there then we should be able to do the extra bit of magic in our JSM.
For the record. I think this is a very bad idea. 
gal's api requires fsyncs on every write. Yoric's requires explicit flushing. I don't see how either solution is an improvement over writing atomicly JSON to indiv files.
Flags: needinfo?(taras.mozilla)
(In reply to ben turner [:bent] from comment #25)

> Yeah, this is a long known issue with localStorage... Except that most web
> devs either don't care or don't need to do this kind of thing. Most don't
> want to bother with transactions, so I don't know that we should care too
> much. (If you want transactions then maybe indexedDB is the best choice)

It might be stating the obvious, so this is more for those following along at home, but browser features very different from the kind of "web dev" development that you're describing. We very much care about transactions, about error handling, and so on.

And in reverse:

> And what can a web page
> do in response to such errors besides log it in the console?

As I understand it, supporting web pages for this feature is an explicit non-goal. That web pages and Firefox both use JS is something of a coincidence. A feature like Firefox Health Report does lots of things with errors, including both log4moz logging *and* submitting them as part of Health Report payloads.

> It's unclear to me that we can do this and still have storage.js be used in
> all browsers. If the promise stuff is something that we can't support there
> then we should be able to do the extra bit of magic in our JSM.

So this sounds like storage.js is not suitable for achieving the goal of this bug. It is:

* Non-transactional, has race issues
* Doesn't provide extensive error handling
* Needs to be cross-platform.

whereas we want something:

* Transactional, async, worker-friendly, fast
* Robust, with error handling and sound semantics
* Explicitly designed for supporting Firefox browser components.

Do you agree?
(In reply to ben turner [:bent] from comment #25)
> > Both storage.js and gal's patch have race issues as discussed in comment 12
> > and comment 14.
> 
> Yeah, this is a long known issue with localStorage... Except that most web
> devs either don't care or don't need to do this kind of thing.

This bug is specifically about providing a better solution for chrome code that currently uses nsIPreferService for data storage (see comment 0).

I think proper handling of races is a requirement for some of those use cases, especially the cases where we make incremental changes to JSON data stored in a pref (browser.newtabpage.blocked, extensions.installCache, etc.).  It's hard to write those safely in an API like storage.js.

As you said yourself, "if we need more for chrome then web apps will probably need it as well" so maybe we are underestimating web developers' needs.

> (If you want transactions then maybe indexedDB is the best choice)

I agree.  This bug is specifically about implementing a key-value API on top of IndexedDB, and I think it would be useful for this API to optionally allow callers to do a read and subsequent write within a single IDBTransaction.

I think this could be made very simple to use; for example a transaction() method that accepts a promise could be used like this:

  storage.transaction(Task(function() {
    let value = yield storage.get("key");
    yield storage.set("key", value + 1);
  }));

> Well, I think we've convinced ourselves that handling errors is not
> something that web devs do really well in the first place...
> However, aside from Quota/IO errors there really shouldn't be any. And
> what can a web page do in response to such errors besides log it in the
> console?

It would be useful to *at least* throw an exception rather than silently swallowing the error.  I think web developers are quite used to the idea that APIs they call may result in an error.

The Promises API has an optional onError callback built in, and Task.jsm automatically converts errors to exceptions.  Gal's Registry.jsm patch also supports an error callback (which could be made optional, to better support the cases where we don't care about errors).  I don't get the argument for not providing this facility.

> It's unclear to me that we can do this and still have storage.js be used in
> all browsers. If the promise stuff is something that we can't support there
> then we should be able to do the extra bit of magic in our JSM.

Promises are easily used cross-browser.  jQuery implements and uses them, for example.  Are there specific issues you're thinking of that would prevent cross-browser code from using Promises?

There are already cross-browser IndexedDB wrappers like DB.js that use Promises as implemented provided by jQuery or other libraries.
I think we should take a step back from discussing API implementation details, read Taras's posts on dev.platform regarding perf and simplicity, and reconsider whether IndexedDB is the right tool for the job.
(In reply to Gregory Szorc [:gps] from comment #29)
> I think we should take a step back from discussing API implementation
> details, read Taras's posts on dev.platform regarding perf and simplicity,
> and reconsider whether IndexedDB is the right tool for the job.

I agree, lets put this bug on hold until we have have concrete requirements for what the IO needs are.
I filed this bug with pretty specific use cases in mind: people currently using prefs for small amounts of data that shouldn't be. So the requirements are a model pretty roughly equivalent to how prefs currently work, minus the drawbacks mentioned in Greg's dev.platform post:

> a) Poor durability guarantees. See bugs 864537 and 849947 for real-life
> issues. tl;dr writes get dropped!
> b) Integers limited to 32 bit (JS dates overflow b/c milliseconds since
> Unix epoch).
> c) I/O is synchronous.
> d) The whole method for saving them to disk is kind of weird.
> e) The API is awkward. See Preferences.jsm for what I'd consider a better API.
> f) Doesn't scale for non-trivial data sets.
> g) Clutters about:config (all preferences aren't config options).

(Apart perhaps for f, which isn't really an issue for the consumers targeted here. And to c) I would add "and blocks startup on the main thread", which is unnecessary for many of the preferences in question here.)

It may not actually be cost-effective to actually migrate all prefs that could potentially make use of this new system, given the cost of migration in some cases. But it would be nice for this system to exist for new users/add-ons at the very least. Losing about:config for easy hackability might also be a hard sell (unless we introduce about:simpledata for managing data stored in this new system, or something).

Lazy initialization and only reading once into memory (like prefs) seems obvious, but for writes it's less clear. Periodic automatic writes? Write-on-set? Batched write-on-set? Write on shutdown only? Some combination?

(It's not clear to me how may of the current durability issues with prefs are caused by it by default only writing on shutdown and requiring explicit savePrefFile() calls otherwise.)
Bug 836263 is also an interesting read, if you're interested in how our APIs are (ab)used by extensions in practice.
So many talented cooks in this kitchen, must be a special episode of Iron Chef! :)

One brief comment: I'm a little bit concerned that we're conflating requirements -- async/OMT IO vs. an API with callbacks. Making the API async incurs a cost (of simplicity/convenience), anywhere from "minor annoyance" to "can't use it here". I don't understand the use-cases this is intended for, so it's hard to judge the tradeoffs.

But if a main goal is for something simple to use, I'd think that tends to argue for a simple "synchronous" API -- with a backend that periodically flushes (OMT/non-blocking) to disk -- and all that implies about guarantees (or lack thereof!) to callers. If a caller wants fancier aspects of ACID, then maybe the "simple key-value store" isn't actually for them.
(In reply to Gregory Szorc [:gps] from comment #22)
> Comment on attachment 742953 [details] [diff] [review]
> Alternative API draft
> 
> Review of attachment 742953 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: toolkit/modules/Registry.jsm
> @@ +51,5 @@
> > +   * Write to disk all the changes since the last call to |flush()|.
> > +   *
> > +   * @return {promise}
> > +   */
> > +  flush: function() {
> 
> By introducing flush, we're in the same boat as preferences are today:
> uncertainty of mutation persistence.
> 
> * How often does the store automatically flush?
> * What happens when a shutdown occurs before auto flush?
> * What happens during Windows fast shutdown?
> * What happens when an auto flush occurs in the middle of a client
> "transaction." (Hint: You don't expose an explicit transaction API so you
> could have mixed state on disk.)

Simple answer: nothing is auto-flushed, ever.

> This will inevitably manifest in "I really need to ensure my writes are
> persisted - I'll just force a flush." Then, enough people start manually
> flushing and a goal of your API is undermined.

Well, I'm ok with every add-on flushing its subset (assuming one consumer == one file). Unless I'm mistaken, that's still widely better than what mozStorage or indexedDB do out of the box.

> It also makes the implementation more complicated as now you need to
> reconcile differences between the DB and the in-memory cache.

In my proposal, nothing is shared between workers unless it has first been flushed. So, no, that doesn't make things complicated.
(In reply to David Rajchenbach Teller [:Yoric] from comment #34)
> Well, I'm ok with every add-on flushing its subset (assuming one consumer ==
> one file). Unless I'm mistaken, that's still widely better than what
> mozStorage or indexedDB do out of the box.

I'm not sure that one consumer == one file is optimal, if there will be many consumers who only want to store one or two values (most of the examples given in this bug so far).
Whiteboard: [Async] → [Async:team]
If I think this should be in our backlog so that we fix it, is adding it as blocking the backlog bug enough? :-)
Flags: needinfo?(gavin.sharp)
Yes, in general. But like any other bug, if you want other people to fix it you might need to make a case for it :)
Flags: needinfo?(gavin.sharp)
Attachment #742953 - Flags: feedback?(gal)
No longer blocks: fxdesktopbacklog
Flags: firefox-backlog+
Whiteboard: [Async:team] → [Async:team] p=13 [qa?]
Points: --- → 13
Whiteboard: [Async:team] p=13 [qa?] → [Async:team]
See Also: → 1134265
Blocks: 577606

Justin, should we wontfix this bug?

Type: defect → enhancement
Flags: needinfo?(dolske)

nsIKeyValueService for C++ and kvstore.jsm for JS is a thing now! It's backed by LMDB, not IDB, but the API is in line with the discussion in this bug.

Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(dolske)
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: