Closed Bug 548589 Opened 13 years ago Closed 13 years ago

Implement a simple storage module

Categories

(Add-on SDK Graveyard :: General, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: adw, Assigned: adw)

References

()

Details

Attachments

(1 file, 3 obsolete files)

Implement simple storage as proposed here:
https://wiki.mozilla.org/Labs/Jetpack/Reboot/JEP/104

Most work will be done in bug 548556, a generic object data store.  Simple storage is a thin layer on top that makes it unnecessary to specify a filename.  If in the future we decide to enforce quotas, that'd be done here, too.
Hmm, now that I've written text streams and JSON stores, I look at implementing simple storage using DOM storage instead.  Looks like it can be done in only a handful of lines... plus it automatically obeys private browsing mode and history clearing actions and quotas...

var url = "http://myExtensionId.jetpack.mozilla.com/";
var uri = Cc["@mozilla.org/network/io-service;1"].
          getService(Ci.nsIIOService).
          newURI(url, null, null);
var princ = Cc["@mozilla.org/scriptsecuritymanager;1"].
            getService(Ci.nsIScriptSecurityManager).
            getCodebasePrincipal(uri);
var store = Cc["@mozilla.org/dom/storagemanager;1"].
            getService(Ci.nsIDOMStorageManager).
            getLocalStorageForPrincipal(princ, url);
store.setItem("root", JSON.stringify(root));
root = JSON.parse(store.getItem("root"));
Depends on: 548870
Target Milestone: 0.2 → --
Depends on: 549324
Depends on: 565694
No longer depends on: 549324
Attached patch patch (obsolete) — Splinter Review
Myk, think this is ready for API review and whatever else you'd like to look at.  I want to draw some points to your attention:

* This implements quotas, because as long as this implementation
  is naive -- reading files on the main thread on startup,
  writing out the entire store even when nothing changes -- we
  shouldn't let people save the Web.

* The doc explains it, but `onOverQuota` is a Collection of
  listeners that are notified when you go over quota, and
  `quotaUsage` is a number [0, Infinity) that indicates the
  percentage of quota you're using.  If you're over quota when
  the module is unloaded, your storage is automatically trimmed
  until you're under.

* Quota is 5 MiB (overridable by hidden pref), which AFAICT is
  the default DOM storage quota.

* The `storage` property itself can be set -- not just props on
  it.  That wasn't in the JEP but I thought it might be handy.

* The backing JSON file is stored at
  <profile dir>/jetpack/<packaging.jetpackID>/simple-storage/store.json.
  To be consistent with pref names, maybe there should be a package
  name in there somewhere, but modules may move around in the near
  future, so...
Attachment #445318 - Flags: review?(myk)
Comment on attachment 445318 [details] [diff] [review]
patch

Dave, would you have time to look at the small portion of this patch that uses the 1.9.2 and 1.9.3 extension manager APIs?  It's the UninstallNotifier class at the bottom of simple-storage.js.  I need to know when an extension is uninstalled, and listening for uninstall and cancel seems to work OK...
Attachment #445318 - Flags: review?(dtownsend)
CC'ing Brian in case he has thoughts on the quota setup described in comment 2.
Comment on attachment 445318 [details] [diff] [review]
patch

>+// Notifies when the given extension is being uninstalled.  observer is an
>+// object that must define methods onUninstalling and onCancelled.
>+function UninstallNotifier(addonId, observer) {
>+  this.observer = observer;
>+  try {
>+    Components.utils.import("resource://gre/modules/AddonManager.jsm");
>+  }
>+  catch (err) {}
>+  if (typeof(AddonManager) === "object")
>+    this._init193(addonId);
>+  else
>+    this._init192(addonId);
>+}

I think it might be cleaner to do something like:

if ("@mozilla.org/extensions/manager;1" in Cc)
  this._init192(addonId);
else
  this._init193(addonId);

Then put the import in the 193 method. You shouldn't need to worry about catching it or testing the AddonManager type then.

>+UninstallNotifier.prototype = {
>+
>+  // Triggers the observer's onUninstalling callback.
>+  trigger: function () {
>+    this.observer.onUninstalling();
>+    if (this.unload)
>+      this.unload();
>+  },
>+
>+  // Gecko 1.9.2 implementation.
>+  _init192: function (addonId) {
>+    const self = this;
>+    let obsServ = require("observer-service");
>+    obsServ.add("em-action-requested", function observe(subject, data) {
>+      if (subject instanceof Ci.nsIUpdateItem && subject.id === addonId) {
>+        switch (data) {
>+        case "item-uninstalled":
>+          self.observer.onUninstalling();
>+          break;
>+        case "item-cancel-action":
>+          self.observer.onCancelled();
>+          break;
>+        }
>+      }
>+    });
>+  },
>+
>+  // Gecko 1.9.3+ implementation.
>+  _init193: function (addonId) {
>+    const self = this;
>+    let listener = {
>+      onOperationCancelled: function onOperationCancelled(addon) {
>+        if (addon.id === addonId &&
>+            !(addon.pendingOperations & AddonManager.PENDING_UNINSTALL))
>+          self.observer.onCancelled();
>+      },
>+      onUninstalling: function onUninstalling(addon) {
>+        if (addon.id === addonId)
>+          self.observer.onUninstalling();
>+      }
>+    };
>+    AddonManager.addAddonListener(listener);
>+    this.unload = function unload() AddonManager.removeAddonListener(listener);
>+    unload.ensure(this);
>+  }
>+};

This all looks good. In both cases you may end up seeing onCancelled called multiple times when cancelling disabling of an add-on f.e. but I don't think that is a problem here.
Attachment #445318 - Flags: review?(dtownsend) → review+
Comment on attachment 445318 [details] [diff] [review]
patch

(In reply to comment #2)
> * This implements quotas, because as long as this implementation
>   is naive -- reading files on the main thread on startup,
>   writing out the entire store even when nothing changes -- we
>   shouldn't let people save the Web.

Sure, that makes sense.  It might be worth noting in the docs that quotas may increase or be eliminated in the future, though.  On the other hand, my crystal ball is fairly opaque on the matter, so maybe it's better to leave out such forwarding-looking statements.


> * The `storage` property itself can be set -- not just props on
>   it.  That wasn't in the JEP but I thought it might be handy.

Yup, that sounds good, and it makes perfect sense that it would be initialized to an empty object by default, but you can set it to any value.


> * The backing JSON file is stored at
>   <profile dir>/jetpack/<packaging.jetpackID>/simple-storage/store.json.
>   To be consistent with pref names, maybe there should be a package
>   name in there somewhere, but modules may move around in the near
>   future, so...

Yeah, modules probably will move around.  I'm also not sure there'll always be a 1-1 mapping between modules (or even packages) and the files they store.  I could imagine multiple modules (perhaps even across package boundaries) sometimes sharing files, especially if those files are SQLite databases.

Although there'll probably still always be a principal owner module for any given file.  And in any case this is all speculation, and this path is certainly good for now.


>diff --git a/packages/jetpack-core/docs/simple-storage.md b/packages/jetpack-core/docs/simple-storage.md

>+Introduction

Somewhere in this documentation, perhaps in the introduction (or some "notes"/"recommendations" section), it might be worth recommending against storing serialized JSON in simple storage, since doing so will take up more storage and require more time to stringify/parse unnecessarily.  This could say something like:

Simple storage supports all valid JSON data types, so you can store JSON data
in its parsed form by assigning the JSON data structure to a property
of the `storage` object.  It is not necessary (and not recommended) to stringify
it first to a string and then parse the string when retrieving it from storage,
since doing so would needlessly take up more storage space and require more time
and processing cycles to stringify/parse.


>+You can also set the `storage` property itself.  The types above are also legal
>+here.
>+
>+    simpleStorage.storage = "O frabjous day!";
>+
>+Be careful with this approach, though: You must set the `storage` property on
>+the module, so the following pattern will not work:
>+
>+    // This is no good!
>+    var myStorage = simpleStorage.storage;
>+    myStorage = ":(";

This all makes perfect sense and is consistent with the JavaScript language (and the earlier statement that the `storage` object behaves like a normal JavaScript object), since myStorage is a (reassignable) reference to the *value* of simpleStorage.storage, not the simpleStorage.storage property itself.


>+The simple storage available to your extension is limited.  Currently this limit
>+is five mebibytes (5,242,880 bytes).  You will be notified when you go over
>+quota, and you should respond by reducing the amount of data in storage.  If the
>+user quits the application or unloads your extension while you are over quota,
>+your storage will automatically be trimmed so that it fits.

Given our primary target audience of casual developers, it would probably be better to describe the limit as "five megabytes", even though that is technically incorrect (and mibibytes is much cuter, as names go).  Alternately, you could say something like "this limit is five megabytes (technically mibibytes, i.e. 5,242,880 bytes)."

Also, it might be worth mentioning here that the data size is measured on a JSON-stringified representation of the stored data.

Finally, nit: will be notified -> can be notified (since you aren't actually required to register a listener).


>+To find out how much of your quota you're using, check the module's `quotaUsage`
>+property.  If you're within your quota, it returns a value from 0 to 1,
>+inclusive, which indicates what percentage of quota your storage occupies.  If
>+you're outside your quota, this value will be a number greater than 1.
>+
>+Therefore, when you're notified that you're over quota, remove some storage
>+until your `quotaUsage` is less than or equal to 1.  For example:
>+
>+    simpleStorage.storage = [ /* some long array */ ];
>+    simpleStorage.onOverQuota = function () {
>+      while (simpleStorage.quotaUsage > 1)
>+        simpleStorage.storage.pop();
>+    };

It might be better to have two properties, quota and usage, that API consumers can check to determine how much they are over quota:

    simpleStorage.storage = [ /* some long array */ ];
    simpleStorage.onOverQuota = function () {
      while (simpleStorage.usage > simpleStorage.quota)
        simpleStorage.storage.pop();
    };

That way they can determine the absolute amount by which they have exceeded the quota, which is useful when a consumer has a good sense of the size of each individual item they are storing (f.e. a consumer could use the information to remove multiple items from an array in one fell swoop via `splice` to get back under quota).

It also enables consumers to determine the current quota (which may vary over time or even be eliminated at some point in the future as the implementation warrants) and their current usage (in which they may be interested independent of its relationship to the quota, or because they would like to proactively maintain their storage under quota).


>diff --git a/packages/jetpack-core/lib/simple-storage.js b/packages/jetpack-core/lib/simple-storage.js

>+// These are exported for testing purposes only.
>+collection.addCollectionProperty(exports, "_onWrite");
>+exports._forceWrite = function () manager.forceWrite();
>+exports._simulateUninstall = function () manager.simulateUninstall();

This is ok, although it makes me think we need a better long term solution that gives unit tests access to private data when they need it without the module having to export it to all consumers.

Hmm, some digging suggests this is already possible.  For example, the testSetget test continues to work with the following code change:

-    ss._onWrite = function () {
-      file.remove(storeFilename);
-      test.done();
-    };
+    loader.sandboxes["resource://jetpack-core-jetpack-core-lib/simple-storage.js"].
+           globalScope.manager.jsonStore.writeObservers.add(function () {
+      file.remove(storeFilename);
+      test.done();
+    });

Evil?


>+  // True iff the root is an empty object.

Nit: iff -> if


>diff --git a/packages/jetpack-core/tests/test-simple-storage.js b/packages/jetpack-core/tests/test-simple-storage.js

>+exports.testSetGetRootString = function (test) {
>+  setGetRoot(test, "sho nuff");

Nit: sho' 'nuff! ;-)

All review comments are recommendations, ruminations, or nits, so r=myk!
Attachment #445318 - Flags: review?(myk) → review+
Thanks Myk.

(In reply to comment #6)
> Sure, that makes sense.  It might be worth noting in the docs that quotas may
> increase or be eliminated in the future, though.  On the other hand, my crystal
> ball is fairly opaque on the matter, so maybe it's better to leave out such
> forwarding-looking statements.

I agree with the other hand.  Let's not speculate.

> Somewhere in this documentation, perhaps in the introduction (or some
> "notes"/"recommendations" section), it might be worth recommending against
> storing serialized JSON in simple storage

I think the second paragraph is clear and concise on that point.  But to reinforce it, I'll add more examples of setting values of different types.

> Also, it might be worth mentioning here that the data size is measured on a
> JSON-stringified representation of the stored data.

I'd actually not like to mention JSON at all, since JSON serialization is an implementation detail.  And I don't want to encourage people to resort to manually stringifying their data and painstakingly counting characters.

> It might be better to have two properties, quota and usage, that API consumers
> can check to determine how much they are over quota:

That was my first impulse, and partly for the same reason of counting characters I decided against it.  My hunch is that the common reaction when exceeding quota will be to throw away objects that are known to be small or less important until you're under, and I'm not sure that knowing the absolute delta helps that.  As for keeping under quota to begin with, I would think you'd want to say "I'm 10% under" rather than 2 KB or whatever.  But let's keep an ear to the ground.

> Hmm, some digging suggests this is already possible.  For example, the
> testSetget test continues to work with the following code change:
> 
> -    ss._onWrite = function () {
> -      file.remove(storeFilename);
> -      test.done();
> -    };
> +   
> loader.sandboxes["resource://jetpack-core-jetpack-core-lib/simple-storage.js"].
> +           globalScope.manager.jsonStore.writeObservers.add(function () {
> +      file.remove(storeFilename);
> +      test.done();
> +    });
> 
> Evil?

Muhuhwahaha.
Attached patch patch v2 (obsolete) — Splinter Review
Addresses Myk's comments, fixes some bugs.
Attachment #445318 - Attachment is obsolete: true
Attachment #445532 - Flags: review?(avarma)
My only concern about the quota is that "If you're over quota when the module is unloaded, your storage is automatically trimmed until you're under" sounds functionally equivalent to "we'll randomly corrupt your data until you stop storing so much". There's no good way to "trim" a stored JSON object to a given size that won't cause unknown corruption, so you might as well just delete the whole thing.

I think developer ergonomics would be best served by throwing an exception at the moment of storing the property that causes the total serialized size to grow beyond the quota, but I recognize that this would require a performance-hurting serialize-and-save on every single property set. I don't know of a reasonable way to handle this. The failure indication (exception or over-quota notification) will always come at the wrong time, and seen by the wrong person: worst case is that everything works fine for the original developer, and nothing goes wrong until an end-user has had the add-on installed for a year and finally accumulates enough data to exceed the quota, and they see a meaningless-to-them exception.

Also, I'd be inclined to describe the default quota value as "about 5 megabytes" and perhaps have a footnote with the exact value, rather than trying to teach people about the difference between SI and MiB in the middle of the API docs. (my personal pet-peeves are: confusing people by using "MB" to mean 2^30 despite a clear SI definition to the contrary, and arguing a lot about whether to use "MB" or "MiB" when simply adding the word "about" would resolve the problem quickly :).
(In reply to comment #9)
> storing so much". There's no good way to "trim" a stored JSON object to a given
> size that won't cause unknown corruption, so you might as well just delete the
> whole thing.

Hmm, you might be right.  Do you think the fact that some cases may be better served by not deleting the whole thing is reason enough to trim?  For example, if your storage consists of an array of URLs of the user's history, it seems better to retain as much of that data as possible than to remove all of it.

Maybe it would be helpful to have persistent error flags, like require("simple-storage").trimmedError or purgedError, that are set if those things occurred and you haven't yet un-set them.  They'd just be confirmation that your data is gone, though...

> I think developer ergonomics would be best served by throwing an exception at
> the moment of storing the property that causes the total serialized size to
> grow beyond the quota, but I recognize that this would require a
> performance-hurting serialize-and-save on every single property set.

Yeah, I agree.

> The failure indication (exception or
> over-quota notification) will always come at the wrong time, and seen by the
> wrong person: worst case is that everything works fine for the original
> developer, and nothing goes wrong until an end-user has had the add-on
> installed for a year and finally accumulates enough data to exceed the quota,
> and they see a meaningless-to-them exception.

Yeah... the best we can do with the current patch is hope people read the docs -- or examples on blogs, etc. :\  Can we chalk it up to being a best practice?

Thanks for the doc suggestion.  I'll update it.
> For example, if your storage consists of an array of URLs of the user's
> history, it seems better to retain as much of that data as possible than to
> remove all of it.

Yeah, but the storage API doesn't have a good way to discover this detail of
the data model. If the jetpack code gets an error, it could respond by
truncating the array and trying again, but that requires that it gets the
error at a useful time.

> Maybe it would be helpful to have persistent error flags, like
> require("simple-storage").trimmedError or purgedError, that are set if
> those things occurred and you haven't yet un-set them. They'd just be
> confirmation that your data is gone, though...

Right.

How about adding a .save() or .flush() method? The semantics would be
something like:

 * You can modify properties on the 'storage' object at any time. You will
   always see those new property values when you look at .storage later. Data
   will be persisted to disk just just before the application shuts down, or
   the jetpack is unloaded. It might also be persisted at other times, like
   30 seconds after the last modification occurs.
 * Calling .save()/.flush() will cause the changed data to be persisted
   immediately. If the JSON-encoded data is larger than the quota, then
   .flush() will raise an exception and the persisted form will be left
   untouched.

To trim an array until history fits, the code would use a loop vaguely like
this:

 storage.history.append(newstuff)
 while True:
   try:
     storage.flush()
     break
   except OverQuotaError:
     if storage.history:
       storage.history.pop(0)
     else:
       raise StillWontFitError

For this to work, you'd have to guarantee that each call to .flush() would
keep returning the same error (so that any internal invocations wouldn't lose
the error information).
(In reply to comment #11)
> How about adding a .save() or .flush() method? The semantics would be
> something like:

>  * Calling .save()/.flush() will cause the changed data to be persisted
>    immediately.

Unfortunately as long as the thread calling a save() or flush() is the main thread, this isn't an option.
Comment on attachment 445532 [details] [diff] [review]
patch v2

>+  // Triggers the uninstall notifier and unloads the JsonStore.
>+  simulateUninstall: function manager_simulateUninstall() {
>+    this.unotif.trigger();
>+    this.jsonStore.unload();
>+  }

Nit: might want to mention in the comment that this is used only for unit testing. I had that impression when I saw the word 'simulate', but it might be useful to mention it here just to be clear. Although I suppose a slightly more 'real' way of simulating the uninstall would be to remove this method and make it possible to provide a fake UninstallNotifier object to manager.init(), I suspect the only way to do that would be with module overrides or somesuch, e.g. this.unotif = require("uninstall-notifier").register(...), with a different "uninstall-notifier" substituted for the particular platform version or test suite, but we can probably leave that kind of refactoring for the lifecycle JEP.

>+storeFile.append("jetpack");
>+storeFile.append(packaging.jetpackID);

Is there a reason we're not using require("self").id here instead? I vaguely recall asking you about this before but I forget what the rationale was...

Anyways, other than that, this looks great!
Attachment #445532 - Flags: review?(avarma) → review+
Attached patch patch v2.1 (obsolete) — Splinter Review
Thanks Atul.  This patch uses self.id and makes some of the changes to the uninstall stuff that you suggested.  (Instead of passing a notifier object to init, I pass a notifier class to a newUninstallNotifier() method, since I can't not call init from within the module itself.)

This patch also clarifies parts of the doc and adds a section on private browsing.
Attachment #445532 - Attachment is obsolete: true
Attached patch patch v2.2Splinter Review
As discussed after today's meeting, this patch drops all over-quota data on the floor if the module is unloaded while over quota, rather than trimming the store until it fits.  AFAICT this is what our DOM storage does, and it seems like a better idea, e.g.:

http://mxr.mozilla.org/mozilla-central/source/dom/tests/mochitest/localstorage/test_localStorageQuota.html?force=1#83

http://mxr.mozilla.org/mozilla-central/source/dom/tests/mochitest/localstorage/frameQuota.html?force=1#59
Attachment #446022 - Attachment is obsolete: true
http://hg.mozilla.org/labs/jetpack-sdk/rev/2b5f8ff3fddd
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Depends on: 567293
The Add-on SDK is no longer a Mozilla Labs experiment and has become a big enough project to warrant its own Bugzilla product, so the "Add-on SDK" product has been created for it, and I am moving its bugs to that product.

To filter bugmail related to this change, filter on the word "looptid".
Component: Jetpack SDK → General
Product: Mozilla Labs → Add-on SDK
QA Contact: jetpack-sdk → general
Version: Trunk → unspecified
You need to log in before you can comment on or make changes to this bug.