Closed Bug 1131362 Opened 5 years ago Closed 5 years ago

Create a reading list SQLite database and JSM to provide access to it

Categories

(Firefox Graveyard :: Reading List, defect, P1)

defect

Tracking

(firefox38 fixed, firefox39 fixed)

RESOLVED FIXED
Firefox 39
Tracking Status
firefox38 --- fixed
firefox39 --- fixed

People

(Reporter: adw, Assigned: adw)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 10 obsolete files)

+++ This bug was initially created as a clone of Bug #1127165 +++

Broken down from bug 1127165.  Create a reading list SQLite database and JSM to provide access to it.
Flags: qe-verify-
Flags: firefox-backlog+
Assignee: nobody → adw
Status: NEW → ASSIGNED
Iteration: --- → 38.3 - 23 Feb
Blocks: 1131416
Blocks: 1132074
Attached patch mozIStorageRow.getColumnName (obsolete) — Splinter Review
Marco, I'd really like a mozIStorageRow.getColumnName().  I'm kind of surprised it doesn't exist already.  It would let me do this:

  function itemFromRow(row) {
    let item = {};
    for (let i = 0; i < row.numEntries; i++) {
      let key = row.getColumnName(i);
      if (key != "id") {
        item[key] = row.getResultByIndex(i);
      }
    }
    return item;
  }

Would you r+ this patch if it had a test?  Please say yes! :-)
Attachment #8563737 - Flags: feedback?(mak77)
Comment on attachment 8563737 [details] [diff] [review]
mozIStorageRow.getColumnName

Review of attachment 8563737 [details] [diff] [review]:
-----------------------------------------------------------------

It's possible, we actually have getColumnName but it's only for mozIStorageStatement, so async stataments are missing that opportunity, and they can't grow one cause sqlite3_column_name() requires a prepared statement.

Note btw, if you know the column names (and you basically always do), you can use map and reduce to convert between row and objects, for example

Task.spawn(function* () {
  let db = yield PlacesUtils.promiseDBConnection();
  let cols = [ "id", "keyword" ];
  let rows = yield db.execute(`SELECT * FROM moz_keywords`);
  rows = rows.map(r => cols.reduce((prev, col) => { prev[col] = r.getResultByName(col); return prev}, {}));
  console.log(JSON.stringify(rows));
});

we do something similar in Bookmarks.jsm, with automatic conversion to the expected js data format for each column (to convert dates for example):
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/Bookmarks.jsm#1113

If bug 512761 would be fixed, you could also basically use the returned row... but that's another sad story...

So, if it's easy to use map/reduce I'd go that way. Otherwise adding the requested API should not be a big deal, BUT notice soon or later (when bug 512761 gets fixed) mozIStorageRow won't be the preferred js way to access a row, it may become [noscript] and your code, or any other code using this new suggested API, would have to be changed.
Attachment #8563737 - Flags: feedback?(mak77)
(In reply to Marco Bonardo [::mak] (needinfo? me) from comment #2)
> Note btw, if you know the column names (and you basically always do), you
> can use map and reduce to convert between row and objects, for example

Yes, but I don't want to have to repeat column names everywhere.  I want to have to write them only one place as a matter of fact, in the SQL that creates the table.  mozIStorageRow.getColumnName lets me do that.  I have seen some code that puts schemas in objects, like { column1: "TEXT", column2: "INTEGER" }, which could accomplish the same thing, but that's a little more complex than just writing the SQL directly.
(In reply to Drew Willcoxon :adw from comment #3)
> (In reply to Marco Bonardo [::mak] (needinfo? me) from comment #2)
> > Note btw, if you know the column names (and you basically always do), you
> > can use map and reduce to convert between row and objects, for example
> 
> Yes, but I don't want to have to repeat column names everywhere.

well, you could just define them once in a const, I assume your code will live in a module.
My only problem here is that you are adding new APIs for javascript to an interface that is not our target for js developers. And when you add APIs you know people is going to use them. That means adding more code and consumers that in future we'll break.
So, please evaluate well if the alternatives really suck for your use-case.
If they do, we'll add the API. In the end I figure we could actually never be able to remove the mozIStorageRow interface, since it likely has _MANY_ consumers in add-ons...
Attached patch WIP (obsolete) — Splinter Review
This kind of follows the API introduced in bug 1123517 but I didn't limit myself to it.

In particular, I made it a bit of a pain to load all items into memory at once.  That's a big difference from bug 1123517.  (Whether or not the backing store keeps all items in memory.  This SQLite implementation doesn't of course.)  Instead, there are forEach-type methods to enumerate over items one at a time.  There are get-type methods that yield an array of items, but you must specify a limit.  As a nod to practicality, for the tests at least, you can specify Infinity.

Query conditions/options: I kept it simple.  You pass one flat object in.  It can have the special properties "limit" (as mentioned), "sort", and "descending" (a boolean, to sort descending).  It can also have properties for each of the attributes of the article/item data type.  Each key-value pair is a predicate, so if you say `unread: true`, you'll get only unread items.

On the "one" and "many" in method names: I started with names like getItem for one and getItems for many.  Only differing in an S is pretty dangerous though I think.  "One" and "many" mean more typing, but you can't screw them up.  I'm not actually sure though how useful having both one- and many-type methods will be.

It still needs:

* A way to delete items.

* A way to mark items as read.

* A fuller test.

* The sync cached/API that we need to update the Mac menu with apparently.  (Do we have to wait until it's opened to update it?  That would be the other way to fix that: update it when the top N items change in the store.)

* To recognize when conflicts occur.  Mark, we talked briefly about the possibility of conflicts.  What's their exact nature?  They pivot around URLs, right?  The server will have foo.com with one GUID and one set of metadata, and then the client will have the same foo.com with a different GUID and set of metadata.  The conflict on the client (which is all we on desktop care about, right?) occurs when the syncer pulls items from the server and tries to put them in the local store.  Right?  What I'm thinking is, the item update method will recognize when a passed-in item has a URL that already exists in the store.  When that happens, it'll call a conflict-resolution callback that the consumer will have provided.  The callback would return the resolved item, which would be used for the update in the store.
Attachment #8566302 - Flags: feedback?(mhammond)
Comment on attachment 8566302 [details] [diff] [review]
WIP

Blair, please see the previous comment.  (Wish it were possible to flag two people when posting a patch!)
Attachment #8566302 - Flags: feedback?(bmcbride)
(In reply to Drew Willcoxon :adw from comment #6)
> (Do we have to wait until it's opened to update it?  That would be the other
> way to fix that: update it when the top N items change in the store.)

Probably harder to do that than the sync cache though.  At the least it reduces to keeping a cache of the top N items because you have to compare them to new and changed items.  So nevermind.
Comment on attachment 8566302 [details] [diff] [review]
WIP

Review of attachment 8566302 [details] [diff] [review]:
-----------------------------------------------------------------

Looks reasonable to me, but I guess you'll also need additional feedback on the storage/Sqlite changes

::: browser/modules/ReadingList.jsm
@@ +7,5 @@
> +this.EXPORTED_SYMBOLS = [
> +  "ReadingList",
> +];
> +
> +const SINGLETON_BASENAME = "reading-list.sqlite";

this const name doesn't seem ideal (but I can guess what it is from the value ;) It might as well be inline though IMO.

@@ +11,5 @@
> +const SINGLETON_BASENAME = "reading-list.sqlite";
> +
> +const { classes: Cc, interfaces: Ci, utils: Cu } = Components;
> +
> +Cu.import("resource://gre/modules/Task.jsm");

Any particular reason this isn't lazy-loaded like the others below?

@@ +35,5 @@
> +
> +SQLiteStore.prototype = {
> +
> +  getNumItems: Task.async(function* (opts={}) {
> +    opts = clone(opts);

this clone seems unnecessary (unless executeCached mutates it, which would suck)

@@ +38,5 @@
> +  getNumItems: Task.async(function* (opts={}) {
> +    opts = clone(opts);
> +    let count = 0;
> +    let conn = yield this._connectionPromise;
> +    yield conn.executeCached(`

template strings! Nice :)

@@ +52,5 @@
> +    }
> +    let conn = yield this._connectionPromise;
> +    yield conn.executeCached(`
> +      SELECT * FROM items ${sqlFromOptions(opts)};
> +    `, opts, row => callback(itemFromRow(row)));

it would be perfect if the callback could return a promise like OS.File's forEach does. It might be as simple as appending promises to an array and yield Promise.all() on it?

@@ +55,5 @@
> +      SELECT * FROM items ${sqlFromOptions(opts)};
> +    `, opts, row => callback(itemFromRow(row)));
> +  }),
> +
> +  getManyItems: Task.async(function* (opts={}) {

These "many" functions seem strange as public functions - I guess I was expecting just forEach and a sommewhat generic search function (eg, I can imagine the sync engine wanting to specify "modified after some date") - but I'll defer to Blair there.

@@ +56,5 @@
> +    `, opts, row => callback(itemFromRow(row)));
> +  }),
> +
> +  getManyItems: Task.async(function* (opts={}) {
> +    opts = clone(opts);

this clone seems unnecessary

@@ +77,5 @@
> +      guid: guids,
> +    }));
> +  }),
> +
> +  getManyItemsByGUID: Task.async(function* (guids) {

"many by guid" certainly seems wrong - I'd expect a guid to be unique (and the schema seems to enforce it)

@@ +99,5 @@
> +  getOneItemByURL: Task.async(function* (url) {
> +    return (yield this.getManyItemsByURL([url]))[0] || null;
> +  }),
> +
> +  updateManyItems: Task.async(function* (items) {

I think the "many" here is jarring too - just "updateItems()" (and updateOneItem's one-liner could possibly also be left to the callers, but I don't really care about that)

::: browser/modules/test/xpcshell/test_ReadingList.js
@@ +27,5 @@
> +  // Make Sqlite.jsm dump its logs to the terminal so we can see its SQL.
> +  conn._connectionData._log.level = Log.Level.All;
> +  let appender = new Log.DumpAppender();
> +  appender.level = Log.Level.All;
> +  Log.repository.rootLogger.addAppender(appender);

this looks a little wrong for this test. Note that new Appenders are already Level.All, so I'd thing something like:

conn._connectionData._log.level = Log.Level.All;
conn._connectionData._log.addAppender(new Log.DumpAppender())

is all you need (it looks like you might be seeing many more logs than you care about in this test)

@@ +72,5 @@
> +  let newTitle = "new title yo";
> +  newItems[0].title = newTitle;
> +  yield store.updateManyItems(newItems);
> +  items = yield store.getManyItems({
> +    limit: Infinity,

heh - such a simple way to get all items doesn't really qualify as "a bit of a pain to load all items into memory at once" ;)

::: toolkit/modules/Sqlite.jsm
@@ +769,5 @@
>      throw new Error("Sqlite.jsm has been shutdown. Cannot open connection to: " + options.path);
>    }
>  
>    // Retains absolute paths and normalizes relative as relative to profile.
> +  let path = OS.Constants.Path.profileDir ?

I'm a little confused why OS.Constants.Path.profileDir would be falsey and/or why this particular patch cares if it is!
Attachment #8566302 - Flags: feedback?(mhammond) → feedback+
(In reply to Mark Hammond [:markh] from comment #9)
> @@ +11,5 @@
> > +const SINGLETON_BASENAME = "reading-list.sqlite";
> > +
> > +const { classes: Cc, interfaces: Ci, utils: Cu } = Components;
> > +
> > +Cu.import("resource://gre/modules/Task.jsm");
> 
> Any particular reason this isn't lazy-loaded like the others below?

Task.async is used a bunch to define methods.

>> @@ +35,5 @@
> > +
> > +SQLiteStore.prototype = {
> > +
> > +  getNumItems: Task.async(function* (opts={}) {
> > +    opts = clone(opts);
> 
> this clone seems unnecessary (unless executeCached mutates it, which would
> suck)

I just clone them all to be consistent and defensive to future changes.  It's much easier and safer to adopt that policy than to think about every case.

> @@ +52,5 @@
> > +    }
> > +    let conn = yield this._connectionPromise;
> > +    yield conn.executeCached(`
> > +      SELECT * FROM items ${sqlFromOptions(opts)};
> > +    `, opts, row => callback(itemFromRow(row)));
> 
> it would be perfect if the callback could return a promise like OS.File's
> forEach does. It might be as simple as appending promises to an array and
> yield Promise.all() on it?

Oh, interesting, so the callback would not be called again until the promise that it previously returned was resolved?  Is that how it works?

I admit to not knowing much of the OS.File API, but OS.File.DirectoryIterator looks interesting.  Maybe my API should work more like it.  Instead of store.forEach, store.iterator, which has forEach, nextBatch, and next methods.

> @@ +55,5 @@
> > +      SELECT * FROM items ${sqlFromOptions(opts)};
> > +    `, opts, row => callback(itemFromRow(row)));
> > +  }),
> > +
> > +  getManyItems: Task.async(function* (opts={}) {
> 
> These "many" functions seem strange as public functions - I guess I was
> expecting just forEach and a sommewhat generic search function (eg, I can
> imagine the sync engine wanting to specify "modified after some date") - but
> I'll defer to Blair there.

getManyItems is the generic search function.  There's no way right now to do range queries like that though, but there should be.

> @@ +77,5 @@
> > +      guid: guids,
> > +    }));
> > +  }),
> > +
> > +  getManyItemsByGUID: Task.async(function* (guids) {
> 
> "many by guid" certainly seems wrong - I'd expect a guid to be unique (and
> the schema seems to enforce it)

You pass in an array of GUIDs, and it yields an array of items with those GUIDs.  Not sure whether that's useful in practice, but implementing it is no harder than implementing getOneItemByGUID (the latter calls the former in fact), which does seem useful.

> ::: browser/modules/test/xpcshell/test_ReadingList.js
> @@ +27,5 @@
> > +  // Make Sqlite.jsm dump its logs to the terminal so we can see its SQL.
> > +  conn._connectionData._log.level = Log.Level.All;
> > +  let appender = new Log.DumpAppender();
> > +  appender.level = Log.Level.All;
> > +  Log.repository.rootLogger.addAppender(appender);
> 
> this looks a little wrong for this test. Note that new Appenders are already
> Level.All, so I'd thing something like:
> 
> conn._connectionData._log.level = Log.Level.All;
> conn._connectionData._log.addAppender(new Log.DumpAppender())
> 
> is all you need (it looks like you might be seeing many more logs than you
> care about in this test)

Thanks, I'll try that.  I thought I did though and it didn't work.

> @@ +72,5 @@
> > +  let newTitle = "new title yo";
> > +  newItems[0].title = newTitle;
> > +  yield store.updateManyItems(newItems);
> > +  items = yield store.getManyItems({
> > +    limit: Infinity,
> 
> heh - such a simple way to get all items doesn't really qualify as "a bit of
> a pain to load all items into memory at once" ;)

Fair enough!  It's more like you have to think about what you're doing to do it.  I kind of think maybe there are no circumstances when you should want to load *everything*, so maybe I'll remove that.

> ::: toolkit/modules/Sqlite.jsm
> @@ +769,5 @@
> >      throw new Error("Sqlite.jsm has been shutdown. Cannot open connection to: " + options.path);
> >    }
> >  
> >    // Retains absolute paths and normalizes relative as relative to profile.
> > +  let path = OS.Constants.Path.profileDir ?
> 
> I'm a little confused why OS.Constants.Path.profileDir would be falsey
> and/or why this particular patch cares if it is!

It's undefined in xpcshell tests.  I could call do_get_profile in the test but it's totally unecessary to set up a profile directory just to load a DB that I need in the test.
Bug 512761 would be great and is clearly the best answer, but in the meantime I really would like mozIStorageRow.getColumnName.  I see that your Bugzilla status is currently sick :-(, but you're the best person to review this and if you are reading bugmail, please let me know if you'd like me to ask someone else.  I need to finish this bug by the end of the iteration.
Attachment #8563737 - Attachment is obsolete: true
Attachment #8566822 - Flags: review?(mak77)
Attached patch patch (obsolete) — Splinter Review
The API isn't perfect and we need to reconcile it with the syncer and front-end, but it's at least internally consistent I think.  I'd like to land it and iterate on it.  Unless it's like totally bad.

Except I'm not sure about URLs.  Mark, do you know if they're supposed to be unique?  Is that where conflicts between the server and client pivot (see last paragraph in comment 6)?  We should try to make the schema as right as we can before landing.

This assumes some of the directories created in the other bugs.  I need to update xpcshell.ini, moz.build, etc. before landing.  (I had been building this in browser/modules locally and running the test there.)
Attachment #8566302 - Attachment is obsolete: true
Attachment #8566302 - Flags: feedback?(bmcbride)
Attachment #8567407 - Flags: review?(mhammond)
(Sorry I couldn't get to this sooner, been battling a migraine)

(In reply to Drew Willcoxon :adw from comment #12)
> Except I'm not sure about URLs.  Mark, do you know if they're supposed to be
> unique?

Yes, URLs are meant to be unique.
(In reply to Drew Willcoxon :adw from comment #6)
> This kind of follows the API introduced in bug 1123517 but I didn't limit
> myself to it.
> 
> In particular, I made it a bit of a pain to load all items into memory at
> once.  That's a big difference from bug 1123517.

Good :) The mock prefs-based store in bug 1123517 was not built to be realistic - I just needed the stupidest thing I could do as quickly as possible, to build and test the UI parts and help flesh out some use-cases for an API.

> Query conditions/options: I kept it simple.  You pass one flat object in. 
> It can have the special properties "limit" (as mentioned), "sort", and
> "descending" (a boolean, to sort descending).  It can also have properties
> for each of the attributes of the article/item data type.  Each key-value
> pair is a predicate, so if you say `unread: true`, you'll get only unread
> items.

This is very nice :)

> * The sync cached/API that we need to update the Mac menu with apparently. 
> (Do we have to wait until it's opened to update it?  That would be the other
> way to fix that: update it when the top N items change in the store.)

Yes, that would also work.

Though I should note, the common use case will be to just fetch the first X items. So a cache will be of benefit anyway. A useful optimization around this would be to detect when it can fall-back to using the cache for forEachItem (ie, the query matches what is used to build the cache).
Comment on attachment 8567407 [details] [diff] [review]
patch

Review of attachment 8567407 [details] [diff] [review]:
-----------------------------------------------------------------

Looked at this mostly from the point of view of using it. General notes (which can be followups):
* This would really benefit from some comments/documentation
* In the sidebar for v1, we'll have a "Show more" button. eg, I'd want to fetch 5 items, sorted by date, starting from the 10th item. Does SQLite support "LIMIT 10,5" syntax?
* The ReadingList.Item wrapper is quite useful. It'd be nice if only one Item wrapper ever existed for a given item, and it were a live representation (ie, always represents what is in the DB). I think that means SQLiteStore itself would have to hand out these wrappers, and cache them (WeakMap, keyed on guid?).

::: toolkit/modules/Sqlite.jsm
@@ +771,5 @@
>  
>    // Retains absolute paths and normalizes relative as relative to profile.
> +  let path = OS.Constants.Path.profileDir ?
> +             OS.Path.join(OS.Constants.Path.profileDir, options.path) :
> +             options.path;

I wonder if this would be better served by allowing options.absPath to explicitly specify an absolute path, rather than this magic behaviour.
Attachment #8567407 - Flags: feedback+
Mass change of ReadingList bugs, moving to their own component. Filter bugspam on the following quote:

“Reading is to the mind what exercise is to the body.”
― Joseph Addison
Component: General → Reading List
> > Except I'm not sure about URLs.  Mark, do you know if they're supposed to be
> > unique?
> 
> Yes, URLs are meant to be unique.

More thoroughly: both 'url' and 'resolved_url' are expected to be independently unique. 'resolved_url' can be null.

Upon uploading to the server, if a uniqueness constraint is violated, the server will reply with a 303 See Other, and the client is expected to merge accordingly.

See <https://github.com/mozilla-services/readinglist/wiki/API-Design-proposal#user-content-conflicts>.

The API doc doesn't make it clear, and perhaps the server doesn't do this (it should), but if you POST a new record with the same URL as a record that's already deleted, it will create a new record (effectively un-deleting the old one with a new GUID and wiped fields).

We'll test this and fix the server if it doesn't work!
(In reply to Blair McBride [:Unfocused] (I don't read bugmail - needinfo? me!) from comment #15)

> * In the sidebar for v1, we'll have a "Show more" button. eg, I'd want to
> fetch 5 items, sorted by date, starting from the 10th item. Does SQLite
> support "LIMIT 10,5" syntax?

Yes, SQLite supports LIMIT and OFFSET. You can also use appropriate sorting and <=, which is more stable when insertions occur.

The server also supports paginated fetches.
Comment on attachment 8567407 [details] [diff] [review]
patch

Review of attachment 8567407 [details] [diff] [review]:
-----------------------------------------------------------------

I guess I've got no real problem with iterating on this to get it usable, although I do think we should get the identified use-cases (by Blair) handled now (ie, "get next n items" and a weak-map or similar to ensure items are live-updated).  I also think we should work to make sure we are quite confident the schema is fairly close - eg, I think all the unique constraints should be setup and tested, and all currently identified indexes added - it doesn't make sense to add a new migration function to the very next iteration.  Also, as Blair also mentioned, I think come comments for the public functions are needed.

TBH, I'm a bit reluctant to say these can all be in followups as they seem fundamental to the module being ready for use and they don't seem that painful to fix - if it's possible to be done and review re-requested tomorrow we can still make the iteration deadline :)

::: browser/components/readinglist/SQLiteStore.jsm
@@ +43,5 @@
> +    opts = clone(opts);
> +    let conn = yield this._connectionPromise;
> +    yield conn.executeCached(`
> +      SELECT * FROM items ${sqlFromOptions(opts)};
> +    `, opts, row => callback(itemFromRow(row)));

I still think we will want to allow the callback to return a promise (but I guess this can be a followup when it's first needed)

@@ +143,5 @@
> +    limit: 20,
> +  },
> +
> +  refreshCache: Task.async(function* (opts=this.cacheOptions) {
> +    this.cache = yield this.items(opts);

I don't understand how this cache is used.  If it is for a possible future, I think we should remove it for a followup (eg, we'd want to be confident that the additional IO being done to keep the cache current is a net-win - best I can tell it is currently a total loss as the cached data is not actually used)

(Blair and I had a quick chat in IRC, and while he has some ideas for how a cache could be useful, he agreed that having ".cache" as part of the API probably isn't ideal, so I'm still inclined to think we should only implement this once we have an identified use-case and via an API where the cache becomes simply an implementation detail and not part of the API)

@@ +163,5 @@
> +    yield conn.setSchemaVersion(this._schemaVersion);
> +  }),
> +
> +  _migrateSchema0To1: Task.async(function* (conn) {
> +    //XXX url UNIQUE?  If not, need an index on it.

Given rnewman's comments, we probably want unique constraints on url and resolvedURL?  (IIUC, unique and null work as we need with sqlite?)  I'm also expecting to see some indexes here (eg, the cache code seems to always sort by addedOn)

@@ +205,5 @@
> +  }
> +  return item;
> +}
> +
> +function timestamp() {

it seems a little overkill to have this function given it is used only once.

@@ +218,5 @@
> +  // are server timestamps:
> +  // https://github.com/mozilla-services/readinglist/wiki/API-Design-proposal#data-model
> +  let now = timestamp();
> +  if (!("addedOn" in item)) {
> +    item.addedOn = now;

As above, why not drop the timestamp method and call Date.now() directly here, with a comment the comment about the server?

@@ +234,5 @@
> +      sortDir = "DESC";
> +    }
> +    delete opts.descending;
> +  }
> +  if ("sort" in opts) {

This entire function is building an SQL string from externally supplied arguments, which sounds like a foot-gun.  While in theory we do trust the code supplying them, I'm not sure that is enough to mitigate the risks of building stuff this way.  It would be better if we can work out how to do this using params (it *looks* easy enough to do with positional args).

At the very least, if using params is problematic I think we need a second opinion on the safety here.

@@ +247,5 @@
> +    let where = ["WHERE"];
> +    for (let key of Object.keys(opts)) {
> +      if (Array.isArray(opts[key])) {
> +        // Assume opts[key] is an array of strings.
> +        let strs = opts[key].map(s => "'" + s.replace(/'/g, "''") + "'"); //"

this scares me a little - is there really no valid string that could defeat this? And why the strange trailing comment?

::: browser/components/readinglist/test/xpcshell/test_SQLiteStore.js
@@ +31,5 @@
> +  // Make Sqlite.jsm dump its logs to the terminal so we can see its SQL.
> +  conn._connectionData._log.level = Log.Level.All;
> +  let appender = new Log.DumpAppender();
> +  appender.level = Log.Level.All;
> +  Log.repository.rootLogger.addAppender(appender);

I really don't think we want a dumpappender on the root log - I mentioned that in my previous feedback - what problems did you have?

@@ +201,5 @@
> +    for (let prop in expectedItems[i]) {
> +      Assert.ok(prop in actualItems[i]);
> +      Assert.equal(actualItems[i][prop], expectedItems[i][prop]);
> +    }
> +  }

I think we need some tests for our constraints - eg, adding a duplicate GUID or URL

::: toolkit/modules/Sqlite.jsm
@@ +776,1 @@
>  

I'm still not convinced this is a worthwhile change for just this test and just to avoid a do_get_profile call, when no other xpcshell test using Sqlite.jsm felt it necessary.  I don't think this test is special enough to add these special cases (and similarly, I don't think we should use Blair's idea of allowing an options.absPath just for this test is worthwhile for the exact same reason)
(In reply to Mark Hammond [:markh] from comment #19)
> (Blair and I had a quick chat in IRC, and while he has some ideas for how a
> cache could be useful, he agreed that having ".cache" as part of the API
> probably isn't ideal, so I'm still inclined to think we should only
> implement this once we have an identified use-case and via an API where the
> cache becomes simply an implementation detail and not part of the API)

Heh, kinda. I think we could solve the bookmarks menu issue via either an sync API like .cache, or via a keeping that list continually updated (eg, by refreshing it when we get a notification that something change).

Though, thinking about that some more... if we were keeping a live list, the UI would essentially be re-implementing .cache anyway, but in a more complicated way. Which makes me think .cache is the right approach (albeit maybe with a different name - priorityItems or something?)
(In reply to Blair McBride [:Unfocused] (I don't read bugmail - needinfo? me!) from comment #20)
> Though, thinking about that some more... if we were keeping a live list, the
> UI would essentially be re-implementing .cache anyway, but in a more
> complicated way. Which makes me think .cache is the right approach (albeit
> maybe with a different name - priorityItems or something?)

I remain unconvinced :)  If this is purely for bookmarks, and only due to an issue on OSX, I still believe we would be better off with an observer that fires at relevant times and leave this "cache" to the external code in question.  It knows better what the "limit" should be and it knows the policy for ordering (eg, are we sure lastModified isn't the key we should be using?).  It also would own the policy of when to initialize the cache (which IIUC, isn't done here at all) - eg, I assume this patch would need to change so the cache is populated as the module is initialized, whereas the code working around the OSX issue might better decide to delay pre-populating the cache for a few seconds (or maybe to work-around the problem some other way, such as destroying-and-recreating the popup should it be opened before the cache is populated as a trade-off against the performance hit). And that code might decide to only perform this hackery on OSX, and only while there's no platform-level workaround.

This code's observer notification would only need to be a few lines long and IMO it would be a better separation of responsibilities, especially when it seems the module itself isn't able to leverage the cache itself and it also can't know what platforms or other considerations are necessary for the work-around to be made active.
(In reply to Mark Hammond [:markh] from comment #19)

> TBH, I'm a bit reluctant to say these can all be in followups as they seem
> fundamental to the module being ready for use and they don't seem that
> painful to fix - if it's possible to be done and review re-requested
> tomorrow we can still make the iteration deadline :)

Don't be a slave to nonsense arbitrary deadlines. Fight the power.

> Given rnewman's comments, we probably want unique constraints on url and
> resolvedURL?  (IIUC, unique and null work as we need with sqlite?)  I'm also
> expecting to see some indexes here (eg, the cache code seems to always sort
> by addedOn)

On Android these are not (yet) marked as UNIQUE. We have application-level logic to control uniqueness ("already exists?" is "matches either URL or RESOLVED_URL"), and moreover the server will return 303s whenever a dupe is uploaded.

> > +  // are server timestamps:
> > +  // https://github.com/mozilla-services/readinglist/wiki/API-Design-proposal#data-model
> > +  let now = timestamp();
> > +  if (!("addedOn" in item)) {
> > +    item.addedOn = now;

The only time it's acceptable to set addedOn is when this is the device adding the item. It's not the case that every row in the DB must have addedOn.

It looks like this method is used by updateItems (which is really UPDATE OR INSERT), so be careful here. The value -- including if it's null! -- shouldn't change after initial creation, with the possible exception of some 303 processing or downloaded changes.


> As above, why not drop the timestamp method and call Date.now() directly
> here, with a comment the comment about the server?

Usually the preferred way to do this is to make methods take a 'now' param:

  addSomething(foo, bar, now=Date.now()) {

-- the hook is to allow for tests and callers to control time, which makes the function purer and easier to test.


> This entire function is building an SQL string from externally supplied
> arguments, which sounds like a foot-gun.  While in theory we do trust the
> code supplying them, I'm not sure that is enough to mitigate the risks of
> building stuff this way.  It would be better if we can work out how to do
> this using params (it *looks* easy enough to do with positional args).

Either do it right:

https://android.googlesource.com/platform/frameworks/base/+/refs/heads/master/core/java/android/database/sqlite/SQLiteQueryBuilder.java

or do it with bind args, yeah.
(In reply to Mark Hammond [:markh] from comment #19)
> I'm also expecting to see some indexes here (eg, the cache code seems to
> always sort by addedOn)

You're right addedOn needs an index.  But for the others, UNIQUE automatically creates an index:

> In most cases, UNIQUE and PRIMARY KEY constraints are implemented
> by creating a unique index in the database. (The exceptions are
> INTEGER PRIMARY KEY and PRIMARY KEYs on WITHOUT ROWID tables.)
> Hence, the following schemas are logically equivalent:
> 
>     CREATE TABLE t1(a, b UNIQUE);
> 
>     CREATE TABLE t1(a, b PRIMARY KEY);
> 
>     CREATE TABLE t1(a, b);
>     CREATE UNIQUE INDEX t1b ON t1(b);

http://sqlite.org/lang_createtable.html

> @@ +234,5 @@
> > +      sortDir = "DESC";
> > +    }
> > +    delete opts.descending;
> > +  }
> > +  if ("sort" in opts) {
> 
> This entire function is building an SQL string from externally supplied
> arguments, which sounds like a foot-gun.  While in theory we do trust the
> code supplying them, I'm not sure that is enough to mitigate the risks of
> building stuff this way.  It would be better if we can work out how to do
> this using params (it *looks* easy enough to do with positional args).
> 
> At the very least, if using params is problematic I think we need a second
> opinion on the safety here.

It is using params actually -- this builds up a string of params, not literals.  e.g., it makes strings like "WHERE guid = :guid", not "WHERE guid = 'some-literal-guid'".  The one exception is for "IN" queries like "WHERE guid IN (<some guid list>)" -- the isArray thing below -- since I don't think it's possible to use params in that case.

> @@ +247,5 @@
> > +    let where = ["WHERE"];
> > +    for (let key of Object.keys(opts)) {
> > +      if (Array.isArray(opts[key])) {
> > +        // Assume opts[key] is an array of strings.
> > +        let strs = opts[key].map(s => "'" + s.replace(/'/g, "''") + "'"); //"
> 
> this scares me a little - is there really no valid string that could defeat
> this?

Yeah, GUIDs are the only strings this is used for, but you're right it's bad.  I'll try to find something else.  The point of this is to support looking up multiple items by GUID at once, but I'm not sure we even need that.

> And why the strange trailing comment?

To calm my syntax highlighter. :-/

> ::: browser/components/readinglist/test/xpcshell/test_SQLiteStore.js
> @@ +31,5 @@
> > +  // Make Sqlite.jsm dump its logs to the terminal so we can see its SQL.
> > +  conn._connectionData._log.level = Log.Level.All;
> > +  let appender = new Log.DumpAppender();
> > +  appender.level = Log.Level.All;
> > +  Log.repository.rootLogger.addAppender(appender);
> 
> I really don't think we want a dumpappender on the root log - I mentioned
> that in my previous feedback - what problems did you have?

It just didn't work, nothing was logged.  No errors either, silent.  I don't really have time to wade through abstraction after abstraction trying to figure out why it didn't work so I went with something that did.  I'll just remove it from the final version.

> ::: toolkit/modules/Sqlite.jsm
> @@ +776,1 @@
> >  
> 
> I'm still not convinced this is a worthwhile change for just this test and
> just to avoid a do_get_profile call, when no other xpcshell test using
> Sqlite.jsm felt it necessary.  I don't think this test is special enough to
> add these special cases (and similarly, I don't think we should use Blair's
> idea of allowing an options.absPath just for this test is worthwhile for the
> exact same reason)

I don't get it.  If you pass in an absolute path *and* there is a profile directory, OS.Path.join(OS.Constants.Path.profileDir, options.path) returns the absolute path you passed in, so you get a SQLite DB at that path.  If you pass in an absolute path but there is no profile directory, OS.Path.join throws an error, so you get nothing.  Why wouldn't we want to fix that by returning the same absolute path?  It's not magic.

About the other tests creating a profile directory just to load test files -- they do and that's just dumb.
(In reply to Richard Newman [:rnewman] from comment #22)
> Don't be a slave to nonsense arbitrary deadlines. Fight the power.

Uh, easy for you to say. :-(
Attached patch strawman 2 (obsolete) — Splinter Review
Please see if this looks any better.

* Added comments.

* Removed the addedOn default value.  Let the syncer or front-end set it, and let storage be dumb.

* ReadingListItem with live update.  Unfortunately items have to be manually "disconnected" from their backing store when you're done with them.  I can't see any way to use WeakMap or WeakSet to make this nicer.  It's WeakMaps *keys* that are weak, not its values. :-(

* Added ReadingListIterator.  An easy way, hopefully, to pick up where you left off.  Looking at the test should give you some idea about how to use it if it's not clear.

* Removed a bunch of methods, made everything ReadingListItem-based.

* Removed the cache.
Attachment #8567407 - Attachment is obsolete: true
Attachment #8567407 - Flags: review?(mhammond)
Attachment #8568323 - Flags: feedback?(mhammond)
Comment on attachment 8568323 [details] [diff] [review]
strawman 2

Please see the previous comment.
Attachment #8568323 - Flags: feedback?(bmcbride)
Comment on attachment 8568323 [details] [diff] [review]
strawman 2

Review of attachment 8568323 [details] [diff] [review]:
-----------------------------------------------------------------

A quick look, but I think this looks good.  I'm not quite convinced of the usefulness of "invalidating iterators" - *any* store operation seems to make all iterators invalid, which basically means it's *never* safe to hang on to them without handling the invalidation exception, and thus they need a fallback to work out how to fetch another chunk - and if they need that fallback they may as well just use that fallback all the time.

::: browser/components/readinglist/ReadingList.jsm
@@ +166,5 @@
> +   *        enumerate them all.
> +   * @return Promise<null> Resolved when the enumeration completes *and* the
> +   *         last promise returned by the callback is resolved.
> +   */
> +  forEach: Task.async(function* (callback, count=-1) {

It seems the iterator could be made invalid while it is iterating, so maybe there should also be an _ensureValid() call in the loop?  I guess it depends on what the implications are of the iterator going bad - but ISTM it can't be "less bad" for it to go invalid during the loop than before the next loop.

@@ +190,5 @@
> +    let opts = clone(this.options);
> +    opts.offset = this.index;
> +    opts.limit = count;
> +    let items = yield this.store.items(opts);
> +    this.index += items.length;

and I guess the same issue applies here - if the iterator became invalid while .items() is resolving we probably want to throw instead of returning items?

::: browser/components/readinglist/SQLiteStore.jsm
@@ +97,5 @@
> +    yield conn.executeCached(`
> +      SELECT * FROM items ${sqlFromOptions(opts)};
> +    `, opts, row => {
> +      promiseChain.then(() => {
> +        let promise = callback(this._newItem(row));

So I think this wants to change to something like _getOrCreateItem(row), which either finds an existing item in the map or creates a new one, then updates the item to reflect the current row value.

Another nice thing we could consider in a followup is to allow "event handlers" on an item, so a consumer could do something like:

item.on("deleted", () => { do_something_to_reflect_deletion() });
item.on("updated", () => { do_something_to_reflect_update() });

@@ +110,5 @@
> +  /**
> +   * Adds an item to the store if it's new, and otherwise updates its
> +   * properties in the store.  The item may have as few or many properties that
> +   * the caller wants to set; only the properties that are present are updated.
> +   * The item must have a GUID, however. //XXXadw or URL?

I'd think guid is fine, and some "resolution" process can mark one or another as deleted - which in theory should only happen after a sync.  OTOH, I guess multiple maps (eg, one mapping URL) might also make sense so the client could do some sanity checking and not allow a new item to be created when a matching URL already exists.  Or something :)

@@ +119,5 @@
> +  updateItem: Task.async(function* (item) {
> +    //XXXadw
> +    // (1) INSERT the item if it's new or UPDATE it if it's not.
> +    // (2) Find all items in the this._itemsByGUID map and update their
> +    //     properties.

I think that if we make the items a singleton we could skip (2) and don't need to do (4) as the item must already be connected.

@@ +158,5 @@
> +   * again.
> +   */
> +  destroy: Task.async(function* (guid) {
> +    let conn = yield this._connectionPromise;
> +    yield conn.close();

Maybe set this._connectionPromise = Promise.reject(...);?

@@ +170,5 @@
> +   */
> +  _newItem(row) {
> +    let item = new ReadingList.Item(propertiesFromRow(row));
> +    item.store = this;
> +    this._itemsByGUID.add(item);

I think you want .set(item.guid, item) here, right?  Also, it appears you are setting this up for an item to be a singleton (ie, there's only ever one item alive with a particular guid), which I think makes sense but (a) means an item.close() isn't going to work correctly and (b) I think the caller of _newItem will need to change to find an existing item and update/return that.

As discussed briefly on IRC, I think (a) can be fixed by using Ci.getWeakReference() and for (b) see below.

@@ +267,5 @@
> +    let where = ["WHERE"];
> +    for (let key of Object.keys(opts)) {
> +      if (Array.isArray(opts[key])) {
> +        // Assume opts[key] is an array of strings.
> +        let strs = opts[key].map(s => "'" + s.replace(/'/g, "''") + "'"); //"

same comment as last time here :)

::: browser/components/readinglist/test/xpcshell/test_SQLiteStore.js
@@ +36,5 @@
> +  Log.repository.rootLogger.addAppender(appender);
> +
> +  gItems = [];
> +  for (let i = 0; i < 3; i++) {
> +    gItems.push(new ReadingList.Item({

If we go the singleton route, then we won't want to expose ReadingList.Item() in that way - we'd need a "factory" type function that takes a URL (and/or a GUID I guess) and returns a the Item() object.  A consumer of this must know the URL of the item they are going to create.

::: toolkit/modules/Sqlite.jsm
@@ +771,5 @@
>  
>    // Retains absolute paths and normalizes relative as relative to profile.
> +  let path = OS.Constants.Path.profileDir ?
> +             OS.Path.join(OS.Constants.Path.profileDir, options.path) :
> +             options.path;

If we insist this method must handle no profile dir just for this one test, I think we need to do what Unfocused suggested and use something like options.absPath, or insist that options.path is already absolute - we don't want someone to pass a relative path and have the actual location be indeterminate.
Attachment #8568323 - Flags: feedback?(mhammond) → feedback+
Iteration: 38.3 - 23 Feb → 39.1 - 9 Mar
(In reply to Mark Hammond [:markh] from comment #27)
> A quick look, but I think this looks good.  I'm not quite convinced of the
> usefulness of "invalidating iterators" - *any* store operation seems to make
> all iterators invalid

Only the mutating operations.

An iterator can't become invalid while it's querying the DB.  SQLite serializes queries so no two queries will run at the same time.  It won't be the case that while an iterator's SELECT query is in process some other DELETE or INSERT or UPDATE could run at the same time.  That's why an iterator can only become invalid in between calls you make to it.

However, the thing where the callback can return a promise to block the callback from being called again -- that complicates the picture because now the SELECT statement might be finished but your callback hasn't been called for all the rows yet, and the promise of the iterator method you called hasn't been resolved yet.  I think the caller is on the hook for that, though.  It also makes me wonder whether that whole idea is not so great.

As for what to do about it when an iterator becomes invalid, well, I don't think the storage can say.  What I was picturing was that the caller would check in between calls, either using the `invalid` property or by try-catching.  If the iterator is invalid, then the caller starts over whatever is was doing from the beginning.

> So I think this wants to change to something like _getOrCreateItem(row),
> which either finds an existing item in the map or creates a new one, then
> updates the item to reflect the current row value.

Yeah, that makes sense.

> Another nice thing we could consider in a followup is to allow "event
> handlers" on an item, so a consumer could do something like:
> 
> item.on("deleted", () => { do_something_to_reflect_deletion() });
> item.on("updated", () => { do_something_to_reflect_update() });

Yeah, I was wondering about that too -- see one of the XXX comments.

> @@ +158,5 @@
> > +   * again.
> > +   */
> > +  destroy: Task.async(function* (guid) {
> > +    let conn = yield this._connectionPromise;
> > +    yield conn.close();
> 
> Maybe set this._connectionPromise = Promise.reject(...);?

Good idea.

> As discussed briefly on IRC, I think (a) can be fixed by using
> Ci.getWeakReference()

I missed that conversation -- I forgot about Cu.getWeakReference!  That might work, let me try it.

> ::: browser/components/readinglist/test/xpcshell/test_SQLiteStore.js
> @@ +36,5 @@
> > +  Log.repository.rootLogger.addAppender(appender);
> > +
> > +  gItems = [];
> > +  for (let i = 0; i < 3; i++) {
> > +    gItems.push(new ReadingList.Item({
> 
> If we go the singleton route, then we won't want to expose
> ReadingList.Item() in that way - we'd need a "factory" type function that
> takes a URL (and/or a GUID I guess) and returns a the Item() object.  A
> consumer of this must know the URL of the item they are going to create.

I was thinking that when the syncer pulls from the server, it would create items using the Item constructor that are disconnected from the store.  Then it would call the update method to update the store with the items, which would connect the items.  The server may have items that are not present in the store, so it doesn't make sense for the syncer to go ask the store to vend items that it may not have.

I'm still fuzzy about how the interface between the syncer and store though.

> ::: toolkit/modules/Sqlite.jsm
> @@ +771,5 @@
> >  
> >    // Retains absolute paths and normalizes relative as relative to profile.
> > +  let path = OS.Constants.Path.profileDir ?
> > +             OS.Path.join(OS.Constants.Path.profileDir, options.path) :
> > +             options.path;
> 
> If we insist this method must handle no profile dir just for this one test,
> I think we need to do what Unfocused suggested and use something like
> options.absPath, or insist that options.path is already absolute - we don't
> want someone to pass a relative path and have the actual location be
> indeterminate.

Sigh, it's not just for this one test.  It's a bug that this Sqlite.jsm method blows up when the profile directory doesn't exist.  Indeed it doesn't always exist, so do something reasonable when it doesn't.

I actually don't think this method should be assuming the caller wants to use the profile directory at all.  It should take an absolute path, period.  Is that what you're suggesting?
The syncer might work like this:

let itemsFromServer = <array of simple JS objects>;
let itemsFromServerByGUID = itemsFromServer.reduce((map, item) => {
  map.set(i.guid, i);
  return map;
}, new Map());
let guidsFromServer = [for (guid in itemsFromServerByGUID.keys()) guid];

// Reconcile server items that already exist -- by GUID -- in the store.
let guidsInStore = new Set();
store.forEachItem(itemFromStore => {
  guidsInStore.add(itemFromStore.guid);
  let itemFromServer = itemsFromServerByGUID.get(itemFromStore.guid);
  // Update itemFromServer by setting its properties.
}, { guid: guidsFromServer });

// Reconcile server items that already exist -- by URL -- in the store.
let newItems = itemsFromServer.filter(item => !guidsInStore.has(item.guid));
let newItemURLs = newItems.map(i => i.url);
store.forEachItem(itemFromStore => {
  guidsInStore.add(itemFromStore.guid);
  let itemFromServer = itemsFromServerByGUID.get(itemFromStore.guid);
  // something something something
}, { url: newItemURLs });

// Finally, add new items to the store -- items whose GUIDs and URLs aren't
// already in the store.
newItems = itemsFromServer.filter(item => !guidsInStore.has(item.guid));
store.addItems(newItems);

-------------

The store is dumb and all the reconciliation smarts are in the syncer, using simple methods on the store.  Instead of an update method that both updates and adds items, the store has one method for each.
Comment on attachment 8566822 [details] [diff] [review]
mozIStorageRow.getColumnName with test

Think I'll need the column names outside of a SQL context after all.  Thanks for considering it anyway Marco!
Attachment #8566822 - Attachment is obsolete: true
Attachment #8566822 - Flags: review?(mak77)
(In reply to Drew Willcoxon :adw from comment #28)
> (In reply to Mark Hammond [:markh] from comment #27)
> > A quick look, but I think this looks good.  I'm not quite convinced of the
> > usefulness of "invalidating iterators" - *any* store operation seems to make
> > all iterators invalid
> 
> Only the mutating operations.

Yeah, that's what I meant.  Eg, a sync seems likely to invalidate all iterators.  My point is that the calling code becomes more complex as it needs to handle the 2 cases of "the previous iterator is OK" and "the previous iterator must be thrown away and use a fallback". I'm wondering if we might as well force them to always use that fallback.

> As for what to do about it when an iterator becomes invalid, well, I don't
> think the storage can say.  What I was picturing was that the caller would
> check in between calls, either using the `invalid` property or by
> try-catching.  If the iterator is invalid, then the caller starts over
> whatever is was doing from the beginning.

I'm not sure it will just start again - eg, one use-case Blair mentioned was a "fetch 5 more" button.  I'm assuming the intent was the code could hold the iterator and just grab the next 5 there - unless the iterator is invalid, so it needs to re-create the query and skip some to get back to the same point (which isn't "start over", it's explicitly "do something different").  It is exactly this scenario that I believe becomes complex - 2 ways to "get next 5" and it being completely unpredictable which one will actually be used.  There might as well only be one path here that is always used.

> > ::: toolkit/modules/Sqlite.jsm
> > @@ +771,5 @@
> > >  
> > >    // Retains absolute paths and normalizes relative as relative to profile.
> > > +  let path = OS.Constants.Path.profileDir ?
> > > +             OS.Path.join(OS.Constants.Path.profileDir, options.path) :
> > > +             options.path;
> > 
> > If we insist this method must handle no profile dir just for this one test,
> > I think we need to do what Unfocused suggested and use something like
> > options.absPath, or insist that options.path is already absolute - we don't
> > want someone to pass a relative path and have the actual location be
> > indeterminate.
> 
> Sigh, it's not just for this one test.  It's a bug that this Sqlite.jsm
> method blows up when the profile directory doesn't exist.

So you say - is there a bug on file?  I haven't seen it come up before and other test authors coped.  Are you simply trying to save releng resources?  I'd argue it's a bug that it allows an absolute path at all (ie, what's the use-case for opening sqlite files outside the profile directory?  I'd guess an edge-case such as profile migration etc, but that's it).

> Indeed it doesn't always exist, so do something reasonable when it doesn't.

What's the use-case for this?  I don't think xpcshell tests qualify - for all intents and purposes, in all actual products that use this module, that dir *must* exist.

> I actually don't think this method should be assuming the caller wants to
> use the profile directory at all.  It should take an absolute path, period. 
> Is that what you're suggesting?

That would be preferable to allowing it to silently reference an indeterminate directory, but IMO it would be preferable for it to *insist* it be in the profile directory and not support abspaths at all, but as above, I could imagine it might break profile migration or other edge-cases where there really is a requirement to reach outside the current profile dir.
Okie dokie, white flag, do_get_profile() it is.

(In reply to Mark Hammond [:markh] from comment #31)
> I'm not sure it will just start again - eg, one use-case Blair mentioned was
> a "fetch 5 more" button.  I'm assuming the intent was the code could hold
> the iterator and just grab the next 5 there - unless the iterator is
> invalid, so it needs to re-create the query and skip some to get back to the
> same point (which isn't "start over", it's explicitly "do something
> different").  It is exactly this scenario that I believe becomes complex - 2
> ways to "get next 5" and it being completely unpredictable which one will
> actually be used.  There might as well only be one path here that is always
> used.

That's the case I was thinking of.  The consumer wouldn't skip back to some point, it would start over.  As in, recreate whatever view it's showing.
I think we may need some kind of notification API where the store can tell observers that its changed -- a single item or many items, and consumers could update themselves accordingly.
(In reply to Drew Willcoxon :adw from comment #33)
> I think we may need some kind of notification API where the store can tell
> observers that its changed -- a single item or many items, and consumers
> could update themselves accordingly.

Yeah, I agree - and wonder if we can just rely on this instead of invalidating iterators, and make the behaviour undefined if they continue to use the iterator?

Eg, what harm would we expect in that "fetch 5 more" use-case if a write happened?  If the data is simply going to be stale - but just as stale as if the write happened immediately *after* the iteration - then I doubt it would be that bad.  If the code really insists on having up-to-date data, it could listen for these notifications, decide if it is impacted and drop the iterators itself.

But maybe I'm underestimating the problems in such cases?
You can end up with duplicate entries in the UI, or missed entries, for example.

Say you've already updated your UI with the first 5 items yielded by the iterator, and you keep the iterator around so you can get the next 5.  Meanwhile, someone deleted the first item.  So when you get your next 5, you're starting at index 5, which used to be index 6, so your updated UI will be missing one item.  Or if someone added a new first item instead, index 4 is now at index 5 so your UI will have a duplicate entry.

Maybe in those cases the iterator could be smart enough to adjust its index.  Your existing UI is out-of-date, so what.  But what if someone changes the properties of a bunch of items that the iterator is sorting on?

Maybe you could argue that we should try to restrict the cases where an iterator is invalidated.  An UPDATE mutation shouldn't affect an iterator -- but only if the update didn't change a property that it's sorting on.  Or an insert "after" the iterator shouldn't invalidate it.  But I think implementing that would be kind of complex and probably unnecessary if we had a mutation notification mechanism.
(In reply to Drew Willcoxon :adw from comment #35)
> You can end up with duplicate entries in the UI, or missed entries, for
> example.

Yeah, fair enough. It sounds to me like we simply don't support "long lived" iterators - ie, iterators can't be used to solve this use-case.

OTOH though, it sounds like the same underlying problem could happen even with short-lived iterators - being promise-based, there's no reason why the same problems can't happen when doing the initial iteration, so even those first 5 items could end up with duplicated items, for example.

IOW, it sounds a little like these iterators are always going to be a foot-gun unless they somehow "snapshot".
(In reply to Mark Hammond [:markh] from comment #36)
> OTOH though, it sounds like the same underlying problem could happen even
> with short-lived iterators - being promise-based, there's no reason why the
> same problems can't happen when doing the initial iteration, so even those
> first 5 items could end up with duplicated items, for example.

No, the storage fetch kicked off by a single call to an iterator method is "atomic," even though there may be lots of time in between the calls to your callback because your callback is returning promises.  So the set of items passed to your callback will be a snapshot.  They may be out of date by the time some of the calls to your callback happen, but that's your problem.  You're the one returning promises. :-)
(still not sure the idea of allowing the callback to return promises is good)
(In reply to Drew Willcoxon :adw from comment #38)
> (still not sure the idea of allowing the callback to return promises is good)

In that case we should avoid that in the first iteration - it's easier to add if the something turns out to really want it than to remove it later :)
this is a generic needinfo to take a look at this, not blocking you, if I have comments I can bring them up later.

since I saw some discussion related to LIMIT and OFFSET, I would like to point out something (maybe not so much) obvious. It sucks from a performance point of view.

Due to how a b-tree works, when you offset Sqlite won't skip N entries, it will instad FETCH them. Basically if you offset 10 and limit 10, it will FETCH 20 entries, and then, later, discard the first 10. This is common to many db engines.

That means, when performances matter, it's better to fetch once and then offset in the results in a memory cache, than to offset directly in the SELECT (since then you would fetch the same items multiple times).
Flags: needinfo?(mak77)
(In reply to Drew Willcoxon :adw from comment #7)
> (Wish it were possible to flag two people when posting a patch!)

You can comma separate multiple usernames in the feedback?/review? requestee fields.
Attached patch new patch (obsolete) — Splinter Review
Attachment #8568323 - Attachment is obsolete: true
Attachment #8568323 - Flags: feedback?(bmcbride)
Attachment #8569592 - Flags: review?(mhammond)
Comment on attachment 8569592 [details] [diff] [review]
new patch

Review of attachment 8569592 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/readinglist/SQLiteStore.jsm
@@ +28,5 @@
> +this.SQLiteStore = function SQLiteStore(pathRelativeToProfileDir) {
> +  this.pathRelativeToProfileDir = pathRelativeToProfileDir;
> +  this._connectionPromise = Task.spawn(function* () {
> +    let conn = yield Sqlite.openConnection({
> +      path: pathRelativeToProfileDir,

please pass sharedMemoryCache: false option, it's usually slightly faster, especially if in future you add a concurrent connection (I said many times the default should be this one in Sqlite.jsm, but nobody listened to me)

note you are not picking a journal mode, so you are defaulting to TRUNCATE+normal, you might want to change to WAL+normal (Sqlite.jsm doesn't make a choice for you). If you do so, I suggest setting also PRAGMA journal_size_limit.

you might also want to try PRAGMA locking_mode = EXCLUSIVE, for additional perf wins. It should work fine for your use-case.

@@ +146,5 @@
> +
> +  _migrateSchema0To1: Task.async(function* (conn) {
> +    yield conn.execute(`
> +      CREATE TABLE items (
> +        id INTEGER PRIMARY KEY AUTOINCREMENT,

do you actually use the id, or plan to use it in future (for a relation table, for example)? Cause otherwise you might want to make guid the PRIMARY KEY and use WITHOUT ROWID. Note the benefit is low, you just save some space not storing ids, for small tables it's likely ignorable.

@@ +151,5 @@
> +        guid TEXT UNIQUE,
> +        lastModified INTEGER,
> +        url TEXT UNIQUE,
> +        title TEXT,
> +        resolvedURL TEXT UNIQUE,

url and resolvedURL indices here (due to unique) will take up quite some space.
Provided we expect fewer than 5000 urls it might not be a big deal, otherwise using a unique hash for equality checks would be better.

@@ +164,5 @@
> +        addedOn INTEGER,
> +        storedOn INTEGER,
> +        markedReadBy TEXT,
> +        markedReadOn INTEGER,
> +        readPosition INTEGER

many of these would benefit of a NOT NULL (guid, lastModified, url, resolvedURL, status, unread, probably others) and some of a DEFAULT, so you can avoid inserting values when they are the common value.
constraints are usually good, since they prevent developer mistakes, so better not avoiding them.

@@ +236,5 @@
> +    let whereTerms = [];
> +    for (let key of Object.keys(opts)) {
> +      if (Array.isArray(opts[key])) {
> +        // Convert arrays to IN expressions.  e.g., { guid: ['a', 'b', 'c'] }
> +        // becomes "guid IN (SELECT :guid_0, :guid_1, :guid_2)".  The guid_i

for a simple IN (list_of_values) you don't need a SELECT
Flags: needinfo?(mak77)
Thanks Marco!

(In reply to Marco Bonardo [::mak] -- catching up, delayed answers -- from comment #43)
> please pass sharedMemoryCache: false option

Will do.

> note you are not picking a journal mode, so you are defaulting to
> TRUNCATE+normal, you might want to change to WAL+normal (Sqlite.jsm doesn't
> make a choice for you). If you do so, I suggest setting also PRAGMA
> journal_size_limit.

Do you have an idea of what journal_size_limit should be?  I see places/Database sets it to 3*512 KiB.

> you might also want to try PRAGMA locking_mode = EXCLUSIVE, for additional
> perf wins. It should work fine for your use-case.

OK.

> > +  _migrateSchema0To1: Task.async(function* (conn) {
> > +    yield conn.execute(`
> > +      CREATE TABLE items (
> > +        id INTEGER PRIMARY KEY AUTOINCREMENT,
> 
> do you actually use the id, or plan to use it in future (for a relation
> table, for example)? Cause otherwise you might want to make guid the PRIMARY
> KEY and use WITHOUT ROWID. Note the benefit is low, you just save some space
> not storing ids, for small tables it's likely ignorable.

Yeah, I definitely thought about making guid the primary key.  I don't use the id, and I don't have plans to have any other tables right now, but who knows?

In the breakdown bug you said, "think thrice before adding an index on a TEXT field," which is why I kept id.  But I forgot you followed that up with "(unless... the text is expected to be less than 50 chars)", which guid is.  So I should probably make guid the primary key.  (I applied this advice inconsistently because when it came to URLs -- below -- I ignored it... :-/)

> @@ +151,5 @@
> > +        guid TEXT UNIQUE,
> > +        lastModified INTEGER,
> > +        url TEXT UNIQUE,
> > +        title TEXT,
> > +        resolvedURL TEXT UNIQUE,
> 
> url and resolvedURL indices here (due to unique) will take up quite some
> space.
> Provided we expect fewer than 5000 urls it might not be a big deal,
> otherwise using a unique hash for equality checks would be better.

Well, not constraining ourselves to a small number of items was central to my reasoning for choosing SQLite.  Could you describe what you mean by using a hash?  Adding another column that stores the hash of the URL with a UNIQUE index on it instead?  Any recommendations for the hash function I should use?  Should I hash in JS and INSERT the hashed value, or write a SQL function and do it in SQL, or does it not matter much?

> @@ +164,5 @@
> > +        addedOn INTEGER,
> > +        storedOn INTEGER,
> > +        markedReadBy TEXT,
> > +        markedReadOn INTEGER,
> > +        readPosition INTEGER
> 
> many of these would benefit of a NOT NULL (guid, lastModified, url,
> resolvedURL, status, unread, probably others) and some of a DEFAULT, so you
> can avoid inserting values when they are the common value.
> constraints are usually good, since they prevent developer mistakes, so
> better not avoiding them.

Hmm, I'm not sure about this.  NOT NULL on the required columns definitely makes sense: at least guid and url.  I should do that.  Not sure which others are actually required, and which others still could have null values that are interpreted by the JS as some other value -- e.g., a null unread is falsey, a null status is 0.

But I should probably play it safe -- NOT NULL and defaults shouldn't hurt -- and just do what the Android schema does: http://mxr.mozilla.org/mozilla-central/source/mobile/android/base/db/BrowserDatabaseHelper.java#321

> > +        // Convert arrays to IN expressions.  e.g., { guid: ['a', 'b', 'c'] }
> > +        // becomes "guid IN (SELECT :guid_0, :guid_1, :guid_2)".  The guid_i
> 
> for a simple IN (list_of_values) you don't need a SELECT

Oh, so "guid IN (:guid_0, :guid_1, :guid_2)" is valid SQL?  It's been a while since I've written any...  I'll check and fix it.
Flags: needinfo?(mak77)
I think I'm going to want to hold off landing this until I see how the schema may need to be adjusted in the face of syncing (bug 1131416).  Mobile's schema has fields related to syncing.  But there's still plenty to review, so leaving the r?.
Marco, WITHOUT ROWID sounded good until I read this part in http://sqlite.org/withoutrowid.html:

> WITHOUT ROWID tables work best when individual rows are not too large. A good
> rule-of-thumb is that the average size of a single row in a WITHOUT ROWID
> table should be less than about 1/20th the size of a database page. That
> means that rows should not contain more than about 50 bytes each for a 1KiB
> page size or about 200 bytes each for 4KiB page size. WITHOUT ROWID tables
> will work (in the sense that they get the correct answer) for arbitrarily
> large rows - up to 2GB in size - but traditional rowid tables tend to work
> faster for large row sizes.

It looks like our default page size is 32768 bytes:

http://mxr.mozilla.org/mozilla-central/search?string=PREF_TS_PAGESIZE_DEFAULT
http://sqlite.org/pragma.html#pragma_page_size

So that's 1638 bytes per row.  I'm storing 200-word excerpts in each row, so that's not great.  Is it worth trading off some performance for some space savings?  Is it worth storing excerpts in a separate table?  Keep in mind that the UI calls for displaying excerpts alongside each item in some cases.
Comment on attachment 8569592 [details] [diff] [review]
new patch

Review of attachment 8569592 [details] [diff] [review]:
-----------------------------------------------------------------

Drew mentioned in IRC he is making a new version with Mak's changes and also to support better querying of objects, but I have no problems with this beyond what I've written below (and which I think should be simple to do)

::: browser/components/readinglist/ReadingList.jsm
@@ +20,5 @@
> + * A reading list contains ReadingListItems.
> + *
> + * A list maintains only one copy of an item per GUID.  So if for example you
> + * use an iterator to get two references to items with the same GUID, your
> + * references actually refer to the same JS object.  Another way to look at this

A nit, but think the "Another way too look at this..." sentence could just say "all items with the same GUID are shared".  (eg, imagine if you didn't share the items, but kept an array of items in the map, and after a commit() etc, you updated each distinct object in that array (ie, with that guid.)  That could still quality as "live updated" but isn't what we are describing here.)

@@ +32,5 @@
> + * whose names are drawn from ITEM_PROPERTY_NAMES.  These properties constrain
> + * the acted on items to those with the same properties.  If an options object
> + * has a single property, then all items that are acted on by the method must
> + * have the same property.  If an options object has many properties, then they
> + * are logically OR'ed together, so items must match any one of the properties.

hmm - I haven't got to the code yet, but this sounds a little wrong.  If I'm enumerating items and pass an object with {unread: true, favorite: true}, I'd expect to get all unread favorites, not all items that are *either* unread or favorites.

Your "For example" a couple of lines down should probably include multiple fields to make this clear.

But as we discussed on IRC, you said you are tweaking this a little anyway.

@@ +90,5 @@
> +          if (promise instanceof Promise) {
> +            return promise.then(resolve, reject);
> +          }
> +          resolve();
> +          return undefined;

A normal "return;" seems OK to me here.

@@ +94,5 @@
> +          return undefined;
> +        });
> +      });
> +    }, opts);
> +    yield promiseChain;

I'd have thought an array of promises and returning Promise.all() would be clearer here, but I'm not too bothered.

@@ +119,5 @@
> +   * the object are those in ITEM_PROPERTY_NAMES.  The object may have as few or
> +   * as many properties that you want to set, but it must have a `guid`
> +   * property.
> +   *
> +   * It's an error to call this with objects whose `guid` or `url` properties

might be worth expanding on the error condition here (ie, I assume a rejection with a specific error?).  We probably want the error state to allow the caller to know what actually happened (eg, if the second item had a dupe, will the first be written?  If so, how will the caller know which one failed?

Should we return an array of "status" objects instead (eg, [{error: null}, {error: Error("duplicate item")}] - that would allow the caller to see that item 0 was written but item1 failed.

@@ +142,5 @@
> +   *
> +   * It's an error to call this for an item that doesn't belong to the list.
> +   *
> +   * @param item The ReadingListItem to update.
> +   * @return Promise<null> Resolved when the list is updated.

ditto here WRT the docs - we should give some indication of the rejection value in the error case (this seems to also apply for most of the method docs below).

@@ +160,5 @@
> +   * @return Promise<null> Resolved when the list is updated.
> +   */
> +  deleteItem: Task.async(function* (item) {
> +    this._assertItemBelongsToList(item);
> +    yield this._store.deleteItemByGUID(item.guid);

shouldn't we invalidate item here?  ie, delete the .list attribute and remove it from the map?

@@ +168,5 @@
> +  /**
> +   * Call this when you're done with the list.  Don't use it afterward.
> +   */
> +  destroy: Task.async(function* () {
> +    yield this._store.destroy();

maybe setting this._store = null is worthwhile here?  Probably doesn't matter that much as the store object is going to throw

@@ +224,5 @@
> +    }
> +    this._iterators.clear();
> +  },
> +
> +  _assertItemBelongsToList(item) {

I'd prefer "ensure" rather than "assert" like in the iterator - "assert" implies to me a possibly-debug-only check.

@@ +225,5 @@
> +    this._iterators.clear();
> +  },
> +
> +  _assertItemBelongsToList(item) {
> +    if (item.list != this) {

can't we do this check by something like |this._itemsByGUID.get(item.guid).get() === item|, and drop the |item.list| property?

@@ +239,5 @@
> + * ReadingList that the item doesn't belong to.
> + *
> + * @param props The properties of the item, as few or many as you want.
> + */
> +function ReadingListItem(props={}) {

Having a .toJSON() method on this object might be nice (eg, then Log.jsm etc will "do the right thing")

@@ +251,5 @@
> +   *
> +   * @param props A simple object containing the properties to set.
> +   * @param commit If true, commit() is called.
> +   */
> +  setProperties: Task.async(function* (props, commit=true) {

This doesn't seem necessary as a public function as I'd expect callers to just set the properties in the usual way.  It would also avoid the "smell" of the constructor calling a generator (and I'm not sure it would actually work if another yield was added to the generator in the future)

@@ +433,5 @@
> +}
> +
> +this.ReadingList.ItemPropertyNames = ITEM_PROPERTY_NAMES;
> +
> +Object.defineProperty(this.ReadingList, "singleton", {

I think this module should have an opinion on how instances are used:

1) It could not expose an instance and have callers create instances as necessary. This would mean that one of the UI/SyncEngine/etc pieces is responsible for creating an instance and passing it to the other pieces.  This would theoretically allow us to have 2 UI parts, 2 schedulers, 2 sync engines etc, each sharing a different instance.  However, in practice, I expect there will be exactly 1 instance created.

2) It could expose a single instance, and not expose a way to create another. This means the module is exposing a singleton (and the fact that another instance could theoretically be created is moot if the module (a) doesn't expose a way and (b) promises to only ever create one.)  This would mean that the UI, Scheduler, Sync Engine etc can just import this module to get the single instance and need not pass them around. Or to phrase it another way, the contract/interface exposed by the module guarantees a single instance.

I think it should do (2) as I can't think of when we will actually need (1).  This is a common pattern already in the tree (eg, the readinglist modules you reviewed last week use this pattern, as does the FxA modules, Bookmarks.jsm, etc.)  If we *do* use (2), then I think the .list attributes are not needed as this module guarantees only a single instance - all .list attributes are guaranteed to be that instance. If that means you remove the constructor and make it a simple object (a-la Bookmarks.jsm), so be it.

If we want to do (1) then I agree the .list instances are probably needed, but think this will add to the complexity of the consumers of this module (as above, we'd need to agree who owns/creates the single instance that will be used in practice, and make sure it can pass this instance too all the parts that need it)  We'd probably also need to fix the scheduler module as that assumes a singleton scheduler instance, which wouldn't be true in this model.

IOW, I believe we should drop the "singleton" attribute and have the single "ReadingList" export be a single instance/simple object and, if necessary, make some simple concessions for xpcshell tests (mochitests probably don't need any such concessions as they will tend to be testing the things that *use* the singleton rather than the instance itself) - but the tests in this patch seem to work fine using the singleton.

::: browser/components/readinglist/test/xpcshell/test_ReadingList.js
@@ +92,5 @@
> +
> +  // promises resolved after a delay
> +  items = [];
> +  let i = 0;
> +  let promises = {};

shouldn't promises be an array here and the code below just set the index to undefined instead of delete?

@@ +105,5 @@
> +    promises[this_i] = new Promise(resolve => {
> +      // Resolve the promise one second from now.  The idea is that if
> +      // forEachItem works correctly, then the callback should not be called
> +      // again before the promise resolves -- before one second elapases.
> +      // Maybe there's a better way to do this that doesn't hinge on timeouts.

hrm, yeah, the timeout is somewhat sucky and 3 seconds for this to complete seems a shame.  I'm not sure why a delay matters - if it doesn't work with a timeout of 0 it seems like a future orange if the test runs *very* slow.

@@ +210,5 @@
> +  let item;
> +  yield gList.forEachItem(i => item = i, {
> +    guid: gItems[0].guid,
> +  });
> +  Assert.ok(!!item);

I don't think .ok insists on a bool - ie, Assert.ok(item) should work the same (and in other places)

@@ +222,5 @@
> +  let item = (yield iter.items(1))[0];
> +  Assert.ok(!!item);
> +
> +  // item.setProperties(commit=false).  After fetching the item again, its title
> +  // shoud be the old title.

should

@@ +226,5 @@
> +  // shoud be the old title.
> +  let oldTitle = item.title;
> +  let newTitle = "item_setProperties title 1";
> +  Assert.notEqual(oldTitle, newTitle);
> +  item.setProperties({ title: newTitle }, false);

this is a good example of why I don't think setProperties makes much sense - it would be far more natural for the caller to say item.title = "..." and an explicit commit.

@@ +232,5 @@
> +  iter = gList.iterator({
> +    sort: "guid",
> +  });
> +  let sameItem = (yield iter.items(1))[0];
> +  Assert.ok(item, sameItem);

it looks like this should be Assert.deepEqual(item, sameItem) or maybe Assert.ok(item === sameItem)?

@@ +292,5 @@
> +  checkItems(items, gItems.slice(2));
> +
> +  // delete third item with list.deleteItem()
> +  yield gList.deleteItem(gItems[2]);
> +  gItems[2].list = null;

As mentioned before, I'd expect .list to be nuked by deleteItem (but I'd prefer .list die completely, or at least be ._list)
Attachment #8569592 - Flags: review?(mhammond)
(In reply to Marco Bonardo [::mak] -- catching up, delayed answers -- from comment #43)

> do you actually use the id, or plan to use it in future (for a relation
> table, for example)? Cause otherwise you might want to make guid the PRIMARY
> KEY and use WITHOUT ROWID. Note the benefit is low, you just save some space
> not storing ids, for small tables it's likely ignorable.

Bear in mind that GUIDs are allocated by the server, and are thus allowed to be null (and expected to be so for non-FxA clients).

There are some circumstances (e.g., switching between Firefox Accounts) where the sanest thing to do is empty the GUID column and refill it from the new server.

It would be possible to use some distinct space for "local-only" GUIDs, but that seems like more hassle than it's worth.

On Android we use the numeric record ID to apply the final server-supplied version of a record after a POST.
(In reply to Drew Willcoxon :adw from comment #44)
> In the breakdown bug you said, "think thrice before adding an index on a
> TEXT field," which is why I kept id.  But I forgot you followed that up with
> "(unless... the text is expected to be less than 50 chars)", which guid is. 
> So I should probably make guid the primary key.  (I applied this advice
> inconsistently because when it came to URLs -- below -- I ignored it... :-/)

Well, if you can rely on guid being unique and not null, you probably should.
But as I said, the cost of id in this case is ignorable so you might keep it to be on the safe side (in future could use it for a relation table).

> > @@ +151,5 @@
> > > +        guid TEXT UNIQUE,
> > > +        lastModified INTEGER,
> > > +        url TEXT UNIQUE,
> > > +        title TEXT,
> > > +        resolvedURL TEXT UNIQUE,
> > 
> > url and resolvedURL indices here (due to unique) will take up quite some
> > space.
> > Provided we expect fewer than 5000 urls it might not be a big deal,
> > otherwise using a unique hash for equality checks would be better.
> 
> Well, not constraining ourselves to a small number of items was central to
> my reasoning for choosing SQLite.  Could you describe what you mean by using
> a hash?  Adding another column that stores the hash of the URL with a UNIQUE
> index on it instead?

yes that's what I meant. The index is on the hash column rather than on the TEXT column. Clearly that way you can only do == or !=, you can't do < or > checks.
I did some research for Places, and I think we will use a 32bit integer hash (platform has a good hash in cpp, used for hash tables, but also crc32 could do). Note that it won't guarantee uniqueness, but a search like "hash = and url =" would not be that much expensive cause at a maximum the hash would return 2 or 3 dupes to scan through.

Any recommendations for the hash function I should
> use?  Should I hash in JS and INSERT the hashed value, or write a SQL
> function and do it in SQL, or does it not matter much?

It doesn't matter much.
This is also feasible in future, you don't have to do that immediately, indeed in Places we will change that in future. You could for example start by adding telemetry that will report the avg number of items in the table. if it's low you might even not need to do that.
Telemetry is your ratio to choose.

> Oh, so "guid IN (:guid_0, :guid_1, :guid_2)" is valid SQL?  It's been a
> while since I've written any...  I'll check and fix it.

yes, it's valid.
Flags: needinfo?(mak77)
(In reply to Richard Newman [:rnewman] from comment #48)
> (In reply to Marco Bonardo [::mak] -- catching up, delayed answers -- from
> comment #43)
> 
> > do you actually use the id, or plan to use it in future (for a relation
> > table, for example)? Cause otherwise you might want to make guid the PRIMARY
> > KEY and use WITHOUT ROWID. Note the benefit is low, you just save some space
> > not storing ids, for small tables it's likely ignorable.
> 
> Bear in mind that GUIDs are allocated by the server, and are thus allowed to
> be null (and expected to be so for non-FxA clients).
> 
> There are some circumstances (e.g., switching between Firefox Accounts)
> where the sanest thing to do is empty the GUID column and refill it from the
> new server.

That sounds "fancy". a null guid doesn't make sense, since it's a guid. IT can make sense temporarily in the code, not in a schema.
Why is the guid created server side? why can't it be synced down to the clients?
Honestly, the rowid of a table should never be used outside of relations in the schema itself, the guid is a good candidate for an external id, but every entry should always be recognizable by a guid. in this case could even just be the url too, this is not bookmarks where many bookmarks can point to the same url, sounds like in this case it has enough uniqueness to allow to immediately identify an entry.
Attached patch patch (obsolete) — Splinter Review
Next I'll attach a diff between the previous patch and this, in case that helps.

(In reply to Mark Hammond [:markh] from comment #47)
> ::: browser/components/readinglist/ReadingList.jsm
> @@ +20,5 @@
> > + * A reading list contains ReadingListItems.
> > + *
> > + * A list maintains only one copy of an item per GUID.  So if for example you
> > + * use an iterator to get two references to items with the same GUID, your
> > + * references actually refer to the same JS object.  Another way to look at this
> 
> A nit, but think the "Another way too look at this..." sentence could just
> say "all items with the same GUID are shared".

I just removed that part.

> @@ +90,5 @@
> > +          if (promise instanceof Promise) {
> > +            return promise.then(resolve, reject);
> > +          }
> > +          resolve();
> > +          return undefined;
> 
> A normal "return;" seems OK to me here.

It would work, but it seems wrong to me for one return to return a value and the other one return nothing.  It looks like an error, like whoever wrote it forgot to return a value in that case.

> @@ +94,5 @@
> > +          return undefined;
> > +        });
> > +      });
> > +    }, opts);
> > +    yield promiseChain;
> 
> I'd have thought an array of promises and returning Promise.all() would be
> clearer here, but I'm not too bothered.

The problem with that is the array wouldn't be helpful while the iteration is ongoing, so building it would only be useful for blocking the method's return, but in that case yielding on promiseChain does the same thing.

> @@ +119,5 @@
> > +   * the object are those in ITEM_PROPERTY_NAMES.  The object may have as few or
> > +   * as many properties that you want to set, but it must have a `guid`
> > +   * property.
> > +   *
> > +   * It's an error to call this with objects whose `guid` or `url` properties
> 
> might be worth expanding on the error condition here (ie, I assume a
> rejection with a specific error?).  We probably want the error state to
> allow the caller to know what actually happened (eg, if the second item had
> a dupe, will the first be written?  If so, how will the caller know which
> one failed?
> 
> Should we return an array of "status" objects instead (eg, [{error: null},
> {error: Error("duplicate item")}] - that would allow the caller to see that
> item 0 was written but item1 failed.

Good point, but rather than trying to make addItems smarter, I changed it to addItem (singular).  I updated comments to say that promises are rejected with Errors.

> @@ +160,5 @@
> > +   * @return Promise<null> Resolved when the list is updated.
> > +   */
> > +  deleteItem: Task.async(function* (item) {
> > +    this._assertItemBelongsToList(item);
> > +    yield this._store.deleteItemByGUID(item.guid);
> 
> shouldn't we invalidate item here?  ie, delete the .list attribute and
> remove it from the map?

It could go either way I think, but I think you're probably right.

> @@ +168,5 @@
> > +  /**
> > +   * Call this when you're done with the list.  Don't use it afterward.
> > +   */
> > +  destroy: Task.async(function* () {
> > +    yield this._store.destroy();
> 
> maybe setting this._store = null is worthwhile here?  Probably doesn't
> matter that much as the store object is going to throw

Yeah, I think a rejected promise/thrown error saying "the store has been destroyed" is a better failure than the error that would be thrown trying to call something on null.

> @@ +225,5 @@
> > +    this._iterators.clear();
> > +  },
> > +
> > +  _assertItemBelongsToList(item) {
> > +    if (item.list != this) {
> 
> can't we do this check by something like
> |this._itemsByGUID.get(item.guid).get() === item|, and drop the |item.list|
> property?

item needs a list property so when you change a property on it, it updates itself with its list.  So checking the list property here is equivalent to doing _itemsByGUID.get().

> @@ +251,5 @@
> > +   *
> > +   * @param props A simple object containing the properties to set.
> > +   * @param commit If true, commit() is called.
> > +   */
> > +  setProperties: Task.async(function* (props, commit=true) {
> 
> This doesn't seem necessary as a public function as I'd expect callers to
> just set the properties in the usual way.  It would also avoid the "smell"
> of the constructor calling a generator (and I'm not sure it would actually
> work if another yield was added to the generator in the future)

I'm imagining that the sync module would call this after it pulls down the items from the server.  It would look up the fetched items in the local list, and for each that exists in the local list, it would update it by calling setProperties on it, passing it an object with reconciled item properties (comment 29).  I haven't written the sync module yet so maybe it'll end up not working that way, but I can't imagine it not doing something like that -- iterating over properties and setting each one.  So that's why this convenience method exists.  If it turns out to not be a convenience, I'll remove it.

FWIW other ORMs I've seen have similar methods, and setProperties doesn't actually yield if commit=false, which it is when the constructor calls it.

> I think this module should have an opinion on how instances are used:
> 
> 1) It could not expose an instance and have callers create instances as

Yeah, I don't like that either.

> 2) It could expose a single instance, and not expose a way to create
> another.

Yeah, I like this too, but I'm not dropping the list properties or changing ReadingList from a constructor/prototype to a single object.

> @@ +232,5 @@
> > +  iter = gList.iterator({
> > +    sort: "guid",
> > +  });
> > +  let sameItem = (yield iter.items(1))[0];
> > +  Assert.ok(item, sameItem);
> 
> it looks like this should be Assert.deepEqual(item, sameItem) or maybe
> Assert.ok(item === sameItem)?

Typo, should be .equal().
Attachment #8569592 - Attachment is obsolete: true
Attachment #8571699 - Flags: review?(mhammond)
Attached patch diff (obsolete) — Splinter Review
Comment on attachment 8571699 [details] [diff] [review]
patch

Review of attachment 8571699 [details] [diff] [review]:
-----------------------------------------------------------------

I'm really disappointed you chose to ignore most previous comments and insist on implementing things in this "should-be-a-singleton-but-maybe-not" approach, but I think there'd be even greater disappointment if I gave r-, forced you to find another reviewer and caused this patch to go into its 4th week (which TBH I'd personally prefer to do). Damned if I do, damned if I don't :(
Attachment #8571699 - Flags: review?(mhammond) → review+
I think I didn't answer the journal size limit question. Places uses 1,5MiB, but it has a lot of traffic, I think in this case you can stay much lower, 512KiB should be more than enough (indeed very likely  in most cases it will merge on shutdown)
Comment on attachment 8571699 [details] [diff] [review]
patch

Review of attachment 8571699 [details] [diff] [review]:
-----------------------------------------------------------------

Er, are you planning on landing this as-is? Cos it's based on a the tree from 2 weeks ago, before anything landed - so is going to break all existing code, including all its tests.

::: browser/components/readinglist/ReadingList.jsm
@@ +159,5 @@
> +   * Adds an item to the list that isn't already present.
> +   *
> +   * The given object represents a new item, and the properties of the object
> +   * are those in ITEM_PROPERTY_NAMES.  It may have as few or as many properties
> +   * that you want to set, but it must have a `guid` property.

If the GUID is allocated by the server, then this won't allow the UI to add an item.

@@ +346,5 @@
> +
> +// Add getters and setters to ReadingListItem.prototype for all the properties
> +// above.  The getters are ordinary getters, but the setters update the item in
> +// its list.
> +for (let n of ITEM_PROPERTY_NAMES) {

I'd far rather have this in long-form defined properties on the prototype so it can be properly documented, and so the getters can return something other than primatives (Date, nsIURL, etc).

@@ +353,5 @@
> +    get() {
> +      return this[`_${name}`];
> +    },
> +    set(value) {
> +      this[`_${name}`] = value;

If this was setting properties on an internal object (this._data), ReadingListItem can just become a wrapper, and there'd be no need for simpleObjectFromItem() (and maybe not even setProperties). It'd also greatly simplify toJSON().
(In reply to Blair McBride [:Unfocused] (I don't read bugmail - needinfo? me!) from comment #55)
> Er, are you planning on landing this as-is? Cos it's based on a the tree
> from 2 weeks ago, before anything landed - so is going to break all existing
> code, including all its tests.

Yeah, I need to reconcile it with the stuff that has landed recently.  To be honest I kind of forgot that you had already landed a ReadingList.jsm, and I didn't add my ReadingList.jsm until the last moment; I was thinking the only conflicts would be supporting files like moz.build and xpcshell.ini.  I'm not sure what to to about that.  I had been thinking we should land the storage and then do the reconciliation with the front-end in a future bug.

> ::: browser/components/readinglist/ReadingList.jsm
> @@ +159,5 @@
> > +   * Adds an item to the list that isn't already present.
> > +   *
> > +   * The given object represents a new item, and the properties of the object
> > +   * are those in ITEM_PROPERTY_NAMES.  It may have as few or as many properties
> > +   * that you want to set, but it must have a `guid` property.
> 
> If the GUID is allocated by the server, then this won't allow the UI to add
> an item.

Good point, I'll need to fix that.

> @@ +346,5 @@
> > +
> > +// Add getters and setters to ReadingListItem.prototype for all the properties
> > +// above.  The getters are ordinary getters, but the setters update the item in
> > +// its list.
> > +for (let n of ITEM_PROPERTY_NAMES) {
> 
> I'd far rather have this in long-form defined properties on the prototype so
> it can be properly documented, and so the getters can return something other
> than primatives (Date, nsIURL, etc).

Returning something other than primitives is a good point.  I'll fix that somehow too.

> @@ +353,5 @@
> > +    get() {
> > +      return this[`_${name}`];
> > +    },
> > +    set(value) {
> > +      this[`_${name}`] = value;
> 
> If this was setting properties on an internal object (this._data),
> ReadingListItem can just become a wrapper, and there'd be no need for
> simpleObjectFromItem() (and maybe not even setProperties). It'd also greatly
> simplify toJSON().

Interesting idea, thanks, especially in the context of returning higher-level objects instead of primitives.
Attached patch patch (obsolete) — Splinter Review
Requesting review again because Blair's comments prompted some non-trivial changes.  I'll attach another patch, maybe in a new bug, for updating front-end consumers to use this ReadingList.jsm and ask Blair to review it.

* This is based on the latest tree, so it replaces the existing ReadingList.jsm.

* Changes the schema to allow null GUIDs so the front-end can create local items.

* As a result, all the methods that looked up objects by GUID now do so by URL.

* Keeps item properties from the store in a new item._properties object and adds some item getters and setters for properties that convert _properties values to nicer-to-use values.  e.g., item.uri maps to an nsIURI version of item.url, item.lastModified is now a Date instead of a number.

* Renames the sqlite file reading-list-temp.sqlite to reflect the fact that the schema may change to meet the requirements of the sync module, like you suggested on IRC.

* Fixes a PRAGMA journal_size_limit error -- it takes bytes, not KiB.

* Adds some tests.

Next I'll post a diff between this patch and the last, in case it helps.
Attachment #8571699 - Attachment is obsolete: true
Attachment #8572261 - Flags: review?(mhammond)
Attached patch diff (obsolete) — Splinter Review
Attachment #8571700 - Attachment is obsolete: true
Comment on attachment 8572262 [details] [diff] [review]
diff

Review of attachment 8572262 [details] [diff] [review]:
-----------------------------------------------------------------

(Oops - these comments are in the interdiff, but they apply to the real patch)

Looks OK to me with comments addressed, but I think we need review from Blair too.

::: browser/components/readinglist/ReadingList.jsm
@@ +223,3 @@
>        let item = itemWeakRef.get();
>        if (item) {
>          item.list = null;

*sigh*

@@ +351,5 @@
> +// Add getters and setters to ReadingListItem.prototype for properties that need
> +// to be mapped to and from storage types.
> +const ITEM_PROPERTY_MAP = {
> +  // Each of these maps the name of a getter and setter to [the corresponding
> +  // name in _properties, the getter/setter type].

did you mean "ITEM_PROPERTY_NAMES" instead of "_properties" here?  I think we should only keep one such map (ie, replace ITEM_PROPERTY_NAMES with a map to the type name - it reduces redundancy and is less error prone and means you can handle all properties in the same block.

@@ +369,3 @@
>      get() {
> +      let [privatePropName, type] = ITEM_PROPERTY_MAP[propName];
> +      let fnName = `${type}FromPrimitive`;

This looks a little too magic for me - a switch/case would be better given every one of the "helpers" is 1 line long, and would avoid the |global| hack

@@ +535,5 @@
>  
>  Object.defineProperty(this, "ReadingList", {
>    get() {
>      if (!this._singleton) {
> +      let store = new SQLiteStore("reading-list-temp.sqlite");

I think an XXX comment here is appropriate that spells out why we are doing this (ie, because we are going to explicitly "wipe" all user data at some point near in the future.

::: browser/components/readinglist/test/xpcshell/test_ReadingList.js
@@ +35,5 @@
>        url: `http://example.com/${i}`,
>        resolvedURL: `http://example.com/resolved/${i}`,
>        title: `title ${i}`,
>        excerpt: `excerpt ${i}`,
> +      unread: 0,

I'm surprised booleans aren't boolean! Is this just to test your conversion functions? If so, I think that should be a separate explicit test (eg, IIUC, you now lack coverage for bool->bool)

@@ +626,5 @@
>  function checkItems(actualItems, expectedItems) {
>    Assert.equal(actualItems.length, expectedItems.length);
>    for (let i = 0; i < expectedItems.length; i++) {
>      for (let prop in expectedItems[i]) {
> +      if (prop != "list") {

*deep sigh*
Attachment #8572262 - Flags: feedback+
Attachment #8572261 - Flags: review?(mhammond) → review?(bmcbride)
Priority: -- → P1
Comment on attachment 8572262 [details] [diff] [review]
diff

Review of attachment 8572262 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/readinglist/ReadingList.jsm
@@ +369,3 @@
>      get() {
> +      let [privatePropName, type] = ITEM_PROPERTY_MAP[propName];
> +      let fnName = `${type}FromPrimitive`;

What Mark said about this - this is a lot of magic/metaprogramming for not much gain.

One of my main issues around the general metaprogramming approach for this is that it doesn't let us document the properties. And especially not in a way that's understandable by any tools. As a consumer of the API, that's *really* painful.

I guess I don't see the justification for the magic, given the alternative of hand-coding the properties (see ReadingList.jsm in the tree) is shorter, simpler, & less bug-prone, allows docs.
Attachment #8572261 - Flags: review?(bmcbride)
Comment on attachment 8572261 [details] [diff] [review]
patch

Review of attachment 8572261 [details] [diff] [review]:
-----------------------------------------------------------------

I think we benefit more by landing this now as-is, and doing everything else in followups. I don't think any new comments here explicitly get in the way of actually using the module, nor are they breaking issues. But the UI work in particular benefits by integrating this patch sooner rather than later.
Attachment #8572261 - Flags: review+
(In reply to Blair McBride [:Unfocused] (I don't read bugmail - needinfo? me!) from comment #60)
> One of my main issues around the general metaprogramming approach for this
> is that it doesn't let us document the properties.

That's true -- at least it doesn't lend itself to the regular flower-box style of comments.

> I guess I don't see the justification for the magic, given the alternative
> of hand-coding the properties (see ReadingList.jsm in the tree) is shorter,
> simpler, & less bug-prone, allows docs.

I disagree.  Writing these methods by hand would mean a ton of copying and pasting.  You end up with N methods that all do basically the same thing.  2N actually since each has a getter and setter.  That's not simpler.  It's certainly not less bug prone because you have the bug-proneness of copy and paste, and more importantly each method is independently buggy.  I wish we did more metaprogramming on our team.

In all the time I've been here, I truly think I'm unique on the team in liking to use metaprogramming in Firefox code, and that sucks for me (world's smallest violin), but I understand it, so I don't mind changing it as you request.
Attached patch consumers patchSplinter Review
This updates all the ReadingList.jsm consumers I found.  I added an `id` property to ReadingListItems.  Items don't have to have GUIDs because GUIDs come from the server.  It doesn't seem wise to make up GUIDs for locally created items because then they have to be reconciled with the GUIDs created by the server.  It also doesn't seem wise to expose the database rowid as an ID.

At first I tried doing without IDs altogether, but then I saw that sidebar.js sets some aria properties based on item IDs.  I don't know how to work around that without using IDs of some sort, so I chose an ID that's the hash of the item's URL.  All items must have URLs, and URLs are unique, so basing the ID off the URL seems good (or even making it the URL itself, although that seems unwieldy).  The front-end will probably never have to deal with GUIDs at all.

I'm planning to update ReadingList.jsm to satisfy comment 60.  I haven't done that yet though, so this patch is based on the last patch I posted here with some minor tweaks.
Attachment #8573547 - Flags: review?(bmcbride)
Attached patch patchSplinter Review
This satisfies comment 60 and adds getters and setters by hand.  It keeps ITEM_PROPERTY_NAMES because it's useful for SQLiteStore but renames it ITEM_BASIC_PROPERTY_NAMES.
Attachment #8572261 - Attachment is obsolete: true
Attachment #8572262 - Attachment is obsolete: true
Comment on attachment 8573583 [details] [diff] [review]
patch

Review of attachment 8573583 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/readinglist/ReadingList.jsm
@@ +371,4 @@
>    },
>  
>    /**
> +   * The item's resolved URL as an nsIURI.

I still think a map of field->typename would make this simpler, smaller, easier to read and less error prone. Is there a reason you didn't do it how I suggested and how Blair seemed to agree is reasonable?
See Also: → 1140194
Comment on attachment 8573583 [details] [diff] [review]
patch

Review of attachment 8573583 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/readinglist/ReadingList.jsm
@@ -25,5 @@
> -  parentLog.addAppender(new Log.ConsoleAppender(formatter));
> -  parentLog.addAppender(new Log.DumpAppender(formatter));
> -})();
> -
> -let log = Log.repository.getLogger("readinglist.api");

Keep the logging setup - it's going to be *really* useful.

@@ +613,5 @@
> +    yield this.list.updateItem(this);
> +  }),
> +
> +  toJSON() {
> +    return JSON.stringify(this._properties);

Confusingly, toJSON is expected to return an object.
Attachment #8573583 - Flags: review+
Attachment #8573547 - Flags: review?(bmcbride) → review+
Pushed this, so it can stop blocking things. However, on my Windows machine the xpcshell tests were failing due to the DB file being locked at the end of the test. Classic Windows issue and everything else worked fine, so I've disabled those tests and filed a followup (bug 1140389).


https://hg.mozilla.org/integration/fx-team/rev/d2776d6d1d22
https://hg.mozilla.org/integration/fx-team/rev/d2aca2f1ae0f
Blocks: 1140389
https://hg.mozilla.org/mozilla-central/rev/d2776d6d1d22
https://hg.mozilla.org/mozilla-central/rev/d2aca2f1ae0f
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → Firefox 39
No longer blocks: 1140389
Depends on: 1140389
Product: Firefox → Firefox Graveyard
You need to log in before you can comment on or make changes to this bug.