Closed Bug 1131416 Opened 9 years ago Closed 9 years ago

Desktop syncing module for reading list service

Categories

(Firefox Graveyard :: Reading List, defect, P1)

defect

Tracking

(firefox38 fixed, firefox39 fixed)

RESOLVED FIXED
Firefox 39
Tracking Status
firefox38 --- fixed
firefox39 --- fixed

People

(Reporter: markh, Assigned: adw)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 16 obsolete files)

40.62 KB, patch
markh
: review+
Details | Diff | Splinter Review
17.68 KB, patch
adw
: review+
Details | Diff | Splinter Review
71.23 KB, patch
adw
: review+
Details | Diff | Splinter Review
This is the desktop version of bug 1117830.  It is for an "engine" (ie, a module) that is capable of syncing the client state with the server and sending any necessary notifications for the UI to react to incoming changes.
Flags: qe-verify-
Flags: firefox-backlog+
Depends on: 1131362
Blocks: 1132074
Mass change of ReadingList bugs, moving to their own component. Filter bugspam on the following quote:

“Reading is to the mind what exercise is to the body.”
― Joseph Addison
Component: General → Reading List
Summary: Desktop client for reading list service → Desktop syncing module for reading list service
Assignee: nobody → adw
Status: NEW → ASSIGNED
Iteration: --- → 39.1 - 9 Mar
Priority: -- → P1
Iteration: 39.1 - 9 Mar → 39.2 - 23 Mar
Attached patch rough sketch (obsolete) — Splinter Review
This is a rough sketch.  I'd like feedback on the basic idea, which I'll describe now.

There's a Sync prototype, and the module will expose a singleton Sync object.  Sync creates a SyncSession each time you sync.  SyncSession maintains state needed for one sync session and can be thrown away when a sync is done.

When SyncSession starts, it fetches items from the server in batches since it seems like a bad idea to try to pull down a huge list of items at once.  (This'll be a GET /articles with appropriate parameters and headers (especially Last-Modified).)  For each fetched batch, it looks up by URL the items that exist locally.  For each of those items, if the local item has been modified more recently than the server item, then SyncSession ignores the server item and queues up an outbound request to tell the server that it should match the local item.  If the server item has been modified more recently, then SyncSession updates the local item to match the server item.

Then SyncSession adds any remaining items that don't exist locally to the local store.

Finally it gets all the local items that have never been synced and queues up requests to tell the server about them.  I'm using a new syncStatus property on local items for this.  (It'll be in the schema/DB of course.)  It may make more sense to use a lastSyncDate.

In the same way that GET requests are batched, outbound POST requests would be batched too I'm thinking -- inside a single POST /batch, not as actual individual requests -- to keep request sizes down.  Whenever the max request count is reached, SyncSession would fire off a new actual request.  But I don't know if that matters.  I don't have much experience with talking to servers.

I'm wondering how much if any of this should be done off the main thread in workers.
Attachment #8577026 - Flags: feedback?(mhammond)
(In reply to Drew Willcoxon :adw from comment #2)
> items that exist locally.  For each of those items, if the local item has
> been modified more recently than the server item

Should elaborate on this part: I'm planning on getting the server's idea of "now" from the Date header in responses.  Not sure if that's legit.  I'll need to be careful when comparing lastModified dates that they come from the same machine, i.e., compare server dates to server dates and local dates to local dates.  That might mean storing two lastModified dates in the schema, one for the server and one for the local, not sure.
Comment on attachment 8577026 [details] [diff] [review]
rough sketch

Review of attachment 8577026 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not sure what value the SyncSession is adding, and I think we really need to get our heads around how the android client is doing a sync.  https://github.com/mozilla-services/android-sync/blob/develop/docs/reading-list.md is quite terse but has some information.

But really, I think the most important thing we can do in the short-term is to get *something* working here and not boil the ocean. I think for a first cut it's reasonable to assume a relatively small number of elements. We can then iterate as we understand the problem better.

::: browser/components/readinglist/Sync.jsm
@@ +4,5 @@
> +}
> +
> +Sync.prototype = {
> +
> +  start() {

I think it would make more sense for start to throw if it is already running - after all, nothing could actually be started.

@@ +27,5 @@
> +SyncSession.prototype = {
> +
> +  start() {
> +    if (!this.promise) {
> +      this.promise = Task.spawn(function* () {

isn't |this.promise = this._start()| the same?

@@ +71,5 @@
> +    }
> +
> +    // Reconcile items that already locally exist by URL.
> +    let localURLs = new Set();
> +    yield this.list.forEachItem(localItem => {

I expect that in most cases when these numbers are huge, the number of server URLs will be roughly equal to the number of local items, so I'd think iterating all local items without a filter would give roughly the same performance and avoid the XXX comment below.

@@ +90,5 @@
> +    }
> +  }),
> +
> +  _reconcileItem(serverNow, serverItem, localItem) {
> +    let localNow = Date.now();

this worries me a little (clock skew, timezones, latency etc).  Can we treat the server timestamp as opaque?  ie, after updating a record re-get the server timestamp?  It looks like the server doesn't allow you set explicitly set lastModified anyway.  But regardless, I think working out what rnewman has done here is what we should do too (but best I can from his patch, the local timestamp is always a copy of the server timestamp). I *think* he uses a state flag to indicate when a record has been modified locally
Attachment #8577026 - Flags: feedback?(mhammond) → feedback+
(In reply to Drew Willcoxon :adw from comment #3)

> Should elaborate on this part: I'm planning on getting the server's idea of
> "now" from the Date header in responses.  Not sure if that's legit.  I'll
> need to be careful when comparing lastModified dates that they come from the
> same machine, i.e., compare server dates to server dates and local dates to
> local dates.  That might mean storing two lastModified dates in the schema,
> one for the server and one for the local, not sure.

You should try never to compare modification times, not even from the same clock.
> this worries me a little (clock skew, timezones, latency etc).  Can we treat
> the server timestamp as opaque?  ie, after updating a record re-get the
> server timestamp?  It looks like the server doesn't allow you set explicitly
> set lastModified anyway.

Server "timestamps" in RL actually aren't timestamps; they're opaque versions that just happen to be quite close to a real clock.

You cannot set the server modification time.

The server timestamp (version) should be treated as opaque: you should only use it to check if a record is the same as a copy of it that you've already seen (e.g., as the response to an upload), or to fetch changes since a previous version.

> But regardless, I think working out what rnewman
> has done here is what we should do too (but best I can from his patch, the
> local timestamp is always a copy of the server timestamp). I *think* he uses
> a state flag to indicate when a record has been modified locally

Yes, we use a combination of two flags to accurately track changes. We use sequencing of operations to avoid the need to reconcile conflicts.

This doc gives a good overview, in combination with the doc you linked in Comment 4.

I'd be happy to walk through anything you like next week.

https://github.com/mozilla-services/readinglist/wiki/Client-phases
(In reply to Drew Willcoxon :adw from comment #2)
> For each fetched batch, it looks up by URL the
> items that exist locally.

I particularly want to advise against doing this. URL-based conflicts are trivially resolved by the server, and the server allocates IDs.

New records, or records with changed URLs, should be uploaded before changes are downloaded.

All non-new records will have an ID allocated by the server, which makes applying changes easy. No need to do lookups by URL.
This makes some changes in preparation for the sync patch -- plus some probably unrelated changes mixed in that I think are improvements.  These are standalone changes from the sync patch so I think they can be reviewed now, but if you want to wait for the sync patch that's OK.

* Renames _properties to _record and changes all related terminology to use "record".  Writing the sync patch kind of made me go nuts because I need a term to refer to the non-ReadingListItem objects from the DB and server.

* Renames _itemsByURL to _itemsByNormalizedURL.

* Gets rid of stripNonItemProperties.  Instead, normalizeRecord sanity-checks the passed-in object and throws an error for unrecognized properties (in addition to normalizing the properties).

* Adds ReadingList.item(), which fetches a single matching item.

* Renames getItemForURL to itemForURL, which better matches the other method names.

* Fixes a couple of bugs/typos I noticed (exerpt -> excerpt, one or two others I forget now).

* Gets rid of item.setProperties in favor of a proper _record getter/setter.  The setter normalizes properties to guarantee that wrong values won't ever end up in _record.

* Adds item._updateRecord, which sets _record, and makes all the item public setters call it -- so you get normalization for free.  (The sync patch will hook into this method.)

* Removes the string-valued item.url and item.resolvedURL in favor of the nsIURI versions.

* Removes the item.uri setter since the URL shouldn't change after item creation.

* Removes the item.guid setter.

* Removes item.lastModified -- this'll be a serverLastModified in the sync patch, and I'm not sure the front-end will need a last modified.  Even if it does, the server last modified value should be treated as opaque, so a locally generated date should be used instead.
Attachment #8577026 - Attachment is obsolete: true
Attachment #8579642 - Flags: review?(mhammond)
Comment on attachment 8579642 [details] [diff] [review]
standalone ReadingList changes in preparation for sync changes

Review of attachment 8579642 [details] [diff] [review]:
-----------------------------------------------------------------

Looks OK, but I've a few concerns we should discuss:
* I think the lack of a url getter is a problem, see comments below.
* I've no real objection to removing the |url| setter given it (should be) immutable, but if we take that approach we should take it for all immutable fields.

It's also worth considering if we should be looking at these improvements after we have the engine working - I doubt many of these changes make a huge difference to the engine.

::: browser/components/readinglist/ReadingList.jsm
@@ +265,5 @@
>     *         Error on error.
>     */
>    deleteItem: Task.async(function* (item) {
>      this._ensureItemBelongsToList(item);
> +    yield this._store.deleteItemByURL(item._record.url);

see below, but I think removing the |url| getter is a mistake, which would mean this could remain as item.url

@@ +455,5 @@
>    // items via a message manager. If |this.list| is set, the item can't be
>    // transferred directly, so .toJSON is implicitly called and the object
>    // returned via that is sent. However, once the item is deleted and |this.list|
>    // is null, the item *can* be directly serialized - so the message handler
> +  // sees the _record object.

I think the old comment was clearer here - ie, once this.list is null we see *more than* this._record - we see the entire "raw" object including its internal properties such as "_record"

@@ +508,2 @@
>     * @type string
>     */

TBH I'm not that keen on killing the 'url' getter - it seems strange that the only way to create an item is passing an object with |url|, but the returned object has no |url| - and vice-versa (ie, that an object with |uri| will fail to work even though the resulting object has |uri|.

Same issue with the JSON representation of an object - IIUC, it will have a url property even though the object itself will fail if you try and access it.

I think the cost of these helpers are low enough we should just support them both as getters (no setters is fine though I think), and I think your normalizeRecord function should (a) check both url and URI are *not* specified and (b) convert a specified URI to a URL. (ie, at no point do we need to store a URI, but we will happily accept and return them)

@@ +510,1 @@
>    get domain() {

Can we kill domain?  I assume it's not a field in the schema, and getting it is relatively trivial given the .URI getter

@@ +525,4 @@
>             undefined;
>    },
>    set resolvedURI(val) {
> +    this._updateRecord({ resolvedURL: val });

Why do we still support this as a setter?  Isn't this the same concept as URI/url?

@@ +581,3 @@
>    },
>    set favorite(val) {
> +    this._updateRecord({ favorite: Number(!!val) });

Number() seems strange here (and for all booleans below) - isn't '!!' always going to give us a bool, which is exactly what we want?

@@ +626,4 @@
>             undefined;
>    },
>    set addedOn(val) {
> +    this._updateRecord({ addedOn: val.valueOf() });

If we are dropping setters due to the fields being immutable (such as URI), I think a setter for this seems suspect too.  Ditto for storedOn below.

@@ +700,5 @@
> +   * that the local store and server use.  Records passed in by the consumer are
> +   * not normalized, but everywhere else, records are always normalized unless
> +   * otherwise stated.  The setter normalizes the passed-in value, so it will
> +   * throw an error if the value is not a valid record.
> +   */

I don't understand the need for this extra complexity - I can't see anyone else setting ._record (and if code external to this is trying to go behind our back, then can continue to do so by using .__record.)

::: browser/components/readinglist/test/xpcshell/test_ReadingList.js
@@ +61,5 @@
>    Assert.ok(item);
>  
>    Assert.ok(item.uri);
>    Assert.ok(item.uri instanceof Ci.nsIURI);
> +  Assert.equal(item.uri.spec, item._record.url);

this change would not be necessary if we allow that getter.
Attachment #8579642 - Flags: review?(mhammond) → feedback+
Keeping the url getter is fine by me.  It just seemed unnecessary since there's a uri property, so I erred on the side of a smaller API surface.

About other immutable properties -- I'm still a little fuzzy on which properties are immutable from the front-end's POV.  At least I haven't thought much about it beyond url.  There may be properties that are not immutable from the server API POV (resolved_url is one IIRC, you can actually set it) but maybe there's no reason for the front-end to be changing them.  Definitely OK with me to remove setters for immutable properties.

(In reply to Mark Hammond [:markh] from comment #9)
> @@ +510,1 @@
> >    get domain() {
> 
> Can we kill domain?  I assume it's not a field in the schema, and getting it
> is relatively trivial given the .URI getter

I'm not sure -- Blair's original ReadingList.jsm had it.  It's not in the schema, but IIRC the sidebar uses it to display the domain in each sidebar entry.  But as you say it could be up to the consumer to get the domain, so I can gladly remove it and modify the one (IIRC) caller.

> @@ +581,3 @@
> >    },
> >    set favorite(val) {
> > +    this._updateRecord({ favorite: Number(!!val) });
> 
> Number() seems strange here (and for all booleans below) - isn't '!!' always
> going to give us a bool, which is exactly what we want?

Well, I was trying to keep these boolean properties as numbers in the raw record because SQLite doesn't have a boolean type really -- it treats them as numbers.  I was thinking the server is the same, but I checked now and it's not, it uses real booleans.  So I'm probably overcorrecting here.  (Somewhere along the way bools are converted to numbers for SQLite anyhow.)  So I'll drop this.

> @@ +700,5 @@
> > +   * that the local store and server use.  Records passed in by the consumer are
> > +   * not normalized, but everywhere else, records are always normalized unless
> > +   * otherwise stated.  The setter normalizes the passed-in value, so it will
> > +   * throw an error if the value is not a valid record.
> > +   */
> 
> I don't understand the need for this extra complexity - I can't see anyone
> else setting ._record (and if code external to this is trying to go behind
> our back, then can continue to do so by using .__record.)

I'm not sure what you mean by complexity -- we can talk about it tomorrow -- but the sync module sets _record too.  It takes a server record, converts it to a local record, and then sets localItem._record.  So it's nice for the normalization and sanity checking to all happen there in one place, and "atomically," if you will.

> It's also worth considering if we should be looking at these improvements
> after we have the engine working - I doubt many of these changes make a huge
> difference to the engine.

Fair point, but while some of the improvements are technically unrelated to sync, they make me happier and help me keep my sanity as I'm making the changes that are directly related to sync. :-)  The sync module is pretty much exercising all of this code.
For reference, in case you want to look at it.
First cut at the "server client" - we might as well keep this in thus bug rather than creating a new one because it really is a part of the engine. A better name would be nice but this looks in reasonable shape to me.
Attachment #8579766 - Flags: feedback?(adw)
Comment on attachment 8579766 [details] [diff] [review]
0006-first-wip-for-server-client.patch

Review of attachment 8579766 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine from my POV.

::: browser/components/readinglist/ServerClient.jsm
@@ +45,5 @@
> +    }
> +    if (path.startsWith("/")) {
> +      path = path.slice(1);
> +    }
> +    return result + "/" + path;

As a consumer, I would expect that if I passed "articles" as the path, instead of "/articles", then I'd end up with a bad URL: https://readinglist.services.mozilla.com/v1articles.  "articles" isn't actually the right path, and I'd be surprised to see my passed-in path modified IOW.  But not a big deal.

@@ +138,5 @@
> +        } catch (e) {
> +          log.error("Failed to parse JSON body ${body}: ${e}",
> +                    {body: request.response.body, e});
> +          // We don't reject due to this, but just set a '.rawBody' attribute.
> +          result.rawBody = response.body;

The word rawBody kind of creeps me out. :-O  How about bodyText or bodyString?  Shouldn't actually matter to me though.  I don't think I'd ever use it.  I'd check if body is undefined and then deal with that.
Attachment #8579766 - Flags: feedback?(adw) → feedback+
Addresses comments.
Attachment #8579642 - Attachment is obsolete: true
Attachment #8580311 - Flags: review?(mhammond)
Had to modify the SQL generation stuff and store API in order to filter out items whose syncStatus is DELETED.
Attachment #8579706 - Attachment is obsolete: true
Attachment #8580318 - Flags: feedback?(mhammond)
Comment on attachment 8580311 [details] [diff] [review]
standalone ReadingList changes in preparation for sync changes v2

Review of attachment 8580311 [details] [diff] [review]:
-----------------------------------------------------------------

looks great, thanks!

::: browser/components/readinglist/ReadingList.jsm
@@ +137,5 @@
>     * @param {String/nsIURI} url - URL to check.
>     * @returns {Promise} Promise that is fulfilled with a boolean indicating
>     *                    whether the URL is in the list or not.
>     */
> +  hasItemForURL: Task.async(function* (url) {

There are 2 consumers of this that need updating.  I had a quick look at them and I don't think we should bother removing this function and forcing them to fetch an item, so these consumers just need updating to the new name.
Attachment #8580311 - Flags: review?(mhammond) → review+
Comment on attachment 8580318 [details] [diff] [review]
WIP sync patch, builds on the other patch v2

Review of attachment 8580318 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/readinglist/ReadingList.jsm
@@ +724,5 @@
>     */
>    _updateRecord(partialRecord) {
>      let record = this._record;
> +
> +    // Update the syncStatus flag.

This part should only kick in when the current status is SYNC_STATUS_SYNCED.

::: browser/components/readinglist/SQLiteStore.jsm
@@ +245,5 @@
>   * @param optsList See Options Objects in ReadingList.jsm.
>   * @return An array [sql, args].  sql is a string of SQL.  args is an object
>   *         that contains arguments for all the parameters in sql.
>   */
> +function sqlFromOptions(userOptsList, controlOpts) {

Whoops, meant to delete this function.  It's replaced by the two new ones below it.
Comment on attachment 8580318 [details] [diff] [review]
WIP sync patch, builds on the other patch v2

Review of attachment 8580318 [details] [diff] [review]:
-----------------------------------------------------------------

I think this is looking really good!  I'm surprised to not see more stuff around server timestamps and if-modified-since/if-not-modified-since headers) but I do understand this is just a WIP - so looking good!

::: browser/components/readinglist/ReadingList.jsm
@@ +59,5 @@
>  `.trim().split(/\s+/);
>  
> +
> +// These options filter out all records in the store that don't match.
> +const STORE_CONTROL_OPTIONS = {

I think a better name and comment here would be nice - something like STORE_NON_DELETED_OPTIONS and maybe even a comment indicating that locally deleted items aren't deleted from the DB until after a sync.  (Hmm - but after looking at SQLLiteStore.jsm I see why it is named this way, so I'm not too bothered - but I think some clarifying comments would help - I *think* it's just the CONTROL part that seems strange - STORE_OPTIONS might be fine.  Whatever.)

An interesting side-effect of this is that if the RL engine is disabled items will *never* be removed from the store, which kinda sucks.  I wonder if we can instantly delete items without a GUID (implying that the item has never before made it to the server and thus can be safely deleted now?)

@@ +280,5 @@
>     *         Error on error.
>     */
>    deleteItem: Task.async(function* (item) {
>      this._ensureItemBelongsToList(item);
> +    item._record.syncStatus = SYNC_STATUS_DELETED;

see above - this looks like a reasonable place to nuke the item from the DB if it has no guid.

@@ +391,5 @@
>    _itemFromRecord(record) {
>      let itemWeakRef = this._itemsByNormalizedURL.get(record.url);
>      let item = itemWeakRef ? itemWeakRef.get() : null;
>      if (item) {
> +//       item._record = record;

/me ignores this :)

@@ +705,5 @@
>     * throw an error if the value is not a valid record.
> +   *
> +   * This object should reflect the item's representation in the local store, so
> +   * when calling the setter, be careful that it doesn't drift away from the
> +   * store's record.  If you set it, you should also call updateItem() around

it looks like this comment belongs in the other patch - there's no need to change it, but if that's wrong I'm probably misunderstanding something.

@@ +726,5 @@
>      let record = this._record;
> +
> +    // Update the syncStatus flag.
> +    let hasMaterialChange = Object.keys(partialRecord).some(prop => {
> +      return RECORD_PROPERTIES_CHANGED_MATERIAL.includes(prop);

Bug 1141505 was due to .includes() not existing on Aurora, so it's not clear this is safe.

::: browser/components/readinglist/SQLiteStore.jsm
@@ +210,5 @@
>          storedOn INTEGER,
>          markedReadBy TEXT,
>          markedReadOn INTEGER,
> +        readPosition INTEGER,
> +        syncStatus INTEGER

can we get away with a trailing comma here?  (I wouldn't be surprised to hear the sql engine throws though...)  I think this is likely to conflict with Jaw's recent patch too (he added the 'preview' column)

@@ +434,5 @@
> +}
> +
> +/**
> + * Returns a SQL expression in disjunctive normal form generated from the given
> + * options list.

might be worth a quick summary of what "disjunctive normal form" is and why it matters ;)

::: browser/components/readinglist/Sync.jsm
@@ +106,5 @@
> +    yield this._uploadChanges(SYNC_STATUS_CHANGED_STATUS,
> +                              RECORD_PROPERTIES_CHANGED_STATUS,
> +                              Task.async(function* (response) {
> +      if (response.status != 200) {
> +        log.warn("Unexpected response: ", response);

A simple comment along the lines of "by design, status-only changes will never conflict" or similar might be worthwhile.

I wonder if we can start thinking about auth errors too - eg, maybe a method like this._handleUnexpectedResponse(response) that for now does nothing but log, but later can do whatever it is we need to do on a 401.

@@ +139,5 @@
> +      defaults: {
> +        method: "PATCH",
> +      },
> +      requests: requests,
> +    });

we probably need to check batchResponse.status here (if for no better reason than auth errors, but presumably for 50x etc too, which should at least be logged). Similarly in the other phases below.

@@ +140,5 @@
> +        method: "PATCH",
> +      },
> +      requests: requests,
> +    });
> +    for (let response of batchResponse.data.responses) {

Given the current ServerClient,jsm, this will be batchResponse.body (which I think is the correct name, given the sub-responses also use body).  Similarly in the other phases.

@@ +145,5 @@
> +      if (response.status == 404) {
> +        // item deleted
> +        yield this._deleteItemForGUID(response.body.id);
> +        continue;
> +      }

other response.status values here?  (But this would *not* call the unexpectedResponse method above, as this is a "sub" status.

@@ +328,5 @@
> +  return localRecord;
> +}
> +
> +
> +//XXXadw not sure if we want to expose a singleton or ctor

singleton ;)
Attachment #8580318 - Flags: feedback?(mhammond) → feedback+
Thanks, Mark.

(In reply to Mark Hammond [:markh] from comment #18)
> An interesting side-effect of this is that if the RL engine is disabled
> items will *never* be removed from the store, which kinda sucks.  I wonder
> if we can instantly delete items without a GUID (implying that the item has
> never before made it to the server and thus can be safely deleted now?)

Yeah, I think so.

It's a good point, and deletes for items with guids is something we should think about how to address.  It seems to me that if you do use sync, then we do need to keep around the guids of items you've deleted -- indefinitely, until the next sync.  Otherwise you could have items that mysteriously reappear, which we should try pretty hard to avoid IMO.

But what does it mean to "use" sync?  If we had a crystal ball we'd know whether the user would ever sync in the future.  If so, keep around deleted guids indefinitely, until the next sync.  If not, then delete them.  We should probably ask rnewman what he's doing.

What do you think about this.  Keep a separate table for deleted items with only a guid column basically.  For items without guids, truly delete them from the main table.  For items with guids, move their guids to the deleted table.  When sync is disabled, your new items won't have guids, so the deleted table won't grow beyond whatever items you have that have already synced.  But we should probably take some steps to not let it grow unbounded.

Or maybe we don't need a separate table?  Another table would make the storage getters a little nicer since all the extra "control" options I just added wouldn't be necessary.  And there's probably some performance benefit to keeping the main table as small as possible.  We should probably at least null out the non-guid fields, since keeping that info around is a form of data leakage.
(In reply to Drew Willcoxon :adw from comment #19)

> Yeah, I think so.

On Android we immediately delete items that are in STATE_NEW (which coincidentally also don't have GUIDs, but the state formalism is more explicit).

See logic here:

http://mxr.mozilla.org/mozilla-central/source/mobile/android/base/db/ReadingListProvider.java#199


> But what does it mean to "use" sync?  If we had a crystal ball we'd know
> whether the user would ever sync in the future.  If so, keep around deleted
> guids indefinitely, until the next sync.  If not, then delete them.  We
> should probably ask rnewman what he's doing.

The case of detaching then reattaching is a little thorny, and I can talk about it tomorrow if you like.

In short: if you reconnect to the same FxA, it makes sense to keep the old data around. If you switch accounts, you should sanitize the DB: clear GUIDs and change flags, remove any deleted items. It's theoretically acceptable to always do so immediately on disconnect, but more efficient to delay.

Deletions between detach and reattach are relatively straightforward: the set of items to mark as DELETED is finite, so not worth worrying about (if the DB is surviving right now, it'll definitely survive with some items flagged for deletion); any new items added since then can be deleted immediately.


> We should probably at least
> null out the non-guid fields, since keeping that info around is a form of
> data leakage.

I elected to simply null out columns, rather than using a separate table.

http://mxr.mozilla.org/mozilla-central/source/mobile/android/base/db/ReadingListProvider.java#163
Comment on attachment 8580311 [details] [diff] [review]
standalone ReadingList changes in preparation for sync changes v2

https://hg.mozilla.org/integration/fx-team/rev/c7897f709ff2

Bug 1131416 - Desktop syncing module for reading list service (prepatory changes). r=markh
Keywords: leave-open
Whoops, meant to include this in the thing I just pushed but I didn't qref.  This actually fixes a test failure introduced by bug 1144823.

https://hg.mozilla.org/integration/fx-team/rev/6305033b955e

Bug 1131416 - Desktop syncing module for reading list service (test fix). r=me
... aaaand some follow-up fixes noticed by Jared.

https://hg.mozilla.org/integration/fx-team/rev/648e3888d9bd

Bug 1131416 - Desktop syncing module for reading list service (fixes to prepatory changes). rs=jaws
(In reply to Drew Willcoxon :adw from comment #23)
> ... aaaand some follow-up fixes noticed by Jared.
> 
> https://hg.mozilla.org/integration/fx-team/rev/648e3888d9bd
> 
> Bug 1131416 - Desktop syncing module for reading list service (fixes to
> prepatory changes). rs=jaws

Backed out:

https://hg.mozilla.org/integration/fx-team/rev/a10307812cf8
And relanded:

https://hg.mozilla.org/integration/fx-team/rev/07f9d9dd82d3

Bug 1131416 - Desktop syncing module for reading list service (fixes to prepatory changes). rs=jaws

Problem was that the tests created a record with a null resolvedURL, and my patch passed that null value to normalizeURI, which doesn't handle nulls.  So I added a conditional to make sure that properties are truthy before passing them to normalizeURI.
Missed another consumer affected by the getItemForURL -> itemForURL name change.  Sorry for the churn.  rs=jaws IRL.

https://hg.mozilla.org/integration/fx-team/rev/70cf7bc0d638

Bug 1131416 - Desktop syncing module for reading list service (one more fix to prepatory changes). r=jaws
Attached patch WIP sync patch v3 (obsolete) — Splinter Review
Includes Mark's scheduler changes.  I'm still trying to test this, so there may be errors, but I think it's nearly done...
Attachment #8580318 - Attachment is obsolete: true
(In reply to Drew Willcoxon :adw from comment #27)
> Includes Mark's scheduler changes.  I'm still trying to test this, so there
> may be errors, but I think it's nearly done...

Need to handle Last-Modified still...
Attached patch WIP sync patch v4 (obsolete) — Splinter Review
Fixes some syntax errors, adds some more logging, makes some logging changes that Mark suggested (adds readinglist.* logs, hackily sets log level to debug).  This patch actually runs, the previous one didn't.  Still need to handle Last-Modified.
Attachment #8580973 - Attachment is obsolete: true
Attached patch WIP sync patch v5 (obsolete) — Splinter Review
This uses Last-Modified.  Not 100% sure I'm doing it right, and I'm definitely not handling 412 responses after uploading status-only changes.  And still not handling material changes ("phase 3") correctly -- not reconciling conflicts or iterating until there are no more conflicts.

I can successfully sync with it, but I haven't stress-tested it at all.  Barely tested.

And no automated tests.
Attachment #8580990 - Attachment is obsolete: true
Attachment #8580999 - Flags: feedback?(mhammond)
Comment on attachment 8580999 [details] [diff] [review]
WIP sync patch v5

Review of attachment 8580999 [details] [diff] [review]:
-----------------------------------------------------------------

Awesome, this is looking great!

::: browser/components/readinglist/ReadingList.jsm
@@ +320,5 @@
> +    // the store.  Otherwise mark it as deleted but don't actually delete it so
> +    // that its status can be synced.
> +    if (item._record.syncStatus == SYNC_STATUS_NEW) {
> +      yield this._store.deleteItemByURL(item.url);
> +    }

nit: else on same line as closing brace

::: browser/components/readinglist/Sync.jsm
@@ +252,5 @@
> +        //XXXadw ???
> +        continue;
> +      }
> +      //XXXadw server seems to return 200 if an identical item already exists.
> +      // (so should ignore 200s?)

to be clear, we don't expect that to be the case here though, right?  ie, our logic should never upload an identical item when this is the only client who has ever put stuff on the server?  (I understand there might be a server bug, in which case I'd expect the comment to reflect that, and I also understand we might land without fully understanding this, but that would be a shame and the comment should reflect that too :)

@@ +269,5 @@
> +   *
> +   * Downloads items that were modified since the last sync.
> +   */
> +  _downloadModifiedItems: Task.async(function* () {
> +    log.info("Phase 2");

I think these phases would be a good example of log.debug instead of .info

::: services/common/logmanager.js
@@ +109,5 @@
>  
>      // now attach the appenders to all our logs.
>      for (let logName of logNames) {
>        let log = Log.repository.getLogger(logName);
> +      log.level = Log.Level.Debug;

As you mentioned, this is a hack and shouldn't land in this patch - I'll try and have a fix for this up by Monday.
Attachment #8580999 - Flags: feedback?(mhammond) → feedback+
Comment on attachment 8580999 [details] [diff] [review]
WIP sync patch v5

Review of attachment 8580999 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/readinglist/ReadingList.jsm
@@ +1006,5 @@
>  
>  Object.defineProperty(this, "ReadingList", {
>    get() {
>      if (!this._singleton) {
> +      let store = new SQLiteStore("reading-list-temp3.sqlite");

if we are confident of things once this patch is ready, maybe this is also a good opportunity to give the file it's real and final name?
Mostly the same as the last one, but feedback comments addressed and utf-8 handling added (the server was rejecting non-ascii chars as we failed to utf-8 encode them - I added tests for this too)
Attachment #8579766 - Attachment is obsolete: true
Attachment #8581216 - Flags: review?(adw)
Attached patch for-review sync patch v1 (obsolete) — Splinter Review
This includes a test -- well, the start of a test.  It actually helped me find several bugs.  Next I'll post a diff from the previous patch, if it helps.

One thing I'm not certain about is telling the server about locally deleted items.  There's not actually a phase for that.  The sync doc doesn't mention it specifically.  I'm wondering if that's one purpose of the `status` article field.  The data model doc says it can take a value meaning deleted.  So I'm wondering whether we should be setting `status` on deleted items and uploading them in the status-only-changes phase.  This patch only sets `syncStatus` for deleted items.

(In reply to Mark Hammond [:markh] from comment #33)
> if we are confident of things once this patch is ready, maybe this is also a
> good opportunity to give the file it's real and final name?

I'm not confident. :-)  In seriousness, I'd like to defer that to the last possible moment, until they tell us we have to ship the thing NOW.  What do you think?

(In reply to Mark Hammond [:markh] from comment #31)
> nit: else on same line as closing brace

I've been doing it the other way in ReadingList.jsm and Sync.jsm so far, so I hope you don't mind that I left it as-is for consistency.

> ::: browser/components/readinglist/Sync.jsm
> @@ +252,5 @@
> > +        //XXXadw ???
> > +        continue;
> > +      }
> > +      //XXXadw server seems to return 200 if an identical item already exists.
> > +      // (so should ignore 200s?)
> 
> to be clear, we don't expect that to be the case here though, right?  ie,
> our logic should never upload an identical item when this is the only client
> who has ever put stuff on the server?

Right.  I only noticed it because sync wasn't working properly at first, so in my manual testing I ended up uploading an item that already existed on the server.

Since we wouldn't expect that in normal bug-free usage, I removed the 200 check here so that only 201 is expected, so if we get a 200, it'll be logged as an unexpected response.  Is that OK?  Although you could legitimately make a new item on one device and then make an identical item on another before either one syncs...

> @@ +269,5 @@
> > +   *
> > +   * Downloads items that were modified since the last sync.
> > +   */
> > +  _downloadModifiedItems: Task.async(function* () {
> > +    log.info("Phase 2");
> 
> I think these phases would be a good example of log.debug instead of .info

Fixed.

> ::: services/common/logmanager.js
> @@ +109,5 @@
> >  
> >      // now attach the appenders to all our logs.
> >      for (let logName of logNames) {
> >        let log = Log.repository.getLogger(logName);
> > +      log.level = Log.Level.Debug;
> 
> As you mentioned, this is a hack and shouldn't land in this patch - I'll try
> and have a fix for this up by Monday.

Removed.
Attachment #8580999 - Attachment is obsolete: true
Attachment #8582115 - Flags: review?(mhammond)
Comment on attachment 8581216 [details] [diff] [review]
0004-Bug-1131416-module-for-the-readinglist-sync-engine-t.patch

Review of attachment 8581216 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Mark Hammond [:markh] from comment #34)
> I added tests for this too)

Nice!
Attachment #8581216 - Flags: review?(adw) → review+
Oops, I forgot there were some points in comment 18 that I still needed to address.  I don't think any are major (?), so I'll batch those with whatever comments you have on the latest patch.
(In reply to Drew Willcoxon :adw from comment #35)

> One thing I'm not certain about is telling the server about locally deleted
> items.  There's not actually a phase for that.  The sync doc doesn't mention
> it specifically.  I'm wondering if that's one purpose of the `status`
> article field.  The data model doc says it can take a value meaning deleted.
> So I'm wondering whether we should be setting `status` on deleted items and
> uploading them in the status-only-changes phase.  This patch only sets
> `syncStatus` for deleted items.

`status` was removed from the server format a while ago, replaced by individual `deleted` and `archived` flags (which are mutually exclusive).

Deletions are communicated to the server via DELETE requests to the appropriate record URL. Outgoing records should never include `deleted`.

These can happen at any time -- ideally in the first phase. Deletes typically shouldn't conflict; they will only do so if you provide If-Unmodified-Since. Once an item is deleted it can't be undeleted.

See the API doc:

http://readinglist.readthedocs.org/en/latest/api/resource.html#delete-articles-id
Comment on attachment 8582116 [details] [diff] [review]
diff between sync patch WIP v5 and for-review v1

Review of attachment 8582116 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good, although it sounds like we need to address rnewman's comment 39 before landing?

::: browser/components/readinglist/ReadingList.jsm
@@ +1006,5 @@
>  
>  Object.defineProperty(this, "ReadingList", {
>    get() {
>      if (!this._singleton) {
> +      let store = new SQLiteStore("reading-list-temp4.sqlite");

as mentioned, I think we should take this opportunity to rename this to the "real" name once rnewman's comments have been addressed and if we are relatively confident no schema changes will be needed for the next week or so :)

::: browser/components/readinglist/Sync.jsm
@@ +20,5 @@
>    "resource:///modules/readinglist/ReadingList.jsm");
>  XPCOMUtils.defineLazyModuleGetter(this, "ServerClient",
>    "resource:///modules/readinglist/ServerClient.jsm");
>  
> +Object.defineProperty(this, "log", {

I don't understand these gymnastics.

@@ +414,4 @@
>    get _serverLastModifiedHeader() {
>      if (!("__serverLastModifiedHeader" in this)) {
>        this.__serverLastModifiedHeader =
>          Preferences.get("browser.readinglist.sync.serverLastModified",

I should have picked this up before, but I think this pref name should drop the "browser." prefix.

@@ +474,5 @@
>      return this._singleton;
>    },
>  });
> +
> +// This is exposed for the test.

Do we really need this?  I think the backstage-pass exposed for tests should make everything public (eg, see createTestableScheduler in Scheduler.jsm and test_scheduler which manages to use it.

::: browser/components/readinglist/test/xpcshell/test_Sync.js
@@ +2,5 @@
> + * http://creativecommons.org/publicdomain/zero/1.0/ */
> +
> +"use strict";
> +
> +let gDBFile = do_get_profile();

maybe add a comment saying this will end up as gDBFile later in the test (or just rename it to gProfileDir, then clone it to create gDBFile?)
Attachment #8582116 - Flags: feedback+
Iteration: 39.2 - 23 Mar → 39.3 - 30 Mar
Attached patch for-review sync patch v2 (obsolete) — Splinter Review
I'll post the diff between this and the previous patch next.

(In reply to Mark Hammond [:markh] from comment #18)
> > +// These options filter out all records in the store that don't match.
> > +const STORE_CONTROL_OPTIONS = {
> 
> I think a better name and comment here would be nice - something like
> STORE_NON_DELETED_OPTIONS and maybe even a comment indicating that locally
> deleted items aren't deleted from the DB until after a sync.  (Hmm - but
> after looking at SQLLiteStore.jsm I see why it is named this way, so I'm not
> too bothered - but I think some clarifying comments would help - I *think*
> it's just the CONTROL part that seems strange - STORE_OPTIONS might be fine.
> Whatever.)

Changed to STORE_OPTIONS_IGNORE_DELETED and expanded the comment.

> @@ +726,5 @@
> >      let record = this._record;
> > +
> > +    // Update the syncStatus flag.
> > +    let hasMaterialChange = Object.keys(partialRecord).some(prop => {
> > +      return RECORD_PROPERTIES_CHANGED_MATERIAL.includes(prop);
> 
> Bug 1141505 was due to .includes() not existing on Aurora, so it's not clear
> this is safe.

D'oh, yes, changed to indexOf.  There were three uses of includes() in the file, so I changed them all.

> ::: browser/components/readinglist/SQLiteStore.jsm
> @@ +210,5 @@
> >          storedOn INTEGER,
> >          markedReadBy TEXT,
> >          markedReadOn INTEGER,
> > +        readPosition INTEGER,
> > +        syncStatus INTEGER
> 
> can we get away with a trailing comma here?  (I wouldn't be surprised to
> hear the sql engine throws though...)  I think this is likely to conflict
> with Jaw's recent patch too (he added the 'preview' column)

As we discussed that's a syntax error unfortunately.  What we could probably do (untested) is:

CREATE TABLE foo (
  bar TEXT
, baz TEXT
, quux TEXT

> @@ +434,5 @@
> > +}
> > +
> > +/**
> > + * Returns a SQL expression in disjunctive normal form generated from the given
> > + * options list.
> 
> might be worth a quick summary of what "disjunctive normal form" is and why
> it matters ;)

Clarified the comment.  (I looked up "disjunctive normal form" to make sure the definition applied here and actually it doesn't quite, so I removed the term.)

> ::: browser/components/readinglist/Sync.jsm
> @@ +106,5 @@
> > +    yield this._uploadChanges(SYNC_STATUS_CHANGED_STATUS,
> > +                              RECORD_PROPERTIES_CHANGED_STATUS,
> > +                              Task.async(function* (response) {
> > +      if (response.status != 200) {
> > +        log.warn("Unexpected response: ", response);
> 
> A simple comment along the lines of "by design, status-only changes will
> never conflict" or similar might be worthwhile.

Added that sentence to the _uploadStatusChanges method comment.

> I wonder if we can start thinking about auth errors too - eg, maybe a method
> like this._handleUnexpectedResponse(response) that for now does nothing but
> log, but later can do whatever it is we need to do on a 401.

Yeah, see next comment:

> @@ +139,5 @@
> > +      defaults: {
> > +        method: "PATCH",
> > +      },
> > +      requests: requests,
> > +    });
> 
> we probably need to check batchResponse.status here (if for no better reason
> than auth errors, but presumably for 50x etc too, which should at least be
> logged). Similarly in the other phases below.

Yes, good point!  I added _handleUnexpectedResponse and called it after all _sendRequest calls.

> @@ +145,5 @@
> > +      if (response.status == 404) {
> > +        // item deleted
> > +        yield this._deleteItemForGUID(response.body.id);
> > +        continue;
> > +      }
> 
> other response.status values here?  (But this would *not* call the
> unexpectedResponse method above, as this is a "sub" status.

Why wouldn't we call unexpectedResponse for the responses within batched responses?  (That's what this patch does.)  All it does right now is log a warning, but I agree we should be careful when we modify it so that it still does what we want in the batched case.

I moved the 412 handling to the _uploadChanges helper.  I also added a 409 check.  After that, _uploadStatusChanges and _uploadMaterialChanges do nearly the same things, so I removed _uploadChanges's callback param.

(In reply to Richard Newman [:rnewman] from comment #39)
> `status` was removed from the server format a while ago, replaced by
> individual `deleted` and `archived` flags (which are mutually exclusive).
> 
> Deletions are communicated to the server via DELETE requests to the
> appropriate record URL. Outgoing records should never include `deleted`.

Thanks, Richard.  I added an _uploadDeletedItems as phase 1 "part 3".

Since list.forEachItem filters out synced deleted items by design, I added list.forEachSyncedDeletedItem.

(In reply to Mark Hammond [:markh] from comment #41)
> ::: browser/components/readinglist/ReadingList.jsm
> @@ +1006,5 @@
> >  
> >  Object.defineProperty(this, "ReadingList", {
> >    get() {
> >      if (!this._singleton) {
> > +      let store = new SQLiteStore("reading-list-temp4.sqlite");
> 
> as mentioned, I think we should take this opportunity to rename this to the
> "real" name once rnewman's comments have been addressed and if we are
> relatively confident no schema changes will be needed for the next week or
> so :)

OK, changed the name to reading-list.sqlite.

> ::: browser/components/readinglist/Sync.jsm
> @@ +20,5 @@
> >    "resource:///modules/readinglist/ReadingList.jsm");
> >  XPCOMUtils.defineLazyModuleGetter(this, "ServerClient",
> >    "resource:///modules/readinglist/ServerClient.jsm");
> >  
> > +Object.defineProperty(this, "log", {
> 
> I don't understand these gymnastics.

Was lazy-loading the log, but that's probably dumb, so I replaced it with a simple assignment.

> @@ +414,4 @@
> >    get _serverLastModifiedHeader() {
> >      if (!("__serverLastModifiedHeader" in this)) {
> >        this.__serverLastModifiedHeader =
> >          Preferences.get("browser.readinglist.sync.serverLastModified",
> 
> I should have picked this up before, but I think this pref name should drop
> the "browser." prefix.

Changed, and moved the string to a const.

> @@ +474,5 @@
> >      return this._singleton;
> >    },
> >  });
> > +
> > +// This is exposed for the test.
> 
> Do we really need this?  I think the backstage-pass exposed for tests should
> make everything public (eg, see createTestableScheduler in Scheduler.jsm and
> test_scheduler which manages to use it.

Fixed.

> ::: browser/components/readinglist/test/xpcshell/test_Sync.js
> @@ +2,5 @@
> > + * http://creativecommons.org/publicdomain/zero/1.0/ */
> > +
> > +"use strict";
> > +
> > +let gDBFile = do_get_profile();
> 
> maybe add a comment saying this will end up as gDBFile later in the test (or
> just rename it to gProfileDir, then clone it to create gDBFile?)

Fixed.
Attachment #8582115 - Attachment is obsolete: true
Attachment #8582115 - Flags: review?(mhammond)
Attachment #8582659 - Flags: review?(mhammond)
Attachment #8582116 - Attachment is obsolete: true
Attached patch for-review sync patch v3 (obsolete) — Splinter Review
Changes we talked about plus a few others:

The `!("addedOn" in record)` etc. checks in list.addItem are so that the sync module can set those properties with whatever the truth is on the server.

Replaced the `status` property with archived and deleted in Sync.jsm -- missed that in the previous patch.

Changed some log.info's to log.debug's.

Improved test_Sync.js a little.
Attachment #8582659 - Attachment is obsolete: true
Attachment #8582659 - Flags: review?(mhammond)
Attachment #8582817 - Flags: review?(mhammond)
Attachment #8582660 - Attachment is obsolete: true
Comment on attachment 8582817 [details] [diff] [review]
for-review sync patch v3

Review of attachment 8582817 [details] [diff] [review]:
-----------------------------------------------------------------

awesome, let's ship this sucker!

::: browser/components/readinglist/ReadingList.jsm
@@ +313,2 @@
>        record.addedBy = Services.prefs.getCharPref("services.sync.client.name");
>      }

I think we should add an else here and set record.addedBy = "Firefox" or something as a short-term work-around.  The issue is that the server rejects the item without an addedBy, and the very first sync in a new profile is unlikely to have that pref yet, so the very first RL sync is likely to fail.  I'll open a new bug to get a better value.

@@ +367,5 @@
> +    }
> +    else {
> +      // To prevent data leakage, only keep the record fields needed to sync
> +      // the deleted status: guid and syncStatus.
> +      item._record = {

If I'm reading SQLiteStore.jsm correctly this will not nuke the other fields.  I agree we probably should, but it's OK to do in a followup - so unless I'm mistaken we should update the comment.

(Also, aren't you going to have issues nuking URL - that has a non-null constraint and resolvedURL has a unique constraint (but maybe null is ok in that case)?
Attachment #8582817 - Flags: review?(mhammond) → review+
Attached patch diff between from v3 (obsolete) — Splinter Review
You're totally right, please have a look.
Attachment #8582865 - Flags: review?(mhammond)
Attachment #8582865 - Attachment is patch: true
Attachment #8582865 - Attachment is obsolete: true
Attachment #8582865 - Flags: review?(mhammond)
Attachment #8582877 - Flags: review?(mhammond)
r+'ed by Mark over IRC.

https://hg.mozilla.org/integration/fx-team/rev/fbeeabe755e7
Attachment #8582817 - Attachment is obsolete: true
Attachment #8582818 - Attachment is obsolete: true
Attachment #8582877 - Attachment is obsolete: true
Attachment #8582877 - Flags: review?(mhammond)
Attachment #8582891 - Flags: review+
Keywords: leave-open
https://hg.mozilla.org/mozilla-central/rev/fbeeabe755e7
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 39
Blocks: 1147713
Depends on: 1148060
Depends on: 1148217
Depends on: 1148252
Depends on: 1148273
Depends on: 1149302
Depends on: 1149336
Depends on: 1149337
Depends on: 1151077
Product: Firefox → Firefox Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: